Difference between revisions of "Conllu Parsing and Searching"

From Apertium
Jump to navigation Jump to search
 
(7 intermediate revisions by the same user not shown)
Line 5: Line 5:
   
   
 
These are the terms for searching. Between the words you are searching for a relation between, add a '>'. You can also use and '<' if you are searching for a word that is a dependent of another word. This, the '<', will find the dependent word. For instance, if you wanted to see when 'have' did action to 'clue' (i.e. I have no clue') you could do it like this:
== Tree search: The '{' character ==
 
   
 
For example <code> python conlluparse.py "text.conllu" 'have>clue'</code> might output:
If you would like to '''search with a tree'''(i.e you want to search for a word with a HEAD value or word), you create your search with a '{'. Then, between the words you are searching for a relation between add a '>'. You can also use and '<' if you are searching for a word that is a dependent of another word. This, the '<', will find the dependent word. For instance, if you wanted to see when 'have' did action to 'clue' (i.e. I have no clue') you could do it like this:
 
 
For example <code> python conlluparse.py "text.conllu" '{have>clue'</code> might output:
 
 
Token: 2, Form: have, Lemma: have, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = 2, Sentence: I have no clue .
 
Token: 2, Form: have, Lemma: have, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = 2, Sentence: I have no clue .
   
 
If you wanted, you could also be more specific or ambigious with your searches. When you specify these arguments, you also need to make sure that you concatenate "Form=" with the word you are searching for. When you have nothing specified on one side, you need to add 'none=none' to that side. For instance if you wanted to find if something was a dependent of 'have', you could do:
 
If you wanted, you could also be more specific or ambigious with your searches. When you specify these arguments, you also need to make sure that you concatenate "Form=" with the word you are searching for. When you have nothing specified on one side, you need to add 'none=none' to that side. For instance if you wanted to find if something was a dependent of 'have', you could do:
   
<code> {none=none<form=have </code>
+
<code> none=none<form=have </code>
   
 
When searching with attributes (i.e UPOSTAG), you could do this like:
 
When searching with attributes (i.e UPOSTAG), you could do this like:
   
<code>python conlluparse.py "text.conllu" '{upostag=verb, form=have>form=clue'</code> which may output:
+
<code>python conlluparse.py "text.conllu" 'upostag=verb, form=have>form=clue'</code> which may output:
 
Token: 2, Form: have, Lemma: have, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = 2, Sentence: I have no clue .
 
Token: 2, Form: have, Lemma: have, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = 2, Sentence: I have no clue .
   
Line 27: Line 25:
 
Token: 2, Form: have, Lemma: have, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = 2, Sentence: I have no clue .
 
Token: 2, Form: have, Lemma: have, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = 2, Sentence: I have no clue .
   
Now, instead, if you search with <code>python conlluparse.py "text.conllu" '{form=clue<none=none' </code>, it will print:
+
Now, instead, if you search with <code>python conlluparse.py "text.conllu" 'form=clue<none=none' </code>, it will print:
   
 
Token: 4, Form: clue, Lemma: clue, UPOSTAG: NOUN, HEAD: 2, DEPREL: obj, # sent_id = 2, Sentence: I have no clue .
 
Token: 4, Form: clue, Lemma: clue, UPOSTAG: NOUN, HEAD: 2, DEPREL: obj, # sent_id = 2, Sentence: I have no clue .
   
The tree searches essentially combines all the searches terms. You can search for a word with a specific deprel or upostag like
+
ou can search for a word with a specific deprel or upostag like
   
<code> {@root, upostag=noun>none=none </code>
+
<code> @root, upostag=noun>none=none </code>
   
 
You can search for relationships like the ; character:
 
You can search for relationships like the ; character:
   
<code> {@nsubj>upostag=noun </code>
+
<code> @nsubj>upostag=noun </code>
   
 
You can search for a plain word like:
 
You can search for a plain word like:
   
<code> {form=have>none=none </code>
+
<code> form=have>none=none </code>
   
  +
You can do very simple searches like <code> python conlluparse.py "text.conllu" "lemma=Еуровидение,form=Еуровидениенің" </code> without the > or <
This will output the same thing as what the other searches would do. Use other searches to distinguish between what you are searching for.
 
 
== Find Features: The ':' character ==
 
 
If you would like '''to search for a deprel or upostag and a feature in a word''', you would start your search with a ':' and encapsulate your search with '[]'
 
 
For instance if you wanted to search for a copula and past feature you would do
 
 
<code>python conlluparse.py "text.conllu" ':[cop, past]' </code> which may output:
 
 
'Token: 3, Form: болғаныма, Lemma: бол, UPOSTAG: AUX, HEAD: 2, DEPREL: cop, # sent_id = akorda-random.tagged.txt:158:2829, Sentence: Мен осында болғаныма қуаныштымын қуанышты мын .'
 
 
== Relationships: The ';' character ==
 
 
If you would like '''to search with a relationship'''(i.e nsubj relation to another node that has a noun POS), you would start your search with a ';'. You would then type a deprel tag followed by a colon and then a part of speech. The second term(the one after the ';') can be these tags : lemma or the word id_name. You would use to search for a word with nsubj relationship with a noun:
 
 
<code>python conlluparse.py "text.conllu" ';nsubj:noun'</code> Would output:
 
 
'Token: 8, Form: жүзімдік, Lemma: жүзімдік, UPOSTAG: NOUN, HEAD: 6, DEPREL: conj, # sent_id = Шымкент.tagged.txt:8:216, Sentence: Тау етегінде өзен бойындағы алқаптарда егіншілік пен жүзімдік ал көгалды таулы жайылымдарда - мал шаруашылығы дамыған .'
 
   
 
== Examples ==
 
== Examples ==
   
<code>python conlluparse.py "text.conllu" ':[aux, pres]' </code>
 
This is how you would run the program with the ':' Could output:
 
Token: 5, Form: жатырсыздар, Lemma: жат, UPOSTAG: AUX, HEAD: 4, DEPREL: aux, # sent_id = akorda-random.tagged.txt:44:775, Sentence: - Сіздер осында тұрып жатырсыздар ал Астанада жұмыс істейсіздер .
 
 
<code>python conlluparse.py "text.conllu" ';nsubj:verb' </code>
 
This is how you would run the program with the ';' Could output:
 
Token: 7, Form: секіреді, Lemma: секір, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = Ер_Төстік.tagged.txt:22:380, Sentence: Қуып келе жатқан Төстік Төстік те өрмектен секіреді .
 
 
<code>python conlluparse.py "text.conllu" "{none=none>none=none, @obj"</code>
 
This is how you would run the program with the ';' Could output:
 
Token: 6, Form: іліп, Lemma: іл, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = Ер_Төстік.tagged.txt:23:396, Sentence: Сөйткенде Төстіктің бір бақайы өрмекті іліп кетеді .
 
   
<code>python conlluparse.py "text.conllu" '<іліп'</code>
+
<code>python conlluparse.py "text.conllu" "none=none>none=none, @obj"</code>
This is how you would run the program with the '<' Could output:
+
This is how you would run the program. Could output:
 
Token: 6, Form: іліп, Lemma: іл, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = Ер_Төстік.tagged.txt:23:396, Sentence: Сөйткенде Төстіктің бір бақайы өрмекті іліп кетеді .
 
Token: 6, Form: іліп, Lemma: іл, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = Ер_Төстік.tagged.txt:23:396, Sentence: Сөйткенде Төстіктің бір бақайы өрмекті іліп кетеді .

Latest revision as of 05:45, 22 December 2017

Parse and Search through a conllu file[edit]

Searching is as follows:


These are the terms for searching. Between the words you are searching for a relation between, add a '>'. You can also use and '<' if you are searching for a word that is a dependent of another word. This, the '<', will find the dependent word. For instance, if you wanted to see when 'have' did action to 'clue' (i.e. I have no clue') you could do it like this:

For example python conlluparse.py "text.conllu" 'have>clue' might output:

Token: 2, Form: have, Lemma: have, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = 2, Sentence:  I have no clue .

If you wanted, you could also be more specific or ambigious with your searches. When you specify these arguments, you also need to make sure that you concatenate "Form=" with the word you are searching for. When you have nothing specified on one side, you need to add 'none=none' to that side. For instance if you wanted to find if something was a dependent of 'have', you could do:

none=none<form=have

When searching with attributes (i.e UPOSTAG), you could do this like:

python conlluparse.py "text.conllu" 'upostag=verb, form=have>form=clue' which may output:

Token: 2, Form: have, Lemma: have, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = 2, Sentence:  I have no clue .

You can search with any of these tags - upostag, xpostag, lemma, or deprel. You would do this by just putting the tag name + and '=' and then the actual value. Concatenate the tag an '=' and the value like upostag=noun' or 'lemma=clue' or @.

You can also specify attributes instead of 'form=clue' such as 'upostag=noun'

Token: 2, Form: have, Lemma: have, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = 2, Sentence:  I have no clue .

Now, instead, if you search with python conlluparse.py "text.conllu" 'form=clue<none=none' , it will print:

Token: 4, Form: clue, Lemma: clue, UPOSTAG: NOUN, HEAD: 2, DEPREL: obj, # sent_id = 2, Sentence:  I have no clue .

ou can search for a word with a specific deprel or upostag like

@root, upostag=noun>none=none

You can search for relationships like the ; character:

@nsubj>upostag=noun

You can search for a plain word like:

form=have>none=none

You can do very simple searches like python conlluparse.py "text.conllu" "lemma=Еуровидение,form=Еуровидениенің" without the > or <

Examples[edit]

python conlluparse.py "text.conllu" "none=none>none=none, @obj" This is how you would run the program. Could output:

Token: 6, Form: іліп, Lemma: іл, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = Ер_Төстік.tagged.txt:23:396, Sentence:  Сөйткенде Төстіктің бір бақайы өрмекті іліп кетеді .