Difference between revisions of "Conllu Parsing and Searching"
| (44 intermediate revisions by 2 users not shown) | |||
| Line 3: | Line 3: | ||
| Searching is as follows:  | Searching is as follows:  | ||
| == The '<' character == | |||
| If you want to find a specific word(i.e. you want to find the word 'bread' in your ConLL-U file):  | |||
| These are the terms for searching. Between the words you are searching for a relation between, add a '>'. You can also use and '<' if you are searching for a word that is a dependent of another word. This, the '<', will find the dependent word. For instance, if you wanted to see when 'have' did action to 'clue' (i.e. I have no clue') you could do it like this: | |||
| You would start your search with a '<'  | |||
| ⚫ | |||
| Then write the word after the '<' (i.e '<ести)  | |||
| ⚫ | |||
| If you wanted, you could also be more specific or ambigious with your searches. When you specify these arguments, you also need to make sure that you concatenate "Form=" with the word you are searching for. When you have nothing specified on one side, you need to add 'none=none' to that side. For instance if you wanted to find if something was a dependent of 'have', you could do: | |||
| This will print your answer in this format:  | |||
| ⚫ | |||
| 'Token: 6, Form: ести, Lemma: есті, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = story.tagged.txt:44:776, Sentence:  Ол енді ол дыбысты анығырақ ести бастады .'  | |||
| This gives you the Token(where in the sentence did this appear), lemma, upostag(part of speech), HEAD, and the sentence_id | |||
| == The '{' character == | |||
| If you would like to search with a tree(i.e you want to search for a word with a HEAD value or word):  | |||
| You would start your search with a '{'  | |||
| Then, between the words you are searching for a relation between add a '>' | |||
| For instance, if you wanted to see when 'have' did action to 'clue' (i.e. I have no clue') you would use this character  | |||
| An example entry would be '{have>clue'  | |||
| If you wanted, you could also be more specific or ambigious  | |||
| When searching with attributes (i.e UPOSTAG), you could do this like:  | When searching with attributes (i.e UPOSTAG), you could do this like:  | ||
| ' | <code>python conlluparse.py "text.conllu" 'upostag=verb, form=have>form=clue'</code> which may output: | ||
| ⚫ | |||
| You can search with any of these tags - upostag, xpostag, lemma, or deprel. You would do this by just putting the tag name + and '=' and then the actual value. Concatenate the tag an '=' and the value like upostag=noun' or 'lemma=clue' or @. | |||
| PLEASE NOTE THAT WHEN YOU SPECIFY EXTRA ATTRIBUTES YOU HAVE TO PUT 'Form=' ARGUMENT FOR THE WORD | |||
| If you wanted to specify nothing and look for words that do action to bread, you would use:  | |||
| ⚫ | |||
| PLEASE NOTE THAT YOU HAVE TO HAVE 'NONE=NONE' WHERE NOTHING IS SPECIFIED  | |||
| You can also specify attributes instead of 'form=clue' such as 'upostag=noun'  | You can also specify attributes instead of 'form=clue' such as 'upostag=noun'  | ||
|  Token: 2, Form: have, Lemma: have, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = 2, Sentence:  I have no clue . | |||
| Example Output:  | |||
| ⚫ | |||
| == The ':' character == | |||
| If you would like to search for a deprel or upostag and a feature in a word:  | |||
| Now, instead, if you search with <code>python conlluparse.py "text.conllu" 'form=clue<none=none' </code>, it will print: | |||
| You would start your search with a ':' and encapsulate your search with '[]'  | |||
|  Token: 4, Form: clue, Lemma: clue, UPOSTAG: NOUN, HEAD: 2, DEPREL: obj, # sent_id = 2, Sentence:  I have no clue . | |||
| For instance if you wanted to search for a copula and past feature you would do  | |||
| ou can search for a word with a specific deprel or upostag like  | |||
| ':[cop, past]'  | |||
| <code> @root, upostag=noun>none=none </code> | |||
| This would find a copula with a past feature and have an output like:  | |||
| You can search for relationships like the ; character: | |||
| 'Token: 3, Form: болғаныма, Lemma: бол, UPOSTAG: AUX, HEAD: 2, DEPREL: cop, # sent_id = akorda-random.tagged.txt:158:2829, Sentence:  Мен осында болғаныма қуаныштымын қуанышты мын .' | |||
| <code> @nsubj>upostag=noun </code> | |||
| == The ';' character == | |||
| You can search for a plain word like: | |||
| If you would like to search with a relationship(i.e nsubj relation to another node that has a noun POS) | |||
| You would start your search with a ';' | |||
| You would then type a deprel tag followed by a colon and then a part of speech | |||
| The second term(the one after the ';') can also be the lemma or the word id_name | |||
| You would use to search for a word with nsubj relationship with a noun: | |||
| ';nsubj:noun' | |||
| Could Output: | |||
| 'Token: 8, Form: жүзімдік, Lemma: жүзімдік, UPOSTAG: NOUN, HEAD: 6, DEPREL: conj, # sent_id = Шымкент.tagged.txt:8:216, Sentence:  Тау етегінде өзен бойындағы алқаптарда егіншілік пен жүзімдік  ал көгалды таулы жайылымдарда - мал шаруашылығы дамыған .' | |||
| <code> form=have>none=none </code> | |||
| == Example Of How To Use This Program == | |||
| python conlluparse.py "text.conllu"  | You can do very simple searches like <code> python conlluparse.py "text.conllu" "lemma=Еуровидение,form=Еуровидениенің" </code> without the > or < | ||
| == Examples == | |||
| ⚫ | |||
| python conlluparse.py "text.conllu" '{none=none>form=bread}' | |||
| python conlluparse.py "text.conllu"  | <code>python conlluparse.py "text.conllu" "none=none>none=none, @obj"</code> | ||
| This is how you would run the program. Could output:  | |||
|  Token: 6, Form: іліп, Lemma: іл, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = Ер_Төстік.tagged.txt:23:396, Sentence:  Сөйткенде Төстіктің бір бақайы өрмекті іліп кетеді . | |||
Latest revision as of 05:45, 22 December 2017
Parse and Search through a conllu file[edit]
Searching is as follows:
These are the terms for searching. Between the words you are searching for a relation between, add a '>'. You can also use and '<' if you are searching for a word that is a dependent of another word. This, the '<', will find the dependent word. For instance, if you wanted to see when 'have' did action to 'clue' (i.e. I have no clue') you could do it like this:
For example  python conlluparse.py "text.conllu" 'have>clue' might output:
Token: 2, Form: have, Lemma: have, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = 2, Sentence: I have no clue .
If you wanted, you could also be more specific or ambigious with your searches. When you specify these arguments, you also need to make sure that you concatenate "Form=" with the word you are searching for. When you have nothing specified on one side, you need to add 'none=none' to that side. For instance if you wanted to find if something was a dependent of 'have', you could do:
 none=none<form=have 
When searching with attributes (i.e UPOSTAG), you could do this like:
python conlluparse.py "text.conllu" 'upostag=verb, form=have>form=clue' which may output:
Token: 2, Form: have, Lemma: have, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = 2, Sentence: I have no clue .
You can search with any of these tags - upostag, xpostag, lemma, or deprel. You would do this by just putting the tag name + and '=' and then the actual value. Concatenate the tag an '=' and the value like upostag=noun' or 'lemma=clue' or @.
You can also specify attributes instead of 'form=clue' such as 'upostag=noun'
Token: 2, Form: have, Lemma: have, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = 2, Sentence: I have no clue .
Now, instead, if you search with python conlluparse.py "text.conllu" 'form=clue<none=none' , it will print:
Token: 4, Form: clue, Lemma: clue, UPOSTAG: NOUN, HEAD: 2, DEPREL: obj, # sent_id = 2, Sentence: I have no clue .
ou can search for a word with a specific deprel or upostag like
 @root, upostag=noun>none=none 
You can search for relationships like the ; character:
 @nsubj>upostag=noun 
You can search for a plain word like:
 form=have>none=none 
You can do very simple searches like  python conlluparse.py "text.conllu" "lemma=Еуровидение,form=Еуровидениенің"  without the > or <
Examples[edit]
python conlluparse.py "text.conllu" "none=none>none=none, @obj"
This is how you would run the program. Could output: 
Token: 6, Form: іліп, Lemma: іл, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = Ер_Төстік.tagged.txt:23:396, Sentence: Сөйткенде Төстіктің бір бақайы өрмекті іліп кетеді .

