Difference between revisions of "Conllu Parsing and Searching"
Line 6: | Line 6: | ||
If you want to find a specific word(i.e. you want to find the word 'bread' in your ConLL-U file): |
If you want to find a specific word(i.e. you want to find the word 'bread' in your ConLL-U file): |
||
You would start your search with a '<' |
You would start your search with a '<' |
||
Then write the word after the '<' (i.e '<ести) |
Then write the word after the '<' (i.e '<ести) |
||
This will print your answer in this format: |
This will print your answer in this format: |
||
'Token: 6, Form: ести, Lemma: есті, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = story.tagged.txt:44:776, Sentence: Ол енді ол дыбысты анығырақ ести бастады .' |
'Token: 6, Form: ести, Lemma: есті, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = story.tagged.txt:44:776, Sentence: Ол енді ол дыбысты анығырақ ести бастады .' |
||
This gives you the Token(where in the sentence did this appear), lemma, upostag(part of speech), HEAD, and the sentence_id |
This gives you the Token(where in the sentence did this appear), lemma, upostag(part of speech), HEAD, and the sentence_id |
||
== The '{' character == |
== The '{' character == |
Revision as of 04:37, 10 December 2017
Contents
Parse and Search through a conllu file
Searching is as follows:
The '<' character
If you want to find a specific word(i.e. you want to find the word 'bread' in your ConLL-U file):
You would start your search with a '<'
Then write the word after the '<' (i.e '<ести)
This will print your answer in this format:
'Token: 6, Form: ести, Lemma: есті, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = story.tagged.txt:44:776, Sentence: Ол енді ол дыбысты анығырақ ести бастады .'
This gives you the Token(where in the sentence did this appear), lemma, upostag(part of speech), HEAD, and the sentence_id
The '{' character
If you would like to search with a tree(i.e you want to search for a word with a HEAD value or word):
You would start your search with a '{'
Then, between the words you are searching for a relation between add a '>'
For instance, if you wanted to see when 'have' did action to 'clue' (i.e. I have no clue') you would use this character
An example entry would be '{have>clue'
If you wanted, you could also be more specific or ambigious
When searching with attributes (i.e UPOSTAG), you could do this like:
'{upostag=verb, form=have>form=clue'
PLEASE NOTE THAT WHEN YOU SPECIFY EXTRA ATTRIBUTES YOU HAVE TO PUT 'Form=' ARGUMENT FOR THE WORD
If you wanted to specify nothing and look for words that do action to bread, you would use:
'{none=none>form=clue}'
PLEASE NOTE THAT YOU HAVE TO HAVE 'NONE=NONE' WHERE NOTHING IS SPECIFIED
You can also specify attributes instead of 'form=clue' such as 'upostag=noun'
Example Output:
Token: 2, Form: have, Lemma: have, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = 2, Sentence: I have no clue .
The ':' character
If you would like to search for a deprel or upostag and a feature in a word:
You would start your search with a ':' and encapsulate your search with '[]'
For instance if you wanted to search for a copula and past feature you would do
':[cop, past]'
This would find a copula with a past feature and have an output like:
'Token: 3, Form: болғаныма, Lemma: бол, UPOSTAG: AUX, HEAD: 2, DEPREL: cop, # sent_id = akorda-random.tagged.txt:158:2829, Sentence: Мен осында болғаныма қуаныштымын қуанышты мын .'
The ';' character
If you would like to search with a relationship(i.e nsubj relation to another node that has a noun POS) You would start your search with a ';' You would then type a deprel tag followed by a colon and then a part of speech The second term(the one after the ';') can also be the lemma or the word id_name You would use to search for a word with nsubj relationship with a noun: ';nsubj:noun' Could Output: 'Token: 8, Form: жүзімдік, Lemma: жүзімдік, UPOSTAG: NOUN, HEAD: 6, DEPREL: conj, # sent_id = Шымкент.tagged.txt:8:216, Sentence: Тау етегінде өзен бойындағы алқаптарда егіншілік пен жүзімдік ал көгалды таулы жайылымдарда - мал шаруашылығы дамыған .'
Example Of How To Use This Program
python conlluparse.py "text.conllu" ':[cop, past]'
python conlluparse.py "text.conllu" ';nsubj:noun'
python conlluparse.py "text.conllu" '{none=none>form=bread}'
python conlluparse.py "text.conllu" '<bread'