Difference between revisions of "Conllu Parsing and Searching"
Line 17: | Line 17: | ||
If you would like to '''search with a tree'''(i.e you want to search for a word with a HEAD value or word), you would start your search with a '{'. Then, between the words you are searching for a relation between add a '>'. You can also use and '<' if you are searching for a word that is a dependent of another word. This will find the dependent word. For instance, if you wanted to see when 'have' did action to 'clue' (i.e. I have no clue') you would use this character |
If you would like to '''search with a tree'''(i.e you want to search for a word with a HEAD value or word), you would start your search with a '{'. Then, between the words you are searching for a relation between add a '>'. You can also use and '<' if you are searching for a word that is a dependent of another word. This will find the dependent word. For instance, if you wanted to see when 'have' did action to 'clue' (i.e. I have no clue') you would use this character |
||
For example <code> python conlluparse.py "text.conllu" '{have>clue'</code> |
For example <code> python conlluparse.py "text.conllu" '{have>clue'</code> might output: |
||
Token: 2, Form: have, Lemma: have, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = 2, Sentence: I have no clue . |
|||
If you wanted, you could also be more specific or ambigious. When searching with attributes (i.e UPOSTAG), you could do this like: |
If you wanted, you could also be more specific or ambigious with your searches. When searching with attributes (i.e UPOSTAG), you could do this like: |
||
<code>python conlluparse.py "text.conllu" '{upostag=verb, form=have>form=clue'</code> |
<code>python conlluparse.py "text.conllu" '{upostag=verb, form=have>form=clue'</code> which may output: |
||
Token: 2, Form: have, Lemma: have, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = 2, Sentence: I have no clue . |
|||
You can search with any of these tags - upostag, xpostag, lemma, or @rel(deprel). You would do this by just putting the tag name + and '=' and then the actual value, like 'upostag=noun' or 'lemma=clue'. |
You can search with any of these tags - upostag, xpostag, lemma, or @rel(deprel). You would do this by just putting the tag name + and '=' and then the actual value, like 'upostag=noun' or 'lemma=clue'. |
||
Line 30: | Line 32: | ||
'''PLEASE NOTE THAT YOU HAVE TO HAVE 'NONE=NONE' WHERE NOTHING IS SPECIFIED |
'''PLEASE NOTE THAT YOU HAVE TO HAVE 'NONE=NONE' WHERE NOTHING IS SPECIFIED |
||
''' |
''' |
||
You can also specify attributes instead of 'form=clue' such as 'upostag=noun' |
You can also specify attributes instead of 'form=clue' such as 'upostag=noun' |
||
Revision as of 01:11, 14 December 2017
Contents
Parse and Search through a conllu file
Searching is as follows:
Form search: the '<' character
If you want to find a specific word (e.g., you want to find the word "bread" in your ConLL-U file), you create a search with the <
symbol followed by the word you want to search for.
For example, the search term python conlluparse.py "text.conllu" '<ести'
might return:
Token: 6, Form: ести, Lemma: есті, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = story.tagged.txt:44:776, Sentence: Ол енді ол дыбысты анығырақ ести бастады .
The format of the result is the Token (where in the sentence the match appeared), the lemma, the upostag
(part of speech), the HEAD, and the sentence_id.
Tree search: The '{' character
If you would like to search with a tree(i.e you want to search for a word with a HEAD value or word), you would start your search with a '{'. Then, between the words you are searching for a relation between add a '>'. You can also use and '<' if you are searching for a word that is a dependent of another word. This will find the dependent word. For instance, if you wanted to see when 'have' did action to 'clue' (i.e. I have no clue') you would use this character
For example python conlluparse.py "text.conllu" '{have>clue'
might output:
Token: 2, Form: have, Lemma: have, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = 2, Sentence: I have no clue .
If you wanted, you could also be more specific or ambigious with your searches. When searching with attributes (i.e UPOSTAG), you could do this like:
python conlluparse.py "text.conllu" '{upostag=verb, form=have>form=clue'
which may output:
Token: 2, Form: have, Lemma: have, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = 2, Sentence: I have no clue .
You can search with any of these tags - upostag, xpostag, lemma, or @rel(deprel). You would do this by just putting the tag name + and '=' and then the actual value, like 'upostag=noun' or 'lemma=clue'.
PLEASE NOTE THAT WHEN YOU SPECIFY EXTRA ATTRIBUTES YOU HAVE TO PUT 'Form=' ARGUMENT FOR THE WORD
PLEASE NOTE THAT YOU HAVE TO HAVE 'NONE=NONE' WHERE NOTHING IS SPECIFIED
You can also specify attributes instead of 'form=clue' such as 'upostag=noun'
Token: 2, Form: have, Lemma: have, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = 2, Sentence: I have no clue .
Now, instead, if you search with python conlluparse.py "text.conllu" '{form=clue<none=none'
, it will print:
Token: 4, Form: clue, Lemma: clue, UPOSTAG: NOUN, HEAD: 2, DEPREL: obj, # sent_id = 2, Sentence: I have no clue .
Find Features: The ':' character
If you would like to search for a deprel or upostag and a feature in a word, you would start your search with a ':' and encapsulate your search with '[]'
For instance if you wanted to search for a copula and past feature you would do
python conlluparse.py "text.conllu" ':[cop, past]'
This would find a copula with a past feature and have an output like:
'Token: 3, Form: болғаныма, Lemma: бол, UPOSTAG: AUX, HEAD: 2, DEPREL: cop, # sent_id = akorda-random.tagged.txt:158:2829, Sentence: Мен осында болғаныма қуаныштымын қуанышты мын .'
Relationships: The ';' character
If you would like to search with a relationship(i.e nsubj relation to another node that has a noun POS), you would start your search with a ';'. You would then type a deprel tag followed by a colon and then a part of speech. The second term(the one after the ';') can also be the lemma or the word id_name. You would use to search for a word with nsubj relationship with a noun:
python conlluparse.py "text.conllu" ';nsubj:noun'
Would output:
'Token: 8, Form: жүзімдік, Lemma: жүзімдік, UPOSTAG: NOUN, HEAD: 6, DEPREL: conj, # sent_id = Шымкент.tagged.txt:8:216, Sentence: Тау етегінде өзен бойындағы алқаптарда егіншілік пен жүзімдік ал көгалды таулы жайылымдарда - мал шаруашылығы дамыған .'
Examples
python conlluparse.py "text.conllu" ':[cop, past]'
This is how you would run the program with the ':' Could output:
'Token: 3, Form: болғаныма, Lemma: бол, UPOSTAG: AUX, HEAD: 2, DEPREL: cop, # sent_id = akorda-random.tagged.txt:158:2829, Sentence: Мен осында болғаныма қуаныштымын қуанышты мын .'
python conlluparse.py "text.conllu" ';nsubj:noun'
This is how you would run the program with the ';' Could output:
'Token: 8, Form: жүзімдік, Lemma: жүзімдік, UPOSTAG: NOUN, HEAD: 6, DEPREL: conj, # sent_id = Шымкент.tagged.txt:8:216, Sentence: Тау етегінде өзен бойындағы алқаптарда егіншілік пен жүзімдік ал көгалды таулы жайылымдарда - мал шаруашылығы дамыған .'
python conlluparse.py "text.conllu" '{none=none>form=clue}'
This is how you would run the program with the ';' Could output:
Token: 2, Form: have, Lemma: have, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = 2, Sentence: I have no clue .
python conlluparse.py "text.conllu" '<ести'
This is how you would run the program with the '<' Could output:
Token: 6, Form: ести, Lemma: есті, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = story.tagged.txt:44:776, Sentence: Ол енді ол дыбысты анығырақ ести бастады .