Difference between revisions of "Conllu Parsing and Searching"

From Apertium
Jump to navigation Jump to search
Line 15: Line 15:


If you would like to search with a tree(i.e you want to search for a word with a HEAD value or word):
If you would like to search with a tree(i.e you want to search for a word with a HEAD value or word):

You would start your search with a '{'
You would start your search with a '{'

Then, between the words you are searching for a relation between add a '>'
Then, between the words you are searching for a relation between add a '>'

For instance, if you wanted to see when 'have' did action to 'clue' (i.e. I have no clue') you would use this character
For instance, if you wanted to see when 'have' did action to 'clue' (i.e. I have no clue') you would use this character

An example entry would be '{have>clue'
An example entry would be '{have>clue'

If you wanted, you could also be more specific or ambigious
If you wanted, you could also be more specific or ambigious

When searching with attributes (i.e UPOSTAG), you could do this like:
When searching with attributes (i.e UPOSTAG), you could do this like:

'{upostag=verb, form=have>form=clue'
'{upostag=verb, form=have>form=clue'

PLEASE NOTE THAT WHEN YOU SPECIFY EXTRA ATTRIBUTES YOU HAVE TO PUT 'Form=' ARGUMENT FOR THE WORD
PLEASE NOTE THAT WHEN YOU SPECIFY EXTRA ATTRIBUTES YOU HAVE TO PUT 'Form=' ARGUMENT FOR THE WORD
If you wanted to specify nothing and look for words that do action to bread, you would use:
If you wanted to specify nothing and look for words that do action to bread, you would use:

'{none=none>form=clue}'
'{none=none>form=clue}'

PLEASE NOTE THAT YOU HAVE TO HAVE 'NONE=NONE' WHERE NOTHING IS SPECIFIED
PLEASE NOTE THAT YOU HAVE TO HAVE 'NONE=NONE' WHERE NOTHING IS SPECIFIED

You can also specify attributes instead of 'form=clue' such as 'upostag=noun'
You can also specify attributes instead of 'form=clue' such as 'upostag=noun'

Example Output:
Example Output:

Token: 2, Form: have, Lemma: have, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = 2, Sentence: I have no clue .
Token: 2, Form: have, Lemma: have, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = 2, Sentence: I have no clue .


== The ':' character ==
== The ':' character ==

Revision as of 04:36, 10 December 2017

Parse and Search through a conllu file

Searching is as follows:

The '<' character

If you want to find a specific word(i.e. you want to find the word 'bread' in your ConLL-U file): You would start your search with a '<' Then write the word after the '<' (i.e '<ести) This will print your answer in this format: 'Token: 6, Form: ести, Lemma: есті, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = story.tagged.txt:44:776, Sentence: Ол енді ол дыбысты анығырақ ести бастады .' This gives you the Token(where in the sentence did this appear), lemma, upostag(part of speech), HEAD, and the sentence_id

The '{' character

If you would like to search with a tree(i.e you want to search for a word with a HEAD value or word):

You would start your search with a '{'

Then, between the words you are searching for a relation between add a '>'

For instance, if you wanted to see when 'have' did action to 'clue' (i.e. I have no clue') you would use this character

An example entry would be '{have>clue'

If you wanted, you could also be more specific or ambigious

When searching with attributes (i.e UPOSTAG), you could do this like:

'{upostag=verb, form=have>form=clue'

PLEASE NOTE THAT WHEN YOU SPECIFY EXTRA ATTRIBUTES YOU HAVE TO PUT 'Form=' ARGUMENT FOR THE WORD

If you wanted to specify nothing and look for words that do action to bread, you would use:

'{none=none>form=clue}'

PLEASE NOTE THAT YOU HAVE TO HAVE 'NONE=NONE' WHERE NOTHING IS SPECIFIED

You can also specify attributes instead of 'form=clue' such as 'upostag=noun'

Example Output:

Token: 2, Form: have, Lemma: have, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = 2, Sentence: I have no clue .

The ':' character

If you would like to search for a deprel or upostag and a feature in a word: You would start your search with a ':' and encapsulate your search with '[]' For instance if you wanted to search for a copula and past feature you would do ':[cop, past]' This would find a copula with a past feature and have an output like: 'Token: 3, Form: болғаныма, Lemma: бол, UPOSTAG: AUX, HEAD: 2, DEPREL: cop, # sent_id = akorda-random.tagged.txt:158:2829, Sentence: Мен осында болғаныма қуаныштымын қуанышты мын .'

The ';' character

If you would like to search with a relationship(i.e nsubj relation to another node that has a noun POS) You would start your search with a ';' You would then type a deprel tag followed by a colon and then a part of speech The second term(the one after the ';') can also be the lemma or the word id_name You would use to search for a word with nsubj relationship with a noun: ';nsubj:noun' Could Output: 'Token: 8, Form: жүзімдік, Lemma: жүзімдік, UPOSTAG: NOUN, HEAD: 6, DEPREL: conj, # sent_id = Шымкент.tagged.txt:8:216, Sentence: Тау етегінде өзен бойындағы алқаптарда егіншілік пен жүзімдік ал көгалды таулы жайылымдарда - мал шаруашылығы дамыған .'

Example Of How To Use This Program

python conlluparse.py "text.conllu" ':[cop, past]'

python conlluparse.py "text.conllu" ';nsubj:noun'

python conlluparse.py "text.conllu" '{none=none>form=bread}'

python conlluparse.py "text.conllu" '<bread'