Difference between revisions of "Conllu Parsing and Searching"

From Apertium
Jump to navigation Jump to search
Line 44: Line 44:


This will output the same thing as what the other searches would do. Use other searches to distinguish between what you are searching for.
This will output the same thing as what the other searches would do. Use other searches to distinguish between what you are searching for.

== Find Features: The ':' character ==

If you would like '''to search for a deprel or upostag and a feature in a word''', you would start your search with a ':' and encapsulate your search with '[]'

For instance if you wanted to search for a copula and past feature you would do

<code>python conlluparse.py "text.conllu" ':[cop, past]' </code> which may output:

'Token: 3, Form: болғаныма, Lemma: бол, UPOSTAG: AUX, HEAD: 2, DEPREL: cop, # sent_id = akorda-random.tagged.txt:158:2829, Sentence: Мен осында болғаныма қуаныштымын қуанышты мын .'


== Relationships: The ';' character ==
== Relationships: The ';' character ==

Revision as of 03:18, 22 December 2017

Parse and Search through a conllu file

Searching is as follows:


Tree search: The '{' character

If you would like to search with a tree(i.e you want to search for a word with a HEAD value or word), you create your search with a '{'. Then, between the words you are searching for a relation between add a '>'. You can also use and '<' if you are searching for a word that is a dependent of another word. This, the '<', will find the dependent word. For instance, if you wanted to see when 'have' did action to 'clue' (i.e. I have no clue') you could do it like this:

For example python conlluparse.py "text.conllu" '{have>clue' might output:

Token: 2, Form: have, Lemma: have, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = 2, Sentence:  I have no clue .

If you wanted, you could also be more specific or ambigious with your searches. When you specify these arguments, you also need to make sure that you concatenate "Form=" with the word you are searching for. When you have nothing specified on one side, you need to add 'none=none' to that side. For instance if you wanted to find if something was a dependent of 'have', you could do:

{none=none<form=have

When searching with attributes (i.e UPOSTAG), you could do this like:

python conlluparse.py "text.conllu" '{upostag=verb, form=have>form=clue' which may output:

Token: 2, Form: have, Lemma: have, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = 2, Sentence:  I have no clue .

You can search with any of these tags - upostag, xpostag, lemma, or deprel. You would do this by just putting the tag name + and '=' and then the actual value. Concatenate the tag an '=' and the value like upostag=noun' or 'lemma=clue' or @.

You can also specify attributes instead of 'form=clue' such as 'upostag=noun'

Token: 2, Form: have, Lemma: have, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = 2, Sentence:  I have no clue .

Now, instead, if you search with python conlluparse.py "text.conllu" '{form=clue<none=none' , it will print:

Token: 4, Form: clue, Lemma: clue, UPOSTAG: NOUN, HEAD: 2, DEPREL: obj, # sent_id = 2, Sentence:  I have no clue .

The tree searches essentially combines all the searches terms. You can search for a word with a specific deprel or upostag like

{@root, upostag=noun>none=none

You can search for relationships like the ; character:

{@nsubj>upostag=noun

You can search for a plain word like:

{form=have>none=none

This will output the same thing as what the other searches would do. Use other searches to distinguish between what you are searching for.

Relationships: The ';' character

If you would like to search with a relationship(i.e nsubj relation to another node that has a noun POS), you would start your search with a ';'. You would then type a deprel tag followed by a colon and then a part of speech. The second term(the one after the ';') can be these tags : lemma or the word id_name. You would use to search for a word with nsubj relationship with a noun:

python conlluparse.py "text.conllu" ';nsubj:noun' Would output:

'Token: 8, Form: жүзімдік, Lemma: жүзімдік, UPOSTAG: NOUN, HEAD: 6, DEPREL: conj, # sent_id = Шымкент.tagged.txt:8:216, Sentence:  Тау етегінде өзен бойындағы алқаптарда егіншілік пен жүзімдік  ал көгалды таулы жайылымдарда - мал шаруашылығы дамыған .'

Examples

python conlluparse.py "text.conllu" ':[aux, pres]' This is how you would run the program with the ':' Could output:

Token: 5, Form: жатырсыздар, Lemma: жат, UPOSTAG: AUX, HEAD: 4, DEPREL: aux, # sent_id = akorda-random.tagged.txt:44:775, Sentence:  - Сіздер осында тұрып жатырсыздар  ал Астанада жұмыс істейсіздер .

python conlluparse.py "text.conllu" ';nsubj:verb' This is how you would run the program with the ';' Could output:

Token: 7, Form: секіреді, Lemma: секір, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = Ер_Төстік.tagged.txt:22:380, Sentence:  Қуып келе жатқан Төстік Төстік те өрмектен секіреді .

python conlluparse.py "text.conllu" "{none=none>none=none, @obj" This is how you would run the program with the ';' Could output:

Token: 6, Form: іліп, Lemma: іл, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = Ер_Төстік.tagged.txt:23:396, Sentence:  Сөйткенде Төстіктің бір бақайы өрмекті іліп кетеді .

python conlluparse.py "text.conllu" '<іліп' This is how you would run the program with the '<' Could output:

Token: 6, Form: іліп, Lemma: іл, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = Ер_Төстік.tagged.txt:23:396, Sentence:  Сөйткенде Төстіктің бір бақайы өрмекті іліп кетеді .