Difference between revisions of "Conllu Parsing and Searching"

From Apertium
Jump to navigation Jump to search
Line 13: Line 13:
 
The format of the result is the Token (where in the sentence the match appeared), the lemma, the <code>upostag</code> (part of speech), the HEAD, and the sentence_id.
 
The format of the result is the Token (where in the sentence the match appeared), the lemma, the <code>upostag</code> (part of speech), the HEAD, and the sentence_id.
   
== The '{' character ==
+
== Tree search: The '{' character ==
   
If you would like to search with a tree(i.e you want to search for a word with a HEAD value or word):
+
If you would like to '''search with a tree'''(i.e you want to search for a word with a HEAD value or word), you would start your search with a '{'. Then, between the words you are searching for a relation between add a '>'. For instance, if you wanted to see when 'have' did action to 'clue' (i.e. I have no clue') you would use this character
   
 
For example <code> {have>clue'</code>
You would start your search with a '{'
 
   
 
If you wanted, you could also be more specific or ambigious. When searching with attributes (i.e UPOSTAG), you could do this like:
Then, between the words you are searching for a relation between add a '>'
 
   
 
<code>{upostag=verb, form=have>form=clue</code>
For instance, if you wanted to see when 'have' did action to 'clue' (i.e. I have no clue') you would use this character
 
   
 
""PLEASE NOTE THAT WHEN YOU SPECIFY EXTRA ATTRIBUTES YOU HAVE TO PUT 'Form=' ARGUMENT FOR THE WORD""
An example entry would be '{have>clue'
 
 
If you wanted, you could also be more specific or ambigious
 
 
When searching with attributes (i.e UPOSTAG), you could do this like:
 
 
'{upostag=verb, form=have>form=clue'
 
 
PLEASE NOTE THAT WHEN YOU SPECIFY EXTRA ATTRIBUTES YOU HAVE TO PUT 'Form=' ARGUMENT FOR THE WORD
 
 
 
 
If you wanted to specify nothing and look for words that do action to bread, you would use:
 
If you wanted to specify nothing and look for words that do action to bread, you would use:
   
'{none=none>form=clue}'
+
<code>{none=none>form=clue} </code>
   
PLEASE NOTE THAT YOU HAVE TO HAVE 'NONE=NONE' WHERE NOTHING IS SPECIFIED
+
""PLEASE NOTE THAT YOU HAVE TO HAVE 'NONE=NONE' WHERE NOTHING IS SPECIFIED""
   
 
You can also specify attributes instead of 'form=clue' such as 'upostag=noun'
 
You can also specify attributes instead of 'form=clue' such as 'upostag=noun'
   
 
Token: 2, Form: have, Lemma: have, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = 2, Sentence: I have no clue .
Example Output:
 
 
Token: 2, Form: have, Lemma: have, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = 2, Sentence: I have no clue .
 
   
 
== The ':' character ==
 
== The ':' character ==

Revision as of 18:20, 10 December 2017

Parse and Search through a conllu file

Searching is as follows:

Form search: the '<' character

If you want to find a specific word (e.g., you want to find the word "bread" in your ConLL-U file), you create a search with the < symbol followed by the word you want to search for.

For example, the search term '<ести might return:

Token: 6, Form: ести, Lemma: есті, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = story.tagged.txt:44:776, Sentence:  Ол енді ол дыбысты анығырақ ести бастады .

The format of the result is the Token (where in the sentence the match appeared), the lemma, the upostag (part of speech), the HEAD, and the sentence_id.

Tree search: The '{' character

If you would like to search with a tree(i.e you want to search for a word with a HEAD value or word), you would start your search with a '{'. Then, between the words you are searching for a relation between add a '>'. For instance, if you wanted to see when 'have' did action to 'clue' (i.e. I have no clue') you would use this character

For example {have>clue'

If you wanted, you could also be more specific or ambigious. When searching with attributes (i.e UPOSTAG), you could do this like:

{upostag=verb, form=have>form=clue

""PLEASE NOTE THAT WHEN YOU SPECIFY EXTRA ATTRIBUTES YOU HAVE TO PUT 'Form=' ARGUMENT FOR THE WORD""

If you wanted to specify nothing and look for words that do action to bread, you would use:

{none=none>form=clue}

""PLEASE NOTE THAT YOU HAVE TO HAVE 'NONE=NONE' WHERE NOTHING IS SPECIFIED""

You can also specify attributes instead of 'form=clue' such as 'upostag=noun'

Token: 2, Form: have, Lemma: have, UPOSTAG: VERB, HEAD: 0, DEPREL: root, # sent_id = 2, Sentence:  I have no clue .

The ':' character

If you would like to search for a deprel or upostag and a feature in a word:

You would start your search with a ':' and encapsulate your search with '[]'

For instance if you wanted to search for a copula and past feature you would do

':[cop, past]'

This would find a copula with a past feature and have an output like:

'Token: 3, Form: болғаныма, Lemma: бол, UPOSTAG: AUX, HEAD: 2, DEPREL: cop, # sent_id = akorda-random.tagged.txt:158:2829, Sentence: Мен осында болғаныма қуаныштымын қуанышты мын .'

The ';' character

If you would like to search with a relationship(i.e nsubj relation to another node that has a noun POS)

You would start your search with a ';'

You would then type a deprel tag followed by a colon and then a part of speech

The second term(the one after the ';') can also be the lemma or the word id_name

You would use to search for a word with nsubj relationship with a noun:

';nsubj:noun'

Could Output:

'Token: 8, Form: жүзімдік, Lemma: жүзімдік, UPOSTAG: NOUN, HEAD: 6, DEPREL: conj, # sent_id = Шымкент.tagged.txt:8:216, Sentence: Тау етегінде өзен бойындағы алқаптарда егіншілік пен жүзімдік ал көгалды таулы жайылымдарда - мал шаруашылығы дамыған .'

Examples

No example output shown, and no descriptions given

python conlluparse.py "text.conllu" ':[cop, past]'
python conlluparse.py "text.conllu" ';nsubj:noun'
python conlluparse.py "text.conllu" '{none=none>form=bread}'
python conlluparse.py "text.conllu" '<bread'