Tagging guidelines for Spanish
You can think of part-of-speech tagging a bit like answering a series of multiple-choice questions. The word is the question, and the possible analyses are the answers. Unknown words can be thought of as questions we don't know what the possible answers are yet. To "tag" the text, you need to answer all of the questions by deleting the "incorrect" answers.
Why is this important?
Hand-tagged texts are needed in large quantities (tens, or better hundreds, of thousands of words) to 'train' the automatic taggers found in some Apertium language pairs. Getting the right tag for a word is important, as translation depends on it. For instance, the Spanish word canto can be a verb or a noun. When translating to English, they have different translations:
- [verb] Los domingos canto en el coro → On Sundays I sing in the choir.
- [verb] La moneda cayó de canto → The coin fell on its edge
This is why we have many hand-tagging tasks in the Google Code-In.
These guidelines cover some difficult words when hand-tagging Spanish Apertium output.
The word "como" can be a preposition, an adverbial relative, and a form of the verb comer (eat) conjunction, joining two noun phrases, a determiner, modifying a noun phrase, and a pronoun, substituting a noun phrase.
- It is a preposition (
pr) when it introduces a noun phrase and may be substituted by del estilo de or del tipo de:
- Un coche como este
- It is an adverbial relative (
rel.adv) when it introduces an adverbial subordinate clause and may be subtituted by en la manera en que or de la manera que, etc.
- Vinieron como yo les dije
- And it is a verb (
vblex.pri.p1.sg) when it may be substituted by comía or comí
- Normalmente no como carne