Difference between revisions of "Tagging guidelines for Spanish"

From Apertium
Jump to navigation Jump to search
Line 34: Line 34:
 
* It is a determiner (<code>det.def.f.sg</code>) when it precedes a noun phrase and can be substituted by another determiner such as ''una'' or ''aquella''
 
* It is a determiner (<code>det.def.f.sg</code>) when it precedes a noun phrase and can be substituted by another determiner such as ''una'' or ''aquella''
 
** ''La'' historia nos enseña el camino del futuro
 
** ''La'' historia nos enseña el camino del futuro
* It is a pronoun when it precedes a verb and it means ''a ella'', ''a esa'', etc..
+
* It is a proclitic pronoun (<code>prn.pro.p3.f.sg</code>) when it precedes a verb and it means ''a ella'', ''a esa'', etc..
 
** ''La'' llamó por teléfono.
 
** ''La'' llamó por teléfono.
   

Revision as of 08:15, 26 December 2015

About tagging

You can think of part-of-speech tagging a bit like answering a series of multiple-choice questions. The word is the question, and the possible analyses are the answers. Unknown words can be thought of as questions we don't know what the possible answers are yet. To "tag" the text, you need to answer all of the questions by deleting the "incorrect" answers.

Why is this important?

Hand-tagged texts are needed in large quantities (tens, or better hundreds, of thousands of words) to 'train' the automatic taggers found in some Apertium language pairs. Getting the right tag for a word is important, as translation depends on it. For instance, the Spanish word canto can be a verb or a noun. When translating to English, they have different translations:

  • [verb] Los domingos canto en el coro → On Sundays I sing in the choir.
  • [verb] La moneda cayó de canto → The coin fell on its edge

This is why we have many hand-tagging tasks in the Google Code-In.

Guidelines

These guidelines cover some difficult words when hand-tagging Spanish Apertium output.

"como"

The word "como" can be a preposition, an adverbial relative, and a form of the verb comer (eat) conjunction, joining two noun phrases, a determiner, modifying a noun phrase, and a pronoun, substituting a noun phrase.

  • It is a preposition (pr) when it introduces a noun phrase and may be substituted by del estilo de or del tipo de:
    • Un coche como este
  • It is an adverbial relative (rel.adv) when it introduces an adverbial subordinate clause and may be subtituted by en la manera en que or de la manera que, etc.
    • Vinieron como yo les dije
  • And it is a verb (vblex.pri.p1.sg) when it may be substituted by comía or comí
    • Normalmente no como carne

"la"

La is the most common ambiguous word in Spanish. It can be a determiner or a pronoun.

  • It is a determiner (det.def.f.sg) when it precedes a noun phrase and can be substituted by another determiner such as una or aquella
    • La historia nos enseña el camino del futuro
  • It is a proclitic pronoun (prn.pro.p3.f.sg) when it precedes a verb and it means a ella, a esa, etc..
    • La llamó por teléfono.

"los"

Los can be a determiner or a pronoun.

  • It is a determiner (det.def.m.pl) when it precedes a noun phrase and can be substituted by another determiner such as unos or aquellos
    • Los trabajadores destruyeron la fábrica.
  • It is a pronoun when it precedes a verb and it means a ellos, a esos, etc..
    • Los llamó por teléfono.