Apertium has moved from SourceForge to GitHub.
If you have any questions, please come and talk to us on #apertium on irc.freenode.net or contact the GitHub migration team.

Tagging guidelines for Spanish

From Apertium
Jump to: navigation, search

These tagging guidelines will never be complete. Problematic words will be added as they are encountered.

Contents

[edit] About tagging

You can think of part-of-speech tagging a bit like answering a series of multiple-choice questions. The word is the question, and the possible analyses are the answers. Unknown words can be thought of as questions we don't know what the possible answers are yet. To "tag" the text, you need to answer all of the questions by deleting the "incorrect" answers.

[edit] Why is this important?

Hand-tagged texts are needed in large quantities (tens, or better hundreds, of thousands of words) to 'train' the automatic taggers found in some Apertium language pairs. Getting the right tag for a word is important, as translation depends on it. For instance, the Spanish word canto can be a verb or a noun. When translating to English, they have different translations:

  • [verb] Los domingos canto en el coro → On Sundays I sing in the choir.
  • [verb] La moneda cayó de canto → The coin fell on its edge

This is why we have many hand-tagging tasks in the Google Code-In.

[edit] Guidelines

These guidelines cover some difficult words when hand-tagging Spanish Apertium output.

[edit] "como"

The word "como" can be a preposition, an adverbial relative, and a form of the verb comer (eat) conjunction, joining two noun phrases, a determiner, modifying a noun phrase, and a pronoun, substituting a noun phrase.

  • It is a preposition (pr) when it introduces a noun phrase and may be substituted by del estilo de or del tipo de:
    • Un coche como este
  • It is an adverbial relative (rel.adv) when it introduces an adverbial subordinate clause and may be subtituted by en la manera en que or de la manera que, etc.
    • Vinieron como yo les dije
  • And it is a verb (vblex.pri.p1.sg) when it may be substituted by comía or comí
    • Normalmente no como carne

[edit] "la"

La is the most common ambiguous word in Spanish. It can be a determiner or a pronoun.

  • It is a determiner (det.def.f.sg) when it precedes a noun phrase and can be substituted by another determiner such as una or aquella
    • La historia nos enseña el camino del futuro
  • It is a proclitic pronoun (prn.pro.p3.f.sg) when it precedes a verb and it means a ella, a esa, etc..
    • La llamó por teléfono.

[edit] "lo"

The proclitic form lo may be a neuter determiner (detnt) or a proclitic pronoun appearing before a verb (prn.pro.p3.m.sg).

  • It is an article when it precedes an adjective or a relative clause with que, and refers to something abstract or to a whole situation:
    • Hice lo que me dijiste
    • No me gusta lo rojo.
    • Me asusta lo grande que es.
  • It is a pronoun when it precedes a verb and it means a él, a ese, etc..
    • Lo vio por la calle.

[edit] "-lo"

The enclitic form -lo appears after a verb in infinitive, gerund or imperative, and may be masculine (prn.enc.p3.m.sg) or neuter (prn.enc.p3.nt).

  • It is masculine when it be substituted by ese (ése), aquel (aquél), etc:
    • En el estante hay un libro. Tráelo.
  • It is neuter when it may be substituted by eso, aquello, usually referring to a whole situation or abstract concept.
    • Suele entrar sin saludar. No puedo tolerarlo.

[edit] "los"

Los can be a determiner or a pronoun.

  • It is a determiner (det.def.m.pl) when it precedes a noun phrase and can be substituted by another determiner such as unos or aquellos
    • Los trabajadores destruyeron la fábrica.
  • It is a (prn.pro.p3.m.pl) pronoun when it precedes a verb and it means a ellos, a esos, etc..
    • Los llamó por teléfono.

[edit] "más"

Más can be an adverb or an adjective.

  • It is an adjective when it modifies a noun, and acts as a comparative form of mucho, mucha, muchos, muchas. Try substituting with another adjective.
    • Han venido más espectadores
    • Tengo más leche si quieres
    • No tengo más.
  • It is an adverb when it modifies an adjective or a verb, and acts as a comparative form of mucho or as a way to make comparisons. Try substituting with another adverb.
    • Corre más o no llegarás
    • Es más guapo de lo que pensaba.
Personal tools