Difference between revisions of "Tagging guidelines for English"

From Apertium
Jump to navigation Jump to search
Line 35: Line 35:
 
** '''Which''' is better?
 
** '''Which''' is better?
 
* <code>rel.an.mf.sp</code> [relative pronoun, introduces an explanatory sentence that describes a noun]
 
* <code>rel.an.mf.sp</code> [relative pronoun, introduces an explanatory sentence that describes a noun]
** '''I saw the men, which stayed there all night'''
+
** I saw the man, '''which''' stayed there all night.
   
 
==="this", "several"===
 
==="this", "several"===

Revision as of 18:09, 24 November 2013

About tagging

You can think of part-of-speech tagging a bit like answering a series of multiple-choice questions. The word is the question, and the possible analyses are the answers. Unknown words can be thought of as questions we don't know what the possible answers are yet. To "tag" the text, you need to answer all of the questions by deleting the "incorrect" answers.

Why is this important?

Hand-tagged texts are needed in large quantities (tens, or better hundreds, of thousands of words) to 'train' the automatic taggers found in some Apertium language pairs. Getting the right tag for a word is important, as translation depends on it. For instance, the word book can be a verb or a noun. When translating to Spanish, they have different translation:

  • [noun] Ann bought a book about gardening → Ann compró un libro sobre jardinería.
  • [verb] Ann wants to book a room at the Ritz → Ann quiere reservar una habitación en el Ritz.

This is why we have many hand-tagging tasks in the Google Code-In.

Guidelines

"both"

The word "both" can be a conjunction, joining two noun phrases, a determiner, modifying a noun phrase, and a pronoun, substituting a noun phrase.

  • cnjcoo
    • I like both cats and dogs.
  • det
    • Both children like playing in the garden.
  • prn
    • Both thought it a good idea.
    • They both like playing in the garden.
    • Both of them like playing in the garden.

"Which"

  • det.itg.sp [determiner, modifies a noun or an adjective]
    • Which car have you bought?
  • prn.itg.m.sp [pronoun, does not modify a noun or adjective, meaning "which one"?]
    • Which is better?
  • rel.an.mf.sp [relative pronoun, introduces an explanatory sentence that describes a noun]
    • I saw the man, which stayed there all night.

"this", "several"

The word "this" (along with its plural "these") can be either a determiner, modifying a noun phrase, or a pronoun, replacing a noun phrase.

  • det.dem
    • I don't like this cat.
    • I don't like these cats.
  • prn
    • This is the reason.
    • These are the ones.

The word "several" follows a similar pattern:

  • det.dem
    • He ate several cakes.
  • prn
    • They like cakes, they always buy several when they go to the shops.
    • Several of them thought it was a good idea.

"that"

The word "that" can be either a determiner, which modifies a noun phrase, a demonstrative pronoun which substitutes a noun phrase, a subordinating conjunction or a relative pronoun.

  • det.dem
    • I don't like that cat.
    • I don't like those cats.
  • prn
    • That is the reason.
    • Those are the ones.
  • rel
    • These are the ones that I like.
  • cnjsub
    • I think that you like cats.

Here is a tip for distinguishing rel and cnjsub. Try substituting the word "that" for the word "which" and see how it sounds. If it sounds ok, then your "that" is probably a relative pronoun, if it sounds bad, it's probably a conjunction.

  • ok: These are the ones which I like.
  • not ok: I think which you like cats.

"no"

The word "no" in English can be a determiner, modifying a noun phrase or an adverb (or interjection).

  • det.ind
    • There are no cats in my attic.
  • adv
    • No! Don't do that!

Verbs with "-ing"

The ending -ing in English can be a gerund (adverbial), a substantive (like a noun) or a present participle (like an adjective).

  • vblex.subs:
    • Roughly, when you can substitute it with a noun: "Flying is hard" → "Flight is hard"
  • vblex.pprs:
    • Roughly, when you can substitute it with a relative clause: "The flying circus" → "The circus that flies"
  • vblex.ger
    • When it follows to be in continuous tenses, or when it can be replaced by a prepositional phrase or a different verbal phrase:
      • "He came singing" → "He came with a song"
      • "He is singing → "He sings"

Adverb or adjective

A word like "first" can be either an adverb, or an ordinal adjective. An adverb modifies a verb phrase (or an adjective phrase, or another adverb), an ordinal adjective modifies a noun phrase.

  • adj
    • This is my first computer.
  • adv
    • First I'm going to buy a computer. [modifying a verb]

Infinitive or present

In English, the "short" infinitive often overlaps in wordform with the present tense, for example:

  • inf
    • I like to play football.
  • pres
    • I play football on Wednesdays.

A tip for distinguishing is to try and put the verb into the third person and see how it sounds, so e.g.

  • not ok: He likes to plays football.
  • ok: He plays football.

Adverb or preposition

In English many words can be adverbs or prepositions. Both often modify a verb phrase, but a preposition is typically followed by a noun phrase, where an adverb stands on its own.

  • pr
    • He plays by the river.
  • adv
    • He walks by.

Past tense or past participle

Many verbs ending in -ed (worked) may be past tense and past participle.


Here is a trick, change the verb to a form of go or drink:

  • If you would have went or drank, then it is past tense (past);
  • if you would have gone or drunk, then it is a past participle (pp).


"In", "on", "Under"

They can be adverbs (adv) or prepositions (pr). As an adverb it stands alone and modifies the verb. As a preposition, it connects a noun phrase (a phrase built around a noun) to one preceding element in the sentence.

Trick: If you can change it by out then it is an adverb.

  • The technician is in: adverb ("The technician is out" is OK)
  • The technician is in the restroom: preposition ("The technician is out the restroom" is not OK)
  • The technician is under the table: preposition ("The technician is out the table" is not OK)
  • The light is on: adverb ("The light is out" is OK)