Difference between revisions of "Tagging guidelines for English"

From Apertium
Jump to navigation Jump to search
Line 152: Line 152:
 
* The technician is '''under''' the table: ''preposition'' ("The technician is '''out''' the table" is not OK)
 
* The technician is '''under''' the table: ''preposition'' ("The technician is '''out''' the table" is not OK)
 
* The light is '''on''': ''adverb'' ("The light is '''out'''" is OK)
 
* The light is '''on''': ''adverb'' ("The light is '''out'''" is OK)
 
   
 
==="'s"===
 
==="'s"===

Revision as of 12:03, 1 December 2013

About tagging

You can think of part-of-speech tagging a bit like answering a series of multiple-choice questions. The word is the question, and the possible analyses are the answers. Unknown words can be thought of as questions we don't know what the possible answers are yet. To "tag" the text, you need to answer all of the questions by deleting the "incorrect" answers.

Why is this important?

Hand-tagged texts are needed in large quantities (tens, or better hundreds, of thousands of words) to 'train' the automatic taggers found in some Apertium language pairs. Getting the right tag for a word is important, as translation depends on it. For instance, the word book can be a verb or a noun. When translating to Spanish, they have different translation:

  • [noun] Ann bought a book about gardening → Ann compró un libro sobre jardinería.
  • [verb] Ann wants to book a room at the Ritz → Ann quiere reservar una habitación en el Ritz.

This is why we have many hand-tagging tasks in the Google Code-In.

Guidelines

"both"

The word "both" can be a conjunction, joining two noun phrases, a determiner, modifying a noun phrase, and a pronoun, substituting a noun phrase.

  • cnjcoo
    • I like both cats and dogs.
  • det
    • Both children like playing in the garden.
  • prn
    • Both thought it was a good idea.
    • They both like playing in the garden.
    • Both of them like playing in the garden.

"Which"

  • det.itg.sp [determiner, modifies a noun or an adjective]
    • Which car have you bought?
  • prn.itg.m.sp [pronoun, does not modify a noun or adjective, meaning "which one"?]
    • Which is better?
  • rel.an.mf.sp [relative pronoun, introduces an explanatory sentence that describes a noun]
    • I saw the car, which stayed there all night.

"this", "several", "each"

The word "this" (along with its plural "these") can be either a determiner, modifying a noun phrase, or a pronoun, replacing a noun phrase.

  • det.dem
    • I don't like this cat.
    • I don't like these cats.
  • prn
    • This is the reason.
    • These are the ones.

The word "several" follows a similar pattern:

  • det.dem
    • He ate several cakes.
  • prn
    • They like cakes, they always buy several when they go to the shops.
    • Several of them thought it was a good idea.

"that"

The word "that" can be either a determiner, which modifies a noun phrase, a demonstrative pronoun which substitutes a noun phrase, a subordinating conjunction or a relative pronoun.

  • det.dem
    • I don't like that cat.
    • I don't like those cats.
  • prn
    • That is the reason.
    • Those are the ones.
  • rel
    • These are the ones that I like.
  • cnjsub
    • I think that you like cats.

Here is a tip for distinguishing rel and cnjsub. Try substituting the word "that" for the word "which" and see how it sounds. If it sounds ok, then your "that" is probably a relative pronoun, if it sounds bad, it's probably a conjunction.

  • ok: These are the ones which I like.
  • not ok: I think which you like cats.

"no"

The word "no" in English can be a determiner, modifying a noun phrase or an adverb (or interjection).

  • det.ind
    • There are no cats in my attic.
  • adv
    • No! Don't do that!

Verbs with "-ing"

The ending -ing in English can be a gerund (adverbial), a substantive (like a noun) or a present participle (like an adjective).

  • vblex.subs:
    • Roughly, when you can substitute it with a noun: "Flying is hard" → "Flight is hard"
  • vblex.pprs:
    • Roughly, when you can substitute it with a relative clause: "The flying circus" → "The circus that flies"
  • vblex.ger
    • When it follows to be in continuous tenses, or when it can be replaced by a prepositional phrase or a different verbal phrase:
      • "He came singing" → "He came with a song"
      • "He is singing → "He sings"

Adverb or adjective

A word like "first" can be either an adverb, or an ordinal adjective. An adverb modifies a verb phrase (or an adjective phrase, or another adverb), an ordinal adjective modifies a noun phrase.

  • adj
    • This is my first computer.
  • adv
    • First I'm going to buy a computer. [modifying a verb]

Infinitive or present

In English, the "short" infinitive often overlaps in wordform with the present tense, for example:

  • inf
    • I like to play football.
  • pres
    • I play football on Wednesdays.

A tip for distinguishing is to try and put the verb into the third person and see how it sounds, so e.g.

  • not ok: He likes to plays football.
  • ok: He plays football.

Adverb or preposition

In English many words can be adverbs or prepositions. Both often modify a verb phrase, but a preposition is typically followed by a noun phrase, where an adverb stands on its own.

  • pr
    • He plays by the river.
  • adv
    • He walks by.

-ed: Past tense or past participle

Many verbs ending in -ed (worked) may be past tense and past participle.

Past participles often work as adjectives, modifying a noun: well-defined task; they also appear in perfect verb forms with have: "I have defined a system", or in passive forms with be: "It was defined as a series of actions".

The past tense is a simple verb which is what happens' in the sentence: "I defined a procedure", "The procedure he defined".

Here is a trick: change the verb to a form of go, or drink, or take and see if the sentence is syntactically OK (even if it does not make much sense).

  • If you would have went or drank or took, then it is past tense (past);
  • if you would have gone or drunk or taken, then it is a past participle (pp).

"In", "on", "Under"

They can be adverbs (adv) or prepositions (pr). As an adverb it stands alone and modifies the verb. As a preposition, it connects a noun phrase (a phrase built around a noun) to one preceding element in the sentence.

Trick: If you can change it by out then it is an adverb.

  • The technician is in: adverb ("The technician is out" is OK)
  • The technician is in the restroom: preposition ("The technician is out the restroom" is not OK)
  • The technician is under the table: preposition ("The technician is out the table" is not OK)
  • The light is on: adverb ("The light is out" is OK)

"'s"

The word 's is usually appended to nouns and can be three different things:

  • If it can be detached as is, it is a form of the verb to be (vbser): "Mike's an athlete" → "Mike is an athlete"
  • If it can be detached as has, it is a for of the verb to have (vbhaver): "Mike's become an athlete" → "Mike has become an athlete"
  • If it marks the possessor or owner of something, it is just the genitive (gen) ending: "Mike's car" → "The car that Mike owns".

"his"

This can be a pronoun (prn) or a determiner (det). As a determiner, it specifies a noun or noun phrase. As a pronoun, it stands on its own as a noun phrase.

Trick: If you can change it by my, your, our, etc., then it is a determiner.

  • His father came to pick him up: determiner ("My father came to pick him up" is OK)
  • I drove my car and he drove his: pronoun ("I drove my car and he drove my" is not OK)

"put"

This verb is the same in many of its forms:

  • inf
    • They did say they were quite willing to put the document before the Pope. (... to throw the document ...)
  • pres
    • They always put their coats on before leaving the house. (They always throw their coats on before ...)
  • past
    • He put forth a form of "radical empiricism" (He threw forth a form ...)
  • pp
    • They have put their coats on the table. (They have thrown their coats on the table)
    • The water to be purified is placed in a chamber and put under great pressure.

Tip: Try replacing "put" with "throw".

Toponym or anthroponym

Often a word can be both a person's given name (anthroponym), and a place name (toponym).

  • top
    • I live in Victoria.
  • ant
    • I hang out with Victoria.

But in some cases it can be really ambiguous:

  • I like Victoria.

In this case try searching for more context to determine.

"do"

The word "do" can be an auxiliary verb in the present tense, and a lexical verb in the infinitive or present. If it is used in a negative construction (I do not like that), or an emphatic construction (I do like that) followed by an infinitive, then it is most likely the auxiliary.

  • vbdo.pres
    • They do not have the luxury of viewing the original film.
  • vblex.inf
    • Modern psychology can do much to explain thought processes.
    • In her words she is ready to do anything.
  • vblex.pres
    • They do so by means of a magic potion.
    • They do the same when they come back.