Difference between revisions of "Part-of-speech tagging"

From Apertium
Jump to navigation Jump to search
Line 4: Line 4:
   
 
This page intends to give an overview of how part-of-speech tagging works in Apertium, primarily within the <code>apertium-tagger</code>, but giving a short overview of constraints (as in [[constraint grammar]]) and restrictions (as in <code>apertium-tagger</code>) as well.
 
This page intends to give an overview of how part-of-speech tagging works in Apertium, primarily within the <code>apertium-tagger</code>, but giving a short overview of constraints (as in [[constraint grammar]]) and restrictions (as in <code>apertium-tagger</code>) as well.
  +
  +
==Lexical ambiguity==
  +
  +
After morphological analysis of a sentence, a not insignificant amount of words will have more than one analysis. For example in the following sentence:
  +
  +
:Vino (noun or verb) a ( la playa
  +
   
 
==Hidden Markov models==
 
==Hidden Markov models==
   
A hidden Markov model is a statistical model of .......
+
A hidden Markov model is a statistical model which consists of a number of hidden states, and a number of observable states.
   
 
===Ambiguity classes===
 
===Ambiguity classes===
Line 20: Line 27:
   
 
===Viterbi===
 
===Viterbi===
  +
  +
==See also==
  +
  +
* [[Tagger training]]
  +
* [[Constraint grammar]]
   
 
==Notes==
 
==Notes==

Revision as of 06:58, 16 September 2008

Part-of-speech tagging is the process of assigning unambiguous grammatical categories[1] to words in context. The crux of the problem is that surface forms of words can often be assigned more than one part-of-speech by morphological analysis. For example in English, the word "trap" can be both a singular noun ("a trap") or a verb ("I'll trap it").

This page intends to give an overview of how part-of-speech tagging works in Apertium, primarily within the apertium-tagger, but giving a short overview of constraints (as in constraint grammar) and restrictions (as in apertium-tagger) as well.

Lexical ambiguity

After morphological analysis of a sentence, a not insignificant amount of words will have more than one analysis. For example in the following sentence:

Vino (noun or verb) a ( la playa


Hidden Markov models

A hidden Markov model is a statistical model which consists of a number of hidden states, and a number of observable states.

Ambiguity classes

Training

Expectation-Maximisation (EM)

Baum-Welch

Tagging

Viterbi

See also

Notes

  1. Also referred to as "parts-of-speech", e.g. Noun, Verb, Adjective, Adverb, Conjunction, etc.