Part-of-speech tagging

Part-of-speech tagging is the process of assigning unambiguous grammatical categories^[1] to words in context. The crux of the problem is that surface forms of words can often be assigned more than one part-of-speech by morphological analysis. For example in English, the word "trap" can be both a singular noun ("a trap") or a verb ("I'll trap it").

This page intends to give an overview of how part-of-speech tagging works in Apertium, primarily within the apertium-tagger, but giving a short overview of constraints (as in constraint grammar) and restrictions (as in apertium-tagger) as well.

Introduction

See also: Morphological dictionaries

Consider the following sentence in Spanish ("She came to the beach"):

Vino (noun or verb) a (pr) la (det or prn) playa (noun)

We can see that two out of the four words are ambiguous, "vino", which can be a noun ("wine") or verb ("came") and "la", which can be a determiner ("the") or a pronoun ("her" or "it"). This gives the following possibilities for the disambiguated analysis of the sentence:

Tag	Gloss
det	Determiner
noun	Noun
prn	Pronoun
pr	Preposition
verb	Verb

noun, pr, det, noun → Wine to the beach

verb, pr, det, noun → She came to the beach

noun, pr, prn, noun → Wine to it beach

verb, pr, prn, noun → She came to it beach

As can be seen, only one of these interpretations (verb, pr, det, noun) yields the correct translation. So the task of part-of-speech tagging is to select the correct interpretation. There are a number of ways of doing this, involving both linguistically motivated rules (as constraint grammar and the Brill tagger) and statistically based (such as the TnT tagger or the ACOPOST tagger).

The tagger in Apertium (apertium-tagger) uses a combination of rules and a statistical (hidden Markov) model.

Hidden Markov models

A hidden Markov model (HMM) is a statistical model which consists of a number of hidden states, and a number of observable states. The hidden states correspond to the "correct" set of tags for a given ambiguous sentence, this would be verb, pr, det, noun in the above example.

Ambiguity classes

In the apertium-tagger, and indeed in many HMM based part-of-speech taggers, the set of observable states corresponds to a set of ambiguity classes. The ambiguity classes of a model are the set of possible ambiguities (often denoted with $\Sigma$ ). For example, in the above example, these would be (noun | verb) and (det | prn). The preposition "a" and the noun "playa" are unambiguous therefore don't belong to an ambiguity class. These are calculated automatically from the corpus to be used for training.

Lexical model

Syntactic model

Training

Preparation

Corpora types

Untagged	Analysed	Tagged
Vino a la playa	Vino`<verb>/<noun>` a`<pr>` la`<det>/<prn>` playa`<noun>`	Vino`<verb>` a`<pr>` la`<det>` playa`<noun>`
Voy a la casa	Voy`<verb>` a`<pr>` la`<det>/<prn>` casa`<noun>/<verb>`	Voy`<verb>` a`<pr>` la`<det>` casa`<noun>`
Bebe vino en casa	Bebe`<verb>` vino`<noun>/<verb>` en`<pr>` casa`<noun>/<verb>`	Bebe`<verb>` vino`<noun>` en`<pr>` casa`<noun>`
La casa es grande	La`<det>/<prn>` casa`<noun>/<verb>` es`<verb>` grande`<adj>`	La`<det>` casa`<noun>` es`<verb>` grande`<adj>`
Es una ciudad grande	Es`<verb>` una`<det>/<prn>/<verb>` ciudad`<noun>` grande`<adj>`	Es`<verb>` una`<det>` ciudad`<noun>` grande`<adj>`

Ambiguity classes

verb / noun
det / prn
det / prn / verb

Transition counts

From the tagged examples we can extract the following transition counts:

	Second tag
First tag	verb	noun	det	prn	pr	adj
verb	0	1	1	0	2	1
noun	1	0	0	0	1	1
det	0	4	0	0	0	0
prn	0	0	0	0	0	0
pr	0	1	2	0	0	0
adj	0	0	0	0	0	0

	Part-of-speech
Word	verb	noun	det	prn	pr	adj
vino	1	1	0	0	0	0
a	0	0	0	0	2	0
la	0	0	3	0	0	0
playa	0	1	0	0	0	0
voy	1	0	0	0	0	0
casa	0	3	0	0	0	0
es	2	0	0	0	0	0
grande	0	0	0	0	0	2
una	0	0	1	0	0	0
ciudad	0	1	0	0	0	0
bebo	1	0	0	0	0	0
en	0	0	0	0	1	0

Parameter estimation

$P(det|pr)>P(prn|pr)$

The apertium-tagger has two options for training (or estimating the parameters of) an HMM. The choice of either depends on the availability of a pre-disambiguated corpus. The maximum-likelihood estimation (ML) algorithm relies on having a pre-tagged corpus.

Maximum likelihood estimation (MLE)

Baum-Welch

Tagging

Viterbi

Notes

↑ Also referred to as "parts-of-speech", e.g. Noun, Verb, Adjective, Adverb, Conjunction, etc.

[1] Also referred to as "parts-of-speech", e.g. Noun, Verb, Adjective, Adverb, Conjunction, etc.

[1]

Part-of-speech tagging

Contents

Introduction

Hidden Markov models

Ambiguity classes

Lexical model

Syntactic model

Training

Preparation

Parameter estimation

Maximum likelihood estimation (MLE)

Baum-Welch

Tagging

Viterbi

See also

Notes

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools