Difference between revisions of "Talk:Part-of-speech tagging"

@@ Line 1: / Line 1: @@
-==Hidden Markov models==
-A hidden Markov model (HMM) is a statistical model which consists of a number of hidden states, and a number of observable states. The hidden states correspond to the "correct" set of tags for a given ambiguous sentence, this would be {{sc|verb, pr, det, noun}} in the above example.
-===Ambiguity classes===
-In the <code>apertium-tagger</code>, and indeed in many HMM based part-of-speech taggers, the set of observable states corresponds to a set of '''ambiguity classes'''. The ambiguity classes of a model are the set of possible ambiguities (often denoted with <math>\Sigma</math>). For example, in the above example, these would be ({{sc|noun &#124; verb}}) and ({{sc|det &#124; prn}}). The preposition "a" and the noun "playa" are unambiguous therefore don't belong to an ambiguity class. These are calculated automatically from the corpus to be used for training.
-===Lexical model===
-===Syntactic model===
-==Training==
-===Preparation===
-;Corpora types
-{|class=wikitable
-! Untagged             !! Analysed !! Tagged
-|-
-| Vino a la playa      || Vino{{fadetag|<verb>/<noun>}} a{{fadetag|<pr>}} la{{fadetag|<det>/<prn>}} playa{{fadetag|<noun>}}       || Vino{{fadetag|<verb>}} a{{fadetag|<pr>}} la{{fadetag|<det>}} playa{{fadetag|<noun>}}
-|-
-| Voy a la casa        || Voy{{fadetag|<verb>}} a{{fadetag|<pr>}} la{{fadetag|<det>/<prn>}} casa{{fadetag|<noun>/<verb>}}                || Voy{{fadetag|<verb>}} a{{fadetag|<pr>}} la{{fadetag|<det>}} casa{{fadetag|<noun>}}
-|-
-| Bebe vino en casa    || Bebe{{fadetag|<verb>}} vino{{fadetag|<noun>/<verb>}} en{{fadetag|<pr>}} casa{{fadetag|<noun>/<verb>}} || Bebe{{fadetag|<verb>}} vino{{fadetag|<noun>}} en{{fadetag|<pr>}} casa{{fadetag|<noun>}}
-|-
-| La casa es grande    || La{{fadetag|<det>/<prn>}} casa{{fadetag|<noun>/<verb>}} es{{fadetag|<verb>}} grande{{fadetag|<adj>}}    || La{{fadetag|<det>}} casa{{fadetag|<noun>}} es{{fadetag|<verb>}} grande{{fadetag|<adj>}}
-|-
-| Es una ciudad grande || Es{{fadetag|<verb>}} una{{fadetag|<det>/<prn>/<verb>}} ciudad{{fadetag|<noun>}} grande{{fadetag|<adj>}} || Es{{fadetag|<verb>}} una{{fadetag|<det>}} ciudad{{fadetag|<noun>}} grande{{fadetag|<adj>}}
-|}
-;Ambiguity classes
-* verb / noun
-* det / prn
-* det / prn / verb
-;Transition counts
-From the tagged examples we can extract the following transition counts:
-<div style="float:left">
-{|class=wikitable
-! !!colspan=6|Second tag
-|-
-! First tag                !! verb !! noun !! det !! prn !! pr !! adj
-|-
-| '''verb'''               ||  0   ||  1   ||  1  ||  0  || 2  ||  1
-|-
-| '''noun'''               ||  1   ||  0   ||  0  ||  0  || 1  ||  1
-|-
-| '''det'''                ||  0   ||  4   ||  0  ||  0  || 0  ||  0
-|-
-| '''prn'''                ||  0   ||  0   ||  0  ||  0  || 0  ||  0
-|-
-| '''pr'''                 ||  0   ||  1   ||  2  ||  0  || 0  ||  0
-|-
-| '''adj'''                ||  0   ||  0   ||  0  ||  0  || 0  ||  0
-|-
-|}
-</div>
-<div style="float: right">
-{|class=wikitable
-!                     !!colspan=6|Part-of-speech
-|-
-! Word                !! verb !! noun !! det !! prn !! pr !! adj
-|-
-| vino                ||  1   ||   1  ||  0  || 0   ||  0 || 0
-|-
-| a                   ||  0   ||   0  ||  0  || 0   ||  2 || 0
-|-
-| la                  ||  0   ||   0  ||  3  || 0   ||  0 || 0
-|-
-| playa               ||  0   ||   1  ||  0  || 0   ||  0 || 0
-|-
-| voy                 ||  1   ||   0  ||  0  || 0   ||  0 || 0
-|-
-| casa                ||  0   ||   3  ||  0  || 0   ||  0 || 0
-|-
-| es                  ||  2   ||   0  ||  0  || 0   ||  0 || 0
-|-
-| grande              ||  0   ||   0  ||  0  || 0   ||  0 || 2
-|-
-| una                 ||  0   ||   0  ||  1  || 0   ||  0 || 0
-|-
-| ciudad              ||  0   ||   1  ||  0  || 0   ||  0 || 0
-|-
-| bebo                ||  1   ||   0  ||  0  || 0   ||  0 || 0
-|-
-| en                  ||  0   ||   0  ||  0  || 0   ||  1 || 0
-|-
-|}
-</div>
-<br style="clear:both"/>
-===Parameter estimation===
-<math>P(det|pr) > P(prn|pr)</math>
-The <code>apertium-tagger</code> has two options for training (or estimating the parameters of) an HMM. The choice of either depends on the availability of a pre-disambiguated corpus. The maximum-likelihood estimation (ML) algorithm relies on having a pre-tagged corpus.
-====Maximum likelihood estimation (MLE)====
-====Baum-Welch====
-==Tagging==
-===Viterbi===

Difference between revisions of "Talk:Part-of-speech tagging"

Latest revision as of 16:19, 25 March 2009

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools