User:Gang Chen/GSoC 2013 Application: "Sliding Window PoS Tagger"

Bold text=== Here is how it works ===

The core idea of a SWPoS is easy to understand. For the input ambiguity class $\sigma [i]$ , the tagger predicts the PoS tag by looking at the left context and right context of it. In the paper, the algorithm works best when $N_{(-)}$ (left context length) is set to 1, and $N_{(+)}$ (right context length) also to 1.

For example, if we have the input as:

^A/a/b$ ^B/x/y$ ^C/m/n$

where $A$ , $B$ and $C$ are the input $\sigma$ -s, while $a$ , $b$ , $x$ , $y$ , $m$ and $n$ are candidate $\gamma$ -s. The context of $B$ should be $A$ _ $C$ .

In the tagging procedure, the algorithm firstly gets a possible list of tags that $B$ may be assigned, i.e. $x$ and $y$ ; then consult the model for the context $A$ _ $C$ on which tag it mostly supports, for example $P(x|A\_C)$ is bigger than $P(y|A\_C)$ , then the PoS tag for $B$ should be $x$ .

In the training procedure, the key problem is to figure out the probability that a particular context supports a particular tag, based on the unsupervised data. The paper provides an iterative method to make approximations to that probability.

Unsupervised training for the SWPoS tagger

For computational convenience, the algorithm estimates ${\tilde {n}}_{A\_x\_C}$ , the number of times that tag $x$ would appear in the context $A$ _ $C$ , instead of $P(x|A\_C)$ . They are interchangable.

We use a very simple example to give a description on how the iterative procedure runs.

Suppose we had only two training cases:

(1) A ^B/x/y$   C
(2) A ^D/x/y/z$ C

where we only focus on the context $A$ _ $C$ . Whether $A$ or $C$ is disambiguous or not will not affect the context.

Firstly, for each training case, we assume that the possibility for any tag is equal, because we don't have any other information. That is, according to the first case, ${\tilde {n}}_{A\_x\_C}=1/2$ , and ${\tilde {n}}_{A\_y\_C}=1/2$ ; according to the second case, ${\tilde {n}}_{A\_x\_C}=1/3$ , ${\tilde {n}}_{A\_y\_C}=1/3$ and ${\tilde {n}}_{A\_z\_C}=1/3$ . This is the initil step (or 0-th step) of the algorithm, and we finally get:

${\tilde {n}}_{A\_x\_C}^{[0]}=1*1/2+1*1/3=5/6$

${\tilde {n}}_{A\_y\_C}^{[0]}=1*1/2+1*1/3=5/6$

${\tilde {n}}_{A\_z\_C}^{[0]}=1*1/3=1/3$

Then, we can use these counts to estimate the following probabilities:

$p(x|A\_B\_C)={\frac {5/6}{5/6+5/6}}=1/2$

$p(y|A\_B\_C)={\frac {5/6}{5/6+5/6}}=1/2$

$p(x|A\_D\_C)={\frac {5/6}{5/6+5/6+1/3}}=5/12$

$p(y|A\_D\_C)={\frac {5/6}{5/6+5/6+1/3}}=5/12$

$p(z|A\_D\_C)={\frac {1/3}{5/6+5/6+1/3}}=1/6$

And then, using these probabilities we can re-estimate the counts as expectations:

${\tilde {n}}_{A\_x\_C}^{[1]}=1*1/2+1*5/12=11/12$

${\tilde {n}}_{A\_y\_C}^{[1]}=1*1/2+1*5/12=11/12$

${\tilde {n}}_{A\_z\_C}^{[1]}=1*1/6=1/6$

Until now, we have finished one iteration, with all the ${\tilde {n}}_{A\_\gamma \_C}$ values updated. And we can observe that ${\tilde {n}}_{A\_x\_C}^{[1]}$ and ${\tilde {n}}_{A\_y\_C}^{[1]}$ are growing bigger, and ${\tilde {n}}_{A\_z\_C}^{[1]}$ is growing smaller, while the total count still remains the same, $5/6+5/6+1/3=11/12+11/12+1/6=2$ .

Repeating the above procedure, the process will converge to a point, with:

${\tilde {n}}_{A\_x\_C}^{[1]}=1$

${\tilde {n}}_{A\_y\_C}^{[1]}=1$

${\tilde {n}}_{A\_z\_C}^{[1]}=0$

This is a simple example to show the iterative procedure of the unsupervised training algorithm. Real data may be more complex, but the core mechanism remains the same.

User:Gang Chen/GSoC 2013 Application: "Sliding Window PoS Tagger"

Contents

Overview

Problem Formulation

Sliding Window PoS tagger

Unsupervised training for the SWPoS tagger

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools