Difference between revisions of "Unigram tagger"

From Apertium
Jump to navigation Jump to search
Line 1: Line 1:
 
==Install==
 
==Install==
The code is a clone of apertium and is at [https://github.com/m5w/apertium m5w/apertium]. It has the same dependencies as apertium, and one should install it in the same way. See [[Installation]] and [[Minimal installation from SVN]] for more information.
+
The code is a clone of apertium and is at [https://github.com/m5w/apertium m5w/apertium]. It has the same dependencies as apertium, so one should install it in the same way. See [[Installation]] and [[Minimal installation from SVN]] for more information.
  +
==Unigram Models==
  +
This code's <code>apertium-tagger</code> implements the three unigram models in [http://coltekin.net/cagri/papers/trmorph-tools.pdf A set of open-source tools for Turkish natural language processing]. See section 5.3.
  +
===Model 1===
  +
See section 5.3.1.
  +
This model scores each analysis string in proportion to its frequency with add-one smoothing.
  +
Consider the following corpus.
  +
<pre>
  +
^a/a<a>$
  +
^a/a<b>$
  +
^a/a<b>$
  +
</pre>
  +
Passed the lexical unit <code>^a/a<a>/a<b>/a<c>$</code>, the tagger assigns the analysis string <code>a<a></code> a score of
  +
<pre>
  +
f + 1 =
  +
(1) + 1 =
  +
2
  +
</pre> and <code>a<b></code> a score of <code>(2) + 1 = 3</code>. The tagger assigns the unknown analysis string <code>a<c></code> a score of <code>1</code>.
 
[[Category:Development]]
 
[[Category:Development]]

Revision as of 03:20, 14 January 2016

Install

The code is a clone of apertium and is at m5w/apertium. It has the same dependencies as apertium, so one should install it in the same way. See Installation and Minimal installation from SVN for more information.

Unigram Models

This code's apertium-tagger implements the three unigram models in A set of open-source tools for Turkish natural language processing. See section 5.3.

Model 1

See section 5.3.1. This model scores each analysis string in proportion to its frequency with add-one smoothing. Consider the following corpus.

^a/a<a>$
^a/a<b>$
^a/a<b>$

Passed the lexical unit ^a/a<a>/a/a<c>$, the tagger assigns the analysis string a<a> a score of

f + 1 =
  (1) + 1 =
  2

and a a score of (2) + 1 = 3. The tagger assigns the unknown analysis string a<c> a score of 1.