Unigram tagger
Jump to navigation
Jump to search
Install
The code is a clone of apertium and is at m5w/apertium. It has the same dependencies as apertium, so one should install it in the same way. See Installation and Minimal installation from SVN for more information.
Unigram Models
This code's apertium-tagger
implements the three unigram models in A set of open-source tools for Turkish natural language processing. See section 5.3.
Model 1
See section 5.3.1. This model scores each analysis string in proportion to its frequency with add-one smoothing. Consider the following corpus.
^a/a<a>$ ^a/a<b>$ ^a/a<b>$
Passed the lexical unit ^a/a<a>/a<b>/a<c>$
, the tagger assigns the analysis string a<a>
a score of
f + 1 = (1) + 1 = 2
and a<b>
a score of (2) + 1 = 3
. The tagger assigns the unknown analysis string a<c>
a score of 1
.