Unigram tagger

From Apertium
Revision as of 03:21, 14 January 2016 by M5w (talk | contribs) (→‎Model 1)
Jump to navigation Jump to search

Install

The code is a clone of apertium and is at m5w/apertium. It has the same dependencies as apertium, so one should install it in the same way. See Installation and Minimal installation from SVN for more information.

Unigram Models

This code's apertium-tagger implements the three unigram models in A set of open-source tools for Turkish natural language processing. See section 5.3.

Model 1

See section 5.3.1. This model scores each analysis string in proportion to its frequency with add-one smoothing. Consider the following corpus.

^a/a<a>$
^a/a<b>$
^a/a<b>$

Passed the lexical unit ^a/a<a>/a<b>/a<c>$, the tagger assigns the analysis string a<a> a score of

f + 1 =
  (1) + 1 =
  2

and a<b> a score of (2) + 1 = 3. The tagger assigns the unknown analysis string a<c> a score of 1.