Difference between revisions of "Unigram tagger"

From Apertium
Jump to navigation Jump to search
Line 1: Line 1:
  +
[[https://github.com/m5w/apertium m5w/apertium]]'s <code>apertium-tagger</code> supports all [http://coltekin.net/cagri/papers/trmorph-tools.pdf A set of open-source tools for Turkish natural language processing]'s unigram models.
 
==Install==
 
==Install==
  +
First, install all prerequisites. See [[Installation#If_you_want_to_add_language_data_.2F_do_more_advanced_stuff]].
The code is a clone of apertium and is at [https://github.com/m5w/apertium m5w/apertium]. It has the same dependencies as apertium, so one should install it in the same way. See [[Installation]] and [[Minimal installation from SVN]] for more information.
 
  +
Then, clone the repository. (Replace <code>&lt;directory&gt;</code> with the directory you'd like to clone [[https://github.com/m5w/apertium m5w/apertium]] into.)
  +
<pre>
  +
git clone https://github.com/m5w/apertium.git <directory>
  +
</pre>
  +
Then, see [[Minimal_installation_from_SVN#Set_up_environment]].
 
==Unigram Models==
 
==Unigram Models==
 
This code's <code>apertium-tagger</code> implements the three unigram models in [http://coltekin.net/cagri/papers/trmorph-tools.pdf A set of open-source tools for Turkish natural language processing]. See section 5.3.
 
This code's <code>apertium-tagger</code> implements the three unigram models in [http://coltekin.net/cagri/papers/trmorph-tools.pdf A set of open-source tools for Turkish natural language processing]. See section 5.3.

Revision as of 15:43, 14 January 2016

[m5w/apertium]'s apertium-tagger supports all A set of open-source tools for Turkish natural language processing's unigram models.

Install

First, install all prerequisites. See Installation#If_you_want_to_add_language_data_.2F_do_more_advanced_stuff. Then, clone the repository. (Replace <directory> with the directory you'd like to clone [m5w/apertium] into.)

git clone https://github.com/m5w/apertium.git <directory>

Then, see Minimal_installation_from_SVN#Set_up_environment.

Unigram Models

This code's apertium-tagger implements the three unigram models in A set of open-source tools for Turkish natural language processing. See section 5.3.

Model 1

See section 5.3.1. This model scores each analysis string in proportion to its frequency with add-one smoothing. Consider the following corpus.

^a/a<a>$
^a/a<b>$
^a/a<b>$

Passed the lexical unit ^a/a<a>/a<b>/a<c>$, the tagger assigns the analysis string a<a> a score of

f + 1 =
  (1) + 1 =
  2

and a<b> a score of (2) + 1 = 3. The unknown analysis string a<c> is assigned a score of 1.

If reconfigured with --enable-debug, the tagger prints such calculations to stderr.



score("a<a>") ==
  2 ==
  2.000000000000000000
score("a<b>") ==
  3 ==
  3.000000000000000000
score("a<c>") ==
  1 ==
  1.000000000000000000
^a<b>$