Difference between revisions of "Unigram tagger"
Line 1: | Line 1: | ||
+ | [[https://github.com/m5w/apertium m5w/apertium]]'s <code>apertium-tagger</code> supports all [http://coltekin.net/cagri/papers/trmorph-tools.pdf A set of open-source tools for Turkish natural language processing]'s unigram models. |
||
==Install== |
==Install== |
||
+ | First, install all prerequisites. See [[Installation#If_you_want_to_add_language_data_.2F_do_more_advanced_stuff]]. |
||
− | The code is a clone of apertium and is at [https://github.com/m5w/apertium m5w/apertium]. It has the same dependencies as apertium, so one should install it in the same way. See [[Installation]] and [[Minimal installation from SVN]] for more information. |
||
+ | Then, clone the repository. (Replace <code><directory></code> with the directory you'd like to clone [[https://github.com/m5w/apertium m5w/apertium]] into.) |
||
+ | <pre> |
||
+ | git clone https://github.com/m5w/apertium.git <directory> |
||
+ | </pre> |
||
+ | Then, see [[Minimal_installation_from_SVN#Set_up_environment]]. |
||
==Unigram Models== |
==Unigram Models== |
||
This code's <code>apertium-tagger</code> implements the three unigram models in [http://coltekin.net/cagri/papers/trmorph-tools.pdf A set of open-source tools for Turkish natural language processing]. See section 5.3. |
This code's <code>apertium-tagger</code> implements the three unigram models in [http://coltekin.net/cagri/papers/trmorph-tools.pdf A set of open-source tools for Turkish natural language processing]. See section 5.3. |
Revision as of 15:43, 14 January 2016
[m5w/apertium]'s apertium-tagger
supports all A set of open-source tools for Turkish natural language processing's unigram models.
Install
First, install all prerequisites. See Installation#If_you_want_to_add_language_data_.2F_do_more_advanced_stuff.
Then, clone the repository. (Replace <directory>
with the directory you'd like to clone [m5w/apertium] into.)
git clone https://github.com/m5w/apertium.git <directory>
Then, see Minimal_installation_from_SVN#Set_up_environment.
Unigram Models
This code's apertium-tagger
implements the three unigram models in A set of open-source tools for Turkish natural language processing. See section 5.3.
Model 1
See section 5.3.1. This model scores each analysis string in proportion to its frequency with add-one smoothing. Consider the following corpus.
^a/a<a>$ ^a/a<b>$ ^a/a<b>$
Passed the lexical unit ^a/a<a>/a<b>/a<c>$
, the tagger assigns the analysis string a<a>
a score of
f + 1 = (1) + 1 = 2
and a<b>
a score of (2) + 1 = 3
. The unknown analysis string a<c>
is assigned a score of 1
.
If reconfigured with --enable-debug
, the tagger prints such calculations to stderr.
score("a<a>") == 2 == 2.000000000000000000 score("a<b>") == 3 == 3.000000000000000000 score("a<c>") == 1 == 1.000000000000000000 ^a<b>$