Difference between revisions of "Unigram tagger"
Jump to navigation
Jump to search
Line 1: | Line 1: | ||
==Install== |
==Install== |
||
− | The code is a clone of apertium and is at [https://github.com/m5w/apertium m5w/apertium]. It has the same dependencies as apertium, |
+ | The code is a clone of apertium and is at [https://github.com/m5w/apertium m5w/apertium]. It has the same dependencies as apertium, so one should install it in the same way. See [[Installation]] and [[Minimal installation from SVN]] for more information. |
+ | ==Unigram Models== |
||
+ | This code's <code>apertium-tagger</code> implements the three unigram models in [http://coltekin.net/cagri/papers/trmorph-tools.pdf A set of open-source tools for Turkish natural language processing]. See section 5.3. |
||
+ | ===Model 1=== |
||
+ | See section 5.3.1. |
||
+ | This model scores each analysis string in proportion to its frequency with add-one smoothing. |
||
+ | Consider the following corpus. |
||
+ | <pre> |
||
+ | ^a/a<a>$ |
||
+ | ^a/a<b>$ |
||
+ | ^a/a<b>$ |
||
+ | </pre> |
||
+ | Passed the lexical unit <code>^a/a<a>/a<b>/a<c>$</code>, the tagger assigns the analysis string <code>a<a></code> a score of |
||
+ | <pre> |
||
+ | f + 1 = |
||
+ | (1) + 1 = |
||
+ | 2 |
||
+ | </pre> and <code>a<b></code> a score of <code>(2) + 1 = 3</code>. The tagger assigns the unknown analysis string <code>a<c></code> a score of <code>1</code>. |
||
[[Category:Development]] |
[[Category:Development]] |
Revision as of 03:20, 14 January 2016
Install
The code is a clone of apertium and is at m5w/apertium. It has the same dependencies as apertium, so one should install it in the same way. See Installation and Minimal installation from SVN for more information.
Unigram Models
This code's apertium-tagger
implements the three unigram models in A set of open-source tools for Turkish natural language processing. See section 5.3.
Model 1
See section 5.3.1. This model scores each analysis string in proportion to its frequency with add-one smoothing. Consider the following corpus.
^a/a<a>$ ^a/a<b>$ ^a/a<b>$
Passed the lexical unit ^a/a<a>/a/a<c>$
, the tagger assigns the analysis string a<a>
a score of
f + 1 = (1) + 1 = 2
and a
a score of (2) + 1 = 3
. The tagger assigns the unknown analysis string a<c>
a score of 1
.