Difference between revisions of "CG hybrid tagging"

From Apertium
Jump to navigation Jump to search
(Created page with "== Tagging == The tagger is more robust against missing ambiguity sets. If it encounters a new ambiguity set it picks the a) smallest b) most frequent of them (in that order)...")
 
Line 11: Line 11:
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
! {{diagonal split header|Model part|Mode}}. !! 0 !! 1 !! 2 !! 3 !! 4
+
! {{diagonal split header|Model part|Mode}} !! 0 !! 1 !! 2 !! 3 !! 4
 
|-
 
|-
 
| Ambiguity classes || Dictionary || CG tagged || Dictionary || Dictionary || CG tagged + trimming
 
| Ambiguity classes || Dictionary || CG tagged || Dictionary || Dictionary || CG tagged + trimming

Revision as of 11:35, 15 June 2016

Tagging

The tagger is more robust against missing ambiguity sets. If it encounters a new ambiguity set it picks the a) smallest b) most frequent of them (in that order). This using of the "nearest" ambiguity set is used in other places too.

Apart from feeding in ambiguity sets as is

Tagger training

Both supervised and unsupervised:

Mode
Model part
0 1 2 3 4
Ambiguity classes Dictionary CG tagged Dictionary Dictionary CG tagged + trimming
Ambiguity class frequency Untagged CG tagged Untagged Untagged CG tagged
Corpus Untagged CG tagged CG tagged (nearest) Mix CG tagged (nearest)

Note that in the case of supervised training the corpus is used in conjunction with the tagged corpus.

Results

Compare with Comparison of part-of-speech tagging systems.