CG hybrid tagging
Jump to navigation
Jump to search
Tagging
The tagger is more robust against missing ambiguity sets. If it encounters a new ambiguity set it picks the a) smallest b) most frequent of them (in that order). This using of the "nearest" ambiguity set is used in other places too.
Apart from feeding in ambiguity sets as is
Tagger training
Both supervised and unsupervised:
Mode Model part |
0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|
Ambiguity classes | Dictionary | CG tagged | Dictionary | Dictionary | CG tagged + trimming |
Ambiguity class frequency | Untagged | CG tagged | Untagged | Untagged | CG tagged |
Corpus | Untagged | CG tagged | CG tagged (nearest) | Mix | CG tagged (nearest) |
Note that in the case of supervised training the corpus is used in conjunction with the tagged corpus.
Results
Compare with Comparison of part-of-speech tagging systems.