Difference between revisions of "CG hybrid tagging"
Jump to navigation
Jump to search
(Created page with "== Tagging == The tagger is more robust against missing ambiguity sets. If it encounters a new ambiguity set it picks the a) smallest b) most frequent of them (in that order)...") |
|||
(One intermediate revision by the same user not shown) | |||
Line 3: | Line 3: | ||
The tagger is more robust against missing ambiguity sets. If it encounters a new ambiguity set it picks the a) smallest b) most frequent of them (in that order). This using of the "nearest" ambiguity set is used in other places too. |
The tagger is more robust against missing ambiguity sets. If it encounters a new ambiguity set it picks the a) smallest b) most frequent of them (in that order). This using of the "nearest" ambiguity set is used in other places too. |
||
+ | Apart from feeding in ambiguity sets as is after CG as is the current common practice before this work, tagging using a mix of untagged and CG, discarding CG analysis in favour of untagged analysis when there is any ambiguity. |
||
− | Apart from feeding in ambiguity sets as is |
||
+ | |||
+ | Invasive... |
||
== Tagger training == |
== Tagger training == |
||
Line 11: | Line 13: | ||
{| class="wikitable" |
{| class="wikitable" |
||
|- |
|- |
||
− | ! {{diagonal split header|Model part|Mode}} |
+ | ! {{diagonal split header|Model part|Mode}} !! 0 !! 1 !! 2 !! 3 !! 4 |
|- |
|- |
||
| Ambiguity classes || Dictionary || CG tagged || Dictionary || Dictionary || CG tagged + trimming |
| Ambiguity classes || Dictionary || CG tagged || Dictionary || Dictionary || CG tagged + trimming |
Latest revision as of 11:37, 15 June 2016
Tagging[edit]
The tagger is more robust against missing ambiguity sets. If it encounters a new ambiguity set it picks the a) smallest b) most frequent of them (in that order). This using of the "nearest" ambiguity set is used in other places too.
Apart from feeding in ambiguity sets as is after CG as is the current common practice before this work, tagging using a mix of untagged and CG, discarding CG analysis in favour of untagged analysis when there is any ambiguity.
Invasive...
Tagger training[edit]
Both supervised and unsupervised:
Mode Model part |
0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|
Ambiguity classes | Dictionary | CG tagged | Dictionary | Dictionary | CG tagged + trimming |
Ambiguity class frequency | Untagged | CG tagged | Untagged | Untagged | CG tagged |
Corpus | Untagged | CG tagged | CG tagged (nearest) | Mix | CG tagged (nearest) |
Note that in the case of supervised training the corpus is used in conjunction with the tagged corpus.
Results[edit]
Compare with Comparison of part-of-speech tagging systems.