Ideas for Google Summer of Code/Improvements to target-language tagger training

From Apertium
< Ideas for Google Summer of Code
Revision as of 16:20, 9 March 2010 by Francis Tyers (talk | contribs) (Created page with 'Enhance source segmentation used during target-language tagger training and improve the disambiguation path pruning algorithm This means that it tunes the parameters of an HMM m…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Enhance source segmentation used during target-language tagger training and improve the disambiguation path pruning algorithm

This means that it tunes the parameters of an HMM model based on the quality of the translations through the whole system. To do so, it segments the source-language training corpus by taking into account the patterns detected by the structural transfer module, and translates to the target language all possible disambiguation paths of each source-language segment. The project consists of two parts. The first part consists of making apertium-tagger-training-tools able to segment using any-level of structural transfer rules (right now, it only "understands" one-level, shallow-transfer rules). The second part consists of implementing a k-best Viterbi algorithm to avoid computing the a-priori likelihood of all paths before pruning; in this way only the k-best disambiguation paths are translated to the target language.