Ideas for Google Summer of Code/Improvements to target-language tagger training

Enhance source segmentation used during target-language tagger training and improve the disambiguation path pruning algorithm

This means that it tunes the parameters of an HMM model based on the quality of the translations through the whole system. To do so, it segments the source-language training corpus by taking into account the patterns detected by the structural transfer module, and translates to the target language all possible disambiguation paths of each source-language segment. The project consists of two parts. The first part consists of making apertium-tagger-training-tools able to segment using any-level of structural transfer rules (right now, it only "understands" one-level, shallow-transfer rules). The second part consists of implementing a k-best Viterbi algorithm to avoid computing the a-priori likelihood of all paths before pruning; in this way only the k-best disambiguation paths are translated to the target language.

Ideas for Google Summer of Code/Improvements to target-language tagger training

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools