Lexical feature transfer - Second report

From Apertium
Revision as of 14:44, 26 July 2012 by Fpetkovski (talk | contribs) (Created page with '== Review == In the first attempt at trying to solve the problem of corpus-based preposition selection, both a Naive Bayes and and SVM classifier were tried out. The lemmas and s…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Review

In the first attempt at trying to solve the problem of corpus-based preposition selection, both a Naive Bayes and and SVM classifier were tried out. The lemmas and some of the tags of the surrounding words were extracted as features for the classifier. The source-language corpus was used to extract training examples from <n1> <pr> <n2> -> <n1> <pr> <n2> patterns, and the target-language corpus was used to label the extracted training examples.

Around 12.000 of the extracted examples were aligned to their target-language translations and labeled. There was some improvement in the translation quality, however, there were many wrong predictions as a result of the small training set and formatting errors in the training set.

First Model