Lexical feature transfer - Second report
In the first attempt at trying to solve the problem of corpus-based preposition selection, both a Naive Bayes and and SVM classifier were tried out. The lemmas and some of the tags of the surrounding words were extracted as features for the classifier. The source-language corpus was used to extract training examples from <n1> <pr> <n2> -> <n1> <pr> <n2> patterns, and the target-language corpus was used to label the extracted training examples.
Around 12.000 of the extracted examples were aligned to their target-language translations and labeled. There was some improvement in the translation quality, however, there were many wrong predictions as a result of the small training set and formatting errors in the training set.