Ideas for Google Summer of Code/Unsupervised weighting of automata

From Apertium
Jump to navigation Jump to search


Coding challenge

  • Install HFST
  • Install lttoolbox
  • Define an evaluation metric
  • Perform a baseline experiment using a tagged corpus:
    • Select a language
    • Split the corpus into 90% training, 10% testing (or use existing test/train split)
    • Use the Apertium morphological analyser to analyse the test data
    • Rank the analyses produced using the training data
    • Compare this ranking to the default order from the transducer, and to a "random" ranking