Difference between revisions of "Ideas for Google Summer of Code/Unsupervised weighting of automata"

From Apertium
Jump to navigation Jump to search
Line 11: Line 11:
 
** Use the Apertium morphological analyser to analyse the test data
 
** Use the Apertium morphological analyser to analyse the test data
 
** Rank the analyses produced using the training data
 
** Rank the analyses produced using the training data
** Compare this ranking to the default order from the transducer, and to a "random" ranking
+
** Compare this ranking to the default order from the transducer, and to a "random" ranking using your metric
   
   

Revision as of 17:05, 29 March 2017


Coding challenge

  • Install HFST
  • Install lttoolbox
  • Define an evaluation metric --- talk to your mentor
  • Perform a baseline experiment using a tagged corpus:
    • Select a language
    • Split the corpus into 90% training, 10% testing (or use existing test/train split)
    • Use the Apertium morphological analyser to analyse the test data
    • Rank the analyses produced using the training data
    • Compare this ranking to the default order from the transducer, and to a "random" ranking using your metric