User:Fpetkovski/GSOC 2012 Report

From Apertium
< User:Fpetkovski
Revision as of 12:19, 11 April 2013 by Fpetkovski (talk | contribs) (Created page with '==Documentation / HOWTO== * Corpus based preposition selection - HOWTO * Building a pseudo-parallel corpus ==Reports== * Lexical feature transfer - First report * […')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Documentation / HOWTO



  • Try generating corpus from monolingual SL corpus:
    • Оваа лабавост на регулативите се одразува врз третманот на уапсените корисници на дрога.

*** Run through lexical transfer mk-en-biltrans

      • Run through apertium-lex-tools/scripts/
      • Run through the rest of the pipeline from apertium-transfer -b onwards
      • Run through apertium-lex-learner/irstlm-ranker
    • This will give:

*** SL:TL selection possibilities

      • probabilities from the TL language model for each selection
    • Select a subset for training where one translation has a substantially higher proportion of the probability mass than the rest.
    • Look at finding out how to work out what "substantially" should be.
  • Improve current method:

** Split test corpus in two (dev, test)

      • Rerun the experiments and check with test corpus
      • Look at dev corpus to see what kind of patterns there are in lines that aren't getting matched
    • Look at combining the 1-feature with the 2-feature model as backoff.

  • Evaluation
    • Try pair bootstrap resampling between best system and default translation for both WER and BLEU.
  • Check the bidix entries that were added automatically