Difference between revisions of "User:Fpetkovski"

From Apertium
Jump to navigation Jump to search
Line 8: Line 8:
 
* '''Try generating corpus from monolingual SL corpus:'''
 
* '''Try generating corpus from monolingual SL corpus:'''
 
** Оваа лабавост на регулативите се одразува врз третманот на уапсените корисници на дрога.
 
** Оваа лабавост на регулативите се одразува врз третманот на уапсените корисници на дрога.
*** Run through lexical transfer <code>mk-en-biltrans</code>
+
<s>*** Run through lexical transfer <code>mk-en-biltrans</code>
 
*** Run through <code>apertium-lex-tools/scripts/biltrans-to-multitrans.py</code>
 
*** Run through <code>apertium-lex-tools/scripts/biltrans-to-multitrans.py</code>
*** Run through the rest of the pipeline from <code>apertium-transfer -b</code> onwards
+
*** Run through the rest of the pipeline from <code>apertium-transfer -b</code> onwards</s>
 
*** Run through <code>apertium-lex-learner/irstlm-ranker</code>
 
*** Run through <code>apertium-lex-learner/irstlm-ranker</code>
 
** This will give:
 
** This will give:
Line 19: Line 19:
   
 
* '''Improve current method:'''
 
* '''Improve current method:'''
** Split test corpus in two (dev, test)
+
<s>** Split test corpus in two (dev, test)
 
*** Rerun the experiments and check with test corpus
 
*** Rerun the experiments and check with test corpus
 
*** Look at dev corpus to see what kind of patterns there are in lines that aren't getting matched
 
*** Look at dev corpus to see what kind of patterns there are in lines that aren't getting matched
 
** Look at combining the 1-feature with the 2-feature model as backoff.
 
** Look at combining the 1-feature with the 2-feature model as backoff.
  +
</s>
 
 
* '''Evaluation'''
 
* '''Evaluation'''
 
** Try pair bootstrap resampling between best system and default translation for both WER and BLEU.
 
** Try pair bootstrap resampling between best system and default translation for both WER and BLEU.

Revision as of 19:20, 1 August 2012

Reports

TODO

  • Try generating corpus from monolingual SL corpus:
    • Оваа лабавост на регулативите се одразува врз третманот на уапсените корисници на дрога.

*** Run through lexical transfer mk-en-biltrans

      • Run through apertium-lex-tools/scripts/biltrans-to-multitrans.py
      • Run through the rest of the pipeline from apertium-transfer -b onwards
      • Run through apertium-lex-learner/irstlm-ranker
    • This will give:
      • SL:TL selection possibilities
      • probabilities from the TL language model for each selection
    • Select a subset for training where one translation has a substantially higher proportion of the probability mass than the rest.
    • Look at finding out how to work out what "substantially" should be.
  • Improve current method:

** Split test corpus in two (dev, test)

      • Rerun the experiments and check with test corpus
      • Look at dev corpus to see what kind of patterns there are in lines that aren't getting matched
    • Look at combining the 1-feature with the 2-feature model as backoff.

  • Evaluation
    • Try pair bootstrap resampling between best system and default translation for both WER and BLEU.