Difference between revisions of "User:Fpetkovski"

From Apertium
Jump to navigation Jump to search
Line 1: Line 1:
[[Lexical feature transfer - First report]]
 
   
  +
==Reports==
[[Lexical feature transfer - Second report]]
 
 
* [[Lexical feature transfer - First report]]
  +
 
* [[Lexical feature transfer - Second report]]
  +
  +
==TODO==
  +
  +
* '''Try generating corpus from monolingual SL corpus:'''
  +
** Оваа лабавост на регулативите се одразува врз третманот на уапсените корисници на дрога.
  +
*** Run through lexical transfer <code>-biltrans</code>
  +
*** Run through <code>apertium-lex-tools/scripts/biltrans-to-multitrans.py</code>
  +
*** Run through <code>apertium-lex-learner/irstlm-ranker</code>
  +
** This will give:
  +
*** SL:TL selection possibilities
  +
*** probabilities from the TL language model for each selection
  +
** Select a subset for training where one translation has a substantially higher proportion of the probability mass than the rest.
  +
** Look at finding out how to work out what "substantially" should be.
  +
  +
* '''Improve current method:'''
  +
** Split test corpus in two (dev, test)
  +
** Rerun the experiments and check with test corpus
  +
** Look at dev corpus to see what kind of patterns there are in lines that aren't getting matched
  +
  +
  +
[[Category:Users|Fpetkovski]]

Revision as of 12:34, 27 July 2012

Reports

TODO

  • Try generating corpus from monolingual SL corpus:
    • Оваа лабавост на регулативите се одразува врз третманот на уапсените корисници на дрога.
      • Run through lexical transfer -biltrans
      • Run through apertium-lex-tools/scripts/biltrans-to-multitrans.py
      • Run through apertium-lex-learner/irstlm-ranker
    • This will give:
      • SL:TL selection possibilities
      • probabilities from the TL language model for each selection
    • Select a subset for training where one translation has a substantially higher proportion of the probability mass than the rest.
    • Look at finding out how to work out what "substantially" should be.
  • Improve current method:
    • Split test corpus in two (dev, test)
    • Rerun the experiments and check with test corpus
    • Look at dev corpus to see what kind of patterns there are in lines that aren't getting matched