Difference between revisions of "User:Fpetkovski/GSOC 2012 Report"
Jump to navigation
Jump to search
Fpetkovski (talk | contribs) (Created page with '==Documentation / HOWTO== * Corpus based preposition selection - HOWTO * Building a pseudo-parallel corpus ==Reports== * Lexical feature transfer - First report * […') |
m (moved GSOC 2012 Report to User:Fpetkovski/GSOC 2012 Report) |
(No difference)
|
Revision as of 13:16, 11 April 2013
Documentation / HOWTO
Reports
TODO
- Try generating corpus from monolingual SL corpus:
- Оваа лабавост на регулативите се одразува врз третманот на уапсените корисници на дрога.
*** Run through lexical transfer mk-en-biltrans
- Run through
apertium-lex-tools/scripts/biltrans-to-multitrans.py
- Run through the rest of the pipeline from
apertium-transfer -b
onwards Run throughapertium-lex-learner/irstlm-ranker
- Run through
- This will give:
*** SL:TL selection possibilities
probabilities from the TL language model for each selection
- Select a subset for training where one translation has a substantially higher proportion of the probability mass than the rest.
- Look at finding out how to work out what "substantially" should be.
- Improve current method:
** Split test corpus in two (dev, test)
- Rerun the experiments and check with test corpus
- Look at dev corpus to see what kind of patterns there are in lines that aren't getting matched
- Look at combining the 1-feature with the 2-feature model as backoff.
- Evaluation
- Try pair bootstrap resampling between best system and default translation for both WER and BLEU.
- Check the bidix entries that were added automatically