User:Snippyhollow
Revision as of 16:44, 26 May 2009 by Snippyhollow (talk | contribs)
I'm a Computer Science student from France. This year, I did both an engineering degree (ENSIMAG) and a Master of C.S. spec. in Artificial Intelligence and Web. I am currently doing my Master's research internship at National Institute of Informatics in Tokyo on Inductive Logic Programming applied to biology systems.
Basic pruning
Needed files:
- replace.py (in apertium-combine/scripts/)
- split_prune.sh (in apertium-combine/scripts/)
- pruning.cc (in apertium-combine/pruning/)
- your lowercased corpus (see here)
python replace.py lowercase.en lowercase.cy
then use this "lowercase.rep.en" and "lowercase.rep.cy" for the following of the phrase-table building (see below, Workflow for training a LM)
sh split_prune.sh phrase-table
while having set the "exe" inside the script to the good path to your compiled pruning.cc (g++ -O3 apertium-combine/pruning/pruning.cc)
Workflow for training a LM
snippy:moses snippy$ python replace.py work/corpus/30k.lowercased.en work/corpus/30k.lowercased.cy snippy:moses snippy$ build-lm.sh -i work/corpus/30k.lowercased.rep.en -n 3 -o work/lm/30k-en.ilm.gz snippy:moses snippy$ compile-lm work/lm/30k-en.ilm.gz --text yes work/lm/30k-en.lm snippy:moses snippy$ rm work/model/* snippy:moses snippy$ nohup nice tools/moses-scripts/scripts-20090409-0149/training/train-factored-phrase-model.perl \ -scripts-root-dir tools/moses-scripts/scripts-20090409-0149/ -root-dir work -corpus work/corpus/30k.lowercased.rep -f cy \ -e en -alignment grow-diag-final-and -reordering msd-bidirectional-fe -lm 0:3:/Users/snippy/moses/work/lm/30k-en.lm >& work/training.out & ... ... snippy:moses snippy$ rm -rf work/tuning/mert/filtered/ snippy:moses snippy$ nohup nice tools/moses-scripts/scripts-20090409-0149/training/mert-moses.pl work/tuning/100.lowercased.cy \ work/tuning/100.lowercased.en tools/moses/moses-cmd/src/moses work/model/moses.ini --working-dir work/tuning/mert \ --rootdir /Users/snippy/moses/tools/moses-scripts/scripts-20090409-0149/ --decoder-flags "-v 0" >& work/tuning/mert.out &