Difference between revisions of "User:Snippyhollow"

From Apertium
Jump to navigation Jump to search
 
Line 11: Line 11:


python replace.py lowercase.en lowercase.cy
python replace.py lowercase.en lowercase.cy
then use this "lowercase.rep.en" and "lowercase.rep.cy" for the following of the phrase-table building (see below, Workflow for training a LM)
then use this "lowercase.rep.en" and "lowercase.rep.cy" for the following of the phrase-table building (see below, Workflow for building a phrase-table)
sh split_prune.sh phrase-table
sh split_prune.sh phrase-table
while having set the "exe" inside the script to the good path to your compiled pruning.cc (g++ -O3 apertium-combine/pruning/pruning.cc)
while having set the "exe" inside the script to the good path to your compiled pruning.cc (g++ -O3 apertium-combine/pruning/pruning.cc)




===Workflow for training a LM===
===Workflow for building a phrase-table===


<pre>
<pre>

Latest revision as of 16:46, 26 May 2009

I'm a Computer Science student from France. This year, I did both an engineering degree (ENSIMAG) and a Master of C.S. spec. in Artificial Intelligence and Web. I am currently doing my Master's research internship at National Institute of Informatics in Tokyo on Inductive Logic Programming applied to biology systems.

Basic pruning[edit]

Needed files:

  • replace.py (in apertium-combine/scripts/)
  • split_prune.sh (in apertium-combine/scripts/)
  • pruning.cc (in apertium-combine/pruning/)
  • your lowercased corpus (see here)
python replace.py lowercase.en lowercase.cy

then use this "lowercase.rep.en" and "lowercase.rep.cy" for the following of the phrase-table building (see below, Workflow for building a phrase-table)

sh split_prune.sh phrase-table

while having set the "exe" inside the script to the good path to your compiled pruning.cc (g++ -O3 apertium-combine/pruning/pruning.cc)


Workflow for building a phrase-table[edit]

 snippy:moses snippy$ python replace.py work/corpus/30k.lowercased.en work/corpus/30k.lowercased.cy 
 snippy:moses snippy$ build-lm.sh -i work/corpus/30k.lowercased.rep.en -n 3 -o work/lm/30k-en.ilm.gz
 snippy:moses snippy$ compile-lm work/lm/30k-en.ilm.gz --text yes work/lm/30k-en.lm
 snippy:moses snippy$ rm work/model/*
 snippy:moses snippy$ nohup nice tools/moses-scripts/scripts-20090409-0149/training/train-factored-phrase-model.perl \ 
-scripts-root-dir tools/moses-scripts/scripts-20090409-0149/ -root-dir work -corpus work/corpus/30k.lowercased.rep -f cy \
-e en -alignment grow-diag-final-and -reordering msd-bidirectional-fe -lm 0:3:/Users/snippy/moses/work/lm/30k-en.lm >& work/training.out &
 ...
 ...
 snippy:moses snippy$ rm -rf work/tuning/mert/filtered/
 snippy:moses snippy$ nohup nice tools/moses-scripts/scripts-20090409-0149/training/mert-moses.pl work/tuning/100.lowercased.cy \
work/tuning/100.lowercased.en tools/moses/moses-cmd/src/moses work/model/moses.ini --working-dir work/tuning/mert \
--rootdir /Users/snippy/moses/tools/moses-scripts/scripts-20090409-0149/ --decoder-flags "-v 0" >& work/tuning/mert.out &