Difference between revisions of "User:Snippyhollow"

From Apertium
Jump to navigation Jump to search
 
(4 intermediate revisions by 2 users not shown)
Line 1: Line 1:
I'm a Computer Science student from France. This year, I did both an engineering degree (ENSIMAG) and a Master of C.S. spec. in Artificial Intelligence and Web. I am currently doing my Master's research internship at National Institute of Informatics in Tokyo on Inductive Logic Programming applied to biology systems.
I'm a Computer Science student from France. This year, I did both an engineering degree (ENSIMAG) and a Master of C.S. spec. in Artificial Intelligence and Web. I am currently doing my Master's research internship at National Institute of Informatics in Tokyo on Inductive Logic Programming applied to biology systems.


===Basic pruning===


Needed files:
I have to put it somewhere (my workflow for training a lm):


* replace.py (in apertium-combine/scripts/)
* split_prune.sh (in apertium-combine/scripts/)
* pruning.cc (in apertium-combine/pruning/)
* your lowercased corpus (see [http://ufallab2.ms.mff.cuni.cz/~bojar/teaching/NPFL087/export/HEAD/lectures/02-phrase-based-Moses-installation-tutorial.html here])

python replace.py lowercase.en lowercase.cy
then use this "lowercase.rep.en" and "lowercase.rep.cy" for the following of the phrase-table building (see below, Workflow for building a phrase-table)
sh split_prune.sh phrase-table
while having set the "exe" inside the script to the good path to your compiled pruning.cc (g++ -O3 apertium-combine/pruning/pruning.cc)


===Workflow for building a phrase-table===

<pre>
snippy:moses snippy$ python replace.py work/corpus/30k.lowercased.en work/corpus/30k.lowercased.cy
snippy:moses snippy$ python replace.py work/corpus/30k.lowercased.en work/corpus/30k.lowercased.cy
snippy:moses snippy$ build-lm.sh -i work/corpus/30k.lowercased.rep.en -n 3 -o work/lm/30k-en.ilm.gz
snippy:moses snippy$ build-lm.sh -i work/corpus/30k.lowercased.rep.en -n 3 -o work/lm/30k-en.ilm.gz
snippy:moses snippy$ compile-lm work/lm/30k-en.ilm.gz --text yes work/lm/30k-en.lm
snippy:moses snippy$ compile-lm work/lm/30k-en.ilm.gz --text yes work/lm/30k-en.lm
snippy:moses snippy$ rm work/model/*
snippy:moses snippy$ rm work/model/*
snippy:moses snippy$ nohup nice tools/moses-scripts/scripts-20090409-0149/training/train-factored-phrase-model.perl -scripts-root-dir tools/moses-scripts/scripts-20090409-0149/ -root-dir work -corpus work/corpus/30k.lowercased.rep -f cy -e en -alignment grow-diag-final-and -reordering msd-bidirectional-fe -lm 0:3:/Users/snippy/moses/work/lm/30k-en.lm >& work/training.out &
snippy:moses snippy$ nohup nice tools/moses-scripts/scripts-20090409-0149/training/train-factored-phrase-model.perl \
-scripts-root-dir tools/moses-scripts/scripts-20090409-0149/ -root-dir work -corpus work/corpus/30k.lowercased.rep -f cy \
-e en -alignment grow-diag-final-and -reordering msd-bidirectional-fe -lm 0:3:/Users/snippy/moses/work/lm/30k-en.lm >& work/training.out &
...
...
...
snippy:moses snippy$ rm -rf work/tuning/mert/filtered/
snippy:moses snippy$ nohup nice tools/moses-scripts/scripts-20090409-0149/training/mert-moses.pl work/tuning/100.lowercased.cy work/tuning/100.lowercased.en tools/moses/moses-cmd/src/moses work/model/moses.ini --working-dir work/tuning/mert --rootdir /Users/snippy/moses/tools/moses-scripts/scripts-20090409-0149/ --decoder-flags "-v 0" >& work/tuning/mert.out &
snippy:moses snippy$ nohup nice tools/moses-scripts/scripts-20090409-0149/training/mert-moses.pl work/tuning/100.lowercased.cy \
work/tuning/100.lowercased.en tools/moses/moses-cmd/src/moses work/model/moses.ini --working-dir work/tuning/mert \
--rootdir /Users/snippy/moses/tools/moses-scripts/scripts-20090409-0149/ --decoder-flags "-v 0" >& work/tuning/mert.out &
</pre>

Latest revision as of 16:46, 26 May 2009

I'm a Computer Science student from France. This year, I did both an engineering degree (ENSIMAG) and a Master of C.S. spec. in Artificial Intelligence and Web. I am currently doing my Master's research internship at National Institute of Informatics in Tokyo on Inductive Logic Programming applied to biology systems.

Basic pruning[edit]

Needed files:

  • replace.py (in apertium-combine/scripts/)
  • split_prune.sh (in apertium-combine/scripts/)
  • pruning.cc (in apertium-combine/pruning/)
  • your lowercased corpus (see here)
python replace.py lowercase.en lowercase.cy

then use this "lowercase.rep.en" and "lowercase.rep.cy" for the following of the phrase-table building (see below, Workflow for building a phrase-table)

sh split_prune.sh phrase-table

while having set the "exe" inside the script to the good path to your compiled pruning.cc (g++ -O3 apertium-combine/pruning/pruning.cc)


Workflow for building a phrase-table[edit]

 snippy:moses snippy$ python replace.py work/corpus/30k.lowercased.en work/corpus/30k.lowercased.cy 
 snippy:moses snippy$ build-lm.sh -i work/corpus/30k.lowercased.rep.en -n 3 -o work/lm/30k-en.ilm.gz
 snippy:moses snippy$ compile-lm work/lm/30k-en.ilm.gz --text yes work/lm/30k-en.lm
 snippy:moses snippy$ rm work/model/*
 snippy:moses snippy$ nohup nice tools/moses-scripts/scripts-20090409-0149/training/train-factored-phrase-model.perl \ 
-scripts-root-dir tools/moses-scripts/scripts-20090409-0149/ -root-dir work -corpus work/corpus/30k.lowercased.rep -f cy \
-e en -alignment grow-diag-final-and -reordering msd-bidirectional-fe -lm 0:3:/Users/snippy/moses/work/lm/30k-en.lm >& work/training.out &
 ...
 ...
 snippy:moses snippy$ rm -rf work/tuning/mert/filtered/
 snippy:moses snippy$ nohup nice tools/moses-scripts/scripts-20090409-0149/training/mert-moses.pl work/tuning/100.lowercased.cy \
work/tuning/100.lowercased.en tools/moses/moses-cmd/src/moses work/model/moses.ini --working-dir work/tuning/mert \
--rootdir /Users/snippy/moses/tools/moses-scripts/scripts-20090409-0149/ --decoder-flags "-v 0" >& work/tuning/mert.out &