Difference between revisions of "User:Snippyhollow"
Jump to navigation
Jump to search
Snippyhollow (talk | contribs) |
Snippyhollow (talk | contribs) |
||
(6 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
I'm a Computer Science student from France. This year, I did both an engineering degree (ENSIMAG) and a Master of C.S. spec. in Artificial Intelligence and Web. I am currently doing my Master's research internship at National Institute of Informatics in Tokyo on Inductive Logic Programming applied to biology systems. |
I'm a Computer Science student from France. This year, I did both an engineering degree (ENSIMAG) and a Master of C.S. spec. in Artificial Intelligence and Web. I am currently doing my Master's research internship at National Institute of Informatics in Tokyo on Inductive Logic Programming applied to biology systems. |
||
===Basic pruning=== |
|||
Needed files: |
|||
I have to put it somewhere (my workflow for training a lm): |
|||
* replace.py (in apertium-combine/scripts/) |
|||
* split_prune.sh (in apertium-combine/scripts/) |
|||
* pruning.cc (in apertium-combine/pruning/) |
|||
* your lowercased corpus (see [http://ufallab2.ms.mff.cuni.cz/~bojar/teaching/NPFL087/export/HEAD/lectures/02-phrase-based-Moses-installation-tutorial.html here]) |
|||
python replace.py lowercase.en lowercase.cy |
|||
then use this "lowercase.rep.en" and "lowercase.rep.cy" for the following of the phrase-table building (see below, Workflow for building a phrase-table) |
|||
sh split_prune.sh phrase-table |
|||
while having set the "exe" inside the script to the good path to your compiled pruning.cc (g++ -O3 apertium-combine/pruning/pruning.cc) |
|||
===Workflow for building a phrase-table=== |
|||
<pre> |
|||
snippy:moses snippy$ python replace.py work/corpus/30k.lowercased.en work/corpus/30k.lowercased.cy |
snippy:moses snippy$ python replace.py work/corpus/30k.lowercased.en work/corpus/30k.lowercased.cy |
||
snippy:moses snippy$ build-lm.sh -i work/corpus/30k.lowercased.rep.en -n 3 -o work/lm/30k-en.ilm.gz |
snippy:moses snippy$ build-lm.sh -i work/corpus/30k.lowercased.rep.en -n 3 -o work/lm/30k-en.ilm.gz |
||
snippy:moses snippy$ compile-lm work/lm/30k-en.ilm.gz --text yes work/lm/30k-en.lm |
snippy:moses snippy$ compile-lm work/lm/30k-en.ilm.gz --text yes work/lm/30k-en.lm |
||
snippy:moses snippy$ rm work/model/* |
|||
snippy:moses snippy$ nohup nice tools/moses-scripts/scripts-20090409-0149/training/train-factored-phrase-model.perl -scripts-root-dir tools/moses-scripts/scripts-20090409-0149/ -root-dir work -corpus work/corpus/30k.lowercased.rep -f cy -e en -alignment grow-diag-final-and -reordering msd-bidirectional-fe -lm 0:3:/Users/snippy/moses/work/lm/30k-en.lm >& work/training.out & |
|||
snippy:moses snippy$ nohup nice tools/moses-scripts/scripts-20090409-0149/training/train-factored-phrase-model.perl \ |
|||
-scripts-root-dir tools/moses-scripts/scripts-20090409-0149/ -root-dir work -corpus work/corpus/30k.lowercased.rep -f cy \ |
|||
-e en -alignment grow-diag-final-and -reordering msd-bidirectional-fe -lm 0:3:/Users/snippy/moses/work/lm/30k-en.lm >& work/training.out & |
|||
... |
|||
... |
|||
snippy:moses snippy$ rm -rf work/tuning/mert/filtered/ |
|||
snippy:moses snippy$ nohup nice tools/moses-scripts/scripts-20090409-0149/training/mert-moses.pl work/tuning/100.lowercased.cy \ |
|||
work/tuning/100.lowercased.en tools/moses/moses-cmd/src/moses work/model/moses.ini --working-dir work/tuning/mert \ |
|||
--rootdir /Users/snippy/moses/tools/moses-scripts/scripts-20090409-0149/ --decoder-flags "-v 0" >& work/tuning/mert.out & |
|||
</pre> |
Latest revision as of 16:46, 26 May 2009
I'm a Computer Science student from France. This year, I did both an engineering degree (ENSIMAG) and a Master of C.S. spec. in Artificial Intelligence and Web. I am currently doing my Master's research internship at National Institute of Informatics in Tokyo on Inductive Logic Programming applied to biology systems.
Basic pruning[edit]
Needed files:
- replace.py (in apertium-combine/scripts/)
- split_prune.sh (in apertium-combine/scripts/)
- pruning.cc (in apertium-combine/pruning/)
- your lowercased corpus (see here)
python replace.py lowercase.en lowercase.cy
then use this "lowercase.rep.en" and "lowercase.rep.cy" for the following of the phrase-table building (see below, Workflow for building a phrase-table)
sh split_prune.sh phrase-table
while having set the "exe" inside the script to the good path to your compiled pruning.cc (g++ -O3 apertium-combine/pruning/pruning.cc)
Workflow for building a phrase-table[edit]
snippy:moses snippy$ python replace.py work/corpus/30k.lowercased.en work/corpus/30k.lowercased.cy snippy:moses snippy$ build-lm.sh -i work/corpus/30k.lowercased.rep.en -n 3 -o work/lm/30k-en.ilm.gz snippy:moses snippy$ compile-lm work/lm/30k-en.ilm.gz --text yes work/lm/30k-en.lm snippy:moses snippy$ rm work/model/* snippy:moses snippy$ nohup nice tools/moses-scripts/scripts-20090409-0149/training/train-factored-phrase-model.perl \ -scripts-root-dir tools/moses-scripts/scripts-20090409-0149/ -root-dir work -corpus work/corpus/30k.lowercased.rep -f cy \ -e en -alignment grow-diag-final-and -reordering msd-bidirectional-fe -lm 0:3:/Users/snippy/moses/work/lm/30k-en.lm >& work/training.out & ... ... snippy:moses snippy$ rm -rf work/tuning/mert/filtered/ snippy:moses snippy$ nohup nice tools/moses-scripts/scripts-20090409-0149/training/mert-moses.pl work/tuning/100.lowercased.cy \ work/tuning/100.lowercased.en tools/moses/moses-cmd/src/moses work/model/moses.ini --working-dir work/tuning/mert \ --rootdir /Users/snippy/moses/tools/moses-scripts/scripts-20090409-0149/ --decoder-flags "-v 0" >& work/tuning/mert.out &