Difference between revisions of "Moses"
Jump to navigation
Jump to search
(the new version uses "which" (not that that's best practice either but meh)) |
|||
(6 intermediate revisions by the same user not shown) | |||
Line 3: | Line 3: | ||
{{TOCD}} |
{{TOCD}} |
||
− | == |
+ | ==Prerequisites== |
+ | * [[GIZA++]], see the page for how to compile that. Moses also supports [[mgiza]] as an alternative to Giza. |
||
− | |||
+ | * [[IRSTLM]], see the page for how to compile that, and how to make a language model. |
||
− | * GIZA++ and mkcls http://giza-pp.googlecode.com/files/giza-pp-v1.0.2.tar.gz |
||
⚫ | |||
− | * IRST LM (<code>svn checkout svn://svn.code.sf.net/p/irstlm/code/trunk irstlm</code>) |
||
==Compiling== |
==Compiling== |
||
+ | Do |
||
− | {{see-also|Using GIZA++}} |
||
− | |||
− | ;GIZA++ |
||
− | |||
<pre> |
<pre> |
||
⚫ | |||
− | tar -xzvf giza-pp-v1.0.2.tar.gz |
||
+ | cd mosesdecoder/ |
||
− | cd giza-pp |
||
+ | ./bjam |
||
− | make |
||
− | cp mkcls-v2/mkcls /path/prefix/bin |
||
− | cp GIZA++-v2/GIZA++ /path/prefix/bin |
||
− | cp GIZA++-v2/plain2snt.out /path/prefix/bin |
||
− | cp GIZA++-v2/snt2cooc.out /path/prefix/bin |
||
− | cp GIZA++-v2/snt2plain.out /path/prefix/bin |
||
− | cp GIZA++-v2/trainGIZA++.sh /path/prefix/bin |
||
− | cd .. |
||
</pre> |
</pre> |
||
+ | The bjam part takes a long while. |
||
+ | ==Troubleshooting== |
||
− | ;Moses |
||
+ | If your logs anywhere say anything about UnicodeEncodeError, you might have to do |
||
− | <pre> |
||
− | cd trunk |
||
− | ./regenerate-makefiles.sh |
||
− | ./configure --prefix=/path/prefix |
||
− | make |
||
− | make install |
||
− | cd scripts/training/symal |
||
− | make |
||
− | cp symal giza2bal.pl /path/prefix/bin |
||
− | cd ../../../ |
||
− | cd scripts/training/phrase-extract |
||
− | make |
||
− | cp extract score /path/prefix/bin |
||
− | cd ../../../ |
||
− | </pre> |
||
− | |||
− | Now edit the file <code>scripts/training/train-factored-phrase-model.perl</code> and change the following lines: |
||
− | |||
− | <pre> |
||
− | my $SCRIPTS_ROOTDIR = "/home/fran/source/moses/trunk/scripts/"; |
||
− | |||
− | ... |
||
− | |||
− | # the following line is set installation time by 'make release'. BEWARE! |
||
− | my $BINDIR="/path/prefix/bin"; |
||
− | </pre> |
||
− | |||
− | <pre> |
||
− | cp scripts/training/train-factored-phrase-model.perl /path/prefix/bin/ |
||
− | cp scripts/training/symal/giza2bal.pl /path/prefix/bin/ |
||
− | |||
− | cd .. |
||
− | </pre> |
||
− | |||
− | ;IRSTLM |
||
− | <pre> |
||
− | cd irstlm |
||
− | ./install |
||
− | </pre> |
||
− | |||
− | <pre> |
||
− | cp bin/* /path/prefix/bin/ |
||
− | cp bin/x86_64-pc-linux-gnu/* /path/prefix/bin/ |
||
− | mkdir -p /path/prefix/include |
||
− | cp include/* /path/prefix/include |
||
− | cp lib/x86_64-pc-linux-gnu/libirstlm.a /path/prefix/lib/ |
||
− | cd .. |
||
− | </pre> |
||
− | |||
− | ==Building language model== |
||
− | |||
<pre> |
<pre> |
||
− | export |
+ | export PYTHONIOENCODING=utf-8 |
− | build-lm.sh -i cy.crp.txt -o cy.lm.gz -t /tmp |
||
</pre> |
</pre> |
||
+ | before running train-model.perl (or fix merge_alignments.py yourself) |
||
==See also== |
==See also== |
Latest revision as of 08:57, 29 April 2015
Prerequisites[edit]
- GIZA++, see the page for how to compile that. Moses also supports mgiza as an alternative to Giza.
- IRSTLM, see the page for how to compile that, and how to make a language model.
Compiling[edit]
Do
git clone https://github.com/moses-smt/mosesdecoder cd mosesdecoder/ ./bjam
The bjam part takes a long while.
Troubleshooting[edit]
If your logs anywhere say anything about UnicodeEncodeError, you might have to do
export PYTHONIOENCODING=utf-8
before running train-model.perl (or fix merge_alignments.py yourself)