Difference between revisions of "Moses"
Jump to navigation
Jump to search
(14 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
[[L'outil Moses|En français]] |
|||
{{TOCD}} |
{{TOCD}} |
||
== |
==Prerequisites== |
||
* [[GIZA++]], see the page for how to compile that. Moses also supports [[mgiza]] as an alternative to Giza. |
|||
* [[IRSTLM]], see the page for how to compile that, and how to make a language model. |
|||
* GIZA++ and mkcls http://giza-pp.googlecode.com/files/giza-pp-v1.0.2.tar.gz |
|||
* Moses (<code>svn co https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk</code>) |
|||
* IRST LM (<code> svn co https://irstlm.svn.sourceforge.net/svnroot/irstlm</code>) |
|||
==Compiling== |
==Compiling== |
||
Do |
|||
⚫ | |||
<pre> |
<pre> |
||
git clone https://github.com/moses-smt/mosesdecoder |
|||
tar -xzvf giza-pp-v1.0.2.tar.gz |
|||
cd mosesdecoder/ |
|||
cd giza-pp |
|||
./bjam |
|||
make |
|||
cp mkcls-v2/mkcls /path/prefix/bin |
|||
cp GIZA++-v2/GIZA++ /path/prefix/bin |
|||
cp GIZA++-v2/plain2snt.out /path/prefix/bin |
|||
cp GIZA++-v2/snt2cooc.out /path/prefix/bin |
|||
cp GIZA++-v2/snt2plain.out /path/prefix/bin |
|||
cp GIZA++-v2/trainGIZA++.sh /path/prefix/bin |
|||
cd .. |
|||
cd trunk |
|||
./regenerate-makefiles.sh |
|||
./configure --prefix=/path/prefix |
|||
make |
|||
make install |
|||
cd scripts/training/symal |
|||
make |
|||
cd ../../../ |
|||
cd scripts/training/phrase-extract |
|||
make |
|||
cd ../../../ |
|||
</pre> |
</pre> |
||
The bjam part takes a long while. |
|||
==Troubleshooting== |
|||
Now edit the file <code>scripts/training/train-factored-phrase-model.perl</code> and change the following lines: |
|||
If your logs anywhere say anything about UnicodeEncodeError, you might have to do |
|||
<pre> |
<pre> |
||
export PYTHONIOENCODING=utf-8 |
|||
my $SCRIPTS_ROOTDIR = "/home/fran/source/moses/trunk/scripts/"; |
|||
... |
|||
# the following line is set installation time by 'make release'. BEWARE! |
|||
my $BINDIR="/path/prefix/bin"; |
|||
</pre> |
</pre> |
||
before running train-model.perl (or fix merge_alignments.py yourself) |
|||
⚫ | |||
<pre> |
|||
cp scripts/training/train-factored-phrase-model.perl /path/prefix/bin/ |
|||
cp scripts/training/symal/giza2bal.pl /path/prefix/bin/ |
|||
⚫ | |||
cd .. |
|||
==External links== |
|||
cd irstlm |
|||
./install |
|||
</pre> |
|||
* [http://www.statmt.org/wmt08/baseline.html WMT08 Baseline system] |
|||
Now edit the file in <code>scripts/build-sublm.pl</code> and check the location of gzip, |
|||
<pre> |
|||
my $gzip="/usr/bin/gzip"; |
|||
my $gunzip="/usr/bin/gunzip"; |
|||
</pre> |
|||
On Debian systems, <code>gzip</code> and <code>gunzip</code> are found in <code>/bin</code> |
|||
<pre> |
|||
cp bin/* /path/prefix/bin/ |
|||
cp bin/x86_64-pc-linux-gnu/* /path/prefix/bin/ |
|||
mkdir -p /path/prefix/include |
|||
cp include/* /path/prefix/include |
|||
cp lib/x86_64-pc-linux-gnu/libirstlm.a /path/prefix/lib/ |
|||
cd .. |
|||
</pre> |
|||
==Building language model== |
|||
<pre> |
|||
export IRSTLM=/path/prefix |
|||
build-lm.sh -i cy.crp.txt -o cy.lm.gz -t /tmp |
|||
</pre> |
|||
⚫ | |||
* [[Using GIZA++]] |
|||
[[Category:Tools]] |
[[Category:Tools]] |
||
[[Category:Documentation in English]] |
Latest revision as of 08:57, 29 April 2015
Prerequisites[edit]
- GIZA++, see the page for how to compile that. Moses also supports mgiza as an alternative to Giza.
- IRSTLM, see the page for how to compile that, and how to make a language model.
Compiling[edit]
Do
git clone https://github.com/moses-smt/mosesdecoder cd mosesdecoder/ ./bjam
The bjam part takes a long while.
Troubleshooting[edit]
If your logs anywhere say anything about UnicodeEncodeError, you might have to do
export PYTHONIOENCODING=utf-8
before running train-model.perl (or fix merge_alignments.py yourself)