Difference between revisions of "Moses"

From Apertium
Jump to navigation Jump to search
 
(2 intermediate revisions by the same user not shown)
Line 3: Line 3:
{{TOCD}}
{{TOCD}}


==Requisites==
==Prerequisites==
* [[GIZA++]], see the page for how to compile that. Moses also supports [[mgiza]] as an alternative to Giza.

* [[IRSTLM]], see the page for how to compile that, and how to make a language model.
* GIZA++ and mkcls (<code>git clone https://github.com/moses-smt/giza-pp</code>)
* Moses (<code>git clone git@github.com:moses-smt/mosesdecoder.git</code>)
* IRST LM (<code>svn checkout svn://svn.code.sf.net/p/irstlm/code/trunk irstlm</code>)


==Compiling==
==Compiling==
Do
{{see-also|Using GIZA++}}

;GIZA++

<pre>
<pre>
git clone https://github.com/moses-smt/mosesdecoder
tar -xzvf giza-pp-v1.0.2.tar.gz
cd mosesdecoder/
cd giza-pp
./bjam
make
cp mkcls-v2/mkcls /path/prefix/bin
cp GIZA++-v2/GIZA++ /path/prefix/bin
cp GIZA++-v2/plain2snt.out /path/prefix/bin
cp GIZA++-v2/snt2cooc.out /path/prefix/bin
cp GIZA++-v2/snt2plain.out /path/prefix/bin
cp GIZA++-v2/trainGIZA++.sh /path/prefix/bin
cd ..
</pre>

;Moses
<pre>
cd trunk
./regenerate-makefiles.sh
./configure --prefix=/path/prefix
make
make install
cd scripts/training/symal
make
cp symal giza2bal.pl /path/prefix/bin
cd ../../../
cd scripts/training/phrase-extract
make
cp extract score /path/prefix/bin
cd ../../../
</pre>

Now edit the file <code>scripts/training/train-factored-phrase-model.perl</code> and change the following lines:

<pre>
my $SCRIPTS_ROOTDIR = "/home/fran/source/moses/trunk/scripts/";

...

# the following line is set installation time by 'make release'. BEWARE!
my $BINDIR="/path/prefix/bin";
</pre>

<pre>
cp scripts/training/train-factored-phrase-model.perl /path/prefix/bin/
cp scripts/training/symal/giza2bal.pl /path/prefix/bin/

cd ..
</pre>

;IRSTLM
<pre>
cd irstlm
cmake -G "Unix Makefiles" -DCMAKE_INSTALL_PREFIX=/path/prefix
make -j4
make install
</pre>

==Building language model==

<pre>
export IRSTLM=/path/prefix
build-lm.sh -i cy.crp.txt -o cy.lm.gz -t /tmp
</pre>
</pre>
The bjam part takes a long while.


==Troubleshooting==
==Troubleshooting==
If your logs anywhere say anything about UnicodeEncodeError, you might have to do
<pre>
<pre>
do
export PYTHONIOENCODING=utf-8
export PYTHONIOENCODING=utf-8
</pre>
before running train-model.perl (or fix merge_alignments.py yourself)
before running train-model.perl (or fix merge_alignments.py yourself)
</pre>


==See also==
==See also==

Latest revision as of 08:57, 29 April 2015

En français

Prerequisites[edit]

  • GIZA++, see the page for how to compile that. Moses also supports mgiza as an alternative to Giza.
  • IRSTLM, see the page for how to compile that, and how to make a language model.

Compiling[edit]

Do

git clone https://github.com/moses-smt/mosesdecoder
cd mosesdecoder/
./bjam 

The bjam part takes a long while.

Troubleshooting[edit]

If your logs anywhere say anything about UnicodeEncodeError, you might have to do

export PYTHONIOENCODING=utf-8

before running train-model.perl (or fix merge_alignments.py yourself)

See also[edit]

External links[edit]