Difference between revisions of "Moses"

From Apertium
Jump to navigation Jump to search
(the new version uses "which" (not that that's best practice either but meh))
 
(6 intermediate revisions by the same user not shown)
Line 3: Line 3:
{{TOCD}}
{{TOCD}}


==Requisites==
==Prerequisites==
* [[GIZA++]], see the page for how to compile that. Moses also supports [[mgiza]] as an alternative to Giza.

* [[IRSTLM]], see the page for how to compile that, and how to make a language model.
* GIZA++ and mkcls http://giza-pp.googlecode.com/files/giza-pp-v1.0.2.tar.gz
* Moses (<code>git clone git@github.com:moses-smt/mosesdecoder.git</code>)
* IRST LM (<code>svn checkout svn://svn.code.sf.net/p/irstlm/code/trunk irstlm</code>)


==Compiling==
==Compiling==
Do
{{see-also|Using GIZA++}}

;GIZA++

<pre>
<pre>
git clone https://github.com/moses-smt/mosesdecoder
tar -xzvf giza-pp-v1.0.2.tar.gz
cd mosesdecoder/
cd giza-pp
./bjam
make
cp mkcls-v2/mkcls /path/prefix/bin
cp GIZA++-v2/GIZA++ /path/prefix/bin
cp GIZA++-v2/plain2snt.out /path/prefix/bin
cp GIZA++-v2/snt2cooc.out /path/prefix/bin
cp GIZA++-v2/snt2plain.out /path/prefix/bin
cp GIZA++-v2/trainGIZA++.sh /path/prefix/bin
cd ..
</pre>
</pre>
The bjam part takes a long while.


==Troubleshooting==
;Moses
If your logs anywhere say anything about UnicodeEncodeError, you might have to do
<pre>
cd trunk
./regenerate-makefiles.sh
./configure --prefix=/path/prefix
make
make install
cd scripts/training/symal
make
cp symal giza2bal.pl /path/prefix/bin
cd ../../../
cd scripts/training/phrase-extract
make
cp extract score /path/prefix/bin
cd ../../../
</pre>

Now edit the file <code>scripts/training/train-factored-phrase-model.perl</code> and change the following lines:

<pre>
my $SCRIPTS_ROOTDIR = "/home/fran/source/moses/trunk/scripts/";

...

# the following line is set installation time by 'make release'. BEWARE!
my $BINDIR="/path/prefix/bin";
</pre>

<pre>
cp scripts/training/train-factored-phrase-model.perl /path/prefix/bin/
cp scripts/training/symal/giza2bal.pl /path/prefix/bin/

cd ..
</pre>

;IRSTLM
<pre>
cd irstlm
./install
</pre>

<pre>
cp bin/* /path/prefix/bin/
cp bin/x86_64-pc-linux-gnu/* /path/prefix/bin/
mkdir -p /path/prefix/include
cp include/* /path/prefix/include
cp lib/x86_64-pc-linux-gnu/libirstlm.a /path/prefix/lib/
cd ..
</pre>

==Building language model==

<pre>
<pre>
export IRSTLM=/path/prefix
export PYTHONIOENCODING=utf-8
build-lm.sh -i cy.crp.txt -o cy.lm.gz -t /tmp
</pre>
</pre>
before running train-model.perl (or fix merge_alignments.py yourself)


==See also==
==See also==

Latest revision as of 08:57, 29 April 2015

En français

Prerequisites[edit]

  • GIZA++, see the page for how to compile that. Moses also supports mgiza as an alternative to Giza.
  • IRSTLM, see the page for how to compile that, and how to make a language model.

Compiling[edit]

Do

git clone https://github.com/moses-smt/mosesdecoder
cd mosesdecoder/
./bjam 

The bjam part takes a long while.

Troubleshooting[edit]

If your logs anywhere say anything about UnicodeEncodeError, you might have to do

export PYTHONIOENCODING=utf-8

before running train-model.perl (or fix merge_alignments.py yourself)

See also[edit]

External links[edit]