Difference between revisions of "Moses"

From Apertium
Jump to navigation Jump to search
(the new version uses "which" (not that that's best practice either but meh))
 
(6 intermediate revisions by the same user not shown)
Line 3: Line 3:
 
{{TOCD}}
 
{{TOCD}}
   
==Requisites==
+
==Prerequisites==
  +
* [[GIZA++]], see the page for how to compile that. Moses also supports [[mgiza]] as an alternative to Giza.
 
  +
* [[IRSTLM]], see the page for how to compile that, and how to make a language model.
* GIZA++ and mkcls http://giza-pp.googlecode.com/files/giza-pp-v1.0.2.tar.gz
 
* Moses (<code>git clone git@github.com:moses-smt/mosesdecoder.git</code>)
 
* IRST LM (<code>svn checkout svn://svn.code.sf.net/p/irstlm/code/trunk irstlm</code>)
 
   
 
==Compiling==
 
==Compiling==
  +
Do
{{see-also|Using GIZA++}}
 
 
;GIZA++
 
 
 
<pre>
 
<pre>
 
git clone https://github.com/moses-smt/mosesdecoder
tar -xzvf giza-pp-v1.0.2.tar.gz
 
  +
cd mosesdecoder/
cd giza-pp
 
  +
./bjam
make
 
cp mkcls-v2/mkcls /path/prefix/bin
 
cp GIZA++-v2/GIZA++ /path/prefix/bin
 
cp GIZA++-v2/plain2snt.out /path/prefix/bin
 
cp GIZA++-v2/snt2cooc.out /path/prefix/bin
 
cp GIZA++-v2/snt2plain.out /path/prefix/bin
 
cp GIZA++-v2/trainGIZA++.sh /path/prefix/bin
 
cd ..
 
 
</pre>
 
</pre>
  +
The bjam part takes a long while.
   
  +
==Troubleshooting==
;Moses
 
  +
If your logs anywhere say anything about UnicodeEncodeError, you might have to do
<pre>
 
cd trunk
 
./regenerate-makefiles.sh
 
./configure --prefix=/path/prefix
 
make
 
make install
 
cd scripts/training/symal
 
make
 
cp symal giza2bal.pl /path/prefix/bin
 
cd ../../../
 
cd scripts/training/phrase-extract
 
make
 
cp extract score /path/prefix/bin
 
cd ../../../
 
</pre>
 
 
Now edit the file <code>scripts/training/train-factored-phrase-model.perl</code> and change the following lines:
 
 
<pre>
 
my $SCRIPTS_ROOTDIR = "/home/fran/source/moses/trunk/scripts/";
 
 
...
 
 
# the following line is set installation time by 'make release'. BEWARE!
 
my $BINDIR="/path/prefix/bin";
 
</pre>
 
 
<pre>
 
cp scripts/training/train-factored-phrase-model.perl /path/prefix/bin/
 
cp scripts/training/symal/giza2bal.pl /path/prefix/bin/
 
 
cd ..
 
</pre>
 
 
;IRSTLM
 
<pre>
 
cd irstlm
 
./install
 
</pre>
 
 
<pre>
 
cp bin/* /path/prefix/bin/
 
cp bin/x86_64-pc-linux-gnu/* /path/prefix/bin/
 
mkdir -p /path/prefix/include
 
cp include/* /path/prefix/include
 
cp lib/x86_64-pc-linux-gnu/libirstlm.a /path/prefix/lib/
 
cd ..
 
</pre>
 
 
==Building language model==
 
 
 
<pre>
 
<pre>
export IRSTLM=/path/prefix
+
export PYTHONIOENCODING=utf-8
build-lm.sh -i cy.crp.txt -o cy.lm.gz -t /tmp
 
 
</pre>
 
</pre>
  +
before running train-model.perl (or fix merge_alignments.py yourself)
   
 
==See also==
 
==See also==

Latest revision as of 08:57, 29 April 2015

En français

Prerequisites[edit]

  • GIZA++, see the page for how to compile that. Moses also supports mgiza as an alternative to Giza.
  • IRSTLM, see the page for how to compile that, and how to make a language model.

Compiling[edit]

Do

git clone https://github.com/moses-smt/mosesdecoder
cd mosesdecoder/
./bjam 

The bjam part takes a long while.

Troubleshooting[edit]

If your logs anywhere say anything about UnicodeEncodeError, you might have to do

export PYTHONIOENCODING=utf-8

before running train-model.perl (or fix merge_alignments.py yourself)

See also[edit]

External links[edit]