Difference between revisions of "Building a pseudo-parallel corpus"

Revision as of 19:55, 22 August 2012

Acquiring parallel corpora can be a difficult process and for some language pairs such resources might not exist. However, we can use a language model for the target language in order to create pseudo-parallel corpora, and use them in the same way as parallel ones.

IRSTLM

IRSTLM is a tool for building n-gram language models from corpora. It supports different smoothing methods, including Written-Bell smoothing, Kneser-Ney smoothing and others.

The full documentation can be viewed here

Difference between revisions of "Building a pseudo-parallel corpus"

Revision as of 19:55, 22 August 2012

IRSTLM

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools

@@ Line 1: / Line 1: @@
-Acquiring parallel corpora can be a difficult process and for some language pairs such resources might not exist.
+Acquiring parallel corpora can be a difficult process and for some language pairs such resources might not exist. However, we can use a language model for the target language in order to create pseudo-parallel corpora, and use them in the same way as parallel ones.
 == IRSTLM ==
+IRSTLM is a tool for building n-gram language models from corpora. It supports different smoothing methods, including Written-Bell smoothing, Kneser-Ney smoothing and others.
+The full documentation can be viewed [http://sourceforge.net/apps/mediawiki/irstlm/index.php?title=Main_Page here]