Difference between revisions of "Building a pseudo-parallel corpus"
Jump to navigation
Jump to search
Fpetkovski (talk | contribs) (Created page with 'Acquiring parallel corpora can be a difficult process and for some language pairs such resources might not exist. == IRSTLM ==') |
Fpetkovski (talk | contribs) |
||
Line 1: | Line 1: | ||
Acquiring parallel corpora can be a difficult process and for some language pairs such resources might not exist. |
Acquiring parallel corpora can be a difficult process and for some language pairs such resources might not exist. However, we can use a language model for the target language in order to create pseudo-parallel corpora, and use them in the same way as parallel ones. |
||
== IRSTLM == |
== IRSTLM == |
||
IRSTLM is a tool for building n-gram language models from corpora. It supports different smoothing methods, including Written-Bell smoothing, Kneser-Ney smoothing and others. |
|||
The full documentation can be viewed [http://sourceforge.net/apps/mediawiki/irstlm/index.php?title=Main_Page here] |
Revision as of 19:55, 22 August 2012
Acquiring parallel corpora can be a difficult process and for some language pairs such resources might not exist. However, we can use a language model for the target language in order to create pseudo-parallel corpora, and use them in the same way as parallel ones.
IRSTLM
IRSTLM is a tool for building n-gram language models from corpora. It supports different smoothing methods, including Written-Bell smoothing, Kneser-Ney smoothing and others.
The full documentation can be viewed here