Building a pseudo-parallel corpus
Acquiring parallel corpora can be a difficult process and for some language pairs such resources might not exist. However, we can use a language model for the target language in order to create pseudo-parallel corpora, and use them in the same way as parallel ones.
IRSTLM is a tool for building n-gram language models from corpora. It supports different smoothing and interpolation methods, including Written-Bell smoothing, Kneser-Ney smoothing and others.