Assimilation and Dissemination

From Apertium
Jump to navigation Jump to search

MT systems are useful for mainly two tasks:[1]

  • creating almost-translated text that needs post-editing before being publishable, and
  • creating an understandable translation of a foreign text for getting the gist of some text.

In the gisting task the user does not know the foreign language, while in the post-editing task, the user is a translator who does know the foreign/source language (and will often be reading both the source text and the translation).

MT of post-editing quality into a lesser-resourced language can help with creating more text in that language. We call this dissemination.

MT of gisting quality from a lesser-resourced language can help with letting people who don't speak that language understand text (e.g. blogs, news articles) written in that language (thus removing an argument against writing in the lesser-resourced language). We call this assimilation.

Note that for many non-central/lesser-resourced/minority languages, the speakers are multilingual and understand the "central" language as well, e.g. North Sámi speakers typically understand Norwegian (but not vice versa); even for mutually intelligible, closely related language pairs, those who write the non-central language more often know how to write central language than vice versa, due to the central language having more visibility, being included in education, etc.

A non-central language being included in an MT system can thus create more text in that language, as well as raise its profile, increase its visibility and normalcy and even contribute to standardisation.[2]

How to make an MT system useful[edit]

One way of making MT systems more useful is to make their applications/use-cases more restricted[3]. The "ideal" MT system would do Fully Automatic High Quality MT of Unrestricted text between any language pair. No such system exists :-) The systems that are considered successful are the ones that have removed part of the ideal requirements. For example:

  • Meteo was successful already in the early 80's by translating only a restricted domain (weather reports).
    • Molto/GF has a similar focus
  • Some systems are successful because they focus on gisting, where the demand for quality is a lot lower (a reader won't mind much if you always leave out the word "the" when translating from Russian, whereas a post-editor would find it a chore).
    • Apertium pairs such as eus-spa, sme-nob and tat-rus do this (where all these language pairs are in a non-central/central situation).
  • If we focus on translating between closely-related languages, the task of getting post-editable-quality results can be easier due to more overlap in vocabulary and syntax etc.
    • This has historically been Apertium's niche.


  1. There are of course other practical uses, e.g. MT-supported language learning (see Geriouaeg).
  2. Forcada, Mikel L. (2006) "Open-source machine translation: an opportunity for minor languages" in B. Williams (ed.): Proceedings of the Workshop "Strategies for developing machine translation for minority languages (5th SALTMIL workshop on Minority Languages)" (organised in conjunction with LREC 2006 (22-28.05.2006)). Genoa, Italy, pp. 1-6.
  3. Church and Hovy (1993), "Good Applications for Crummy Machine Translation"