Difference between revisions of "Курсы машинного перевода для языков России/Session 8"

From Apertium
Jump to navigation Jump to search
Line 25: Line 25:
 
** Assimilation: For giving an idea of what a text is about
 
** Assimilation: For giving an idea of what a text is about
 
** Dissemination: For producing draft translations
 
** Dissemination: For producing draft translations
** Domain: Will it be used for news texts, encyclopaedic texts, legal texts, etc. ?
+
** Domain: Will it be used for news texts, encyclopaedic texts, legal texts, the weather, etc. ?
 
* What existing linguistic resources can be reused ?
 
* What existing linguistic resources can be reused ?
 
** Are there already good, free dictionaries available ?
 
** Are there already good, free dictionaries available ?
 
* How long do we have to build the system ?
 
* How long do we have to build the system ?
  +
** Six months is probably not enough enough time to build your ideal wide-domain interlingua MT system between all the languages of the Middle Volga, ...
  +
*** ... but it might be enough to build some prototype systems for translating the weather forecast.
   
 
If a system is being planned to translate governmental texts for dissemination,
 
If a system is being planned to translate governmental texts for dissemination,

Revision as of 15:27, 8 January 2012

Case studies

The following case studies highlight different development styles.

Spanish and Catalan

Norwegian Nynorsk and Norwegian Bokmål

Breton and French

...

Existing resources

Objectives

It is important when starting a project to ask questions and clearly define objectives, for example,

  • Who is the target audience ?
    • Do we want the system to be used by professional translators, by lay-translators, by ordinary members of the public ?
  • What is the system intended to be used for ?
    • Assimilation: For giving an idea of what a text is about
    • Dissemination: For producing draft translations
    • Domain: Will it be used for news texts, encyclopaedic texts, legal texts, the weather, etc. ?
  • What existing linguistic resources can be reused ?
    • Are there already good, free dictionaries available ?
  • How long do we have to build the system ?
    • Six months is probably not enough enough time to build your ideal wide-domain interlingua MT system between all the languages of the Middle Volga, ...
      • ... but it might be enough to build some prototype systems for translating the weather forecast.

If a system is being planned to translate governmental texts for dissemination, then it will necessarily have different features than if it is planned to translate Wikipedia articles for assimilation.

Making a high-coverage, open domain system for assimilation and dissemination is a nice idea, but practically is not possible given limited resources.

Time

The amount of time taken to make a new language pair based on the Apertium platform depends greatly on the objectives of the project, the existing resources available and the experience of the developers. A prototype or proof of concept system can be created in anywhere from 10 days to 3 months. Whereas a production system can take from 3 months (as in the case of Nynorsk-Bokmål) to several years.

Year Total pairs New pairs Language pairs
2005 3 3 es-ca, es-gl, es-pt
2006 6 3 es-ca, es-gl, es-pt, en-ca, fr-ca, oc-ca
2007 8 2 es-ca, es-gl, es-pt, en-ca, fr-ca, oc-ca, es-ro, fr-es
2008 18 10 es-ca, es-gl, es-pt, en-ca, fr-ca, oc-ca, es-ro, fr-es, oc-es, en-gl, cy-en, eo-ca, eo-es, eu-es, pt-gl, eo-en, en-es, pt-ca
2009 21 3 es-ca, es-gl, es-pt, en-ca, fr-ca, oc-ca, es-ro, fr-es, oc-es, en-gl, cy-en, eo-ca, eo-es, eu-es, pt-gl, eo-en, en-es, pt-ca, nn-nb, sv-da, br-fr
2010 23 2 es-ca, es-gl, es-pt, en-ca, fr-ca, oc-ca, es-ro, fr-es, oc-es, en-gl, cy-en, eo-ca, eo-es, eu-es, pt-gl, eo-en, en-es, pt-ca, nn-nb, sv-da, br-fr, es-ast, mk-bg
2011 33 10 es-ca, es-gl, es-pt, en-ca, fr-ca, oc-ca, es-ro, fr-es, oc-es, en-gl, cy-en, eo-ca, eo-es, eu-es, pt-gl, eo-en, en-es, pt-ca, nn-nb, sv-da, br-fr, es-ast, mk-bg, is-en, ca-it, eo-fr, mk-en, es-an, eu-en, es-it, sh-mk, tr-az, tr-ky

Funding

Apertium language pairs have been funded and created in many different ways.

The following table summarises the ways in which development on the "stable" machine translation systems in Apertium was funding. The most used systems online are highlighted in bold face.

Funder Type Language pair(s)
Google Summer of Code Competition mk-bg, nn-nb, sh-mk, sv-da, tr-az, tr-ky
Volunteers eo-fr, es-an, mk-en, ca-it, eo-en
Generalitat de Catalunya Regional government oc-ca, oc-es, en-ca,
Thesis / Dissertation cy-en, fr-ca, pt-ca
Universitat d'Alacant Educational institution eu-es, (br-fr), es-pt
Ministry of Industry, Commerce and Tourism National government es-ca, es-gl
ABC Enciklopedioj Company eo-es, eo-ca
imaxin|software Company en-gl, pt-gl
Universidá d'Uviéu Educational institution es-ast
Prompsit Company es-it, (br-fr), (fr-es)
Eleka Ingenieritza Linguistikoa Company fr-es
Icelandic Research Council National government is-en
Ofis ar Brezhoneg Quasi non-governmental organisation br-fr
European Assoc. Machine Translation Non-governmental organisation eu-en