Difference between revisions of "Promotion HQ"

From Apertium
Jump to navigation Jump to search
Line 5: Line 5:


* The use of lttoolbox to develop analysers for under-resourced languages (e.g. Welsh/Afrikaans ...)
* The use of lttoolbox to develop analysers for under-resourced languages (e.g. Welsh/Afrikaans ...)
* <s>Retrieving bilingual dictionary entries using Wikipedia interwiki links.</s>
* Open-source Afrikaans-English machine translation
* Longest-match left-to-right compound splitting in the context of Afrikaans-English machine translation.
* Retrieving bilingual dictionary entries using Wikipedia interwiki links.
* On pragmatic dealing with MWEs
* On pragmatic dealing with MWEs
* On Spanish-French, Catalan-French
* On Spanish-French, Catalan-French
Line 23: Line 21:
* Slovenian <-> Serbo-Croatian <-> Macedonian <-> Bulgarian (South-Slavic dialect continuum)
* Slovenian <-> Serbo-Croatian <-> Macedonian <-> Bulgarian (South-Slavic dialect continuum)
* Afrikaans <-> Dutch
* Afrikaans <-> Dutch
* Irish <-> Scots Gaelic &mdash; Kevin Scannell already has a system, but it could be Apertiumised.
* Irish <-> Scots Gaelic &mdash; Kevin Scannell already has a system, but it could be Apertiumised.
::See [[Scots Gaelic]] and the [[Incubator]]
:: see [[Scottish Gaelic and Irish]]
* Finnish <-> Estonian (Balto-Finnic, with [[agglutinative morphology]])
* Finnish <-> Estonian (Balto-Finnic, with [[agglutinative morphology]])
* Romanian <-> Aromanian
* Romanian <-> Aromanian
Line 42: Line 40:
* Indonesian <-> Malaysian
* Indonesian <-> Malaysian
* Xhosa <-> Zulu
* Xhosa <-> Zulu
* North Sámi <-> Lule Sámi
:: see [[North Sámi and Lule Sámi]]


==Large pairs for which we should have something==
==Large pairs for which we should have something==

Revision as of 11:06, 3 February 2009

Some ideas for expanding and promoting Apertium, like a scratchpad or something.

Ideas for papers

  • The use of lttoolbox to develop analysers for under-resourced languages (e.g. Welsh/Afrikaans ...)
  • Retrieving bilingual dictionary entries using Wikipedia interwiki links.
  • On pragmatic dealing with MWEs
  • On Spanish-French, Catalan-French
  • On apertium-2/3 transfer

Ideal pairs for development

These pairs are ideal for development due to the closeness of the languages in question, or historical connection. Some are closer than others, but all are pretty close.

European Union official languages

  • Danish <-> Swedish <-> Norwegian Bokmål <-> Norwegian Nynorsk <-> Icelandic <-> Faroese (North-Germanic dialect continuum)
Between Nynorsk and Bokmål there exists a proprietary implementation, Nynodata, some discussion here
Fran made a dictionary for Faroese: here (neither Icelandic nor Faroese are EU official)
  • Czech <-> Slovak
  • Slovenian <-> Serbo-Croatian <-> Macedonian <-> Bulgarian (South-Slavic dialect continuum)
  • Afrikaans <-> Dutch
  • Irish <-> Scots Gaelic — Kevin Scannell already has a system, but it could be Apertiumised.
see Scottish Gaelic and Irish
  • Finnish <-> Estonian (Balto-Finnic, with agglutinative morphology)
  • Romanian <-> Aromanian
  • Romanian <-> Italian
  • Italian <-> Neapolitan <-> Piedmontese <-> Friulian
  • English <-> Scots/Ulster Scots (Scots might benefit in some way like Occitan from the standardisation effort as described in Mikel's LREC paper) — the SLC may have funds.

Non-EU

  • Hindi <-> Urdu
  • Persian <-> Tajik
  • Northern Sotho <-> Sotho
  • Turkish <-> Azerbaijani <-> Turkmen <-> Tatar (Southwestern-Turkic, Oghuz dialect continuum)
  • Uyghur <-> Uzbek
  • Russian <-> Ukrainian <-> Belarusian (East-Slavic dialect continuum)
  • Dungan <-> Mandarin (not that many people speak Dungan)
  • Indonesian <-> Malaysian
  • Xhosa <-> Zulu
  • North Sámi <-> Lule Sámi
see North Sámi and Lule Sámi

Large pairs for which we should have something

These pairs are not really close, but are important languages.

  • Italian <-> French
  • Dutch <-> German
  • Italian <-> Spanish

See also