Difference between revisions of "Promotion HQ"
Jump to navigation
Jump to search
Line 21: | Line 21: | ||
* Czech <-> Slovak |
* Czech <-> Slovak |
||
* Slovenian <-> Serbo-Croatian <-> Macedonian <-> Bulgarian (South-Slavic dialect continuum) |
* Slovenian <-> Serbo-Croatian <-> Macedonian <-> Bulgarian (South-Slavic dialect continuum) |
||
:: see [[Macedonian and Bulgarian]] |
|||
* Afrikaans <-> Dutch |
* Afrikaans <-> Dutch |
||
* Irish <-> Scots Gaelic — Kevin Scannell already has a system, but it could be Apertiumised. |
* Irish <-> Scots Gaelic — Kevin Scannell already has a system, but it could be Apertiumised. |
Revision as of 22:47, 3 September 2010
Some ideas for expanding and promoting Apertium, like a scratchpad or something.
Ideas for papers
- The use of lttoolbox to develop analysers for under-resourced languages (e.g. Welsh/Afrikaans ...)
Retrieving bilingual dictionary entries using Wikipedia interwiki links.- On pragmatic dealing with MWEs
- On Spanish-French, Catalan-French
- On apertium-2/3 transfer
Ideal pairs for development
These pairs are ideal for development due to the closeness of the languages in question, or historical connection. Some are closer than others, but all are pretty close.
European Union official languages
- Danish <-> Swedish <-> Norwegian Bokmål <-> Norwegian Nynorsk <-> Icelandic <-> Faroese (North-Germanic dialect continuum)
- Czech <-> Slovak
- Slovenian <-> Serbo-Croatian <-> Macedonian <-> Bulgarian (South-Slavic dialect continuum)
- Afrikaans <-> Dutch
- Irish <-> Scots Gaelic — Kevin Scannell already has a system, but it could be Apertiumised.
- Finnish <-> Estonian (Balto-Finnic, with agglutinative morphology)
- Romanian <-> Aromanian
- Romanian <-> Italian
- Italian <-> Neapolitan <-> Piedmontese <-> Friulian
- English <-> Scots/Ulster Scots (Scots might benefit in some way like Occitan from the standardisation effort as described in Mikel's LREC paper) — the SLC may have funds.
Non-EU
- Hindi <-> Urdu
- Persian <-> Tajik
- Northern Sotho <-> Sotho
- Turkish <-> Azerbaijani <-> Turkmen <-> Tatar (Southwestern-Turkic, Oghuz dialect continuum)
- Uyghur <-> Uzbek
- Russian <-> Ukrainian <-> Belarusian (East-Slavic dialect continuum)
- Dungan <-> Mandarin (not that many people speak Dungan)
- Indonesian <-> Malaysian
- Xhosa <-> Zulu
- North Sámi <-> Lule Sámi
Large pairs for which we should have something
These pairs are not really close, but are important languages.
- Italian <-> French
- Dutch <-> German
- Italian <-> Spanish