Difference between revisions of "Promotion HQ"

Revision as of 09:26, 22 January 2008

Ideas for papers

The use of lttoolbox to develop analysers for under-resourced languages (e.g. Welsh/Afrikaans ...)
Open-source Afrikaans-English machine translation
Longest-match left-to-right compound splitting in the context of Afrikaans-English machine translation.
Retrieving bilingual dictionary entries using Wikipedia interwiki links.
On pragmatic dealing with MWEs
On Spanish-French, Catalan-French
On apertium-2/3 transfer

Ideal pairs for development

These pairs are ideal for development due to the closeness of the languages in question, or historical connection. Some are closer than others, but all are pretty close.

European Union official languages

Danish <-> Swedish <-> Norwegian Bokmål <-> Norwegian Nynorsk <-> Icelandic <-> Faroese (North-Germanic dialect continuum)

Between Nynorsk and Bokmål there exists a proprietary implementation, Nynodata, some discussion here

Fran made a dictionary for Faroese: here (neither Icelandic nor Faroese are EU official)

Czech <-> Slovak
Slovenian <-> Serbo-Croatian <-> Macedonian <-> Bulgarian (South-Slavic dialect continuum)
Afrikaans <-> Dutch
Irish <-> Scots Gaelic — Kevin Scannell already has a system, but it could be Apertiumised.

The data used in Kevin Scannell's system is available, ask Francis Tyers

Finnish <-> Estonian (Balto-Finnic, with agglutinative morphology)
Romanian <-> Aromanian
Romanian <-> Italian
Italian <-> Neapolitan <-> Piedmontese <-> Friulian
English <-> Scots/Ulster Scots (Scots might benefit in some way like Occitan from the standardisation effort as described in Mikel's LREC paper) — the SLC may have funds.

Non-EU

Hindi <-> Urdu
Persian <-> Tajik
Northern Sotho <-> Sotho
Turkish <-> Azerbaijani <-> Turkmen <-> Tatar (Southwestern-Turkic, Oghuz dialect continuum)
Uyghur <-> Uzbek
Russian <-> Ukrainian <-> Belarusian (East-Slavic dialect continuum)
Dungan <-> Mandarin (not that many people speak Dungan)
Indonesian <-> Malaysian
Xhosa <-> Zulu

Large pairs for which we should have something

These pairs are not really close, but are important languages.

Italian <-> French
Dutch <-> German
Italian <-> Spanish
English <-> Spanish

@@ Line 24: / Line 24: @@
 * Afrikaans <-> Dutch
 * Irish <-> Scots Gaelic &mdash; Kevin Scannell already has a system, but it could be Apertiumised.
+::The data used in Kevin Scannell's system is available, ask [[User:Francis Tyers|Francis Tyers]]
 * Finnish <-> Estonian (Balto-Finnic, with [[agglutinative morphology]])
 * Romanian <-> Aromanian
 * Romanian <-> Italian
 * Italian <-> Neapolitan <-> Piedmontese <-> Friulian
-* English <-> Scots (Scots might benefit in some way like Occitan from the standardisation effort as described in Mikel's LREC paper) &mdash; the [http://www.scotslanguage.com/Scots_Language_Centre/ SLC] may have funds.
+* English <-> Scots/Ulster Scots (Scots might benefit in some way like Occitan from the standardisation effort as described in Mikel's LREC paper) &mdash; the [http://www.scotslanguage.com/Scots_Language_Centre/ SLC] may have funds.
 ===Non-EU===

Difference between revisions of "Promotion HQ"

Revision as of 09:26, 22 January 2008

Contents

Ideas for papers

Ideal pairs for development

European Union official languages

Non-EU

Large pairs for which we should have something

See also

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools