Bringing the problem of definition into perspective.

Email me | IRC nick: spectie, spectei or spectre

There are at least two other more serious problems for endangered languages, more acute than just lack of mother-tongue transmission. There are languages whose last fluent speakers are already gone or are about to go. At a meeting at Glorieta near Santa Fe, New Mexico, a few months ago, we had actually the last living speaker of one of the languages come. It was a very sad experience for everyone, not just for that woman. And perhaps the saddest thing is that she cannot even talk to her sister anymore, who was the next-to-last speaker before she recently died. She can not call up anybody. The only person for her to talk to is a linguist and that is no fun.[1]

  • "Если человек не понимает слово, это не проблема перевода - это проблема человека." - Варвара
  • idiomas de oficialidad más débil o peso demográfico más reducido → languages of officiality feebleer or demographical weight more reduced

Apertium — Machine translation for languages of officiality feebleer or reduced demographic weight.

  • We have met the enemy and it is us.
  • "I am a fundamentalist, I use MT 100% of the time" -- Maria Machado, EU DGT.
  • The idea of an Apertium MT system is quite at odds with many other NLP applications. For morphological analysers, part-of-speech taggers, etc., the idea is to model as much of the language as possible, the wider the coverage the better. An Apertium MT system on the other hand is a closed system. The idea is to analyse and generate only as much as can be translated. This can often seem counter intuitive to people who are used to working on other NLP software. They can find it frustrating that they can't just take their state-of-the-art analyser or tagger and get an equivalently good MT system. The thing to remember is that if it can't be translated, then being able to analyse it does more harm than good. It usually takes some time to grasp in fullness. Many people give up before they get it.
  • Why we try not to translate between parts of speech: We do not try to translate between parts of speech because it makes transfer more complicated. Rules match on source language patterns, and output target language patterns. For most pairs, these patterns are modelled on part of speech, or part of speech and subtags. The rules usually have a single 'out' section which outputs the target pattern. If we want to translate between parts of speech, we probably need more 'out' sections, making the rules more complicated and harder to maintain.
  • Choosing a successful pair:
    • Not in Google or can get better quality than in Google
    • High quality translation
    • Existing closed-source system available
  • Mus uni non fidit antro.
  • ein burde ikkje sove
  • når natta fell på
  • ein burde sjå på stjernene
  • ein burde vere to.

  • Some kind of generic web verb conjugator which uses lttoolbox, to increase the valuability of having Apertium style data.
See 'apertium-verbconj'
  • Some kind of generic analyser, with human readable output — see work on faroese.
See here etc.
  • Investigate if SFST or hunmorph may be used as an analyser for more complicated language morphology, and how it may be included into an Apertium pipeline.
See HFST, Foma, SFST and hunmorph
  • A HOWTO of approaching transfer rules, similar to the "getting started with a language pair" HOWTO
  • A HOWTO of designing a language pair from scratch, including milestones. e.g. doing syntactic analysis of sentences, working out where translation problems will be, which ones can be tackled easily, which ones to leave until later etc.
  • Why is machine translation good?
  • /An MT system in one thousand steps
EU project

"Bridging the gap: Machine translation into morphologically complex languages"

{en,de,fr} -> {fi,et,hu,tr,eu}

<bogdan> spectie: we want more spectish poetry!
<spectie> haha :D
<bogdan> spectie: I had not heard any bad non-rhyming non-sensical poetry in a long time and I miss it!
<spectie> s/bad/good
<spectie> bogdan2005, ok
<spectie> here's a variation on a popular theme:
<spectie> its called "Machine translation"
<spectie> ...
<spectie> machine translation
<spectie>   sometimes it works
<spectie> sometimes it doesn't
<spectie> ...
<zocky> machine translation / sometimes it works / manchmal he don't
Murat: kovayla bira içerim, ama sen bilmezsin. yarın gelir misin?
Murat: vedrəyle pivə içirəm, ama sen bilməzsən. yarın gələrmisən?
Murat: I drink beer with the bucket, but you don't know it. Do you come tomorrow?
Murat: a poem by msalperen
<isaac> apertium es un software occitante too
<spectei> intransitive verbs usually don't have a present participle
<Unhammer> ah
<spectei> note also: the prefix for the pp depends on the stem, not on the paradigm
  • lexical economy → wordwise thrift
  • linguistic economy → speakwise thrift
  • morphological annotation → wordbound adornment
  • language exchange → speechshare
  • homonymy → samenameness
  • polysemy → manymeaningness
  • birthplace → birthstead
  • prediction → forsaying
  • predict → forsay

Substantiv är namn på ting, 
till exempel boll och ring 
Verb är sådant man kan göra, 
som att hoppa, se och höra 
Adjektiven sen oss lär, 
hurudana tingen är

