User:Francis Tyers

From Apertium
Jump to navigation Jump to search

Email me | IRC nick: spectie, spectei or spectre

There are at least two other more serious problems for endangered languages, more acute than just lack of mother-tongue transmission. There are languages whose last fluent speakers are already gone or are about to go. At a meeting at Glorieta near Santa Fe, New Mexico, a few months ago, we had actually the last living speaker of one of the languages come. It was a very sad experience for everyone, not just for that woman. And perhaps the saddest thing is that she cannot even talk to her sister anymore, who was the next-to-last speaker before she recently died. She can not call up anybody. The only person for her to talk to is a linguist and that is no fun.[1]

Translations

  • DIM EISIAU → ZERO WANT
  • idiomas de oficialidad más débil o peso demográfico más reducido → languages of officiality feebleer or demographical weight more reduced

Apertium — Machine translation for languages of officiality feebleer or reduced demographic weight.

  • We have met the enemy and it is us.
  • The idea of an Apertium MT system is quite at odds with many other NLP applications. For morphological analysers, part-of-speech taggers, etc., the idea is to model as much of the language as possible, the wider the coverage the better. An Apertium MT system on the other hand is a closed system. The idea is to analyse and generate only as much as can be translated. This can often seem counter intuitive to people who are used to working on other NLP software. They can find it frustrating that they can't just take their state-of-the-art analyser or tagger and get an equivalently good MT system. The thing to remember is that if it can't be translated, then being able to analyse it does more harm than good. It usually takes some time to grasp in fullness. Many people give up before they get it.
  • Why we try not to translate between parts of speech: We do not try to translate between parts of speech because it makes transfer more complicated. Rules match on source language patterns, and output target language patterns. For most pairs, these patterns are modelled on part of speech, or part of speech and subtags. The rules usually have a single 'out' section which outputs the target pattern. If we want to translate between parts of speech, we probably need more 'out' sections, making the rules more complicated and harder to maintain.
  • Choosing a successful pair:
    • Not in Google or can get better quality than in Google
    • High quality translation
    • Existing closed-source system available

Todo

  • Some kind of generic web verb conjugator which uses lttoolbox, to increase the valuability of having Apertium style data.
See 'apertium-verbconj'
  • Some kind of generic analyser, with human readable output — see work on faroese.
See here etc.
  • Investigate if SFST or hunmorph may be used as an analyser for more complicated language morphology, and how it may be included into an Apertium pipeline.
See HFST, Foma, SFST and hunmorph
  • A HOWTO of approaching transfer rules, similar to the "getting started with a language pair" HOWTO
  • A HOWTO of designing a language pair from scratch, including milestones. e.g. doing syntactic analysis of sentences, working out where translation problems will be, which ones can be tackled easily, which ones to leave until later etc.
  • Why is machine translation good?

Scratchpad

Humour and poetry

<bogdan> spectie: we want more spectish poetry!
<spectie> haha :D
<bogdan> spectie: I had not heard any bad non-rhyming non-sensical poetry in a long time and I miss it!
<spectie> s/bad/good
<spectie> bogdan2005, ok
<spectie> here's a variation on a popular theme:
<spectie> its called "Machine translation"
<spectie> SILENCE PLEASE
<spectie> ...
<spectie> machine translation
<spectie>   sometimes it works
<spectie> sometimes it doesn't
<spectie> ...
<zocky> machine translation / sometimes it works / manchmal he don't
Murat: kovayla bira içerim, ama sen bilmezsin. yarın gelir misin?
Murat: vedrəyle pivə içirəm, ama sen bilməzsən. yarın gələrmisən?
Murat: I drink beer with the bucket, but you don't know it. Do you come tomorrow?
Murat: a poem by msalperen
<jimregan> nah... there's some junk in the JRC parallel text I have here
<isaac> jimregan: what are you doing? trying to use retratos?
<jimregan> yep
<jimregan> my 'beginner's polish' mini-corpus is at my parents' house
<isaac> that happens usually, you never have your mini-corpus when you need it
<Garbine> qué es un ej. oc?
<spectie> Garbine, un ejemplo de occitano
<isaac> Garbine: oc == occitante
<isaac> ops :P
<spectie> occitante ??
<isaac> typo :P
<spectie> hah
<spectie> :D
<isaac> occitante sounds cool though :P
<carmentano> occitante?!?!?!
<spectie> occitant
<carmentano> occitano
<Garbine> vale, muchas gracias a todos
<carmentano> occitante no sale en la rae
<isaac> it will
<spectie> lol isaac 
<carmentano> :S
<Garbine> ahora me voy a comer, y luego haré unas pruebas
<spectie> Garbine, hasta luego
<carmentano> yo también me voy dentro de nada
<Garbine> que aproveche!
<carmentano> que tengo una clase ¿occitante?
<spectie> haha!
<spectie> isaac, que significa occitante? llena de occitano ?
<isaac> esa es la primera acepcion
<spectie> "fue una clase occitante... "
<carmentano> aquí las clases occitantes están llenas de franceses...
<spectie> :/
<isaac> si, quiere decir "excitante y llena de occitano"
<spectie> isaac, lol
<carmentano> uhm...!
<carmentano> suena bien
<isaac> <isaac> carmentano: como ha ido la clase?
<isaac> <carmentano> ha sido occitante
<spectie> XD
<carmentano> :D
<isaac> there you have a usage example
<isaac> apertium es un software occitante too
<Afal> this is rubbish spectie
<Afal> no wonder you need a welsh person to help you with this
<CIA-29> apertium: ftyers * r5542 /trunk/apertium-cy-en/apertium-cy-en.en.dix.xml: +1
<CIA-29> apertium: ftyers * r5543 /trunk/apertium-cy-en/apertium-cy-en.cy.tsx: Adding tsx file
<CIA-29> apertium: ftyers * r5544 /trunk/apertium-cy-en/ (3 files): Bla
<CIA-29> apertium: ftyers * r5545 /trunk/apertium-cy-en/apertium-cy-en.cy.tsx: Minor addition to tsx
<CIA-29> apertium: jimregan * r5546 /trunk/apertium-cy-en/ (apertium-cy-en.cy-en.dix.xml apertium-cy-en.cy.dix.xml): llawer o -> a lot of
<CIA-29> apertium: ftyers * r5547 /trunk/apertium-cy-en/apertium-cy-en.cy.tsx: Minor thing
<CIA-29> apertium: ftyers * r5548 /trunk/apertium-cy-en/ (apertium-cy-en.cy.tsx apertium-cy-en.cy.dix.xml): Minor thing
<CIA-29> apertium: ftyers * r5549 /trunk/apertium-cy-en/apertium-cy-en.cy.tsx: One more
<CIA-29> apertium: ftyers * r5550 /trunk/apertium-cy-en/apertium-cy-en.cy-en.t1x: More crud
<CIA-29> apertium: ftyers * r5551 /trunk/apertium-cy-en/cy-en.prob: New prob
<CIA-29> apertium: ftyers * r5552 /trunk/apertium-cy-en/ (apertium-cy-en.cy.dix.xml cy-en.prob): Minor thing
<CIA-29> apertium: ftyers * r5553 /trunk/apertium-cy-en/apertium-cy-en.cy-en.dix.xml: AErgaerg
<CIA-29> apertium: ftyers * r5554 /trunk/apertium-cy-en/ (3 files): RELATIVE
<CIA-29> apertium: sortiz * r5555 /trunk/apertium/apertium/apertium-header.sh: Minor fix in apertium script
<spectie> joder
<CIA-29> apertium: jimregan * r5556 /trunk/apertium-cy-en/apertium-cy-en.cy-en.dix.xml: fix unicode conversion debris
<CIA-29> apertium: garbine * r5557 /trunk/apertium-fr-es/ (3 files): New vocabulary added by Eleka
<CIA-29> apertium: ftyers * r5558 /trunk/apertium-cy-en/ (apertium-cy-en.cy-en.t1x cy-en.prob): Blergh
<CIA-29> apertium: jimregan * r5559 /trunk/apertium-cy-en/ (6 files): currency
<CIA-29> apertium: ftyers * r5560 /trunk/apertium-cy-en/ (3 files): Blerg
<CIA-29> apertium: ftyers * r5561 /trunk/apertium-cy-en/apertium-cy-en.cy.tsx: TSX
<CIA-29> apertium: ftyers * r5562 /trunk/apertium-cy-en/ (apertium-cy-en.cy-en.t1x apertium-cy-en.cy.dix.xml): Blah
<CIA-29> apertium: ftyers * r5562 /trunk/apertium-cy-en/ (apertium-cy-en.cy-en.t1x apertium-cy-en.cy.dix.xml): Blah
<spectie> my commit messages get more desperate as the day goes on
(22:01:36) murat: I have the very first mk-tr corpus in the world.
(22:01:43) murat: Hope the poverty will end!
<HannesP> haha, our 5 y.o. neighbour is fluent in both polish and swedish. he explains his ability of speaking polish as performing
  magic in his mouth transforming the speech to polish
<jacobEo> yes, already standardised tagsets
<spectie> parole tags look like:
<spectie> NC0000S
<spectie> and penn treebank tags look like:
<Unhammer> *silent scream*
<spectie> NN VBZ
<spectie> lol Unhammer 
<spectie> yes
<jacobEo> argh! No thanks. What does it mean?
<spectie> common noun, singular

Phrases

  • lexical economy → wordwise thrift
  • linguistic economy → speakwise thrift
  • morphological annotation → wordbound adornment