User:Francis Tyers
Stuff
- DIM EISIAU → ZERO WANT
- idiomas de oficialidad más débil o peso demográfico más reducido → languages of officiality feebleer or demographical weight more reduced
Apertium — Machine translation for languages of officiality feebleer or reduced demographic weight.
- We have met the enemy and it is us.
- "I am a fundamentalist, I use MT 100% of the time" -- Maria Machado, EU DGT.
- The idea of an Apertium MT system is quite at odds with many other NLP applications. For morphological analysers, part-of-speech taggers, etc., the idea is to model as much of the language as possible, the wider the coverage the better. An Apertium MT system on the other hand is a closed system. The idea is to analyse and generate only as much as can be translated. This can often seem counter intuitive to people who are used to working on other NLP software. They can find it frustrating that they can't just take their state-of-the-art analyser or tagger and get an equivalently good MT system. The thing to remember is that if it can't be translated, then being able to analyse it does more harm than good. It usually takes some time to grasp in fullness. Many people give up before they get it.
- Why we try not to translate between parts of speech: We do not try to translate between parts of speech because it makes transfer more complicated. Rules match on source language patterns, and output target language patterns. For most pairs, these patterns are modelled on part of speech, or part of speech and subtags. The rules usually have a single 'out' section which outputs the target pattern. If we want to translate between parts of speech, we probably need more 'out' sections, making the rules more complicated and harder to maintain.
- Choosing a successful pair:
- Not in Google or can get better quality than in Google
- High quality translation
- Existing closed-source system available
Todo
Some kind of generic web verb conjugator which uses lttoolbox, to increase the valuability of having Apertium style data.
- See 'apertium-verbconj'
Some kind of generic analyser, with human readable output — see work on faroese.
- See here etc.
Investigate if SFST or hunmorph may be used as an analyser for more complicated language morphology, and how it may be included into an Apertium pipeline.
- A HOWTO of approaching transfer rules, similar to the "getting started with a language pair" HOWTO
- A HOWTO of designing a language pair from scratch, including milestones. e.g. doing syntactic analysis of sentences, working out where translation problems will be, which ones can be tackled easily, which ones to leave until later etc.
- Why is machine translation good?
- EU project
"Bridging the gap: Machine translation into morphologically complex languages"
{en,de,fr} -> {fi,et,hu,tr,eu}
Scratchpad
- http://www.services.gov.za/en-za/Home.htm — Available in 11 official languages.
- http://www-user.tu-chemnitz.de/~fri/ding/ — German-English dictionary (GPL) ~100,000 lemmata.
- http://www.hakikatkitabevi.com/ — text available for aligning in many languages (incl. Turkish, Azerbaijani).
- http://www.harunyahya.org/ — related to above.
- Emores, an Empirical MOrphological REaSoning engine for the automatic acquisition of lemmas from a word list. (lexical acquisition)
Humour and poetry
<bogdan> spectie: we want more spectish poetry! <spectie> haha :D <bogdan> spectie: I had not heard any bad non-rhyming non-sensical poetry in a long time and I miss it! <spectie> s/bad/good <spectie> bogdan2005, ok <spectie> here's a variation on a popular theme: <spectie> its called "Machine translation" <spectie> SILENCE PLEASE <spectie> ... <spectie> machine translation <spectie> sometimes it works <spectie> sometimes it doesn't <spectie> ... <zocky> machine translation / sometimes it works / manchmal he don't
Murat: kovayla bira içerim, ama sen bilmezsin. yarın gelir misin? Murat: vedrəyle pivə içirəm, ama sen bilməzsən. yarın gələrmisən? Murat: I drink beer with the bucket, but you don't know it. Do you come tomorrow? Murat: a poem by msalperen
<jimregan> nah... there's some junk in the JRC parallel text I have here <isaac> jimregan: what are you doing? trying to use retratos? <jimregan> yep <jimregan> my 'beginner's polish' mini-corpus is at my parents' house <isaac> that happens usually, you never have your mini-corpus when you need it
<Garbine> qué es un ej. oc? <spectie> Garbine, un ejemplo de occitano <isaac> Garbine: oc == occitante <isaac> ops :P <spectie> occitante ?? <isaac> typo :P <spectie> hah <spectie> :D <isaac> occitante sounds cool though :P <carmentano> occitante?!?!?! <spectie> occitant <carmentano> occitano <Garbine> vale, muchas gracias a todos <carmentano> occitante no sale en la rae <isaac> it will <spectie> lol isaac <carmentano> :S <Garbine> ahora me voy a comer, y luego haré unas pruebas <spectie> Garbine, hasta luego <carmentano> yo también me voy dentro de nada <Garbine> que aproveche! <carmentano> que tengo una clase ¿occitante? <spectie> haha! <spectie> isaac, que significa occitante? llena de occitano ? <isaac> esa es la primera acepcion <spectie> "fue una clase occitante... " <carmentano> aquí las clases occitantes están llenas de franceses... <spectie> :/ <isaac> si, quiere decir "excitante y llena de occitano" <spectie> isaac, lol <carmentano> uhm...! <carmentano> suena bien <isaac> <isaac> carmentano: como ha ido la clase? <isaac> <carmentano> ha sido occitante <spectie> XD <carmentano> :D <isaac> there you have a usage example <isaac> apertium es un software occitante too
<Afal> this is rubbish spectie <Afal> no wonder you need a welsh person to help you with this
<CIA-29> apertium: ftyers * r5542 /trunk/apertium-cy-en/apertium-cy-en.en.dix.xml: +1 <CIA-29> apertium: ftyers * r5543 /trunk/apertium-cy-en/apertium-cy-en.cy.tsx: Adding tsx file <CIA-29> apertium: ftyers * r5544 /trunk/apertium-cy-en/ (3 files): Bla <CIA-29> apertium: ftyers * r5545 /trunk/apertium-cy-en/apertium-cy-en.cy.tsx: Minor addition to tsx <CIA-29> apertium: jimregan * r5546 /trunk/apertium-cy-en/ (apertium-cy-en.cy-en.dix.xml apertium-cy-en.cy.dix.xml): llawer o -> a lot of <CIA-29> apertium: ftyers * r5547 /trunk/apertium-cy-en/apertium-cy-en.cy.tsx: Minor thing <CIA-29> apertium: ftyers * r5548 /trunk/apertium-cy-en/ (apertium-cy-en.cy.tsx apertium-cy-en.cy.dix.xml): Minor thing <CIA-29> apertium: ftyers * r5549 /trunk/apertium-cy-en/apertium-cy-en.cy.tsx: One more <CIA-29> apertium: ftyers * r5550 /trunk/apertium-cy-en/apertium-cy-en.cy-en.t1x: More crud <CIA-29> apertium: ftyers * r5551 /trunk/apertium-cy-en/cy-en.prob: New prob <CIA-29> apertium: ftyers * r5552 /trunk/apertium-cy-en/ (apertium-cy-en.cy.dix.xml cy-en.prob): Minor thing <CIA-29> apertium: ftyers * r5553 /trunk/apertium-cy-en/apertium-cy-en.cy-en.dix.xml: AErgaerg <CIA-29> apertium: ftyers * r5554 /trunk/apertium-cy-en/ (3 files): RELATIVE <CIA-29> apertium: sortiz * r5555 /trunk/apertium/apertium/apertium-header.sh: Minor fix in apertium script <spectie> joder <CIA-29> apertium: jimregan * r5556 /trunk/apertium-cy-en/apertium-cy-en.cy-en.dix.xml: fix unicode conversion debris <CIA-29> apertium: garbine * r5557 /trunk/apertium-fr-es/ (3 files): New vocabulary added by Eleka <CIA-29> apertium: ftyers * r5558 /trunk/apertium-cy-en/ (apertium-cy-en.cy-en.t1x cy-en.prob): Blergh <CIA-29> apertium: jimregan * r5559 /trunk/apertium-cy-en/ (6 files): currency <CIA-29> apertium: ftyers * r5560 /trunk/apertium-cy-en/ (3 files): Blerg <CIA-29> apertium: ftyers * r5561 /trunk/apertium-cy-en/apertium-cy-en.cy.tsx: TSX <CIA-29> apertium: ftyers * r5562 /trunk/apertium-cy-en/ (apertium-cy-en.cy-en.t1x apertium-cy-en.cy.dix.xml): Blah <CIA-29> apertium: ftyers * r5562 /trunk/apertium-cy-en/ (apertium-cy-en.cy-en.t1x apertium-cy-en.cy.dix.xml): Blah <spectie> my commit messages get more desperate as the day goes on
(22:01:36) murat: I have the very first mk-tr corpus in the world. (22:01:43) murat: Hope the poverty will end!
<HannesP> haha, our 5 y.o. neighbour is fluent in both polish and swedish. he explains his ability of speaking polish as performing magic in his mouth transforming the speech to polish
<jacobEo> yes, already standardised tagsets <spectie> parole tags look like: <spectie> NC0000S <spectie> and penn treebank tags look like: <Unhammer> *silent scream* <spectie> NN VBZ <spectie> lol Unhammer <spectie> yes <jacobEo> argh! No thanks. What does it mean? <spectie> common noun, singular <spectei> heh, the word for monday is as in russian <spectei> interestingly <spectei> in komi <spectei> all the days of the week are from russian <spectei> except monday <firespeaker> really? <spectei> yep <firespeaker> what's Monday? <spectei> 'sec <spectei> i have it written down on a piece of paper <firespeaker> spectei: in case of devastating EM bursts? <spectei> firespeaker, :D
Phrases
- lexical economy → wordwise thrift
- linguistic economy → speakwise thrift
- morphological annotation → wordbound adornment
- language exchange → speechshare
- homonymy → samenameness
- polysemy → manymeaningness
- birthplace → birthstead