User:Francis Tyers
Revision as of 12:40, 29 March 2009 by Jacob Nordfalk (talk | contribs)
IRC nick: spectie or spectre
Translations
- DIM EISIAU → ZERO WANT
- idiomas de oficialidad más débil o peso demográfico más reducido → languages of officiality feebleer or demographical weight more reduced
Apertium — Machine translation for languages of officiality feebleer or reduced demographic weight.
Todo
- A script to tag a corpus of sentences/phrases and then produce sequences of tags with frequencies.
1023040 <det><n><vblex><det><n> 9004 <det><n><vblex> 400 <n><cnjcoo><n><vblex><n>
- Some kind of generic web verb conjugator which uses lttoolbox, to increase the valuability of having Apertium style data.
Some kind of generic analyser, with human readable output — see work on faroese.
- See here etc.
Investigate if SFST or hunmorph may be used as an analyser for more complicated language morphology, and how it may be included into an Apertium pipeline.
- A HOWTO of approaching transfer rules, similar to the "getting started with a language pair" HOWTO
- A HOWTO of designing a language pair from scratch, including milestones. e.g. doing syntactic analysis of sentences, working out where translation problems will be, which ones can be tackled easily, which ones to leave until later etc.
- Why is machine translation good?
Scratchpad
- http://www.services.gov.za/en-za/Home.htm — Available in 11 official languages.
- http://www-user.tu-chemnitz.de/~fri/ding/ — German-English dictionary (GPL) ~100,000 lemmata.
- http://corpora.informatik.uni-leipzig.de/download.html — corpora
- http://www.hakikatkitabevi.com/ — text available for aligning in many languages (incl. Turkish, Azerbaijani).
- http://www.harunyahya.org/ — related to above.
- http://natura.di.uminho.pt/wiki/index.cgi?NATools — NATools is a workbench for parallel corpora processing. It includes a sentence aligner and a Probabilistic Translation Dictionary extractor, a word aligner and a set of other tools to study the aligned parallel corpora.
- http://www.setimes.com/cocoon/setimes/xhtml/en_GB/homepage/default — Newspaper in all the Balkan languages, public domain.
- Emores, an Empirical MOrphological REaSoning engine for the automatic acquisition of lemmas from a word list. (lexical acquisition)
Humour and poetry
<bogdan> spectie: we want more spectish poetry! <spectie> haha :D <bogdan> spectie: I had not heard any bad non-rhyming non-sensical poetry in a long time and I miss it! <spectie> s/bad/good <spectie> bogdan2005, ok <spectie> here's a variation on a popular theme: <spectie> its called "Machine translation" <spectie> SILENCE PLEASE <spectie> ... <spectie> machine translation <spectie> sometimes it works <spectie> sometimes it doesn't <spectie> ... <zocky> machine translation / sometimes it works / manchmal he don't
Murat: kovayla bira içerim, ama sen bilmezsin. yarın gelir misin? Murat: vedrəyle pivə içirəm, ama sen bilməzsən. yarın gələrmisən? Murat: I drink beer with the bucket, but you don't know it. Do you come tomorrow? Murat: a poem by msalperen
<jimregan> nah... there's some junk in the JRC parallel text I have here <isaac> jimregan: what are you doing? trying to use retratos? <jimregan> yep <jimregan> my 'beginner's polish' mini-corpus is at my parents' house <isaac> that happens usually, you never have your mini-corpus when you need it
<Garbine> qué es un ej. oc? <spectie> Garbine, un ejemplo de occitano <isaac> Garbine: oc == occitante <isaac> ops :P <spectie> occitante ?? <isaac> typo :P <spectie> hah <spectie> :D <isaac> occitante sounds cool though :P <carmentano> occitante?!?!?! <spectie> occitant <carmentano> occitano <Garbine> vale, muchas gracias a todos <carmentano> occitante no sale en la rae <isaac> it will <spectie> lol isaac <carmentano> :S <Garbine> ahora me voy a comer, y luego haré unas pruebas <spectie> Garbine, hasta luego <carmentano> yo también me voy dentro de nada <Garbine> que aproveche! <carmentano> que tengo una clase ¿occitante? <spectie> haha! <spectie> isaac, que significa occitante? llena de occitano ? <isaac> esa es la primera acepcion <spectie> "fue una clase occitante... " <carmentano> aquí las clases occitantes están llenas de franceses... <spectie> :/ <isaac> si, quiere decir "excitante y llena de occitano" <spectie> isaac, lol <carmentano> uhm...! <carmentano> suena bien <isaac> <isaac> carmentano: como ha ido la clase? <isaac> <carmentano> ha sido occitante <spectie> XD <carmentano> :D <isaac> there you have a usage example <isaac> apertium es un software occitante too
<Afal> this is rubbish spectie <Afal> no wonder you need a welsh person to help you with this
<CIA-29> apertium: ftyers * r5542 /trunk/apertium-cy-en/apertium-cy-en.en.dix.xml: +1 <CIA-29> apertium: ftyers * r5543 /trunk/apertium-cy-en/apertium-cy-en.cy.tsx: Adding tsx file <CIA-29> apertium: ftyers * r5544 /trunk/apertium-cy-en/ (3 files): Bla <CIA-29> apertium: ftyers * r5545 /trunk/apertium-cy-en/apertium-cy-en.cy.tsx: Minor addition to tsx <CIA-29> apertium: jimregan * r5546 /trunk/apertium-cy-en/ (apertium-cy-en.cy-en.dix.xml apertium-cy-en.cy.dix.xml): llawer o -> a lot of <CIA-29> apertium: ftyers * r5547 /trunk/apertium-cy-en/apertium-cy-en.cy.tsx: Minor thing <CIA-29> apertium: ftyers * r5548 /trunk/apertium-cy-en/ (apertium-cy-en.cy.tsx apertium-cy-en.cy.dix.xml): Minor thing <CIA-29> apertium: ftyers * r5549 /trunk/apertium-cy-en/apertium-cy-en.cy.tsx: One more <CIA-29> apertium: ftyers * r5550 /trunk/apertium-cy-en/apertium-cy-en.cy-en.t1x: More crud <CIA-29> apertium: ftyers * r5551 /trunk/apertium-cy-en/cy-en.prob: New prob <CIA-29> apertium: ftyers * r5552 /trunk/apertium-cy-en/ (apertium-cy-en.cy.dix.xml cy-en.prob): Minor thing <CIA-29> apertium: ftyers * r5553 /trunk/apertium-cy-en/apertium-cy-en.cy-en.dix.xml: AErgaerg <CIA-29> apertium: ftyers * r5554 /trunk/apertium-cy-en/ (3 files): RELATIVE <CIA-29> apertium: sortiz * r5555 /trunk/apertium/apertium/apertium-header.sh: Minor fix in apertium script <spectie> joder <CIA-29> apertium: jimregan * r5556 /trunk/apertium-cy-en/apertium-cy-en.cy-en.dix.xml: fix unicode conversion debris <CIA-29> apertium: garbine * r5557 /trunk/apertium-fr-es/ (3 files): New vocabulary added by Eleka <CIA-29> apertium: ftyers * r5558 /trunk/apertium-cy-en/ (apertium-cy-en.cy-en.t1x cy-en.prob): Blergh <CIA-29> apertium: jimregan * r5559 /trunk/apertium-cy-en/ (6 files): currency <CIA-29> apertium: ftyers * r5560 /trunk/apertium-cy-en/ (3 files): Blerg <CIA-29> apertium: ftyers * r5561 /trunk/apertium-cy-en/apertium-cy-en.cy.tsx: TSX <CIA-29> apertium: ftyers * r5562 /trunk/apertium-cy-en/ (apertium-cy-en.cy-en.t1x apertium-cy-en.cy.dix.xml): Blah <CIA-29> apertium: ftyers * r5562 /trunk/apertium-cy-en/ (apertium-cy-en.cy-en.t1x apertium-cy-en.cy.dix.xml): Blah <spectie> my commit messages get more desperate as the day goes on
Phrases
- lexical economy → wordwise thrift
- linguistic economy → speakwise thrift