User:Francis Tyers
Translations
- DIM EISIAU → ZERO WANT
Todo
- A script to tag a corpus of sentences/phrases and then produce sequences of tags with frequencies.
1023040 <det><n><vblex><det><n> 9004 <det><n><vblex> 400 <n><cnjcoo><n><vblex><n>
- Some kind of generic web verb conjugator which uses lttoolbox, to increase the valuability of having Apertium style data.
- Some kind of generic analyser, with human readable output — see work on faroese.
- Investigate if SFST or hunmorph may be used as an analyser for more complicated language morphology, and how it may be included into an Apertium pipeline.
- A HOWTO of approaching transfer rules, similar to the "getting started with a language pair" HOWTO
- A HOWTO of designing a language pair from scratch, including milestones. e.g. doing syntactic analysis of sentences, working out where translation problems will be, which ones can be tackled easily, which ones to leave until later etc.
- Why is machine translation good?
- Compare VISLCG with
apertium-tagger
.
Scratchpad
- http://mokk.bme.hu/resources/hunalign — GPL text aligner.
- http://www.services.gov.za/en-za/Home.htm — Available in 11 official languages.
- http://www-user.tu-chemnitz.de/~fri/ding/ — German-English dictionary (GPL) ~100,000 lemmata.
- http://corpora.informatik.uni-leipzig.de/download.html — corpora
- http://www.hakikatkitabevi.com/ — text available for aligning in many languages (incl. Turkish, Azerbaijani).
- http://www.harunyahya.org/ — related to above.
- http://natura.di.uminho.pt/wiki/index.cgi?NATools — NATools is a workbench for parallel corpora processing. It includes a sentence aligner and a Probabilistic Translation Dictionary extractor, a word aligner and a set of other tools to study the aligned parallel corpora.
- http://www.setimes.com/cocoon/setimes/xhtml/en_GB/homepage/default — Newspaper in all the Balkan languages, public domain.
- Emores, an Empirical MOrphological REaSoning engine for the automatic acquisition of lemmas from a word list. (lexical acquisition)
- http://mokk.bme.hu/resources/hunmorph — hunmorph, a morphological analyser for Hungarian (and other agglutinative languages).
Humour and poetry
<bogdan> spectie: we want more spectish poetry! <spectie> haha :D <bogdan> spectie: I had not heard any bad non-rhyming non-sensical poetry in a long time and I miss it! <spectie> s/bad/good <spectie> bogdan2005, ok <spectie> here's a variation on a popular theme: <spectie> its called "Machine translation" <spectie> SILENCE PLEASE <spectie> ... <spectie> machine translation <spectie> sometimes it works <spectie> sometimes it doesn't <spectie> ... <zocky> machine translation / sometimes it works / manchmal he don't
Murat: kovayla bira içerim, ama sen bilmezsin. yarın gelir misin? Murat: vedrəyle pivə içirəm, ama sen bilməzsən. yarın gələrmisən? Murat: I drink beer with the bucket, but you don't know it. Do you come tomorrow? Murat: a poem by msalperen
<jimregan> nah... there's some junk in the JRC parallel text I have here <isaac> jimregan: what are you doing? trying to use retratos? <jimregan> yep <jimregan> my 'beginner's polish' mini-corpus is at my parents' house <isaac> that happens usually, you never have your mini-corpus when you need it
Phrases
- lexical economy → wordwise thrift
- linguistic economy → speakwise thrift