Difference between revisions of "User:Francis Tyers"
Jump to navigation
Jump to search
There are at least two other more serious problems for endangered languages, more acute than just lack of mother-tongue transmission. There are languages whose last fluent speakers are already gone or are about to go. At a meeting at Glorieta near Santa Fe, New Mexico, a few months ago, we had actually the last living speaker of one of the languages come. It was a very sad experience for everyone, not just for that woman. And perhaps the saddest thing is that she cannot even talk to her sister anymore, who was the next-to-last speaker before she recently died. She can not call up anybody. The only person for her to talk to is a linguist and that is no fun.[1]
(→Scratchpad: committed to memory) |
|||
Line 7: | Line 7: | ||
* DIM EISIAU → ZERO WANT |
* DIM EISIAU → ZERO WANT |
||
* idiomas de oficialidad más débil o peso demográfico más reducido → languages of officiality feebleer or demographical weight |
* idiomas de oficialidad más débil o peso demográfico más reducido → languages of officiality feebleer or demographical weight more reduced |
||
<center> |
|||
Apertium — Machine translation for languages of officiality feebleer or reduced demographic weight. |
|||
</center> |
|||
==Todo== |
==Todo== |
Revision as of 08:46, 2 April 2008
Translations
- DIM EISIAU → ZERO WANT
- idiomas de oficialidad más débil o peso demográfico más reducido → languages of officiality feebleer or demographical weight more reduced
Apertium — Machine translation for languages of officiality feebleer or reduced demographic weight.
Todo
- A script to tag a corpus of sentences/phrases and then produce sequences of tags with frequencies.
1023040 <det><n><vblex><det><n> 9004 <det><n><vblex> 400 <n><cnjcoo><n><vblex><n>
- Some kind of generic web verb conjugator which uses lttoolbox, to increase the valuability of having Apertium style data.
- Some kind of generic analyser, with human readable output — see work on faroese.
- Investigate if SFST or hunmorph may be used as an analyser for more complicated language morphology, and how it may be included into an Apertium pipeline.
- A HOWTO of approaching transfer rules, similar to the "getting started with a language pair" HOWTO
- A HOWTO of designing a language pair from scratch, including milestones. e.g. doing syntactic analysis of sentences, working out where translation problems will be, which ones can be tackled easily, which ones to leave until later etc.
- Why is machine translation good?
- Compare VISLCG with
apertium-tagger
.
Scratchpad
- http://www.services.gov.za/en-za/Home.htm — Available in 11 official languages.
- http://www-user.tu-chemnitz.de/~fri/ding/ — German-English dictionary (GPL) ~100,000 lemmata.
- http://corpora.informatik.uni-leipzig.de/download.html — corpora
- http://www.hakikatkitabevi.com/ — text available for aligning in many languages (incl. Turkish, Azerbaijani).
- http://www.harunyahya.org/ — related to above.
- http://natura.di.uminho.pt/wiki/index.cgi?NATools — NATools is a workbench for parallel corpora processing. It includes a sentence aligner and a Probabilistic Translation Dictionary extractor, a word aligner and a set of other tools to study the aligned parallel corpora.
- http://www.setimes.com/cocoon/setimes/xhtml/en_GB/homepage/default — Newspaper in all the Balkan languages, public domain.
- Emores, an Empirical MOrphological REaSoning engine for the automatic acquisition of lemmas from a word list. (lexical acquisition)
Humour and poetry
<bogdan> spectie: we want more spectish poetry! <spectie> haha :D <bogdan> spectie: I had not heard any bad non-rhyming non-sensical poetry in a long time and I miss it! <spectie> s/bad/good <spectie> bogdan2005, ok <spectie> here's a variation on a popular theme: <spectie> its called "Machine translation" <spectie> SILENCE PLEASE <spectie> ... <spectie> machine translation <spectie> sometimes it works <spectie> sometimes it doesn't <spectie> ... <zocky> machine translation / sometimes it works / manchmal he don't
Murat: kovayla bira içerim, ama sen bilmezsin. yarın gelir misin? Murat: vedrəyle pivə içirəm, ama sen bilməzsən. yarın gələrmisən? Murat: I drink beer with the bucket, but you don't know it. Do you come tomorrow? Murat: a poem by msalperen
<jimregan> nah... there's some junk in the JRC parallel text I have here <isaac> jimregan: what are you doing? trying to use retratos? <jimregan> yep <jimregan> my 'beginner's polish' mini-corpus is at my parents' house <isaac> that happens usually, you never have your mini-corpus when you need it
Phrases
- lexical economy → wordwise thrift
- linguistic economy → speakwise thrift