Difference between revisions of "User:Francis Tyers"
Jump to navigation
Jump to search
There are at least two other more serious problems for endangered languages, more acute than just lack of mother-tongue transmission. There are languages whose last fluent speakers are already gone or are about to go. At a meeting at Glorieta near Santa Fe, New Mexico, a few months ago, we had actually the last living speaker of one of the languages come. It was a very sad experience for everyone, not just for that woman. And perhaps the saddest thing is that she cannot even talk to her sister anymore, who was the next-to-last speaker before she recently died. She can not call up anybody. The only person for her to talk to is a linguist and that is no fun.[1]
Line 32: | Line 32: | ||
* http://www.services.gov.za/en-za/Home.htm — Available in 11 official languages. |
* http://www.services.gov.za/en-za/Home.htm — Available in 11 official languages. |
||
* http://www-user.tu-chemnitz.de/~fri/ding/ — German-English dictionary (GPL) ~100,000 lemmata. |
* http://www-user.tu-chemnitz.de/~fri/ding/ — German-English dictionary (GPL) ~100,000 lemmata. |
||
* http://wt.jrc.it/lt/Acquis/ — Multilingual Euro-corpus (Public domain) 20+ languages. |
|||
* http://corpora.informatik.uni-leipzig.de/download.html — corpora |
* http://corpora.informatik.uni-leipzig.de/download.html — corpora |
||
* http://www.hakikatkitabevi.com/ — text available for aligning in many languages (incl. Turkish, Azerbaijani). |
* http://www.hakikatkitabevi.com/ — text available for aligning in many languages (incl. Turkish, Azerbaijani). |
Revision as of 11:30, 22 January 2008
Translations
- DIM EISIAU → ZERO WANT
Todo
- A script to tag a corpus of sentences/phrases and then produce sequences of tags with frequencies.
1023040 <det><n><vblex><det><n> 9004 <det><n><vblex> 400 <n><cnjcoo><n><vblex><n>
- Some kind of generic web verb conjugator which uses lttoolbox, to increase the valuability of having Apertium style data.
- Some kind of generic analyser, with human readable output — see work on faroese.
- Investigate if SFST or hunmorph may be used as an analyser for more complicated language morphology, and how it may be included into an Apertium pipeline.
- A HOWTO of approaching transfer rules, similar to the "getting started with a language pair" HOWTO
- A HOWTO of designing a language pair from scratch, including milestones. e.g. doing syntactic analysis of sentences, working out where translation problems will be, which ones can be tackled easily, which ones to leave until later etc.
- Why is machine translation good?
- Compare VISLCG with
apertium-tagger
.
Scratchpad
- http://mokk.bme.hu/resources/hunalign — GPL text aligner.
- http://www.services.gov.za/en-za/Home.htm — Available in 11 official languages.
- http://www-user.tu-chemnitz.de/~fri/ding/ — German-English dictionary (GPL) ~100,000 lemmata.
- http://corpora.informatik.uni-leipzig.de/download.html — corpora
- http://www.hakikatkitabevi.com/ — text available for aligning in many languages (incl. Turkish, Azerbaijani).
- http://www.harunyahya.org/ — related to above.
- http://natura.di.uminho.pt/wiki/index.cgi?NATools — NATools is a workbench for parallel corpora processing. It includes a sentence aligner and a Probabilistic Translation Dictionary extractor, a word aligner and a set of other tools to study the aligned parallel corpora.
- http://www.setimes.com/cocoon/setimes/xhtml/en_GB/homepage/default — Newspaper in all the Balkan languages, public domain.
- Emores, an Empirical MOrphological REaSoning engine for the automatic acquisition of lemmas from a word list. (lexical acquisition)
- http://mokk.bme.hu/resources/hunmorph — hunmorph, a morphological analyser for Hungarian (and other agglutinative languages).
Humour and poetry
<bogdan> spectie: we want more spectish poetry! <spectie> haha :D <bogdan> spectie: I had not heard any bad non-rhyming non-sensical poetry in a long time and I miss it! <spectie> s/bad/good <spectie> bogdan2005, ok <spectie> here's a variation on a popular theme: <spectie> its called "Machine translation" <spectie> SILENCE PLEASE <spectie> ... <spectie> machine translation <spectie> sometimes it works <spectie> sometimes it doesn't <spectie> ... <zocky> machine translation / sometimes it works / manchmal he don't
Murat: kovayla bira içerim, ama sen bilmezsin. yarın gelir misin? Murat: vedrəyle pivə içirəm, ama sen bilməzsən. yarın gələrmisən? Murat: I drink beer with the bucket, but you don't know it. Do you come tomorrow? Murat: a poem by msalperen
Phrases
- lexical economy → wordwise thrift
- linguistic economy → speakwise thrift