Difference between revisions of "User:Francis Tyers"

From Apertium
Jump to navigation Jump to search
 
(39 intermediate revisions by 2 users not shown)
Line 1: Line 1:
<center>
<center>
[[File:Duck-Rabbit illusion.jpg|thumb|200px|Bringing the problem of definition into perspective.]]
[[Special:Emailuser/Francis_Tyers|Email me]] | [[IRC]] nick: spectie, spectei or spectre
<br/>
[[Special:Emailuser/Francis_Tyers|Email me]] | [[IRC]] nick: spectie, spectei or spectre | link_id: ftyers
</center>
</center>
<center><big>''There are at least two other more serious problems for endangered languages, more acute than just lack of mother-tongue transmission. There are languages whose last fluent speakers are already gone or are about to go. At a meeting at Glorieta near Santa Fe, New Mexico, a few months ago, we had actually the last living speaker of one of the languages come. It was a very sad experience for everyone, not just for that woman. And perhaps the saddest thing is that she cannot even talk to her sister anymore, who was the next-to-last speaker before she recently died. She can not call up anybody. '''The only person for her to talk to is a linguist and that is no fun.'''''[http://www.ncela.gwu.edu/pubs/stabilize/conclusion.htm]
<center><big>''There are at least two other more serious problems for endangered languages, more acute than just lack of mother-tongue transmission. There are languages whose last fluent speakers are already gone or are about to go. At a meeting at Glorieta near Santa Fe, New Mexico, a few months ago, we had actually the last living speaker of one of the languages come. It was a very sad experience for everyone, not just for that woman. And perhaps the saddest thing is that she cannot even talk to her sister anymore, who was the next-to-last speaker before she recently died. She can not call up anybody. '''The only person for her to talk to is a linguist and that is no fun.'''''[http://www.ncela.gwu.edu/pubs/stabilize/conclusion.htm]
</big></center>
</big></center>
==Translations==
==Stuff==

* "Если человек не понимает слово, это не проблема перевода - это проблема человека." - Варвара


* DIM EISIAU → ZERO WANT
* DIM EISIAU → ZERO WANT
Line 14: Line 18:


* We have met the enemy and it is us.
* We have met the enemy and it is us.

* "I am a fundamentalist, I use MT 100% of the time" -- Maria Machado, EU DGT.

* The idea of an Apertium MT system is quite at odds with many other NLP applications. For morphological analysers, part-of-speech taggers, etc., the idea is to model as much of the language as possible, the wider the coverage the better. An Apertium MT system on the other hand is a closed system. The idea is to analyse and generate only as much as can be translated. This can often seem counter intuitive to people who are used to working on other NLP software. They can find it frustrating that they can't just take their state-of-the-art analyser or tagger and get an equivalently good MT system. The thing to remember is that ''if it can't be translated, then being able to analyse it does more harm than good''. It usually takes some time to grasp in fullness. Many people give up before they get it.

* Why we try not to translate between parts of speech: We do not try to translate between parts of speech because it makes transfer more complicated. Rules match on source language patterns, and output target language patterns. For most pairs, these patterns are modelled on part of speech, or part of speech and subtags. The rules usually have a single 'out' section which outputs the target pattern. If we want to translate between parts of speech, we probably need more 'out' sections, making the rules more complicated and harder to maintain.

* Choosing a successful pair:
** Not in Google '''or''' can get better quality than in Google
** High quality translation
** Existing closed-source system available

* Mus uni non fidit antro.

* {{sc|ein burde ikkje sove}}
* {{sc|når natta fell på}}
* {{sc|ein burde sjå på stjernene}}
* {{sc|ein burde vere to.}}


==Todo==
==Todo==



* Some kind of generic web verb conjugator which uses lttoolbox, to increase the valuability of having Apertium style data.
* <s>Some kind of generic analyser, with human readable output &mdash; see work on [http://xixona.dlsi.ua.es/~fran/faroese/ faroese].</s>
:See [http://xixona.dlsi.ua.es/~fran/faroese/ here] etc.
* <s>Investigate if [http://www.ims.uni-stuttgart.de/projekte/gramotron/SOFTWARE/SFST.html SFST] or ''hunmorph'' may be used as an analyser for more complicated language morphology, and how it may be included into an Apertium pipeline.</s>
:See [[SFST]] and [[hunmorph]]
* A HOWTO of approaching transfer rules, similar to the "getting started with a language pair" HOWTO
* A HOWTO of approaching transfer rules, similar to the "getting started with a language pair" HOWTO
* A HOWTO of designing a language pair from scratch, including milestones. e.g. doing syntactic analysis of sentences, working out where translation problems will be, which ones can be tackled easily, which ones to leave until later etc.
* A HOWTO of designing a language pair from scratch, including milestones. e.g. doing syntactic analysis of sentences, working out where translation problems will be, which ones can be tackled easily, which ones to leave until later etc.
* [[Why is machine translation good]]?
* [[Why is machine translation good]]?
* [[/An MT system in one thousand steps]]

==Scratchpad==

* http://www.services.gov.za/en-za/Home.htm &mdash; Available in 11 official languages.
* http://www-user.tu-chemnitz.de/~fri/ding/ &mdash; German-English dictionary (GPL) ~100,000 lemmata.
* http://corpora.informatik.uni-leipzig.de/download.html &mdash; corpora
* http://www.hakikatkitabevi.com/ &mdash; text available for aligning in many languages (incl. Turkish, Azerbaijani).
** http://www.harunyahya.org/ &mdash; related to above.
* http://natura.di.uminho.pt/wiki/index.cgi?NATools &mdash; NATools is a workbench for parallel corpora processing. It includes a sentence aligner and a Probabilistic Translation Dictionary extractor, a word aligner and a set of other tools to study the aligned parallel corpora.
* http://www.setimes.com/cocoon/setimes/xhtml/en_GB/homepage/default &mdash; Newspaper in all the Balkan languages, '''public domain'''.
* Emores, an Empirical MOrphological REaSoning engine for the automatic acquisition of lemmas from a word list. (lexical acquisition)


==Humour and poetry==
==Humour and poetry==
Line 164: Line 172:
<jacobEo> argh! No thanks. What does it mean?
<jacobEo> argh! No thanks. What does it mean?
<spectie> common noun, singular
<spectie> common noun, singular


<spectei> heh, the word for monday is as in russian
<spectei> interestingly
<spectei> in komi
<spectei> all the days of the week are from russian
<spectei> except monday
<firespeaker> really?
<spectei> yep
<firespeaker> what's Monday?
<spectei> 'sec
<spectei> i have it written down on a piece of paper
<firespeaker> spectei: in case of devastating EM bursts?
<spectei> firespeaker, :D

<spectei> intransitive verbs usually don't have a present participle
<Unhammer> ah
<spectei> note also: the prefix for the pp depends on the stem, not on the paradigm
<n0nick> ooh
</pre>


<pre>

<firespeaker> түш:түшүр%>%{I%}%{l%} V-INFL-IV-IRREG-CAUS-PASS ; ! "to make fall"
<firespeaker> is that "right" or "wrong"?
<spectre> well, it's both
<spectre> it's right
<spectre> but inelegant
<firespeaker> why do you always think of the inelegant solutions first? :(
<spectre> it's like my brain is wired that way :((

</pre>

<pre>
<firespeaker> spectre: he's right you know
<firespeaker> your Russian there did happen to be almost perfect
<firespeaker> you made some silly errors with cases and stuff, but it just came off looking like someone who didn't know how to spell,
<firespeaker> and not like your Russian normally does
</pre>

<pre>
<spectie> user1, which language(s) do you speak ?
<user1> only english..
<spectie> user1, you don't speak hindi ?
<spectie> or kannada ?
<user1> yeah, i speak kannada and hindi

<spectie> flammie's tactic of translating genitive +postposition into " 's + adverb/noun" in english
<spectie> still has me smiling
<spectie> this morning i was smiling while thinking about it in the shower
<Flammie> table's under seems perfectly understandable
<Flammie> I think I'm gonna start using that from now on
<spectie> :D
</pre>

<pre>
<spectie> <sirex> https://bitbucket.org/sirex/morfologija - this is my attempt to parse data and http://donelaitis.vdu.lt/~vytas/lmdb/ this is actual data.
<spectie> <spectie> ciklonas 1 - 1 1 1 1 1 1 1 0 0 2 0 0 0 0 1 1 0 0 0 0
<spectie> <spectie> wow
<spectie> <spectie> nice codes XD
<spectie> <Unhammer> <spectie> ciklonas 1 - 1 1 1 1 1 1 1 0 0 2 0 0 0 0 1 1 0 0 0 0 [21:59]
<spectie> <Unhammer> AGH
<spectie> <spectie> Unhammer, SILENT SCREAM
<spectie> <spectie> or not so silent
<spectie> <Unhammer> hahah
<spectie> <Unhammer> "I was walking along the road with two friends – the sun was setting – suddenly the sky turned blood red – I paused, feeling exhausted, and leaned on the fence – there was ciklonas 1 - 1 1 1 1 1 1 1 0 0 2 0 0 0 0 1 1 0 0 0 0 http://is.gd/z31AgG "
<spectie> <Unhammer> true story
</pre>

<pre>
* tachyons (~tachyons@117.221.159.182) ha entrat a #hfst
<spectre> tachyons, \o/
<tachyons> postposition
</pre>

<pre>
<spectre> they have a song in Scots Gaelic about Linux
</pre>

<pre>
<fotonzade> okay oguz so these pipes | are used for taking the output of one program and using them as input to another program
<fotonzade> they're called pipes because they work like aqueducts
</pre>
</pre>


Line 170: Line 261:
* lexical economy → wordwise thrift
* lexical economy → wordwise thrift
* linguistic economy → speakwise thrift
* linguistic economy → speakwise thrift
* morphological annotation → wordbound adornment
*
* language exchange → speechshare
* homonymy → samenameness
* polysemy → manymeaningness
* birthplace → birthstead
* prediction → forsaying
* predict → forsay

==Songs==

<pre>

Substantiv är namn på ting,
till exempel boll och ring
Verb är sådant man kan göra,
som att hoppa, se och höra
Adjektiven sen oss lär,
hurudana tingen är

</pre>


__NOTOC__
__NOTOC__

Latest revision as of 00:06, 1 July 2020

Bringing the problem of definition into perspective.


Email me | IRC nick: spectie, spectei or spectre | link_id: ftyers

There are at least two other more serious problems for endangered languages, more acute than just lack of mother-tongue transmission. There are languages whose last fluent speakers are already gone or are about to go. At a meeting at Glorieta near Santa Fe, New Mexico, a few months ago, we had actually the last living speaker of one of the languages come. It was a very sad experience for everyone, not just for that woman. And perhaps the saddest thing is that she cannot even talk to her sister anymore, who was the next-to-last speaker before she recently died. She can not call up anybody. The only person for her to talk to is a linguist and that is no fun.[1]

Stuff[edit]

  • "Если человек не понимает слово, это не проблема перевода - это проблема человека." - Варвара
  • DIM EISIAU → ZERO WANT
  • idiomas de oficialidad más débil o peso demográfico más reducido → languages of officiality feebleer or demographical weight more reduced

Apertium — Machine translation for languages of officiality feebleer or reduced demographic weight.

  • We have met the enemy and it is us.
  • "I am a fundamentalist, I use MT 100% of the time" -- Maria Machado, EU DGT.
  • The idea of an Apertium MT system is quite at odds with many other NLP applications. For morphological analysers, part-of-speech taggers, etc., the idea is to model as much of the language as possible, the wider the coverage the better. An Apertium MT system on the other hand is a closed system. The idea is to analyse and generate only as much as can be translated. This can often seem counter intuitive to people who are used to working on other NLP software. They can find it frustrating that they can't just take their state-of-the-art analyser or tagger and get an equivalently good MT system. The thing to remember is that if it can't be translated, then being able to analyse it does more harm than good. It usually takes some time to grasp in fullness. Many people give up before they get it.
  • Why we try not to translate between parts of speech: We do not try to translate between parts of speech because it makes transfer more complicated. Rules match on source language patterns, and output target language patterns. For most pairs, these patterns are modelled on part of speech, or part of speech and subtags. The rules usually have a single 'out' section which outputs the target pattern. If we want to translate between parts of speech, we probably need more 'out' sections, making the rules more complicated and harder to maintain.
  • Choosing a successful pair:
    • Not in Google or can get better quality than in Google
    • High quality translation
    • Existing closed-source system available
  • Mus uni non fidit antro.
  • ein burde ikkje sove
  • når natta fell på
  • ein burde sjå på stjernene
  • ein burde vere to.

Todo[edit]

  • A HOWTO of approaching transfer rules, similar to the "getting started with a language pair" HOWTO
  • A HOWTO of designing a language pair from scratch, including milestones. e.g. doing syntactic analysis of sentences, working out where translation problems will be, which ones can be tackled easily, which ones to leave until later etc.
  • Why is machine translation good?
  • /An MT system in one thousand steps

Humour and poetry[edit]

<bogdan> spectie: we want more spectish poetry!
<spectie> haha :D
<bogdan> spectie: I had not heard any bad non-rhyming non-sensical poetry in a long time and I miss it!
<spectie> s/bad/good
<spectie> bogdan2005, ok
<spectie> here's a variation on a popular theme:
<spectie> its called "Machine translation"
<spectie> SILENCE PLEASE
<spectie> ...
<spectie> machine translation
<spectie>   sometimes it works
<spectie> sometimes it doesn't
<spectie> ...
<zocky> machine translation / sometimes it works / manchmal he don't
Murat: kovayla bira içerim, ama sen bilmezsin. yarın gelir misin?
Murat: vedrəyle pivə içirəm, ama sen bilməzsən. yarın gələrmisən?
Murat: I drink beer with the bucket, but you don't know it. Do you come tomorrow?
Murat: a poem by msalperen
<jimregan> nah... there's some junk in the JRC parallel text I have here
<isaac> jimregan: what are you doing? trying to use retratos?
<jimregan> yep
<jimregan> my 'beginner's polish' mini-corpus is at my parents' house
<isaac> that happens usually, you never have your mini-corpus when you need it
<Garbine> qué es un ej. oc?
<spectie> Garbine, un ejemplo de occitano
<isaac> Garbine: oc == occitante
<isaac> ops :P
<spectie> occitante ??
<isaac> typo :P
<spectie> hah
<spectie> :D
<isaac> occitante sounds cool though :P
<carmentano> occitante?!?!?!
<spectie> occitant
<carmentano> occitano
<Garbine> vale, muchas gracias a todos
<carmentano> occitante no sale en la rae
<isaac> it will
<spectie> lol isaac 
<carmentano> :S
<Garbine> ahora me voy a comer, y luego haré unas pruebas
<spectie> Garbine, hasta luego
<carmentano> yo también me voy dentro de nada
<Garbine> que aproveche!
<carmentano> que tengo una clase ¿occitante?
<spectie> haha!
<spectie> isaac, que significa occitante? llena de occitano ?
<isaac> esa es la primera acepcion
<spectie> "fue una clase occitante... "
<carmentano> aquí las clases occitantes están llenas de franceses...
<spectie> :/
<isaac> si, quiere decir "excitante y llena de occitano"
<spectie> isaac, lol
<carmentano> uhm...!
<carmentano> suena bien
<isaac> <isaac> carmentano: como ha ido la clase?
<isaac> <carmentano> ha sido occitante
<spectie> XD
<carmentano> :D
<isaac> there you have a usage example
<isaac> apertium es un software occitante too
<Afal> this is rubbish spectie
<Afal> no wonder you need a welsh person to help you with this
<CIA-29> apertium: ftyers * r5542 /trunk/apertium-cy-en/apertium-cy-en.en.dix.xml: +1
<CIA-29> apertium: ftyers * r5543 /trunk/apertium-cy-en/apertium-cy-en.cy.tsx: Adding tsx file
<CIA-29> apertium: ftyers * r5544 /trunk/apertium-cy-en/ (3 files): Bla
<CIA-29> apertium: ftyers * r5545 /trunk/apertium-cy-en/apertium-cy-en.cy.tsx: Minor addition to tsx
<CIA-29> apertium: jimregan * r5546 /trunk/apertium-cy-en/ (apertium-cy-en.cy-en.dix.xml apertium-cy-en.cy.dix.xml): llawer o -> a lot of
<CIA-29> apertium: ftyers * r5547 /trunk/apertium-cy-en/apertium-cy-en.cy.tsx: Minor thing
<CIA-29> apertium: ftyers * r5548 /trunk/apertium-cy-en/ (apertium-cy-en.cy.tsx apertium-cy-en.cy.dix.xml): Minor thing
<CIA-29> apertium: ftyers * r5549 /trunk/apertium-cy-en/apertium-cy-en.cy.tsx: One more
<CIA-29> apertium: ftyers * r5550 /trunk/apertium-cy-en/apertium-cy-en.cy-en.t1x: More crud
<CIA-29> apertium: ftyers * r5551 /trunk/apertium-cy-en/cy-en.prob: New prob
<CIA-29> apertium: ftyers * r5552 /trunk/apertium-cy-en/ (apertium-cy-en.cy.dix.xml cy-en.prob): Minor thing
<CIA-29> apertium: ftyers * r5553 /trunk/apertium-cy-en/apertium-cy-en.cy-en.dix.xml: AErgaerg
<CIA-29> apertium: ftyers * r5554 /trunk/apertium-cy-en/ (3 files): RELATIVE
<CIA-29> apertium: sortiz * r5555 /trunk/apertium/apertium/apertium-header.sh: Minor fix in apertium script
<spectie> joder
<CIA-29> apertium: jimregan * r5556 /trunk/apertium-cy-en/apertium-cy-en.cy-en.dix.xml: fix unicode conversion debris
<CIA-29> apertium: garbine * r5557 /trunk/apertium-fr-es/ (3 files): New vocabulary added by Eleka
<CIA-29> apertium: ftyers * r5558 /trunk/apertium-cy-en/ (apertium-cy-en.cy-en.t1x cy-en.prob): Blergh
<CIA-29> apertium: jimregan * r5559 /trunk/apertium-cy-en/ (6 files): currency
<CIA-29> apertium: ftyers * r5560 /trunk/apertium-cy-en/ (3 files): Blerg
<CIA-29> apertium: ftyers * r5561 /trunk/apertium-cy-en/apertium-cy-en.cy.tsx: TSX
<CIA-29> apertium: ftyers * r5562 /trunk/apertium-cy-en/ (apertium-cy-en.cy-en.t1x apertium-cy-en.cy.dix.xml): Blah
<CIA-29> apertium: ftyers * r5562 /trunk/apertium-cy-en/ (apertium-cy-en.cy-en.t1x apertium-cy-en.cy.dix.xml): Blah
<spectie> my commit messages get more desperate as the day goes on
(22:01:36) murat: I have the very first mk-tr corpus in the world.
(22:01:43) murat: Hope the poverty will end!
<HannesP> haha, our 5 y.o. neighbour is fluent in both polish and swedish. he explains his ability of speaking polish as performing
  magic in his mouth transforming the speech to polish
<jacobEo> yes, already standardised tagsets
<spectie> parole tags look like:
<spectie> NC0000S
<spectie> and penn treebank tags look like:
<Unhammer> *silent scream*
<spectie> NN VBZ
<spectie> lol Unhammer 
<spectie> yes
<jacobEo> argh! No thanks. What does it mean?
<spectie> common noun, singular


<spectei> heh, the word for monday is as in russian
<spectei> interestingly 
<spectei> in komi
<spectei> all the days of the week are from russian
<spectei> except monday
<firespeaker> really?
<spectei> yep
<firespeaker> what's Monday?
<spectei> 'sec
<spectei> i have it written down on a piece of paper
<firespeaker> spectei: in case of devastating EM bursts?
<spectei> firespeaker, :D

<spectei> intransitive verbs usually don't have a present participle
<Unhammer> ah
<spectei> note also: the prefix for the pp depends on the stem, not on the paradigm
<n0nick> ooh



<firespeaker>  түш:түшүр%>%{I%}%{l%} V-INFL-IV-IRREG-CAUS-PASS ; ! "to make fall"
<firespeaker> is that "right" or "wrong"?
<spectre> well, it's both
<spectre> it's right
<spectre> but inelegant
<firespeaker> why do you always think of the inelegant solutions first? :(
<spectre> it's like my brain is wired that way :((

<firespeaker> spectre: he's right you know
<firespeaker> your Russian there did happen to be almost perfect
<firespeaker> you made some silly errors with cases and stuff, but it just came off looking like someone who didn't know how to spell, 
<firespeaker> and not like your Russian normally does
<spectie> user1, which language(s) do you speak ?
<user1> only english..
<spectie> user1, you don't speak hindi ?
<spectie> or kannada ?
<user1> yeah, i speak kannada and hindi

<spectie> flammie's tactic of translating genitive +postposition into " 's + adverb/noun" in english
<spectie> still has me smiling
<spectie> this morning i was smiling while thinking about it in the shower
<Flammie> table's under seems perfectly understandable
<Flammie> I think I'm gonna start using that from now on
<spectie> :D
<spectie> <sirex> https://bitbucket.org/sirex/morfologija - this is my attempt to parse data and http://donelaitis.vdu.lt/~vytas/lmdb/ this is actual data.
<spectie> <spectie> ciklonas 1 - 1 1 1 1 1 1 1 0 0 2 0 0 0 0 1 1 0 0 0 0
<spectie> <spectie> wow
<spectie> <spectie> nice codes XD
<spectie> <Unhammer> <spectie> ciklonas 1 - 1 1 1 1 1 1 1 0 0 2 0 0 0 0 1 1 0 0 0 0  [21:59]
<spectie> <Unhammer> AGH
<spectie> <spectie> Unhammer, SILENT SCREAM
<spectie> <spectie> or not so silent
<spectie> <Unhammer> hahah
<spectie> <Unhammer>  "I was walking along the road with two friends – the sun was setting – suddenly the sky turned blood red – I paused, feeling exhausted, and leaned on the fence – there was ciklonas 1 - 1 1 1 1 1 1 1 0 0 2 0 0 0 0 1 1 0 0 0 0 http://is.gd/z31AgG "
<spectie> <Unhammer> true story
* tachyons (~tachyons@117.221.159.182) ha entrat a #hfst
<spectre> tachyons, \o/
<tachyons> postposition
<spectre> they have a song in Scots Gaelic about Linux
<fotonzade> okay oguz so these pipes | are used for taking the output of one program and using them as input to another program
<fotonzade> they're called pipes because they work like aqueducts

Phrases[edit]

  • lexical economy → wordwise thrift
  • linguistic economy → speakwise thrift
  • morphological annotation → wordbound adornment
  • language exchange → speechshare
  • homonymy → samenameness
  • polysemy → manymeaningness
  • birthplace → birthstead
  • prediction → forsaying
  • predict → forsay

Songs[edit]


Substantiv är namn på ting, 
till exempel boll och ring 
Verb är sådant man kan göra, 
som att hoppa, se och höra 
Adjektiven sen oss lär, 
hurudana tingen är