Difference between revisions of "User:Firespeaker/Templatic bidix"
Firespeaker (talk | contribs) |
Firespeaker (talk | contribs) |
||
(5 intermediate revisions by 2 users not shown) | |||
Line 3: | Line 3: | ||
This would create a few issues: |
This would create a few issues: |
||
* The user would then have to know the languages in depth to even really ''being'' working on a bidix. But isn't this already the ideal case? |
* The user would then have to know the languages in depth to even really ''being'' working on a bidix. But isn't this already the ideal case? |
||
* Addition to / rewrite of bidix (maybe best to fork it and release it as something different) |
|||
* Some way to deal with ranking of preference between different possible mappings |
|||
** Tokenisation / longest-match |
|||
== Test cases == |
== Test cases == |
||
Line 30: | Line 33: | ||
* [1 {{tag|n}}] аркылуу{{tag|post}} = via{{tag|prep}} [1]{{tag|n}} |
* [1 {{tag|n}}] аркылуу{{tag|post}} = via{{tag|prep}} [1]{{tag|n}} |
||
** [1 {{tag|n}}] аркылуу{{tag|post}} = through{{tag|prep}} [1]{{tag|n}} |
** [1 {{tag|n}}] аркылуу{{tag|post}} = through{{tag|prep}} [1]{{tag|n}} |
||
* {{highlight|ич ара = ... |
* {{highlight|ич ара = ...}} |
||
* араздашууну жөнгө салуу тажрыйбасын көрсөтүп, маданияттын бул түрү аркылуу жаштарды туура жолго салып, ак жолтой келечек курса болот деген көз карашын жайылтууда.}} |
* {{highlight|араздашууну жөнгө салуу тажрыйбасын көрсөтүп, маданияттын бул түрү аркылуу жаштарды туура жолго салып, ак жолтой келечек курса болот деген көз карашын жайылтууда.}} |
||
=== an English example === |
|||
* {{test|eng|He gets really angry when the person he's talking with starts throwing all his stuff out of the window|Аны менен сүйлөшүп жаткан киши буюмдарын терезеге таштай баштаганда жини жаман келет}} |
|||
== GSoC task == |
|||
Templatic bidix (Hard) |
|||
=== How? (required skills) === |
|||
Python, XML, C++ |
|||
=== What? (description) === |
|||
Design a format similar to [[bidix]] (declarative XML establishing language 1 <> language 2 correspondences) that allows the use of templates, as well as the back-end to process it (i.e., it should compile into an FST). It should deal with '''discontiguous multiwords''' and '''complex multiwords''', allowing them to be easily translated, and should provide some mechanism (some sort of ranking) to deal with multiple matching sets of templates for a given translation (similar to [[CG]]). It should essentially allow one to bypass [[transfer]] rules and [[constraint grammar|disambiguation]] and produce similar (if not better) accuracy in translation. |
|||
=== Why? (rationale) === |
|||
A templatic bidix forces the designer of a language pair to be more explicit, and also consolidates pair development. Furthermore, there are several types of phenomenon such a system could deal with that are currently highly problematic. |
|||
=== Who? (mentors) === |
|||
[[User:Firespeaker|Jonathan]] |
Latest revision as of 18:29, 14 March 2013
I have this idea that I think would make translations better (via more explicit mappings between languages as well as arbitrary structure mapping) and development easier. This would work by offloading disambiguation and "syntax" to bidix via bidix accepting "translation templates" instead of "words".
This would create a few issues:
- The user would then have to know the languages in depth to even really being working on a bidix. But isn't this already the ideal case?
- Addition to / rewrite of bidix (maybe best to fork it and release it as something different)
- Some way to deal with ranking of preference between different possible mappings
- Tokenisation / longest-match
Contents
Test cases[edit]
English/Turkic translations mostly
a long example[edit]
- Хип-хоптун алгачкы хореографы, америкалык өнөрпоз жергиликтүү бийчилер менен жолугушуп, хип-хоп аркылуу ич ара араздашууну жөнгө салуу тажрыйбасын көрсөтүп, маданияттын бул түрү аркылуу жаштарды туура жолго салып, ак жолтой келечек курса болот деген көз карашын жайылтууда.
- The first hip-hop choreographer, an American specialist, met with local dancers, presented his experience in settling internal disagreements through hip-hop, and advanced his stance that through this sort of culture you can set youth on the right path and built a bright future.
mappings needed[edit]
- [1]
<n>
<gen>
[2]<det>
[3]<n>
<px3sp>
= the [2]<det>
[1]<n>
[3]<n>
- хип-хоп
<n>
= hip-hop<n>
- алгачкы
<det>
= first<det>
- хореограф
<n>
= choreographer<n>
- америкалык
<adj>
= American<adj>
- өнөрпоз
<n>
= specialist<n>
- {{{1}}}
- жергиликтүү
<adj>
= local<adj>
- бийчи
<n>
= dancer<n>
<pl>
=<pl>
(a fall-back default?)- ( [1
<n>
|(<np>
.*)] ~ [2<n>
|(<np>
.*)] менен ) жолук<v>
<coop>
[3 _tags_] = [1] meet<v>
[3] with<prep>
[2] - {{{1}}}
- [1
<n>
] аркылуу<post>
= via<prep>
[1]<n>
- [1
<n>
] аркылуу<post>
= through<prep>
[1]<n>
- [1
- {{{1}}}
- араздашууну жөнгө салуу тажрыйбасын көрсөтүп, маданияттын бул түрү аркылуу жаштарды туура жолго салып, ак жолтой келечек курса болот деген көз карашын жайылтууда.
an English example[edit]
- (eng) He gets really angry when the person he's talking with starts throwing all his stuff out of the window → Аны менен сүйлөшүп жаткан киши буюмдарын терезеге таштай баштаганда жини жаман келет
GSoC task[edit]
Templatic bidix (Hard)
How? (required skills)[edit]
Python, XML, C++
What? (description)[edit]
Design a format similar to bidix (declarative XML establishing language 1 <> language 2 correspondences) that allows the use of templates, as well as the back-end to process it (i.e., it should compile into an FST). It should deal with discontiguous multiwords and complex multiwords, allowing them to be easily translated, and should provide some mechanism (some sort of ranking) to deal with multiple matching sets of templates for a given translation (similar to CG). It should essentially allow one to bypass transfer rules and disambiguation and produce similar (if not better) accuracy in translation.
Why? (rationale)[edit]
A templatic bidix forces the designer of a language pair to be more explicit, and also consolidates pair development. Furthermore, there are several types of phenomenon such a system could deal with that are currently highly problematic.