User:Pyry/Sandbox
Jump to navigation
Jump to search
Challenges in Finnish to North Sámi rule-based machine translation Translating the Bible from Finnish to North Sámi Trials and tribulations in Finnish to North Sámi rule-based machine translation
http://www.uoc.edu/freerbmt11/
Submission deadline: Nov 8
(13:38:36) francis: 1) underspecification in omorfi (e.g. cc/cs vs. conj) (13:39:08) francis: 2) differing grammatical traditions (acc/gen??) merge in omorfi but not in GT (13:39:28) francis: 3) overgeneration in sme (13:40:27) francis: 4) sometimes it wasn't clear when words were assigned to defective paradigms (e.g. some pronouns?? didn't decline in some cases) (13:41:20) francis: btw, we actually had a three way tagset disjunct (13:41:29) francis: between omorfi, fred's CG and giellatekno (13:44:30) ryan: one of the larger problems i thought was figuring out what exactly trying to do with compound words (13:45:18) ryan: sankari probably (13:45:20) francis: i haveo ne (13:45:21) ryan: means hero (13:45:26) ryan: but it ended up with a compound analysis (13:45:27) ryan: san# kari (13:45:29) francis: saamelainen (13:45:32) ryan: ooh, that too (13:45:43) ryan: even more related ;) (13:46:18) francis: 6) differing lexicalisation (13:46:25) francis: kritiserema kritiseret+V+TV+Der3+Der/n+N+Sg+Acc vs. kritisoinnin kritisointi+N+Sg+Gen
Paper
Introduction
- MT from major to minor language
- MT between related languages
- MT between agglutinative closely-related languages: Turkish--{Tatar,Turkmen,...}
- MT between Finno-Urgic / Sámi languages
Languages
- Constrastive analysis Finnish and North Sámi
- Cases
- Tenses
Implementation
- Tools
- HFST
- Constraint Grammar
- Apertium
- Problematic aspects
- Tagset differences (three-way: Omorfi, Fred's CG and Giellatekno)
- underspecification in omorfi (e.g. cc/cs vs. conj)
- differing grammatical traditions (acc/gen??) merge in omorfi but not in GT
- overgeneration in sme
- sometimes it wasn't clear when words were assigned to defective paradigms (e.g. some pronouns?? didn't decline in some cases)
- differing lexicalisation ('saamelainen', 'kritiserema' vs. kritisoinnin)
- compound words (f.eks. 'sankari' = san#kari)
Evaluation
Discussion
- Future work
- Conclusion
References
- Tantuğ, A. Cüneyd and Adalı, Eşref and Oflazer, Kemal (2007) A MT System from Turkmen to Turkish employing finite state and statistical methods. In: Machine Translation Summit XI, Copenhagen, Denmark
- Kemal Altinas (2001) "TURKISH to CRIMEAN TATAR MACHINE TRANSLATION SYSTEM". Masters Thesis, Bilkent University.
- Abulfat Fatullayev and Samir Shagavatov (2008) "TURKISH-AZERBAIJANI TRANSLATION MODULE OF DILMANC MT SYSTEM". The Second International Conference “Problems of Cybernetics and Informatics” September 10-12, 2008, Baku, Azerbaijan
- Tyers, F. M. and Wiechetek, L. and Trosterud, T. (2009) "Developing prototypes for machine translation between two Sámi languages". Proceedings of the 13th Annual Conference of the European Association of Machine Translation, EAMT09. pp. 120--128
- Wiechetek, L. and Tyers, F. M. and Omma, T. (2010) "Shooting at flies in the dark: Rule-based lexical selection for a minority language pair". Lecture Notes in Artificial Intelligence Volume 6233/2010, pp. 418--429