User:Jonasfromseier/GSoC 2013 Application: "Danish-Norwegian (Bokmål) language pair"
Contents
- 1 Why are you interested in machine translation?
- 2 Why are you interested in the Apertium project?
- 3 Why Google and Apertium should sponsor it?
- 4 How and who it will benefit in society?
- 5 Which of the published tasks are you interested in? What do you plan to do?
- 6 Coding challenge
- 7 Community Bonding Period
- 8 Work plan
- 9 List your skills and give evidence of your qualifications
- 10 My non-Summer-of-Code plans for the Summer
Why are you interested in machine translation?
MT is for me one of the most exciting disciplines in linguistics. The crossroads between natural languages and formal languages is a source of wonder to me. How can we boil down the unlimited human capacity for creating and understanding new sentences to a set of formal rules? The task is challenging. From a functional-cognitive linguistic perspective I'm interested in finding the translations that best represent the function of the original sentence. Areas of great interest to me is the translation of metaphor and idioms and the differences in these throughout the world's languages.
Why are you interested in the Apertium project?
The idea of free translation seems as natural to me as the idea of free speech and free Internet access to everyone. Access to information in any language should be available to anyone effortlessly. Apertium's open-source philosophy fits the bill perfectly.
Why Google and Apertium should sponsor it?
The addition of a new language pair to the Apertium trunk is of value to the small language communities. The two languages have a limited number of speakers which means less funding for Natural Language Processing related tools. The smaller the language community, the less funding is normally allocated to language tools. Sponsoring my development of a nb-da language pair will yield a significant result for an insignificant amount of funds.
How and who it will benefit in society?
The addition of the language pair will help Norwegian students studying Danish and vice versa in their learning process. Although the two languages contain many cognates, the differences between are plentiful.
Which of the published tasks are you interested in? What do you plan to do?
I plan to work on the nb-da language pair that's currently in the nursery stage. I will make a set of transfer rules and work on a constraint grammar to improve the tagging process. I will mainly do work on the nb-da direction as I can make reliable grammaticality judgments, Danish being my first language.
Coding challenge
For the coding I've worked on the nb-da language that already exists in the Apertium nursery. The nursery version contains just a limited version of the monodices and the bidix, without transfer rules, and I've been working mainly on the transfer rule t1x file.
Community Bonding Period
During the period coming up the bonding period I've been active on the apertium IRC channel. The community has been so welcoming and it's been great having a 20-hour help desk full of helpful people. I will continue to be as active as possible during the bonding period.
Work plan
week | plan |
---|---|
week 1 | Identification of Norwegian-Danish morphology and syntax differences with the aid of grammar books (Norsk Referansgrammatik and Dansk for Norsker), supplementing with the study of parallel nb-da corpora and Norwegian texts.
|
week 2 | Design and preliminary testing of transfer rules |
week 3 | Test and debugging of transfer rules. |
week 4 | Rectification of transfer rules erros. |
Deliverable # 1 | Complete set of transfer rules for the nb-da direction. |
week 5 | Commence work on Constraint Grammar. Porting of Oslo-Bergen tagger. |
week 6 | Design of Constraint Grammar. |
week 7 | Testing of Constraint Grammar. |
Deliverable # 2 | nb-da pair with Danish Constraint Grammar |
week 8 | Extension of bilingual dictionary and monodices to include the nynorsk variant of Norwegian. |
week 9 | Testvoc
|
week 10 | Testvoc
|
week 11 | Debugging transfer rules and CG. |
week 12 | Cleanup and dissemination
|
List your skills and give evidence of your qualifications
I'm currently a 2nd year student of the BA in Linguistics at Copenhagen University. Through my work and studies I've gained the following skills:
Linguistics: semantics, pragmatics, grammatic analysis, analysis of morphology and syntax of exotic languages.
Danish language: I'm a Danish language instructor at the respected Danish language institute Studieskolen. Through my work I've gained valuable insights in Danish morpholgy and syntax and specfically the problems of foreign language typology applied to translation to Danish.
Programming skills: I've attended a programming class at Copenhagen University, gaining knowledge of Python programming. I finished the class with top grades. Apart from the class I've taken the Courserea online object-oriented programming class and done extensive reading on Natural Language Processing. Furthermore I do HTML and CSS web programming for my own websites.
My non-Summer-of-Code plans for the Summer
I'm taking the summer off work so will be freed of any work commitments apart from GSoC.
--Jonasfromseier 13:54, 27 April 2013 (UTC)