User:Jonasfromseier/GSoC 2013 Application: "Danish-Norwegian (Bokmål) language pair"

From Apertium
Jump to navigation Jump to search

Why are you interested in machine translation?

MT is for me one of the most exciting disciplines in linguistics. The crossroads between natural languages and formal languages is a source of wonder to me. How can we boil down the unlimited human capacity for creating and understanding new sentences to a set of formal rules? The task is challenging. From a functional-cognitive linguistic perspective I'm interested in finding the translations that best represent the function of the original sentence. Areas of great interest to me is the translation of metaphor and idioms and the differences in these throughout the world's languages.


Why are you interested in the Apertium project?

The idea of free translation seems as natural to me as the idea of free speech and free Internet access to everyone. Access to information in any language should be available to anyone effortlessly. Apertium's open-source philosophy fits the bill perfectly.


Why Google and Apertium should sponsor it?

The addition of a new language pair to the Apertium trunk is of value to the small language communities. The two languages have a limited number of speakers which means less funding for Natural Language Processing related tools. The smaller the language community, the less funding is normally allocated to language tools. Sponsoring my development of a nb-da language pair will yield a significant result for an insignificant amount of funds.


How and who it will benefit in society?

The addition of the language pair will help Norwegian students studying Danish and vice versa in their learning process. Although the two languages contain many cognates, the differences between are plentiful.


Which of the published tasks are you interested in? What do you plan to do?

I plan to work on the nb-da language pair that's currently in the nursery stage. I will make a set of transfer rules and work on a constraint grammar to improve the tagging process. Accounting for time, I'd like to extend coverage to the newer variant of Norwegian, nynorsk. A feat which will be possible through the existence of the existing nn-nb Apertium language pair


Coding challenge

For the coding I've worked on the nb-da language that already exists in the Apertium nursery. The nursery version contains just a limited version of the monodices and the bidix, without transfer rules, and I've been working mainly on the transfer rule t1x file.


Community Bonding Period

During the period coming up the bonding period I've been active on the apertium IRC channel. The community has been so welcoming and it's been great having a 20-hour help desk full of helpful people. I will continue to be as active as possible during the bonding period.

Work plan

week plan
week 1 Identification of Norwegian-Danish morphology and syntax differences with the aid of grammar books (Norsk Referansgrammatik and Dansk for Norsker), supplementing with the study of parallel nb-da corpora.


week 2 Design of transfer rules
week 3 Test and identification of transfer rules errors
week 4 Rectification of transfer rules erros.
Deliverable # 1 Complete set of transfer rules for the nb-da pair
week 5 Commence work on Constraint Grammar. Porting of Oslo-Bergen tagger. Identify problems
week 6 work on Constraint Grammar
week 7 Testing of Constraint Grammar
Deliverable # 2 nb-da pair with Constraint Grammar


week 8 Extension of bilingual dictionary and monodices
week 9 Error identification and analysis
week 11 Error rectification
week 12 Cleanup and dissemination




List your skills and give evidence of your qualifications

I'm currently a 2nd year student of the BA in Linguistics at Copenhagen University. Through my work and studies I've gained the following skills:

Linguistics: semantics, pragmatics, grammatic analysis, analysis of morphology and syntax of exotic languages.

Danish language: I'm a Danish language instructor at the respected Danish language institute Studieskolen. Through my work I've gained valuable insights in Danish morpholgy and syntax and specfically the problems of foreign language typology applied to translation to Danish.

Programming skills: I've attended a programming class at Copenhagen University, gaining knowledge of Python programming. I finished the class with an A. Furthermore I do HTML and CSS web programming for my own websites.


My non-Summer-of-Code plans for the Summer

I'm taking the summer off work so will be freed of any work commitments apart from GSoC.