User:MaryX/

From Apertium
Jump to navigation Jump to search


Why is it you are interested in machine translation?

I've been interested in languages and how they're structured and how they relate to one another for a long time, and machine translation strikes me as a wonderfully sensible way to approach those questions.

Why is it that you are interested in the Apertium project?

The interest in Apertium follows from the interest in machine translation - I came across the program via the Google Summer of Code list of organizations, and it looked like a way to do something interesting (machine translation) in support of a good cause (I figure having good-quality machine translation readily available for lots of different languages is pretty crucial for making the internet accessible to people from all over the world).

Which of the published tasks are you interested in? What do you plan to do?

I plan to build the Hebrew-Arabic language pair, which currently doesn't exist. Hebrew and Arabic have a lot of grammar in common and often have identical syntax, making it easier to create a coherent machine translation between the two languages. I will also be able to adapt some of the material from the Maltese-Arabic and Maltese-Hebrew pairs, both of which are currently in staging. A Hebrew-Arabic pair would add to the body of resources for Semitic languages which, despite the large number of speakers, is comparatively small. A language pair involving Arabic could also be expanded to include various dialects of colloquial Arabic, since existing language resources are almost entirely restricted to Modern Standard Arabic. In terms of benefiting society, improving translation resources between Hebrew and Arabic would help enable increased individual dialogue and cultural exchange between Israelis and Arabs, which in turn could have positive repercussions for the Middle East peace process.

For creating a specific work plan, I was wondering whether there were work plans for similar tasks from previous years that I could look at for ideas and to see how long different aspects of building a language pair took. In the absence of that, my (very sketchy) plan is as follows:

- End of June: Planning period, including finding resources for building dictionaries and finding/translating texts for testing
- First half of July: Input a basic framework that covers the most common grammatical structures
- Second half of July and first half of August: Test the basic framework with a series of texts, adding and making changes as needed
- Second half of August: Expand the dictionaries
- September: Tie up loose ends, write/expand documentation

The test texts can be made to serve as milestones (i.e., I can aim to be able to translate such-and-such text by such-and-such date).

Some of the other projects listed on the "Ideas" page looked interesting/within my abilities as well, such as creating an interface for hand-tagging corpora, or improving bilingual dictionary induction - if working on one of those (or working on developing a different language pair) would be more useful, I'd be open to doing that instead.

Skills and Qualifications

I'm majoring in religious studies and vacillating between a major and a minor in math as well - not the most obviously applicable disciplines, I know, but I've been supplementing the math with computer science classes and my religious studies major involves a lot of work with languages. I've been learning German since middle school and in college I've studied Hebrew (both biblical and modern) and Arabic. Last summer I started building a website (to be launched for beta testing this fall, hopefully) that serves as a platform for collaborative grammatical analysis of the Hebrew Bible - since the grammatical interpretation of the text is often ambiguous, the site allows users to hand-tag words based on their grammatical features and also compare interpretations with other users and published sources. However, I haven't programmed in an open-source project before. I'm in the process of doing the coding challenge from the wiki - I'll send you a link to what I've done some time in the next couple of days!

Summer Plans

I will be traveling June 1 - June 20, and I return to school for the fall at the start of September. The time in between is currently completely unscheduled, so I can pretty much devote as much of my time to the project as is needed. (I will probably wind up doing a few small other things as well, but these can be scheduled around the project.) Once I start classes I will have much less free time, so my general idea is to get almost all of the work done before then, and use those last few weeks for tying up loose ends and making sure everything is documented.

Contact Info

I'm reachable via this wiki, via the #apertium IRC (also as MaryX), and by email at miss [dot] mary [dot] x [at] gmail [dot] com.