User:Kanmuri/GSoC 2010 Application/Easy Dictionary Maintenance

From Apertium
< User:Kanmuri
Revision as of 11:26, 6 April 2010 by Kanmuri (talk | contribs) (Created page)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Name: Stephen Tigner

E-mail address: stephen.tigner@gmail.com

Other information that may be useful to contact you: Kanmuri on Freenode IRC (Will provide other contact information directly and privately to project mentors. I don't wish to post such publicly, however.)

Why is it you are interested in machine translation?

The promise of machine translation (along with the internet) is that eventually all the world's information and knowledge will be available to everyone, regardless of location, nationality, or language. I want to help fulfill that promise, as it were.

Also, being able to communicate with people across the language barrier, breaking it down, as it were, interests me. I'm certain that there are those who share my same interests throughout the world, but we may never be able to connect with each other due to the language barrier.

Why is it that you are interested in the Apertium project?

Well, because I'm trying to decide what field in computer science I want to pursue further. One field that had somewhat piqued my interest was computational linguistics, but I know pretty much nothing about linguistics, so I hoped to learn during the process of working with Apertium, to help me know if it's really the path I want to pursue or not.

Also, again, like I said above, Apertium is working toward fulfilling the promise of machine translation.

Qualifications and previous experience

I have my BS in CS (I did my final project in Java), and am currently studying abroad in Japan for a year, returning to home (the US) in late May. My programming experience has been mostly in-class stuff and small utilities or scripts I've written for my own use.

One exception was a "MUCK emulation" layer for a MOO. Basically it was creating a set of utilities that were written from scratch to emulate the operation as well as look and feel of some of the most common commands and utilities on MUCKs. This involved digging into the MOO core, understanding how it worked, and extending it as needed to support the MUCK style of operation, while not breaking the existing MOO-style utilities for users who wanted to use them.

(If you don't know what I'm talking about, that's fine. They're basically text-based shared user environments. MOO is an object-oriented language for creating these shared environments. The term refers to both the language and the environment itself. MUDs, MUCKs, MUSHes, MOOs, etc., are the predecessors of today's MMOs, and many still see somewhat active use even today.)

As for open-source contributions, I haven't really contributed to open-source projects before, unfortunately.

Which of the published tasks are you interested in? What do you plan to do?

2) Easy dictionary maintenance This would be a tool to allow for easy editing and addition of words (with inflections) to Apertium dictionaries. The plan would be to provide a localizable Swing-based GUI for that purpose. The goal would be to make an interface that would be usable by those who would like to contribute by adding words, but are otherwise put off by the complexity of editing XML files by hand.

This would encourage more development and contribution to dictionary files. My impression has been that adding words takes a significant amount of time, so the more people that are able to add words, the better. Removing the barrier to entry created by having to understand the XML file format would, I believe, significantly increase the likelihood for people to contribute.

The reason for making it localizable is, of course, to allow for those who perhaps only know their (non-English) native language, or perhaps know a few languages, but not English, to also be able to contribute to dictionaries. n.n

As mentioned above, I did the stem-checking challenge on the Apertium wiki, so I'm fairly confident I can use dixtools as a library for reading and writing the XML files, and that I'll be mostly focusing on presentation and user interaction issues.

Preliminary Schedule (Week of [Monday]):

May


May 24th: Not available (will be returning from Japan, settling back in, and then attending an event)

May 31st: Clarify requirements, work on UI design and mockups, get feedback.

June


June 7th: Continuation of the previous week.

June 14th: Start coding UI based on design and feedback.

June 21st: Continue coding

June 28th: Have initial working prototype

This prototype may be as ugly as sin and slightly clunky in places, but it should actually *work*.

July


July 5th: Start work on internationalization.

This should hopefully have been happening all along, but this is to go back through the code, and make sure there aren't any assumptions made that would break localizability, or text that should be localized that hasn't been setup to be.

July 12th: Mid-term evaluation, continuing internationalization.

July 19th: Begin work on cleaning up the interface, making it look nicer. Work on enhancing the user experience with the tool.

July 26th: Continue UI cleanup.

August


Aug 2nd: Begin work on documentation. Code should have Javadoc througout, which will make developer documentation easier. User documentation may be a bit more difficult.

Aug 9th: Stop additional coding. Cleanup code and documentation.

Aug 16th: Everything finished by this date. Final evaluation.