From Apertium
Revision as of 04:45, 3 April 2010 by Deepakjoy (talk | contribs)
Jump to navigation Jump to search

GERIAOUEG rebuild Name: Deepak Joy Cheenath Contact Email: IRC nickname: deepakjoy Website: I am a 3rd year student doing an integrated MSc(tech) Information Systems at the Birla Institute of Technology and Science (BITS), Pilani

My aim for Geriaoueg Aim: Create a new web interface for Apertium which would enable any user to utilize the power of Apertium on the internet, without any kind of set up. It would be a powerful force in popularizing Aperitum and thus getting support for its development. I would also ready it for installation on local servers of institutions working with people from around the world (like Universities and companies).

Features planned to achieve this?
Here are the features I plan to implement:
1.) A sleek, unobtrusive (drop-down) and convenient UI to give Geriaoueg the facelift it badly requires. 
2.) Modification of apertium-deshtml to handle broken web code better.
3.) Automatic language detection using libtextcat, to make it even more of a single-click service. This feature might slow down
Geriaoueg, and so would be optional.
4.) Automatic hyperlink conversion, so that once a user starts browsing with Geriaoueg, he need not paste each link into the
Geriaoueg address input. Link addresses will automatically be modified so that they send the link data to Geriaoueg.
5.) Shortform expansion: The web is filled with shorthand and abbrieviations, I would create a functionality to refer to a list
of “shortforms:fullforms” for each language whenever an unknown word is encountered. These lists could then be added to by any user, and could be very useful in the future.
6.) Stepwise GUI interface for installing and configuring (i.e. an admin module), so that users can easily set-up Geriaoueg on local servers. The admin module would include features for easy update and language pair addition/removal. This would be especially useful for organizations like universities, which would like to provide faster Apertium translations from their locally hosted server (for foreign students etc.)
7.) WAP site interface for easy use through mobile phones with smaller screens and slower connection speeds.
8.) User suggestions. I would include a method for users to point out discrepancies in the translation, which would help us to iron out bugs in translation.
9.) Usage statistics using an open source tool like TraceWatch. It would be useful for developers to understand how and why people use Geriaoueg.

What will I ensure?
1.) Compatibility and uniformity across all major browsers (Firefox, IE, Safari, Opera)
2.) Easy updating: Complete GUI for quick updating of the dictionaries and addition of newly added language pairs. (already implemented) Could easily be automated as well.
3.) Handling broken html and other code fragments that might mangle the translation.
4.) Thorough documentation on the wiki and comments in the code, to ensure further development of Geriaoueg.
5.) Internationalized interface, supporting (almost) all languages, so that everyone can at least see whether their language is currently supported or not.
Work plan
Bonding period:
- Create wiki page for Geriaoueg and put up my ideas for its development.
- Discuss various possibilities with fellow members.
- Learn about the Apertium tool.
- Start basic development on some modules.
- Explore the other tools available on the internet. Week 1
- Work on a new web interface for Geriaoueg.
- Update all languages for the Geriaoueg tool online.
Deliverables: Better Geriaoueg interface. Week 2
- Work on apertium-deshtml, or some do pre-processing for improved its handling of broken code.
- Create link conversion code (Point 4 in features list).
Deliverables: Broken code resolution.

Week 3 and 4
- Integrate libtextcat as an optional feature for automatic language detection. This would include some code to speed up the process, like reordering the list of languages based on user details (like IP address would be indicative of location), and link name (which might give away the language). Usage statistics could be analyzed to further improve this algorithm.
- Once a user has identified a language, further pages are checked for that language first.
- Write documentation. Deliverables: A web optimized version of libtextcat.

Week 5 and 6
- Write code for actual translation.
- Write code for hoverboxes. Provide option for single-word or sentence long translation boxes. If is possible, allow for highlighted portions to be translated.
- Page-flip (new!): This would load the translated page, and allowing toggling between the original and translated with a mouse-click. Deliverables: Complete-page translation, hoverboxes and page-flip translations.

Week 7
- Compile a list of commonly used shortforms in English (and a few other languages if possible). These can be gathered from various lists on the internet.
- Create shortform to fullform conversion code which refers to these files. Deliverables: Shortform resolution code and database.

Week 8
- Through testing of all modules.
- Bug-fixing.
- Check for browser compatibility.
- Documentation Deliverables: More robust code.

Week 9
- Create admin module quick installation.
- Create admin module for updating and language addition.
- Create input for user feedback/corrections. Deliverables: Admin modules

Week 10
- Enable usage statistics.
- Set up new Geriaoueg on server for testing. Deliverables: Complete working installation

Week 11
- Ask others to use the tool and collect feedback.
- Identify problems and try fixing them. Deliverables: Improved Geriaoueg

Week 12
- Wrap Up
- Complete Documentation

Why I like Apertium and the Geriaoueg project.
Over the past year I’ve become interested in the fields of machine learning, natural language processing and the semantic web. I find it interesting to explore the huge potential that these fields together have in spreading the use of computers to new frontiers. I have been working on projects involving compilers, and have experience in coding lexical analyzers and parsers as well as developing finite state automata. I am currently working on a project on natural language processing.
I found the Apertium project to be of great academic value to me, since it involves many of the subjects I want to pursue further studies in. Right now I have limited experience in these fields and would like an opportunity to work on a well developed project like Apertium. A well-developed project like Apertium could give me many insights. Also I found the user community to be very helpful and friendly, and think that this would be a good place for someone like me to learn and contribute.
Though I may not be very experienced in the field of machine translation, I have a lot of solid experience working with PHP, C++, HTML, CSS, Javascript and AJAX. I have done numerous projects in my college and even as an intern. These include a web-based Daily Routine Management System, an ecommerce website (customization), work on a registration website of my college, and numerous HTML websites for various organizations and individuals. I have a keen interest in UI design, and take great care in ensuring usability and speed. This is why I thought the Geriaoueg project would be great for me, since it would allow me to ease into the field of machine learning, while still allowing me to do a good amount of development.

Apertium encourages the use of different languages by eliminating the language barrier. I think that being open source is a great strength for Apertium which would enable it to be a leading tool especially for more marginalized languages. Especially developing countries like India which have a multitude of ethnic languages, would be able to utilize this tool to save their languages from becoming obsolete, as computers and the internet become ubiquitous.

I have no other commitments during the GSoC period and can easily devote at least 30 hours a week.