Difference between revisions of "User:Ote/proposal"

From Apertium
Jump to navigation Jump to search
Line 24: Line 24:
The OTE will be enhanced to first support tagging of Parts of Speech (Noun, Verb, Adjective, etc). This is an important first step in order to support the second step: Dictionary Export in the proper Apertium XML formats.
The OTE will be enhanced to first support tagging of Parts of Speech (Noun, Verb, Adjective, etc). This is an important first step in order to support the second step: Dictionary Export in the proper Apertium XML formats.


At the end of this project, I will produce a basic workable system that integrates with Apertium. A proposal for OTE Version 3.0 will then be submitted for further enhancements (Dictionary Import, support for Transfer Rules and Apertium Paradigms)


== By the Student ==
== By the Student ==


I started the OTE project to help me learn the Dutch language. ....
School / Language Learning


== By the Developer ==
== By the Developer ==


A primary goal of this project is to create tools that are easily useable with other Open Source projects.
A primary goal of this project is to create tools that are easily useable with other Open Source projects.

Bootstrapping


= License =
= License =
Line 73: Line 72:


=== Dictionary Import ===
=== Dictionary Import ===

Formats: CSV, OTE XML


=== Word-2-Word Translator ===
=== Word-2-Word Translator ===
Line 98: Line 99:
Possible items:
Possible items:


Further Apertium support, particulary for Transfer Rules and Paradigms.
Further Apertium support


More Robust Classroom management tools
More Robust Classroom management tools
Line 104: Line 105:
Support for further public installations of OTE
Support for further public installations of OTE


Integration with Wordnet, World Wide Lexicon, .PO files...
Integration with more projects... More export formats


= Misc.... =
= Misc.... =

Revision as of 12:30, 16 November 2007

Open Translation Engine (OTE) Version 2.0

Proposal for NLnet[1]

Abstract

With this funding, I will produce a web-based system to allow a community of users to cooperatively build translation dictionaries.

While there are many Open Source projects in the field of Machine Translation, there is a lack of tools for the community creation of translation dictionaries. The goal of the Open Translation Engine (OTE)[2] Version 2.0 is to create a robust tool for this space.


Use Cases

How will the OTE be used?

By the Linguist

In the field of Machine Translation, I have chosen the Apertium project as the first external format to support.

Apertium is a robust project with many active particpants. But there is a lack of tools to allow multiple developers to create the translation dictionaries needed to run Apertium.

Currently Apertium dictionaries are modified by direct editing of multiple XML files. While these XML files are versioned with Subversion, this is not an ideal solution for true community involvement.

The OTE will be enhanced to first support tagging of Parts of Speech (Noun, Verb, Adjective, etc). This is an important first step in order to support the second step: Dictionary Export in the proper Apertium XML formats.

At the end of this project, I will produce a basic workable system that integrates with Apertium. A proposal for OTE Version 3.0 will then be submitted for further enhancements (Dictionary Import, support for Transfer Rules and Apertium Paradigms)

By the Student

I started the OTE project to help me learn the Dutch language. ....

By the Developer

A primary goal of this project is to create tools that are easily useable with other Open Source projects.

License

Currently the OTE is under the BSD License. This includes both the source code and the translation dictionaries.

During this project, I will re-evaluate which Open Source license is the best choice, with particular attention to possible differences in needs between the source code and translation dictionaries.


Project

Core System

Conversion to Unicode - The current prototype is not unicode aware. Unicode is an absolute necessity to continue the project with all possible word languages.

Genericization of code base - The OTE will be built to be as generic as possible, thus allowing for ease of future enhancements. Currently the prototype code is hard coded with the Dutch/English language pair.

Install Procedure - Installation will be made as user-friendy as possible.

Documentation

User System

User accounts

User Permissions

User administration

Core Tools

Word Viewer

Dictionary Viewer

Dictionary Export

Formats: CSV, database(MySQL) dump, OTE XML, Apertium XML

Dictionary Import

Formats: CSV, OTE XML

Word-2-Word Translator

Classroom Tools

Random Word, Flash Cards, Word Lists

Community Tools

Add / Delete / Modify: Individual Word

Add / Delete / Modify: Tagging: Parts of Speech for a Word

Add / Delete / Modify: Translation Word Pairs

Add / Delete / Modify: Languages

Versioning - all words, translation pairs, and tags are versioned, allowing reversion to previous states.


Future Work

Upon successful completion of this project, I will submit a new proposal for OTE Version 3.0.

Possible items:

Further Apertium support

More Robust Classroom management tools

Support for further public installations of OTE

Integration with more projects... More export formats

Misc....

Comparison of OTE to current 'translation memory' systems, such as the many gettext/.po file editor / aggregators.