User:Sphinx/Application for "Adopt a language pair" GSOC 2013

From Apertium
Jump to navigation Jump to search

Contact information

Name: Yishan Jiang

Email: yishanj13@gmail.com

IRC: sphinx

Github repo: https://github.com/sphinx-jiang

Tasks and proposed ideas

Adopt a language pair: zh_CN-zh_TW(Chinese-simple to Chinese-traditional)

Related tasks:

  • Writing linguistic data, including morphological rules and transfer rules — which are specified in a declarative language.
  • A Constraint Grammar will be written if necessary.

Proposed idea

To create a translator with

  • Morphological dictionaries for language Chinese-simple, Chinese-traditional: apertium-zh_CN-zh_TW.zh_CN.dix, apertium-zh_CN-zh_TW.zh_TW.dix
  • Bilingual dictionary: apertium-zh_CN-zh_TW.dix
  • Transfer rules: apertium-zh_CN-zh_TW.zh_CN-zh_TW.t1x, apertium-zh_CN-zh_TW.zh_TW-zh_CN.t1x.

which is testvoc clean, and has a coverage of around 80% or more on a range of free corpora.

Work plan

Task Coding hours TDD hours
Apertium stream format tool 30-40 20-30
Close-class words 60-70 40-50
Open-class words 40-50 20-30
Transfer rules(basic) 40-50 20-30
Transfer rules(supply) 30-40 20-30
CG and Reduce ambiguity 50-60 30-40

There are 3 periods, every period with 4 weeks, the work load correspond to 25th to 28th years week has 105h, and the others has 140h

TDD = Test/Debugging/Documentation

gsoc week week of the year tasks
1 25th week Basic function transfer lookup table to apertium stream format 50h, TDD 10h
2 26th week Create close-class <sdefs> 20h, frequent words added 20h, TDD 20h
3 27th week Create Open-class <sdefs> 30h, basic words added 10h , TDD 20h
4 28th week Completion of monodix 20/30h, verify the tag definition files 15/40h
First Deliverable
5 29th week Bilingual Completion 40h, TDD 20/40h
6 30th week Basic transfer rules 30/40h, TDD 20/30h
7 31st week Constraint Grammar of existing 30/40h
8 32nd week Stabilize the language pair by regression testing(as a new language pair)
Second Deliverable
9 33rd week Separate character transfer added 50h(many paper use this) 50/70h
10 34th week Reduce ambiguity focusing 20/40h, CG 30/40h
11 35th week Performance measure
12 36th week Performance measure
Finalitation

Coding challenage

  • Install Apertium (see Minimal installation from SVN)
  • Go through the HOWTO
  • Go through the MT course here (или здесь)
  • Write a translator that translates as much of this story as possible — Minimum one sentence. (Другие переводы рассказа здесь.)

If there is no translation, translate it into the languages of your language pair first.

  • Upload your work to Apertium SVN.

uploaded here: https://github.com/sphinx-jiang/apertium_language-pair Here is the automorf.bin output of sentence one. Testm1.png Here is the translate output of sentence one. Tests1.png