Difference between revisions of "User:Sphinx/Application for "Adopt a language pair" GSOC 2013"

From Apertium
Jump to navigation Jump to search
Line 22: Line 22:
 
==== Work plan ====
 
==== Work plan ====
   
  +
{| class="wikitable" border="1"
  +
|-
  +
! Task
  +
! Coding hours
  +
! TDD hours
  +
|-
  +
| Apertium stream format tool
  +
| 30-40
  +
| 20-30
  +
|-
  +
| Close-class words
  +
| 60-70
  +
| 40-50
  +
|-
  +
| Open-class words
  +
| 40-50
  +
| 20-30
  +
|-
  +
| Transfer rules(basic)
  +
| 40-50
  +
| 20-30
  +
|-
  +
| Transfer rules(supply)
  +
| 30-40
  +
| 20-30
  +
|-
  +
| CG and Reduce ambiguity
  +
| 50-60
  +
| 30-40
  +
|}
  +
  +
There are 3 periods, every period with 4 weeks, the work load correspond to 25th to 28th years week has 105h, and the others has 140h
  +
  +
TDD = Test/Debugging/Documentation
  +
  +
{| class="wikitable" border="1"
  +
|-
  +
! gsoc week
  +
! week of the year
  +
! tasks
  +
|-
  +
| 1
  +
| 25th week
  +
| Basic function transfer lookup table to apertium stream format 50h, TDD 10h
  +
|-
  +
| 2
  +
| 26th week
  +
| Create close-class <sdefs> 20h, frequent words added 20h, TDD 20h
  +
|-
  +
| 3
  +
| 27th week
  +
| Create Open-class <sdefs> 30h, basic words added 10h , TDD 20h
  +
|-
  +
| 4
  +
| 28th week
  +
| Completion of monodix 20/30h, verify the tag definition files 15/40h
  +
|-
  +
| '''First Deliverable'''
  +
|-
  +
| 5
  +
| 29th week
  +
| Bilingual Completion 40h, TDD 20/40h
  +
|-
  +
| 6
  +
| 30th week
  +
| Basic transfer rules 30/40h, TDD 20/30h
  +
|-
  +
| 7
  +
| 31st week
  +
| Constraint Grammar of existing 30/40h
  +
|-
  +
| 8
  +
| 32nd week
  +
| Stabilize the language pair by regression testing(as a new language pair)
  +
|-
  +
| '''Second Deliverable'''
  +
|-
  +
| 9
  +
| 33rd week
  +
| Separate character transfer added 50h(many paper use this) 50/70h
  +
|-
  +
| 10
  +
| 34th week
  +
| Reduce ambiguity focusing 20/40h, CG 30/40h
  +
|-
  +
| 11
  +
| 35th week
  +
| Performance measure
  +
|-
  +
| 12
  +
| 36th week
  +
| Performance measure
  +
|-
  +
| '''Finalitation'''
  +
|}
   
 
==== Coding challenage ====
 
==== Coding challenage ====

Revision as of 09:29, 26 April 2013

Contact information

Name: Yishan Jiang

Email: yishanj13@gmail.com

IRC: sphinx

Github repo: https://github.com/sphinx-jiang

Tasks and proposed ideas

Adopt a language pair: zh_CN-zh_TW(Chinese-simple to Chinese-traditional)

Related tasks:

  • Writing linguistic data, including morphological rules and transfer rules — which are specified in a declarative language.
  • A Constraint Grammar will be written if necessary.

Proposed idea

To create a translator with

  • Morphological dictionaries for language Chinese-simple, Chinese-traditional: apertium-zh_CN-zh_TW.zh_CN.dix, apertium-zh_CN-zh_TW.zh_TW.dix
  • Bilingual dictionary: apertium-zh_CN-zh_TW.dix
  • Transfer rules: apertium-zh_CN-zh_TW.zh_CN-zh_TW.t1x, apertium-zh_CN-zh_TW.zh_TW-zh_CN.t1x.

which is testvoc clean, and has a coverage of around 80% or more on a range of free corpora.

Work plan

Task Coding hours TDD hours
Apertium stream format tool 30-40 20-30
Close-class words 60-70 40-50
Open-class words 40-50 20-30
Transfer rules(basic) 40-50 20-30
Transfer rules(supply) 30-40 20-30
CG and Reduce ambiguity 50-60 30-40

There are 3 periods, every period with 4 weeks, the work load correspond to 25th to 28th years week has 105h, and the others has 140h

TDD = Test/Debugging/Documentation

gsoc week week of the year tasks
1 25th week Basic function transfer lookup table to apertium stream format 50h, TDD 10h
2 26th week Create close-class <sdefs> 20h, frequent words added 20h, TDD 20h
3 27th week Create Open-class <sdefs> 30h, basic words added 10h , TDD 20h
4 28th week Completion of monodix 20/30h, verify the tag definition files 15/40h
First Deliverable
5 29th week Bilingual Completion 40h, TDD 20/40h
6 30th week Basic transfer rules 30/40h, TDD 20/30h
7 31st week Constraint Grammar of existing 30/40h
8 32nd week Stabilize the language pair by regression testing(as a new language pair)
Second Deliverable
9 33rd week Separate character transfer added 50h(many paper use this) 50/70h
10 34th week Reduce ambiguity focusing 20/40h, CG 30/40h
11 35th week Performance measure
12 36th week Performance measure
Finalitation

Coding challenage