Difference between revisions of "User:Sphinx/Application for "Adopt a language pair" GSOC 2013"
Jump to navigation
Jump to search
Line 22: | Line 22: | ||
==== Work plan ==== |
==== Work plan ==== |
||
+ | {| class="wikitable" border="1" |
||
+ | |- |
||
+ | ! Task |
||
+ | ! Coding hours |
||
+ | ! TDD hours |
||
+ | |- |
||
+ | | Apertium stream format tool |
||
+ | | 30-40 |
||
+ | | 20-30 |
||
+ | |- |
||
+ | | Close-class words |
||
+ | | 60-70 |
||
+ | | 40-50 |
||
+ | |- |
||
+ | | Open-class words |
||
+ | | 40-50 |
||
+ | | 20-30 |
||
+ | |- |
||
+ | | Transfer rules(basic) |
||
+ | | 40-50 |
||
+ | | 20-30 |
||
+ | |- |
||
+ | | Transfer rules(supply) |
||
+ | | 30-40 |
||
+ | | 20-30 |
||
+ | |- |
||
+ | | CG and Reduce ambiguity |
||
+ | | 50-60 |
||
+ | | 30-40 |
||
+ | |} |
||
+ | |||
+ | There are 3 periods, every period with 4 weeks, the work load correspond to 25th to 28th years week has 105h, and the others has 140h |
||
+ | |||
+ | TDD = Test/Debugging/Documentation |
||
+ | |||
+ | {| class="wikitable" border="1" |
||
+ | |- |
||
+ | ! gsoc week |
||
+ | ! week of the year |
||
+ | ! tasks |
||
+ | |- |
||
+ | | 1 |
||
+ | | 25th week |
||
+ | | Basic function transfer lookup table to apertium stream format 50h, TDD 10h |
||
+ | |- |
||
+ | | 2 |
||
+ | | 26th week |
||
+ | | Create close-class <sdefs> 20h, frequent words added 20h, TDD 20h |
||
+ | |- |
||
+ | | 3 |
||
+ | | 27th week |
||
+ | | Create Open-class <sdefs> 30h, basic words added 10h , TDD 20h |
||
+ | |- |
||
+ | | 4 |
||
+ | | 28th week |
||
+ | | Completion of monodix 20/30h, verify the tag definition files 15/40h |
||
+ | |- |
||
+ | | '''First Deliverable''' |
||
+ | |- |
||
+ | | 5 |
||
+ | | 29th week |
||
+ | | Bilingual Completion 40h, TDD 20/40h |
||
+ | |- |
||
+ | | 6 |
||
+ | | 30th week |
||
+ | | Basic transfer rules 30/40h, TDD 20/30h |
||
+ | |- |
||
+ | | 7 |
||
+ | | 31st week |
||
+ | | Constraint Grammar of existing 30/40h |
||
+ | |- |
||
+ | | 8 |
||
+ | | 32nd week |
||
+ | | Stabilize the language pair by regression testing(as a new language pair) |
||
+ | |- |
||
+ | | '''Second Deliverable''' |
||
+ | |- |
||
+ | | 9 |
||
+ | | 33rd week |
||
+ | | Separate character transfer added 50h(many paper use this) 50/70h |
||
+ | |- |
||
+ | | 10 |
||
+ | | 34th week |
||
+ | | Reduce ambiguity focusing 20/40h, CG 30/40h |
||
+ | |- |
||
+ | | 11 |
||
+ | | 35th week |
||
+ | | Performance measure |
||
+ | |- |
||
+ | | 12 |
||
+ | | 36th week |
||
+ | | Performance measure |
||
+ | |- |
||
+ | | '''Finalitation''' |
||
+ | |} |
||
==== Coding challenage ==== |
==== Coding challenage ==== |
Revision as of 09:29, 26 April 2013
Contents
Contact information
Name: Yishan Jiang
Email: yishanj13@gmail.com
IRC: sphinx
Github repo: https://github.com/sphinx-jiang
Tasks and proposed ideas
Adopt a language pair: zh_CN-zh_TW(Chinese-simple to Chinese-traditional)
Related tasks:
- Writing linguistic data, including morphological rules and transfer rules — which are specified in a declarative language.
- A Constraint Grammar will be written if necessary.
Proposed idea
To create a translator with
- Morphological dictionaries for language Chinese-simple, Chinese-traditional: apertium-zh_CN-zh_TW.zh_CN.dix, apertium-zh_CN-zh_TW.zh_TW.dix
- Bilingual dictionary: apertium-zh_CN-zh_TW.dix
- Transfer rules: apertium-zh_CN-zh_TW.zh_CN-zh_TW.t1x, apertium-zh_CN-zh_TW.zh_TW-zh_CN.t1x.
which is testvoc clean, and has a coverage of around 80% or more on a range of free corpora.
Work plan
Task | Coding hours | TDD hours |
---|---|---|
Apertium stream format tool | 30-40 | 20-30 |
Close-class words | 60-70 | 40-50 |
Open-class words | 40-50 | 20-30 |
Transfer rules(basic) | 40-50 | 20-30 |
Transfer rules(supply) | 30-40 | 20-30 |
CG and Reduce ambiguity | 50-60 | 30-40 |
There are 3 periods, every period with 4 weeks, the work load correspond to 25th to 28th years week has 105h, and the others has 140h
TDD = Test/Debugging/Documentation
gsoc week | week of the year | tasks |
---|---|---|
1 | 25th week | Basic function transfer lookup table to apertium stream format 50h, TDD 10h |
2 | 26th week | Create close-class <sdefs> 20h, frequent words added 20h, TDD 20h |
3 | 27th week | Create Open-class <sdefs> 30h, basic words added 10h , TDD 20h |
4 | 28th week | Completion of monodix 20/30h, verify the tag definition files 15/40h |
First Deliverable | ||
5 | 29th week | Bilingual Completion 40h, TDD 20/40h |
6 | 30th week | Basic transfer rules 30/40h, TDD 20/30h |
7 | 31st week | Constraint Grammar of existing 30/40h |
8 | 32nd week | Stabilize the language pair by regression testing(as a new language pair) |
Second Deliverable | ||
9 | 33rd week | Separate character transfer added 50h(many paper use this) 50/70h |
10 | 34th week | Reduce ambiguity focusing 20/40h, CG 30/40h |
11 | 35th week | Performance measure |
12 | 36th week | Performance measure |
Finalitation |