Difference between revisions of "User:Firespeaker/GSoC2014/Workplan"
< User:Firespeaker | GSoC2014
Jump to navigation
Jump to search
Firespeaker (talk | contribs) |
Firespeaker (talk | contribs) |
||
Line 3: | Line 3: | ||
== Schedule == |
== Schedule == |
||
=== Schedule === |
=== Schedule === |
||
⚫ | |||
− | Dates need to be verified. |
||
{|class="wikitable" |
{|class="wikitable" |
||
Line 25: | Line 25: | ||
| |
| |
||
* tur-uzb bidix to 7000 stems |
* tur-uzb bidix to 7000 stems |
||
− | * make some |
+ | * make some baseline CG for kir, uzb |
* one 200-word kaz-kir text to <10% WER |
* one 200-word kaz-kir text to <10% WER |
||
* one 200-word kir-kaz text to <10% WER |
* one 200-word kir-kaz text to <10% WER |
||
Line 43: | Line 43: | ||
* one 200-word kaz-kir text to <10% WER |
* one 200-word kaz-kir text to <10% WER |
||
* one 200-word tur-kir text to <10% WER |
* one 200-word tur-kir text to <10% WER |
||
+ | * work on kaz CG and lrx |
||
* work on tur CG and lrx |
* work on tur CG and lrx |
||
|- |
|- |
||
Line 50: | Line 51: | ||
* one 200-word uzb-tur text to <10% WER |
* one 200-word uzb-tur text to <10% WER |
||
* work on uzb CG and lrx |
* work on uzb CG and lrx |
||
+ | * work on tur CG and lrx |
||
|- |
|- |
||
! 4 !! 8 - 14 June |
! 4 !! 8 - 14 June |
||
Line 62: | Line 64: | ||
* one 500-word kaz-kir text to <10% WER |
* one 500-word kaz-kir text to <10% WER |
||
* one 500-word tur-kir text to <10% WER |
* one 500-word tur-kir text to <10% WER |
||
+ | * work on kaz CG and lrx |
||
* work on tur CG and lrx |
* work on tur CG and lrx |
||
* continue testvoc nouns for all pairs |
* continue testvoc nouns for all pairs |
||
Line 69: | Line 72: | ||
* one 500-word tur-uzb text to <10% WER |
* one 500-word tur-uzb text to <10% WER |
||
* one 500-word uzb-tur text to <10% WER |
* one 500-word uzb-tur text to <10% WER |
||
+ | * work on tur CG and lrx |
||
* work on uzb CG and lrx |
* work on uzb CG and lrx |
||
* continue testvoc nouns for all pairs |
* continue testvoc nouns for all pairs |
||
Line 109: | Line 113: | ||
|} |
|} |
||
− | === |
+ | === Getting started === |
+ | * make scripts for: |
||
⚫ | |||
+ | ** getting raw numbers for [[User:Firespeaker/GSoC2014/Workplan|Progress] |
||
− | * March 10th - March 21st: application |
||
+ | ** doing regression tests |
||
− | * April 21st - May 19th: community bonding |
||
+ | * get updated corpora for: |
||
− | * May 19th: coding begins |
||
+ | ** Uzbek |
||
− | * ??: midterm evaluations |
||
+ | ** Turkish |
||
− | * August 18th?: pencils down |
||
− | * ??: final evaluation |
||
− | |||
− | === Goals by time === |
||
− | * Community bonding (4+ weeks): |
||
− | ** apertium-kir, apertium-tur, apertium-uzb coverages to 90% |
||
− | ** one 200-word text for each direction to <10% WER |
||
− | ** make some real CG for kir, uzb |
||
− | ** build arsenal of 4 200-word texts and 4 500-word texts translated to all languages |
||
− | ** tur-uzb bidix to 7000 stems |
||
+ | === Recurring === |
||
− | * Coding period (13 weeks) |
||
− | + | * The end of every week: |
|
+ | ** Update [[User:Firespeaker/GSoC2014/Progress|Progress]] |
||
− | *** work on WER (one text per week) |
||
+ | * Constantly: |
||
− | *** beef up CG for each language |
||
+ | ** Add good sentences to regression tests |
||
− | *** lrx, transfer as needed |
||
− | ** |
+ | ** Clean up lexc files |
− | *** |
+ | *** remove duplicate entries |
+ | *** alphabetise sections? |
||
+ | *** add glosses, etc. |
Revision as of 05:35, 13 March 2014
Major goals
Schedule
Schedule
See GSoC 2014 Timeline for complete timeline. Dates need to be verified.
week | dates | goals | eval | accomplishments | notes |
---|---|---|---|---|---|
post-application period 22 March - 20 April |
| ||||
community bonding period 21 April - 19 May |
| ||||
1 | 19 - 24 May |
| |||
2 | 25 - 31 May |
| |||
3 | 1 - 7 June |
| |||
4 | 8 - 14 June |
| |||
5 | 15 - 21 June |
| |||
6 | 22 - 28 June |
| |||
7 | 29 June - 5 July |
| |||
midterm eval July 6 | |||||
8 | 6 - 12 July |
| |||
9 | 13 - 19 July |
| |||
10 | 20 - 26 July |
| |||
11 | 27 July - 2 August |
| |||
12 | 3 - 9 August |
| |||
13 | 10 - 18 August |
| |||
pencils-down week final evaluation 18 August - 24 August |
|
Getting started
- make scripts for:
- getting raw numbers for [[User:Firespeaker/GSoC2014/Workplan|Progress]
- doing regression tests
- get updated corpora for:
- Uzbek
- Turkish
Recurring
- The end of every week:
- Update Progress
- Constantly:
- Add good sentences to regression tests
- Clean up lexc files
- remove duplicate entries
- alphabetise sections?
- add glosses, etc.