Difference between revisions of "User:Firespeaker/GSoC2014/Workplan"
< User:Firespeaker | GSoC2014
Jump to navigation
Jump to search
Firespeaker (talk | contribs) |
Firespeaker (talk | contribs) |
||
Line 3: | Line 3: | ||
== Schedule == |
== Schedule == |
||
=== Schedule === |
=== Schedule === |
||
⚫ | |||
Dates need to be verified. |
|||
{|class="wikitable" |
{|class="wikitable" |
||
Line 25: | Line 25: | ||
| |
| |
||
* tur-uzb bidix to 7000 stems |
* tur-uzb bidix to 7000 stems |
||
* make some |
* make some baseline CG for kir, uzb |
||
* one 200-word kaz-kir text to <10% WER |
* one 200-word kaz-kir text to <10% WER |
||
* one 200-word kir-kaz text to <10% WER |
* one 200-word kir-kaz text to <10% WER |
||
Line 43: | Line 43: | ||
* one 200-word kaz-kir text to <10% WER |
* one 200-word kaz-kir text to <10% WER |
||
* one 200-word tur-kir text to <10% WER |
* one 200-word tur-kir text to <10% WER |
||
* work on kaz CG and lrx |
|||
* work on tur CG and lrx |
* work on tur CG and lrx |
||
|- |
|- |
||
Line 50: | Line 51: | ||
* one 200-word uzb-tur text to <10% WER |
* one 200-word uzb-tur text to <10% WER |
||
* work on uzb CG and lrx |
* work on uzb CG and lrx |
||
* work on tur CG and lrx |
|||
|- |
|- |
||
! 4 !! 8 - 14 June |
! 4 !! 8 - 14 June |
||
Line 62: | Line 64: | ||
* one 500-word kaz-kir text to <10% WER |
* one 500-word kaz-kir text to <10% WER |
||
* one 500-word tur-kir text to <10% WER |
* one 500-word tur-kir text to <10% WER |
||
* work on kaz CG and lrx |
|||
* work on tur CG and lrx |
* work on tur CG and lrx |
||
* continue testvoc nouns for all pairs |
* continue testvoc nouns for all pairs |
||
Line 69: | Line 72: | ||
* one 500-word tur-uzb text to <10% WER |
* one 500-word tur-uzb text to <10% WER |
||
* one 500-word uzb-tur text to <10% WER |
* one 500-word uzb-tur text to <10% WER |
||
* work on tur CG and lrx |
|||
* work on uzb CG and lrx |
* work on uzb CG and lrx |
||
* continue testvoc nouns for all pairs |
* continue testvoc nouns for all pairs |
||
Line 109: | Line 113: | ||
|} |
|} |
||
=== |
=== Getting started === |
||
* make scripts for: |
|||
⚫ | |||
** getting raw numbers for [[User:Firespeaker/GSoC2014/Workplan|Progress] |
|||
* March 10th - March 21st: application |
|||
** doing regression tests |
|||
* April 21st - May 19th: community bonding |
|||
* get updated corpora for: |
|||
* May 19th: coding begins |
|||
** Uzbek |
|||
* ??: midterm evaluations |
|||
** Turkish |
|||
* August 18th?: pencils down |
|||
* ??: final evaluation |
|||
=== Goals by time === |
|||
* Community bonding (4+ weeks): |
|||
** apertium-kir, apertium-tur, apertium-uzb coverages to 90% |
|||
** one 200-word text for each direction to <10% WER |
|||
** make some real CG for kir, uzb |
|||
** build arsenal of 4 200-word texts and 4 500-word texts translated to all languages |
|||
** tur-uzb bidix to 7000 stems |
|||
=== Recurring === |
|||
* Coding period (13 weeks) |
|||
* The end of every week: |
|||
** Update [[User:Firespeaker/GSoC2014/Progress|Progress]] |
|||
*** work on WER (one text per week) |
|||
* Constantly: |
|||
*** beef up CG for each language |
|||
** Add good sentences to regression tests |
|||
*** lrx, transfer as needed |
|||
** |
** Clean up lexc files |
||
*** |
*** remove duplicate entries |
||
*** alphabetise sections? |
|||
*** add glosses, etc. |
Revision as of 05:35, 13 March 2014
Major goals
Schedule
Schedule
See GSoC 2014 Timeline for complete timeline. Dates need to be verified.
week | dates | goals | eval | accomplishments | notes |
---|---|---|---|---|---|
post-application period 22 March - 20 April |
| ||||
community bonding period 21 April - 19 May |
| ||||
1 | 19 - 24 May |
| |||
2 | 25 - 31 May |
| |||
3 | 1 - 7 June |
| |||
4 | 8 - 14 June |
| |||
5 | 15 - 21 June |
| |||
6 | 22 - 28 June |
| |||
7 | 29 June - 5 July |
| |||
midterm eval July 6 | |||||
8 | 6 - 12 July |
| |||
9 | 13 - 19 July |
| |||
10 | 20 - 26 July |
| |||
11 | 27 July - 2 August |
| |||
12 | 3 - 9 August |
| |||
13 | 10 - 18 August |
| |||
pencils-down week final evaluation 18 August - 24 August |
|
Getting started
- make scripts for:
- getting raw numbers for [[User:Firespeaker/GSoC2014/Workplan|Progress]
- doing regression tests
- get updated corpora for:
- Uzbek
- Turkish
Recurring
- The end of every week:
- Update Progress
- Constantly:
- Add good sentences to regression tests
- Clean up lexc files
- remove duplicate entries
- alphabetise sections?
- add glosses, etc.