Difference between revisions of "User:Firespeaker/GSoC2014/Workplan"

From Apertium
Jump to navigation Jump to search
Line 3: Line 3:
 
== Schedule ==
 
== Schedule ==
 
=== Schedule ===
 
=== Schedule ===
 
See [http://www.google-melange.com/gsoc/events/google/gsoc2014 GSoC 2014 Timeline] for complete timeline. Dates need to be verified.
Dates need to be verified.
 
   
 
{|class="wikitable"
 
{|class="wikitable"
Line 25: Line 25:
 
|
 
|
 
* tur-uzb bidix to 7000 stems
 
* tur-uzb bidix to 7000 stems
* make some real CG for kir, uzb
+
* make some baseline CG for kir, uzb
 
* one 200-word kaz-kir text to <10% WER
 
* one 200-word kaz-kir text to <10% WER
 
* one 200-word kir-kaz text to <10% WER
 
* one 200-word kir-kaz text to <10% WER
Line 43: Line 43:
 
* one 200-word kaz-kir text to <10% WER
 
* one 200-word kaz-kir text to <10% WER
 
* one 200-word tur-kir text to <10% WER
 
* one 200-word tur-kir text to <10% WER
  +
* work on kaz CG and lrx
 
* work on tur CG and lrx
 
* work on tur CG and lrx
 
|-
 
|-
Line 50: Line 51:
 
* one 200-word uzb-tur text to <10% WER
 
* one 200-word uzb-tur text to <10% WER
 
* work on uzb CG and lrx
 
* work on uzb CG and lrx
  +
* work on tur CG and lrx
 
|-
 
|-
 
! 4 !! 8 - 14 June
 
! 4 !! 8 - 14 June
Line 62: Line 64:
 
* one 500-word kaz-kir text to <10% WER
 
* one 500-word kaz-kir text to <10% WER
 
* one 500-word tur-kir text to <10% WER
 
* one 500-word tur-kir text to <10% WER
  +
* work on kaz CG and lrx
 
* work on tur CG and lrx
 
* work on tur CG and lrx
 
* continue testvoc nouns for all pairs
 
* continue testvoc nouns for all pairs
Line 69: Line 72:
 
* one 500-word tur-uzb text to <10% WER
 
* one 500-word tur-uzb text to <10% WER
 
* one 500-word uzb-tur text to <10% WER
 
* one 500-word uzb-tur text to <10% WER
  +
* work on tur CG and lrx
 
* work on uzb CG and lrx
 
* work on uzb CG and lrx
 
* continue testvoc nouns for all pairs
 
* continue testvoc nouns for all pairs
Line 109: Line 113:
 
|}
 
|}
   
=== GSoC Timeline ===
+
=== Getting started ===
  +
* make scripts for:
See [http://www.google-melange.com/gsoc/events/google/gsoc2014 GSoC 2014 Timeline] for complete timeline. Important coding dates follow:
 
  +
** getting raw numbers for [[User:Firespeaker/GSoC2014/Workplan|Progress]
* March 10th - March 21st: application
 
  +
** doing regression tests
* April 21st - May 19th: community bonding
 
  +
* get updated corpora for:
* May 19th: coding begins
 
  +
** Uzbek
* ??: midterm evaluations
 
  +
** Turkish
* August 18th?: pencils down
 
* ??: final evaluation
 
 
=== Goals by time ===
 
* Community bonding (4+ weeks):
 
** apertium-kir, apertium-tur, apertium-uzb coverages to 90%
 
** one 200-word text for each direction to &lt;10% WER
 
** make some real CG for kir, uzb
 
** build arsenal of 4 200-word texts and 4 500-word texts translated to all languages
 
** tur-uzb bidix to 7000 stems
 
   
  +
=== Recurring ===
* Coding period (13 weeks)
 
** First half (7 weeks):
+
* The end of every week:
  +
** Update [[User:Firespeaker/GSoC2014/Progress|Progress]]
*** work on WER (one text per week)
 
  +
* Constantly:
*** beef up CG for each language
 
  +
** Add good sentences to regression tests
*** lrx, transfer as needed
 
** Second half (6 weeks):
+
** Clean up lexc files
*** work on testvoc
+
*** remove duplicate entries
  +
*** alphabetise sections?
  +
*** add glosses, etc.

Revision as of 05:35, 13 March 2014

Major goals

Schedule

Schedule

See GSoC 2014 Timeline for complete timeline. Dates need to be verified.

week dates goals eval accomplishments notes
post-application period
22 March - 20 April
  • apertium-kir to 90% coverage
  • apertium-tur to 90% coverage
  • apertium-uzb to 90% coverage
  • build arsenal of texts with post-edited translations:
    • four 200-word texts in each kaz, kir, tur, uzb
    • four 500-word texts in each kaz, kir, tur, uzb
community bonding period
21 April - 19 May
  • tur-uzb bidix to 7000 stems
  • make some baseline CG for kir, uzb
  • one 200-word kaz-kir text to <10% WER
  • one 200-word kir-kaz text to <10% WER
  • one 200-word tur-kir text to <10% WER
  • one 200-word kir-tur text to <10% WER
  • one 200-word tur-uzb text to <10% WER
  • one 200-word uzb-tur text to <10% WER
1 19 - 24 May
  • one 200-word kir-tur text to <10% WER
  • one 200-word kir-kaz text to <10% WER
  • work on kir CG and lrx
2 25 - 31 May
  • one 200-word kaz-kir text to <10% WER
  • one 200-word tur-kir text to <10% WER
  • work on kaz CG and lrx
  • work on tur CG and lrx
3 1 - 7 June
  • one 200-word tur-uzb text to <10% WER
  • one 200-word uzb-tur text to <10% WER
  • work on uzb CG and lrx
  • work on tur CG and lrx
4 8 - 14 June
  • one 500-word kir-tur text to <10% WER
  • one 500-word kir-kaz text to <10% WER
  • work on kir CG and lrx
  • start testvoc nouns for all pairs
5 15 - 21 June
  • one 500-word kaz-kir text to <10% WER
  • one 500-word tur-kir text to <10% WER
  • work on kaz CG and lrx
  • work on tur CG and lrx
  • continue testvoc nouns for all pairs
6 22 - 28 June
  • one 500-word tur-uzb text to <10% WER
  • one 500-word uzb-tur text to <10% WER
  • work on tur CG and lrx
  • work on uzb CG and lrx
  • continue testvoc nouns for all pairs
7 29 June - 5 July
  • finish testvoc nouns for all pairs
midterm eval
July 6
8 6 - 12 July
  • testvoc adjs for all pairs
9 13 - 19 July
  • testvoc numerals for all pairs
10 20 - 26 July
  • testvoc v.iv for all pairs
11 27 July - 2 August
  • testvoc v.tv categories for all pairs
12 3 - 9 August
  • testvoc adverbs for all pairs
13 10 - 18 August
  • testvoc misc categories for all pairs
pencils-down week
final evaluation
18 August - 24 August

Getting started

  • make scripts for:
    • getting raw numbers for [[User:Firespeaker/GSoC2014/Workplan|Progress]
    • doing regression tests
  • get updated corpora for:
    • Uzbek
    • Turkish

Recurring

  • The end of every week:
  • Constantly:
    • Add good sentences to regression tests
    • Clean up lexc files
      • remove duplicate entries
      • alphabetise sections?
      • add glosses, etc.