Difference between revisions of "User:Firespeaker/GSoC2014/Workplan"

From Apertium
Jump to navigation Jump to search
Line 85: Line 85:
 
* continue testvoc nouns for all pairs
 
* continue testvoc nouns for all pairs
 
|-
 
|-
 
!colspan="2" style="text-align: right"|midterm eval<br />29 June
! 7 !! 29 June - 5 July
 
|
 
* finish testvoc nouns for all pairs
 
|-
 
!colspan="2" style="text-align: right"|midterm eval<br />July 6
 
 
|
 
|
 
* kaz(-kir) trimmed coverage ≥90%
 
* kaz(-kir) trimmed coverage ≥90%
Line 97: Line 93:
 
* tur(-uzb) trimmed coverage ≥80%
 
* tur(-uzb) trimmed coverage ≥80%
 
* uzb(-tur) trimmed coverage ≥80%
 
* uzb(-tur) trimmed coverage ≥80%
 
|-
 
 
! 7 !! 29 June - 5 July
 
|
 
* finish testvoc nouns for all pairs
 
|-
 
|-
 
! 8 !! 6 - 12 July
 
! 8 !! 6 - 12 July

Revision as of 06:55, 21 March 2014

Primary goals

  • A production-ready release of kaz-kir
    • Translates kaz→kir and kir→kaz with consistently <10% WER
    • Trimmed coverage for kaz and kir ≥90%
  • A production-ready release of tur-kir
    • Translates tur→kir and kir→tur with consistently <20% WER
    • Trimmed coverage for tur and kir ≥90%
  • A stable release of uzb-tur
    • Translates tur→uzb and uzb→tur with consistently <25% WER
    • Trimmed coverage for tur and kir ≥80%

Plan

Schedule

See GSoC 2014 Timeline for complete timeline.

week dates goals eval accomplishments notes
post-application period
22 March - 20 April
  • apertium-kir to 90% coverage (with kaz-like transducer)
  • apertium-tur to 90% coverage (with kaz-like transducer)
  • apertium-uzb to 90% coverage (with kaz-like transducer)
  • build arsenal of texts with post-edited translations:
    • four 200-word texts in each kaz, kir, tur, uzb
    • four 500-word texts in each kaz, kir, tur, uzb
community bonding period
21 April - 19 May
  • tur-uzb bidix to 7000 stems
  • make some baseline CG for kir, uzb
  • one 200-word kaz-kir text to <10% WER
  • one 200-word kir-kaz text to <10% WER
  • one 200-word tur-kir text to <10% WER
  • one 200-word kir-tur text to <10% WER
  • one 200-word tur-uzb text to <10% WER
  • one 200-word uzb-tur text to <10% WER
1 19 - 24 May
  • one 200-word kir-tur text to <10% WER
  • one 200-word kir-kaz text to <10% WER
  • work on kir CG and lrx
2 25 - 31 May
  • one 200-word kaz-kir text to <10% WER
  • one 200-word tur-kir text to <10% WER
  • work on kaz CG and lrx
  • work on tur CG and lrx
3 1 - 7 June
  • one 200-word tur-uzb text to <10% WER
  • one 200-word uzb-tur text to <10% WER
  • work on uzb CG and lrx
  • work on tur CG and lrx
4 8 - 14 June
  • one 500-word kir-tur text to <10% WER
  • one 500-word kir-kaz text to <10% WER
  • work on kir CG and lrx
  • start testvoc nouns for all pairs
5 15 - 21 June
  • one 500-word kaz-kir text to <10% WER
  • one 500-word tur-kir text to <10% WER
  • work on kaz CG and lrx
  • work on tur CG and lrx
  • continue testvoc nouns for all pairs
6 22 - 28 June
  • one 500-word tur-uzb text to <10% WER
  • one 500-word uzb-tur text to <10% WER
  • work on tur CG and lrx
  • work on uzb CG and lrx
  • continue testvoc nouns for all pairs
midterm eval
29 June
  • kaz(-kir) trimmed coverage ≥90%
  • kir(-kaz) trimmed coverage ≥90%
  • tur(-kir) trimmed coverage ≥90%
  • kir(-tur) trimmed coverage ≥90%
  • tur(-uzb) trimmed coverage ≥80%
  • uzb(-tur) trimmed coverage ≥80%
7 29 June - 5 July
  • finish testvoc nouns for all pairs
8 6 - 12 July
  • testvoc adjs for all pairs
9 13 - 19 July
  • testvoc numerals for all pairs
10 20 - 26 July
  • testvoc v.iv for all pairs
11 27 July - 2 August
  • testvoc v.tv categories for all pairs
12 3 - 10 August
  • testvoc adverbs for all pairs
  • testvoc misc categories for all pairs
pencils-down week
final evaluation
11 August - 18 August
  • move pairs to trunk
  • document stuff better on the wiki
  • make the pairs live at turkic.apertium.org

Getting started

  • make scripts for:
  • get updated corpora for:
    • Uzbek
    • Turkish

Recurring

  • The end of every week:
  • Constantly:
    • Add good sentences to regression tests
    • Clean up lexc files
      • remove duplicate entries
      • alphabetise sections?
      • add glosses, etc.