Primary goals[edit]
- A production-ready release of kaz-kir
- Translates kaz→kir and kir→kaz with consistently <10% WER
- Trimmed coverage for kaz and kir ≥90%
- A production-ready release of tur-kir
- Translates tur→kir and kir→tur with consistently <20% WER
- Trimmed coverage for tur and kir ≥85%
- A stable release of uzb-tur
- Translates tur→uzb and uzb→tur with consistently <25% WER
- Trimmed coverage for tur and uzb ≥80%
- While bidix size is not built into the goals, the trimmed coverage numbers can be seen as a more relevant proxy for the same basic idea.
Schedule[edit]
See GSoC 2014 Timeline for complete timeline.
week
|
dates
|
goals
|
eval
|
accomplishments
|
notes
|
post-application period 22 March - 20 April
|
- apertium-kir to 90% coverage (with kaz-like transducer)
- apertium-tur to 90% coverage (with kaz-like transducer)
- apertium-uzb to 90% coverage (with kaz-like transducer)
- build arsenal of texts with post-edited translations:
- four 200-word texts in each kaz, kir, tur, uzb
- four 500-word texts in each kaz, kir, tur, uzb
|
|
- Reworked apertium-tur verb morphology on paper
- Much better disam in apertium-tur
- Have a bunch of texts, not many post-edited
|
- Need to rework transfer rules for apertium-tur
|
community bonding period 21 April - 19 May
|
- tur-uzb bidix to 7000 stems
- make some baseline CG for kir, uzb
- one 200-word kaz-kir text to <10% WER
- one 200-word kir-kaz text to <10% WER
- one 200-word tur-kir text to <10% WER
- one 200-word kir-tur text to <10% WER
- one 200-word tur-uzb text to <10% WER
- one 200-word uzb-tur text to <10% WER
|
|
- Implemented reworking of apertium-tur verb morphology
|
- Around, but lost time due to end of semester and conferences
|
1 |
19 - 24 May
|
- one 200-word kir-tur text to <10% WER
- one 200-word kir-kaz text to <10% WER
- work on kir CG and lrx
|
|
- Fixed some apertium-tur phonology to go with morphology rework
|
- Mostly worked on LREC poster
|
2 |
25 - 31 May
|
- one 200-word kaz-kir text to <10% WER
- one 200-word tur-kir text to <10% WER
- work on kaz CG and lrx
- work on tur CG and lrx
|
|
- At LREC, worked a lot on apertium-uig and apertium-kaz-uig
|
3 |
1 - 7 June
|
- one 200-word tur-uzb text to <10% WER
- one 200-word uzb-tur text to <10% WER
- work on uzb CG and lrx
- work on tur CG and lrx
|
- Fixed a small handful of transfer issues in apertium-tur-kir
|
- Got apertium-uig in a state for others to work with
- Completed eval for last few weeks
|
4 |
8 - 14 June
|
- one 500-word kir-tur text to <10% WER
- one 500-word kir-kaz text to <10% WER
- work on kir CG and lrx
- start testvoc nouns for all pairs
|
|
- Brought coverage of apertium-tur-kir on SETimes corpus up by 2%
- Brought testvoc of apertium-tur-kir on SETimes corpus down from 10.39% to 0.22%
|
|
5 |
15 - 21 June
|
- one 500-word kaz-kir text to <10% WER
- one 500-word tur-kir text to <10% WER
- work on kaz CG and lrx
- work on tur CG and lrx
- continue testvoc nouns for all pairs
|
|
- Brought testvoc of apertium-tur-kir on SETimes corpus down from 0.22% to 0.04%
|
- Made and presented poster for Morphology Fest
|
6 |
22 - 28 June
|
- one 500-word tur-uzb text to <10% WER
- one 500-word uzb-tur text to <10% WER
- work on tur CG and lrx
- work on uzb CG and lrx
- continue testvoc nouns for all pairs
|
|
|
- Moving and getting situated week
- (break for personal reasons)
|
midterm eval 29 June
|
- kaz(-kir) trimmed coverage ≥90%
- kir(-kaz) trimmed coverage ≥90%
- tur(-kir) trimmed coverage ≥90%
- kir(-tur) trimmed coverage ≥90%
- tur(-uzb) trimmed coverage ≥80%
- uzb(-tur) trimmed coverage ≥80%
|
|
|
|
7 |
29 June - 5 July
|
- get texts for kaz-kir translating
|
8 |
6 - 12 July
|
|
9 |
13 - 19 July
|
- corpus textvoc for kaz-kir
|
10 |
20 - 26 July
|
|
11 |
27 July - 2 August
|
|
12 |
3 - 10 August
|
|
pencils-down week final evaluation 11 August - 18 August
|
- move pairs to trunk
- document stuff better on the wiki
- make the pairs live at turkic.apertium.org
|
Getting started[edit]
- make scripts for:
- getting raw numbers for Progress
- doing regression tests (/learn how to use existing scripts)
- guesser
- get updated corpora for:
Recurring[edit]
- The end of every week:
- Constantly:
- Add good sentences to regression tests
- Clean up lexc files
- remove duplicate entries
- alphabetise sections?
- add glosses, etc.