Major goals
- Good WER
- Clean testvoc
- 12'000 stems in bidix (~1000 stems per week, or ~200 per day)
- Sort Adjective and Noun stems in kir.lexc into appropriate categories
- Trimmed coverage approaching 90%
Schedule
Timeline
See GSoC 2013 Timeline for complete timeline. Important coding dates follow:
- June 17th: coding begins
- July 29th - August 2nd: midterm evaluations
- September 16th - September 23rd: pencils down
- September 27th: final evaluation
Workplan
| week
|
dates
|
goals
|
eval
|
accomplishments
|
notes
|
post-application period 3 - 24 May
|
- finish coding challenge with WER ~10%
- trimmed coverage 45%
- total 250 stems in dix
|
|
- coding challenge: WER ~9%
- trimmed coverage: 52%,48%
- stems in dix: 380
|
- Demonstrated ability to add stems to dix and lexc.
- A couple easy lexical selection rules are still not written.
- Needs to learn more about other aspects of apertium and evaluation.
- —Firespeaker 06:45, 20 May 2013 (UTC)
|
community bonding period 27 May - 16 June
|
- run first testvoc
- run coverage scripts
- get first frequency lists
- write ≥4 lexical selection rules
- write ≥2 transfer rules
- write ≥3 disambig rules
|
|
|
|
| 1
|
17 - 22 June
|
- total 1500 stems in dix
- clean testvoc for
<postadv> <ij>
- 500-word evaluation, WER ~10%
- trimmed coverage 51%
|
|
|
|
| 2
|
23 - 29 June
|
- total 2400 stems in dix
- clean testvoc for
<num> <post>
- trimmed coverage 53%
|
|
|
|
| 3
|
30 - 6 July
|
- total 3200 stems in dix
- clean testvoc for
<cnjcoo> <cnjadv> <cnjsub>
- trimmed coverage 55%
|
|
|
|
| 4
|
7 - 13 July
|
- total 4000 stems in dix
- clean testvoc for
<adv>
- trimmed coverage 59%
|
|
|
|
| 5
|
14 -20 July
|
- total 4800 stems in dix
- clean testvoc for
<prn> <det>
- trimmed coverage 63%
|
|
|
|
| 6
|
21 - 27 July
|
- total 5600 stems in dix
- clean testvoc for
<adj> <adj><advl>
- trimmed coverage 68%
|
|
|
|
| 7
|
28 - 3 August
|
- total 6400 stems in dix
- trimmed coverage 70%
|
|
|
|
midterm eval 2 August
|
- total 6500 stems in dix
- 500-word evaluation, WER ~10%
- trimmed coverage 72%
|
|
|
|
| 8
|
4 - 10 August
|
- total 7200 stems in dix
- clean testvoc for
<n> <num><subst> <np> <adj><subst>
- trimmed coverage 75%
|
|
|
|
| 9
|
11 - 17 August
|
- total 8000 stems in dix
- trimmed coverage 78%
|
|
|
|
| 10
|
18 - 24 August
|
- total 8800 stems in dix
- trimmed coverage 81%
|
|
|
|
| 11
|
25 - 31 August
|
- total 9600 stems in dix
- clean testvoc for
<v>
- trimmed coverage 83%
|
|
|
|
| 12
|
1 - 7 September
|
- total 10400 stems in dix
- trimmed coverage 85%
|
|
|
|
| 13
|
8 - 15 September
|
- total 11200 stems in dix
- trimmed coverage 87%
|
|
|
|
pencils-down week final evaluation 16 - 23 September
|
- total 12000 stems in dix
- 500-word evaluation, WER ~10%
- clean testvoc for all categories
- trimmed coverage 88%
|
|
|
|
Tips and Tricks
Adding stems quickly
- Add top stems from frequency lists of unknown forms
- Use spectie's dix-entries-to-be-checked script