Difference between revisions of "Apertium-kaz-kir/Workplan"
Jump to navigation
Jump to search
Firespeaker (talk | contribs) |
Firespeaker (talk | contribs) |
||
(51 intermediate revisions by the same user not shown) | |||
Line 4: | Line 4: | ||
* 12'000 stems in bidix (~1000 stems per week, or ~200 per day) |
* 12'000 stems in bidix (~1000 stems per week, or ~200 per day) |
||
* Sort Adjective and Noun stems in kir.lexc into appropriate categories |
* Sort Adjective and Noun stems in kir.lexc into appropriate categories |
||
− | * Trimmed coverage approaching 90% |
+ | * [[Apertium-kaz-kir/stats#Over-all_stats|Trimmed coverage]] approaching 90% |
== Schedule == |
== Schedule == |
||
Line 23: | Line 23: | ||
!style="width: 35%"| notes |
!style="width: 35%"| notes |
||
|- |
|- |
||
− | + | !colspan="2" style="text-align: right"|post-application period<br />3 - 24 May |
|
| |
| |
||
# finish coding challenge with WER ~10% |
# finish coding challenge with WER ~10% |
||
# trimmed coverage 45% |
# trimmed coverage 45% |
||
# total 250 stems in dix |
# total 250 stems in dix |
||
+ | | {{Workeval5|4}} |
||
− | | 4/5 '''pass''' |
||
| |
| |
||
# coding challenge: WER ~9% |
# coding challenge: WER ~9% |
||
Line 36: | Line 36: | ||
* Demonstrated ability to add stems to dix and lexc. |
* Demonstrated ability to add stems to dix and lexc. |
||
* A couple easy lexical selection rules are still not written. |
* A couple easy lexical selection rules are still not written. |
||
− | * Needs to learn more about other aspects of apertium and evaluation. |
+ | * Needs to learn more about [[User:Firespeaker/Steps_for_writing_a_language_pair#Solve_more_complicated_translation_problems|other aspects of apertium]] and [[User:Firespeaker/Steps_for_writing_a_language_pair#Evaluate_the_pair|evaluation]]. |
: —[[User:Firespeaker|Firespeaker]] 06:45, 20 May 2013 (UTC) |
: —[[User:Firespeaker|Firespeaker]] 06:45, 20 May 2013 (UTC) |
||
|- |
|- |
||
− | + | !colspan="2" style="text-align: right"|community bonding period<br />27 May - 16 June |
|
| |
| |
||
# run first testvoc |
# run first testvoc |
||
Line 47: | Line 47: | ||
# write ≥2 transfer rules |
# write ≥2 transfer rules |
||
# write ≥3 disambig rules |
# write ≥3 disambig rules |
||
+ | note: should be in IRC every day |
||
+ | | {{Workeval5|3}} |
||
| |
| |
||
+ | # — |
||
− | | |
||
+ | # ran trimmed coverage script on a corpus |
||
+ | # took a look at frequency lists |
||
+ | # wrote 4 pairs of lexical selection rules |
||
+ | # wrote 4 variants of 1 transfer rule |
||
+ | # — |
||
| |
| |
||
+ | * demonstrated ability to work with lexical selection rules |
||
+ | * demonstrated ability to work with transfer rules |
||
+ | * got only some experience with coverage scripts |
||
+ | * did not get experience with testvoc |
||
+ | * did not get experience with disambig rules |
||
+ | * '''was not around IRC frequently''' |
||
+ | * worked in bursts, did not spend a single long period of time |
||
+ | —[[User:Firespeaker|Firespeaker]] 02:28, 2 July 2013 (UTC) |
||
|- |
|- |
||
+ | ! 1 |
||
− | | 1 ||align="right"| 17 - 22 June |
||
+ | !style="text-align: right"| 17 - 22 June |
||
| |
| |
||
# total 1500 stems in dix |
# total 1500 stems in dix |
||
Line 57: | Line 73: | ||
# 500-word evaluation, WER ~10% |
# 500-word evaluation, WER ~10% |
||
# trimmed coverage 51% |
# trimmed coverage 51% |
||
+ | | {{Workeval5|0}} |
||
− | | |
||
| |
| |
||
| |
| |
||
+ | * did not show up —[[User:Firespeaker|Firespeaker]] 02:28, 2 July 2013 (UTC) |
||
|- |
|- |
||
+ | ! 2 |
||
− | | 2 ||align="right"| 23 - 29 June |
||
+ | !style="text-align: right"| 23 - 29 June |
||
| |
| |
||
# total 2400 stems in dix |
# total 2400 stems in dix |
||
# clean testvoc for {{tag|num}} {{tag|post}} |
# clean testvoc for {{tag|num}} {{tag|post}} |
||
# trimmed coverage 53% |
# trimmed coverage 53% |
||
+ | | {{Workeval5|0}} |
||
| |
| |
||
+ | # stems in dix: 408 |
||
| |
| |
||
+ | * did not show up —[[User:Firespeaker|Firespeaker]] 02:28, 2 July 2013 (UTC) |
||
− | | |
||
|- |
|- |
||
+ | ! 3 |
||
− | | 3 ||align="right"| 30 - 6 July |
||
+ | !style="text-align: right"| 30 - 6 July |
||
| |
| |
||
# total 3200 stems in dix |
# total 3200 stems in dix |
||
# clean testvoc for {{tag|cnjcoo}} {{tag|cnjadv}} {{tag|cnjsub}} |
# clean testvoc for {{tag|cnjcoo}} {{tag|cnjadv}} {{tag|cnjsub}} |
||
# trimmed coverage 55% |
# trimmed coverage 55% |
||
+ | | {{Workeval5|2}} |
||
+ | | |
||
+ | # stems in dix: 508 |
||
+ | # trimmed coverage: 59.5%,51.5% |
||
| |
| |
||
+ | * trimmed coverage good |
||
− | | |
||
+ | * too narrow a focus on a single corpus |
||
− | | |
||
+ | * number of stems too low |
||
+ | * no new WER text |
||
+ | * no testvoc |
||
+ | —[[User:Firespeaker|Firespeaker]] 20:43, 8 July 2013 (UTC) |
||
|- |
|- |
||
+ | ! 4 |
||
− | | 4 ||align="right"| 7 - 13 July |
||
+ | !style="text-align: right"| 7 - 13 July |
||
| |
| |
||
# total 4000 stems in dix |
# total 4000 stems in dix |
||
# clean testvoc for {{tag|adv}} |
# clean testvoc for {{tag|adv}} |
||
# trimmed coverage 59% |
# trimmed coverage 59% |
||
+ | |{{Workeval5|2}} |
||
− | | |
||
+ | |rowspan="2"| |
||
− | | |
||
+ | # stems in dix: 2574 |
||
− | | |
||
+ | # trimmed coverage: 69.3%,63.8% |
||
+ | # azattyq_24455849 WER: 14.78% |
||
+ | # completed most of TODO-list |
||
+ | |rowspan="2"| |
||
+ | * good progress on adding stems |
||
+ | * fixed little things as directed |
||
+ | * good progress on post-editing process |
||
+ | * didn't make good progress on reducing WER |
||
+ | * still no testvoc |
||
+ | * committed once every 3 or 4 days; '''should be committing every day''' |
||
+ | * poor communication with mentors; needs to be around more often |
||
+ | —[[User:Firespeaker|Firespeaker]] 22:16, 22 July 2013 (UTC) |
||
|- |
|- |
||
+ | ! 5 |
||
− | | 5 ||align="right"| 14 -20 July |
||
+ | !style="text-align: right"| 14 - 20 July |
||
| |
| |
||
# total 4800 stems in dix |
# total 4800 stems in dix |
||
# clean testvoc for {{tag|prn}} {{tag|det}} |
# clean testvoc for {{tag|prn}} {{tag|det}} |
||
# trimmed coverage 63% |
# trimmed coverage 63% |
||
+ | |{{Workeval5|3}} |
||
− | | |
||
− | | |
||
− | | |
||
|- |
|- |
||
+ | ! 6 |
||
− | | 6 ||align="right"| 21 - 27 July |
||
+ | !style="text-align: right"| 21 - 27 July |
||
| |
| |
||
# total 5600 stems in dix |
# total 5600 stems in dix |
||
# clean testvoc for {{tag|adj}} {{tag|adj}}{{tag|advl}} |
# clean testvoc for {{tag|adj}} {{tag|adj}}{{tag|advl}} |
||
# trimmed coverage 68% |
# trimmed coverage 68% |
||
+ | |{{Workeval5|3}} |
||
− | | |
||
+ | |rowspan="3"| |
||
− | | |
||
+ | # stems in dix: 5552 |
||
− | | |
||
+ | # trimmed coverage: 72%,67% |
||
+ | # azattyq_24455849 WER: 18.01% |
||
+ | |rowspan="2"| |
||
+ | * good improvement in dix |
||
+ | ** should be checking for errors (e.g., extra spaces) |
||
+ | * not much progress with WER text |
||
+ | ** simple lrx and t1x should be enough here |
||
+ | * still no indication of progress with testvoc |
||
+ | * better communication and commit frequency, but could still improve |
||
+ | —[[User:Firespeaker|Firespeaker]] 18:21, 1 August 2013 (UTC) |
||
|- |
|- |
||
+ | ! 7 |
||
− | | 7 ||align="right"| 28 - 3 August |
||
+ | !style="text-align: right"| 28 - 3 August |
||
| |
| |
||
# total 6400 stems in dix |
# total 6400 stems in dix |
||
# trimmed coverage 70% |
# trimmed coverage 70% |
||
+ | |{{Workeval5|2}} |
||
− | | |
||
− | | |
||
− | | |
||
|- |
|- |
||
− | + | !colspan="2" style="text-align: right"| [[Apertium-kaz-kir/TODO#By_midterm|midterm eval]]<br />2 August |
|
| |
| |
||
# total 6500 stems in dix |
# total 6500 stems in dix |
||
# 500-word evaluation, WER ~10% |
# 500-word evaluation, WER ~10% |
||
# trimmed coverage 72% |
# trimmed coverage 72% |
||
+ | |{{Workeval5|2}} |
||
| |
| |
||
+ | * midterm TODO list goals only partially attained |
||
− | | |
||
+ | * overall progress has been mediocre |
||
− | | |
||
+ | * among the lowest-performing students |
||
+ | * noticeable improvement in the last few weeks |
||
+ | * needs to improve more to pass the final |
||
+ | —[[User:Firespeaker|Firespeaker]] 18:26, 1 August 2013 (UTC) |
||
|- |
|- |
||
+ | ! 8 |
||
− | | 8 ||align="right"| 4 - 10 August |
||
+ | !style="text-align: right"| 4 - 10 August |
||
| |
| |
||
# total 7200 stems in dix |
# total 7200 stems in dix |
||
# clean testvoc for {{tag|n}} {{tag|num}}{{tag|subst}} {{tag|np}} {{tag|adj}}{{tag|subst}} |
# clean testvoc for {{tag|n}} {{tag|num}}{{tag|subst}} {{tag|np}} {{tag|adj}}{{tag|subst}} |
||
# trimmed coverage 75% |
# trimmed coverage 75% |
||
+ | |{{Workeval5|2}} |
||
− | | |
||
+ | |rowspan="3"| |
||
− | | |
||
+ | # stems in dix: 6493 |
||
+ | # trimmed coverage: 79.6%,74.1% |
||
| |
| |
||
|- |
|- |
||
+ | ! 9 |
||
− | | 9 ||align="right"| 11 - 17 August |
||
+ | !style="text-align: right"| 11 - 17 August |
||
| |
| |
||
# total 8000 stems in dix |
# total 8000 stems in dix |
||
# trimmed coverage 78% |
# trimmed coverage 78% |
||
+ | |{{Workeval5|2}} |
||
− | | |
||
− | | |
||
| |
| |
||
|- |
|- |
||
+ | ! 10 |
||
− | | 10 ||align="right"| 18 - 24 August |
||
+ | !style="text-align: right"| 18 - 24 August |
||
| |
| |
||
# total 8800 stems in dix |
# total 8800 stems in dix |
||
# trimmed coverage 81% |
# trimmed coverage 81% |
||
+ | |{{Workeval5|3}} |
||
− | | |
||
− | | |
||
| |
| |
||
|- |
|- |
||
+ | ! 11 |
||
− | | 11 ||align="right"| 25 - 31 August |
||
+ | !style="text-align: right"| 25 - 31 August |
||
| |
| |
||
# total 9600 stems in dix |
# total 9600 stems in dix |
||
# clean testvoc for {{tag|v}} |
# clean testvoc for {{tag|v}} |
||
# trimmed coverage 83% |
# trimmed coverage 83% |
||
+ | |{{Workeval5|3}} |
||
| |
| |
||
+ | # stems in dix: 6730 |
||
− | | |
||
+ | # trimmed coverage: 82.5%,78.4% |
||
+ | # azattyq_24455849 WER: 6.62% |
||
| |
| |
||
|- |
|- |
||
+ | ! 12 |
||
− | | 12 ||align="right"| 1 - 7 September |
||
+ | !style="text-align: right"| 1 - 7 September |
||
| |
| |
||
# total 10400 stems in dix |
# total 10400 stems in dix |
||
# trimmed coverage 85% |
# trimmed coverage 85% |
||
+ | |{{Workeval5|3}} |
||
| |
| |
||
+ | # stems in dix: 7007 |
||
+ | # trimmed coverage: 84.2%,79.8% |
||
| |
| |
||
+ | * Good [[Turkic_lexicon#Kyrgyz|adjective typology]] |
||
− | | |
||
+ | * Decent progress on coverage |
||
+ | * Not around much later in the week |
||
+ | * Still no testvoc... |
||
+ | —[[User:Firespeaker|Firespeaker]] 07:29, 10 September 2013 (UTC) |
||
|- |
|- |
||
+ | ! 13 |
||
− | | 13 ||align="right"| 8 - 15 September |
||
+ | !style="text-align: right"| 8 - 15 September |
||
| |
| |
||
# total 11200 stems in dix |
# total 11200 stems in dix |
||
# trimmed coverage 87% |
# trimmed coverage 87% |
||
+ | |{{Workeval5|1}} |
||
| |
| |
||
+ | # stems in dix: 7454 |
||
+ | # trimmed coverage: 85.2%,80.4% |
||
| |
| |
||
+ | * Decent increase in coverage |
||
− | | |
||
+ | * Still no testvoc |
||
+ | * Still ~600 unsorted ADJ |
||
+ | * Not around much |
||
+ | —[[User:Firespeaker|Firespeaker]] 20:06, 22 September 2013 (UTC) |
||
|- |
|- |
||
− | + | !colspan="2" style="text-align: right"| pencils-down week<br />final evaluation<br />16 - 23 September |
|
| |
| |
||
# total 12000 stems in dix |
# total 12000 stems in dix |
||
Line 179: | Line 258: | ||
# clean testvoc for all categories |
# clean testvoc for all categories |
||
# trimmed coverage 88% |
# trimmed coverage 88% |
||
+ | # release 0.1.0 and move to trunk |
||
| |
| |
||
| |
| |
||
+ | # stems in dix: 7546 |
||
+ | # trimmed coverage: 85.8%,81.6% |
||
| |
| |
||
+ | * Good coverage |
||
+ | * "Good" WER results |
||
+ | ** But lots of # and * errors :( |
||
+ | * No work on testvoc |
||
+ | * Some ADJ sorted; still >500 unsorted |
||
+ | * only 2 sets of LRX rules since early in GSoC |
||
+ | * only 1 transfer rule since early in GSoC |
||
+ | |- |
||
+ | !colspan="2" style="text-align: right"| Final evaluation |
||
+ | | |
||
+ | | |
||
+ | | |
||
+ | | |
||
+ | * Has improved coverage a certain amount |
||
+ | * Has not done anything else |
||
+ | * Mentors have had to nag to get him to work |
||
+ | * Has not been around enough |
||
+ | * Among the lowest-performing students |
||
+ | * Has not improved since midterm |
||
+ | * Last-ditch efforts not at all impressive |
||
|} |
|} |
||
+ | |||
+ | == Tips and Tricks == |
||
+ | === Adding stems quickly === |
||
+ | * Add top stems from frequency lists of unknown forms |
||
+ | * Use spectie's dix-entries-to-be-checked script |
Latest revision as of 06:42, 23 September 2013
Contents
Major goals[edit]
- Good WER
- Clean testvoc
- 12'000 stems in bidix (~1000 stems per week, or ~200 per day)
- Sort Adjective and Noun stems in kir.lexc into appropriate categories
- Trimmed coverage approaching 90%
Schedule[edit]
Timeline[edit]
See GSoC 2013 Timeline for complete timeline. Important coding dates follow:
- June 17th: coding begins
- July 29th - August 2nd: midterm evaluations
- September 16th - September 23rd: pencils down
- September 27th: final evaluation
Workplan[edit]
week | dates | goals | eval | accomplishments | notes |
---|---|---|---|---|---|
post-application period 3 - 24 May |
|
|
| ||
community bonding period 27 May - 16 June |
note: should be in IRC every day |
|
—Firespeaker 02:28, 2 July 2013 (UTC) | ||
1 | 17 - 22 June |
|
| ||
2 | 23 - 29 June |
|
|
| |
3 | 30 - 6 July |
|
|
—Firespeaker 20:43, 8 July 2013 (UTC) | |
4 | 7 - 13 July |
|
|
—Firespeaker 22:16, 22 July 2013 (UTC) | |
5 | 14 - 20 July |
|
|||
6 | 21 - 27 July |
|
|
—Firespeaker 18:21, 1 August 2013 (UTC) | |
7 | 28 - 3 August |
|
|||
midterm eval 2 August |
|
—Firespeaker 18:26, 1 August 2013 (UTC) | |||
8 | 4 - 10 August |
|
|
||
9 | 11 - 17 August |
|
|||
10 | 18 - 24 August |
|
|||
11 | 25 - 31 August |
|
|
||
12 | 1 - 7 September |
|
|
—Firespeaker 07:29, 10 September 2013 (UTC) | |
13 | 8 - 15 September |
|
|
—Firespeaker 20:06, 22 September 2013 (UTC) | |
pencils-down week final evaluation 16 - 23 September |
|
|
| ||
Final evaluation |
|
Tips and Tricks[edit]
Adding stems quickly[edit]
- Add top stems from frequency lists of unknown forms
- Use spectie's dix-entries-to-be-checked script