Difference between revisions of "Apertium-kaz-kir/Workplan"
Jump to navigation
Jump to search
Firespeaker (talk | contribs) |
Firespeaker (talk | contribs) |
||
(62 intermediate revisions by the same user not shown) | |||
Line 2: | Line 2: | ||
* Good WER |
* Good WER |
||
* Clean testvoc |
* Clean testvoc |
||
* |
* 12'000 stems in bidix (~1000 stems per week, or ~200 per day) |
||
* Sort Adjective and Noun stems in kir.lexc into appropriate categories |
* Sort Adjective and Noun stems in kir.lexc into appropriate categories |
||
* [[Apertium-kaz-kir/stats#Over-all_stats|Trimmed coverage]] approaching 90% |
|||
== Schedule == |
== Schedule == |
||
Line 15: | Line 16: | ||
=== Workplan === |
=== Workplan === |
||
{|class="wikitable" |
{|class="wikitable" |
||
! week |
|||
! week !! dates !! goals !! eval !! notes |
|||
! dates |
|||
!style="width: 25%"| goals |
|||
! eval |
|||
!style="width: 25%"| accomplishments |
|||
!style="width: 35%"| notes |
|||
|- |
|- |
||
!colspan="2" style="text-align: right"|post-application period<br />3 - 24 May |
|||
| |
| |
||
# finish coding challenge with WER ~10% |
# finish coding challenge with WER ~10% |
||
# trimmed coverage 45% |
# trimmed coverage 45% |
||
# total 250 stems in dix |
|||
| 4/5 '''pass''' |
|||
| {{Workeval5|4}} |
|||
| |
| |
||
# coding challenge WER ~ |
# coding challenge: WER ~9% |
||
# trimmed coverage: 52%,48% |
# trimmed coverage: 52%,48% |
||
# stems in dix: 380 |
|||
| |
|||
* Demonstrated ability to add stems to dix and lexc. |
|||
* A couple easy lexical selection rules are still not written. |
|||
* Needs to learn more about [[User:Firespeaker/Steps_for_writing_a_language_pair#Solve_more_complicated_translation_problems|other aspects of apertium]] and [[User:Firespeaker/Steps_for_writing_a_language_pair#Evaluate_the_pair|evaluation]]. |
|||
: —[[User:Firespeaker|Firespeaker]] 06:45, 20 May 2013 (UTC) |
|||
|- |
|- |
||
!colspan="2" style="text-align: right"|community bonding period<br />27 May - 16 June |
|||
| |
| |
||
# run first testvoc |
# run first testvoc |
||
# run coverage scripts |
|||
# get first frequency lists |
|||
# write ≥4 lexical selection rules |
# write ≥4 lexical selection rules |
||
# write ≥2 transfer rules |
# write ≥2 transfer rules |
||
# write ≥3 disambig rules |
# write ≥3 disambig rules |
||
note: should be in IRC every day |
|||
| {{Workeval5|3}} |
|||
| |
| |
||
# — |
|||
| |
|||
# ran trimmed coverage script on a corpus |
|||
# took a look at frequency lists |
|||
# wrote 4 pairs of lexical selection rules |
|||
# wrote 4 variants of 1 transfer rule |
|||
# — |
|||
| |
|||
* demonstrated ability to work with lexical selection rules |
|||
* demonstrated ability to work with transfer rules |
|||
* got only some experience with coverage scripts |
|||
* did not get experience with testvoc |
|||
* did not get experience with disambig rules |
|||
* '''was not around IRC frequently''' |
|||
* worked in bursts, did not spend a single long period of time |
|||
—[[User:Firespeaker|Firespeaker]] 02:28, 2 July 2013 (UTC) |
|||
|- |
|- |
||
! 1 |
|||
| 1 ||align="right"| 17 - 22 June |
|||
!style="text-align: right"| 17 - 22 June |
|||
| |
| |
||
# total 1500 stems in dix |
# total 1500 stems in dix |
||
Line 41: | Line 73: | ||
# 500-word evaluation, WER ~10% |
# 500-word evaluation, WER ~10% |
||
# trimmed coverage 51% |
# trimmed coverage 51% |
||
| {{Workeval5|0}} |
|||
| |
|||
| |
|||
| |
| |
||
* did not show up —[[User:Firespeaker|Firespeaker]] 02:28, 2 July 2013 (UTC) |
|||
|- |
|- |
||
! 2 |
|||
| 2 ||align="right"| 23 - 29 June |
|||
!style="text-align: right"| 23 - 29 June |
|||
| |
| |
||
# total 2400 stems in dix |
# total 2400 stems in dix |
||
# clean testvoc for {{tag|num}} {{tag|post}} |
# clean testvoc for {{tag|num}} {{tag|post}} |
||
# trimmed coverage 53% |
# trimmed coverage 53% |
||
| {{Workeval5|0}} |
|||
| |
| |
||
# stems in dix: 408 |
|||
| |
| |
||
* did not show up —[[User:Firespeaker|Firespeaker]] 02:28, 2 July 2013 (UTC) |
|||
|- |
|- |
||
! 3 |
|||
| 3 ||align="right"| 30 - 6 July |
|||
!style="text-align: right"| 30 - 6 July |
|||
| |
| |
||
# total 3200 stems in dix |
# total 3200 stems in dix |
||
# clean testvoc for {{tag|cnjcoo}} {{tag|cnjadv}} {{tag|cnjsub}} |
# clean testvoc for {{tag|cnjcoo}} {{tag|cnjadv}} {{tag|cnjsub}} |
||
# trimmed coverage 55% |
# trimmed coverage 55% |
||
| {{Workeval5|2}} |
|||
| |
|||
# stems in dix: 508 |
|||
# trimmed coverage: 59.5%,51.5% |
|||
| |
| |
||
* trimmed coverage good |
|||
| |
|||
* too narrow a focus on a single corpus |
|||
* number of stems too low |
|||
* no new WER text |
|||
* no testvoc |
|||
—[[User:Firespeaker|Firespeaker]] 20:43, 8 July 2013 (UTC) |
|||
|- |
|- |
||
! 4 |
|||
| 4 ||align="right"| 7 - 13 July |
|||
!style="text-align: right"| 7 - 13 July |
|||
| |
| |
||
# total 4000 stems in dix |
# total 4000 stems in dix |
||
# clean testvoc for {{tag|adv}} |
|||
# trimmed coverage 59% |
# trimmed coverage 59% |
||
|{{Workeval5|2}} |
|||
| |
|||
|rowspan="2"| |
|||
| |
|||
# stems in dix: 2574 |
|||
# trimmed coverage: 69.3%,63.8% |
|||
# azattyq_24455849 WER: 14.78% |
|||
# completed most of TODO-list |
|||
|rowspan="2"| |
|||
* good progress on adding stems |
|||
* fixed little things as directed |
|||
* good progress on post-editing process |
|||
* didn't make good progress on reducing WER |
|||
* still no testvoc |
|||
* committed once every 3 or 4 days; '''should be committing every day''' |
|||
* poor communication with mentors; needs to be around more often |
|||
—[[User:Firespeaker|Firespeaker]] 22:16, 22 July 2013 (UTC) |
|||
|- |
|- |
||
! 5 |
|||
| 5 ||align="right"| 14 -20 July |
|||
!style="text-align: right"| 14 - 20 July |
|||
| |
| |
||
# total 4800 stems in dix |
# total 4800 stems in dix |
||
# clean testvoc for {{tag|prn}} {{tag|det}} |
|||
# trimmed coverage 63% |
# trimmed coverage 63% |
||
|{{Workeval5|3}} |
|||
| |
|||
| |
|||
|- |
|- |
||
! 6 |
|||
| 6 ||align="right"| 21 - 27 July |
|||
!style="text-align: right"| 21 - 27 July |
|||
| |
| |
||
# total 5600 stems in dix |
# total 5600 stems in dix |
||
# clean testvoc for {{tag|adj}} {{tag|adj}}{{tag|advl}} |
|||
# trimmed coverage 68% |
# trimmed coverage 68% |
||
|{{Workeval5|3}} |
|||
| |
|||
|rowspan="3"| |
|||
| |
|||
# stems in dix: 5552 |
|||
# trimmed coverage: 72%,67% |
|||
# azattyq_24455849 WER: 18.01% |
|||
|rowspan="2"| |
|||
* good improvement in dix |
|||
** should be checking for errors (e.g., extra spaces) |
|||
* not much progress with WER text |
|||
** simple lrx and t1x should be enough here |
|||
* still no indication of progress with testvoc |
|||
* better communication and commit frequency, but could still improve |
|||
—[[User:Firespeaker|Firespeaker]] 18:21, 1 August 2013 (UTC) |
|||
|- |
|- |
||
! 7 |
|||
| 7 ||align="right"| 28 - 3 August |
|||
!style="text-align: right"| 28 - 3 August |
|||
| |
| |
||
# total 6400 stems in dix |
# total 6400 stems in dix |
||
# trimmed coverage 70% |
# trimmed coverage 70% |
||
|{{Workeval5|2}} |
|||
| |
|||
| |
|||
|- |
|- |
||
!colspan="2" style="text-align: right"| [[Apertium-kaz-kir/TODO#By_midterm|midterm eval]]<br />2 August |
|||
| |
| |
||
# total 6500 stems in dix |
# total 6500 stems in dix |
||
# 500-word evaluation, WER ~10% |
# 500-word evaluation, WER ~10% |
||
# trimmed coverage 72% |
# trimmed coverage 72% |
||
|{{Workeval5|2}} |
|||
| |
| |
||
* midterm TODO list goals only partially attained |
|||
| |
|||
* overall progress has been mediocre |
|||
* among the lowest-performing students |
|||
* noticeable improvement in the last few weeks |
|||
* needs to improve more to pass the final |
|||
—[[User:Firespeaker|Firespeaker]] 18:26, 1 August 2013 (UTC) |
|||
|- |
|- |
||
! 8 |
|||
| 8 ||align="right"| 4 - 10 August |
|||
!style="text-align: right"| 4 - 10 August |
|||
| |
| |
||
# total 7200 stems in dix |
# total 7200 stems in dix |
||
# clean testvoc for {{tag|n}} {{tag|num}}{{tag|subst}} {{tag|np}} {{tag|adj}}{{tag|subst}} |
|||
# trimmed coverage 75% |
# trimmed coverage 75% |
||
|{{Workeval5|2}} |
|||
| |
|||
|rowspan="3"| |
|||
# stems in dix: 6493 |
|||
# trimmed coverage: 79.6%,74.1% |
|||
| |
| |
||
|- |
|- |
||
! 9 |
|||
| 9 ||align="right"| 11 - 17 August |
|||
!style="text-align: right"| 11 - 17 August |
|||
| |
| |
||
# total 8000 stems in dix |
# total 8000 stems in dix |
||
# trimmed coverage 78% |
# trimmed coverage 78% |
||
|{{Workeval5|2}} |
|||
| |
|||
| |
| |
||
|- |
|- |
||
! 10 |
|||
| 10 ||align="right"| 18 - 24 August |
|||
!style="text-align: right"| 18 - 24 August |
|||
| |
| |
||
# total 8800 stems in dix |
# total 8800 stems in dix |
||
# trimmed coverage 81% |
# trimmed coverage 81% |
||
|{{Workeval5|3}} |
|||
| |
|||
| |
| |
||
|- |
|- |
||
! 11 |
|||
| 11 ||align="right"| 25 - 31 August |
|||
!style="text-align: right"| 25 - 31 August |
|||
| |
| |
||
# total 9600 stems in dix |
# total 9600 stems in dix |
||
# clean testvoc for {{tag|v}} |
|||
# trimmed coverage 83% |
# trimmed coverage 83% |
||
|{{Workeval5|3}} |
|||
| |
| |
||
# stems in dix: 6730 |
|||
# trimmed coverage: 82.5%,78.4% |
|||
# azattyq_24455849 WER: 6.62% |
|||
| |
| |
||
|- |
|- |
||
! 12 |
|||
| 12 ||align="right"| 1 - 7 September |
|||
!style="text-align: right"| 1 - 7 September |
|||
| |
| |
||
# total 10400 stems in dix |
# total 10400 stems in dix |
||
# trimmed coverage 85% |
# trimmed coverage 85% |
||
|{{Workeval5|3}} |
|||
| |
| |
||
# stems in dix: 7007 |
|||
# trimmed coverage: 84.2%,79.8% |
|||
| |
| |
||
* Good [[Turkic_lexicon#Kyrgyz|adjective typology]] |
|||
* Decent progress on coverage |
|||
* Not around much later in the week |
|||
* Still no testvoc... |
|||
—[[User:Firespeaker|Firespeaker]] 07:29, 10 September 2013 (UTC) |
|||
|- |
|- |
||
! 13 |
|||
| 13 ||align="right"| 8 - 15 September |
|||
!style="text-align: right"| 8 - 15 September |
|||
| |
| |
||
# total 11200 stems in dix |
# total 11200 stems in dix |
||
# trimmed coverage 87% |
# trimmed coverage 87% |
||
|{{Workeval5|1}} |
|||
| |
| |
||
# stems in dix: 7454 |
|||
# trimmed coverage: 85.2%,80.4% |
|||
| |
| |
||
* Decent increase in coverage |
|||
* Still no testvoc |
|||
* Still ~600 unsorted ADJ |
|||
* Not around much |
|||
—[[User:Firespeaker|Firespeaker]] 20:06, 22 September 2013 (UTC) |
|||
|- |
|- |
||
!colspan="2" style="text-align: right"| pencils-down week<br />final evaluation<br />16 - 23 September |
|||
| |
| |
||
# total 12000 stems in dix |
# total 12000 stems in dix |
||
Line 144: | Line 258: | ||
# clean testvoc for all categories |
# clean testvoc for all categories |
||
# trimmed coverage 88% |
# trimmed coverage 88% |
||
# release 0.1.0 and move to trunk |
|||
| |
| |
||
| |
| |
||
# stems in dix: 7546 |
|||
# trimmed coverage: 85.8%,81.6% |
|||
| |
|||
* Good coverage |
|||
* "Good" WER results |
|||
** But lots of # and * errors :( |
|||
* No work on testvoc |
|||
* Some ADJ sorted; still >500 unsorted |
|||
* only 2 sets of LRX rules since early in GSoC |
|||
* only 1 transfer rule since early in GSoC |
|||
|- |
|||
!colspan="2" style="text-align: right"| Final evaluation |
|||
| |
|||
| |
|||
| |
|||
| |
|||
* Has improved coverage a certain amount |
|||
* Has not done anything else |
|||
* Mentors have had to nag to get him to work |
|||
* Has not been around enough |
|||
* Among the lowest-performing students |
|||
* Has not improved since midterm |
|||
* Last-ditch efforts not at all impressive |
|||
|} |
|} |
||
== Tips and Tricks == |
|||
=== Adding stems quickly === |
|||
* Add top stems from frequency lists of unknown forms |
|||
* Use spectie's dix-entries-to-be-checked script |
Latest revision as of 06:42, 23 September 2013
Contents
Major goals[edit]
- Good WER
- Clean testvoc
- 12'000 stems in bidix (~1000 stems per week, or ~200 per day)
- Sort Adjective and Noun stems in kir.lexc into appropriate categories
- Trimmed coverage approaching 90%
Schedule[edit]
Timeline[edit]
See GSoC 2013 Timeline for complete timeline. Important coding dates follow:
- June 17th: coding begins
- July 29th - August 2nd: midterm evaluations
- September 16th - September 23rd: pencils down
- September 27th: final evaluation
Workplan[edit]
week | dates | goals | eval | accomplishments | notes |
---|---|---|---|---|---|
post-application period 3 - 24 May |
|
|
| ||
community bonding period 27 May - 16 June |
note: should be in IRC every day |
|
—Firespeaker 02:28, 2 July 2013 (UTC) | ||
1 | 17 - 22 June |
|
| ||
2 | 23 - 29 June |
|
|
| |
3 | 30 - 6 July |
|
|
—Firespeaker 20:43, 8 July 2013 (UTC) | |
4 | 7 - 13 July |
|
|
—Firespeaker 22:16, 22 July 2013 (UTC) | |
5 | 14 - 20 July |
|
|||
6 | 21 - 27 July |
|
|
—Firespeaker 18:21, 1 August 2013 (UTC) | |
7 | 28 - 3 August |
|
|||
midterm eval 2 August |
|
—Firespeaker 18:26, 1 August 2013 (UTC) | |||
8 | 4 - 10 August |
|
|
||
9 | 11 - 17 August |
|
|||
10 | 18 - 24 August |
|
|||
11 | 25 - 31 August |
|
|
||
12 | 1 - 7 September |
|
|
—Firespeaker 07:29, 10 September 2013 (UTC) | |
13 | 8 - 15 September |
|
|
—Firespeaker 20:06, 22 September 2013 (UTC) | |
pencils-down week final evaluation 16 - 23 September |
|
|
| ||
Final evaluation |
|
Tips and Tricks[edit]
Adding stems quickly[edit]
- Add top stems from frequency lists of unknown forms
- Use spectie's dix-entries-to-be-checked script