Difference between revisions of "Apertium-kaz-kir/Workplan"
		
		
		
		
		
		
		Jump to navigation
		Jump to search
		
				
		
		
		
		
		
		
		
	
| Firespeaker (talk | contribs) | Firespeaker (talk | contribs)  | ||
| (43 intermediate revisions by the same user not shown) | |||
| Line 4: | Line 4: | ||
| * 12'000 stems in bidix (~1000 stems per week, or ~200 per day) | * 12'000 stems in bidix (~1000 stems per week, or ~200 per day) | ||
| * Sort Adjective and Noun stems in kir.lexc into appropriate categories | * Sort Adjective and Noun stems in kir.lexc into appropriate categories | ||
| * Trimmed coverage approaching 90% | * [[Apertium-kaz-kir/stats#Over-all_stats|Trimmed coverage]] approaching 90% | ||
| == Schedule == | == Schedule == | ||
| Line 48: | Line 48: | ||
| # write ≥3 disambig rules | # write ≥3 disambig rules | ||
| note: should be in IRC every day | note: should be in IRC every day | ||
| | {{Workeval5|3}} | |||
| | | | | ||
| # — | |||
| # ran trimmed coverage script on a corpus | |||
| # took a look at frequency lists | |||
| # wrote 4 pairs of lexical selection rules | |||
| # wrote 4 variants of 1 transfer rule | |||
| # — | |||
| | | | | ||
| * demonstrated ability to work with lexical selection rules | |||
| #  | |||
| * demonstrated ability to work with transfer rules | |||
| #  | |||
| * got only some experience with coverage scripts | |||
| #  | |||
| * did not get experience with testvoc | |||
| #  | |||
| * did not get experience with disambig rules | |||
| #  | |||
| * '''was not around IRC frequently''' | |||
| #   | |||
| * worked in bursts, did not spend a single long period of time | |||
| | | |||
| —[[User:Firespeaker|Firespeaker]] 02:28, 2 July 2013 (UTC) | |||
| |- | |- | ||
| ! 1 | ! 1 | ||
| Line 65: | Line 73: | ||
| # 500-word evaluation, WER ~10% | # 500-word evaluation, WER ~10% | ||
| # trimmed coverage 51% | # trimmed coverage 51% | ||
| | {{Workeval5|0}}  | |||
| |  | |||
| | | | | ||
| | | | | ||
| * did not show up —[[User:Firespeaker|Firespeaker]] 02:28, 2 July 2013 (UTC) | |||
| |- | |- | ||
| ! 2 | ! 2 | ||
| Line 75: | Line 84: | ||
| # clean testvoc for {{tag|num}} {{tag|post}} | # clean testvoc for {{tag|num}} {{tag|post}} | ||
| # trimmed coverage 53% | # trimmed coverage 53% | ||
| | {{Workeval5|0}} | |||
| | | | | ||
| # stems in dix: 408 | |||
| | | | | ||
| * did not show up —[[User:Firespeaker|Firespeaker]] 02:28, 2 July 2013 (UTC) | |||
| | | |||
| |- | |- | ||
| ! 3  | ! 3  | ||
| Line 85: | Line 96: | ||
| # clean testvoc for {{tag|cnjcoo}} {{tag|cnjadv}} {{tag|cnjsub}} | # clean testvoc for {{tag|cnjcoo}} {{tag|cnjadv}} {{tag|cnjsub}} | ||
| # trimmed coverage 55% | # trimmed coverage 55% | ||
| | {{Workeval5|2}} | |||
| |  | |||
| # stems in dix: 508 | |||
| # trimmed coverage: 59.5%,51.5% | |||
| | | | | ||
| * trimmed coverage good | |||
| | | |||
| * too narrow a focus on a single corpus | |||
| | | |||
| * number of stems too low | |||
| * no new WER text | |||
| * no testvoc | |||
| —[[User:Firespeaker|Firespeaker]] 20:43, 8 July 2013 (UTC) | |||
| |-  | |-  | ||
| ! 4 | ! 4 | ||
| Line 95: | Line 114: | ||
| # clean testvoc for {{tag|adv}} | # clean testvoc for {{tag|adv}} | ||
| # trimmed coverage 59% | # trimmed coverage 59% | ||
| |{{Workeval5|2}} | |||
| | | |||
| |rowspan="2"| | |||
| | | |||
| # stems in dix: 2574 | |||
| | | |||
| # trimmed coverage: 69.3%,63.8% | |||
| # azattyq_24455849 WER: 14.78% | |||
| # completed most of TODO-list | |||
| |rowspan="2"| | |||
| * good progress on adding stems | |||
| * fixed little things as directed | |||
| * good progress on post-editing process | |||
| * didn't make good progress on reducing WER | |||
| * still no testvoc | |||
| * committed once every 3 or 4 days; '''should be committing every day''' | |||
| * poor communication with mentors; needs to be around more often | |||
| —[[User:Firespeaker|Firespeaker]] 22:16, 22 July 2013 (UTC) | |||
| |-  | |-  | ||
| ! 5  | ! 5  | ||
| !style="text-align: right"| 14 -20 July | !style="text-align: right"| 14 - 20 July | ||
| | | | | ||
| # total 4800 stems in dix | # total 4800 stems in dix | ||
| # clean testvoc for {{tag|prn}} {{tag|det}} | # clean testvoc for {{tag|prn}} {{tag|det}} | ||
| # trimmed coverage 63% | # trimmed coverage 63% | ||
| |{{Workeval5|3}} | |||
| | | |||
| | | |||
| | | |||
| |-  | |-  | ||
| ! 6  | ! 6  | ||
| Line 115: | Line 144: | ||
| # clean testvoc for {{tag|adj}} {{tag|adj}}{{tag|advl}} | # clean testvoc for {{tag|adj}} {{tag|adj}}{{tag|advl}} | ||
| # trimmed coverage 68% | # trimmed coverage 68% | ||
| |{{Workeval5|3}} | |||
| | | |||
| |rowspan="3"| | |||
| | | |||
| # stems in dix: 5552 | |||
| | | |||
| # trimmed coverage: 72%,67% | |||
| # azattyq_24455849 WER: 18.01% | |||
| |rowspan="2"| | |||
| * good improvement in dix | |||
| ** should be checking for errors (e.g., extra spaces) | |||
| * not much progress with WER text | |||
| ** simple lrx and t1x should be enough here | |||
| * still no indication of progress with testvoc | |||
| * better communication and commit frequency, but could still improve | |||
| —[[User:Firespeaker|Firespeaker]] 18:21, 1 August 2013 (UTC) | |||
| |-  | |-  | ||
| ! 7  | ! 7  | ||
| Line 124: | Line 163: | ||
| # total 6400 stems in dix | # total 6400 stems in dix | ||
| # trimmed coverage 70% | # trimmed coverage 70% | ||
| |{{Workeval5|2}} | |||
| | | |||
| | | |||
| | | |||
| |- | |- | ||
| !colspan="2" style="text-align: right"| midterm eval<br />2 August | !colspan="2" style="text-align: right"| [[Apertium-kaz-kir/TODO#By_midterm|midterm eval]]<br />2 August | ||
| | | | | ||
| # total 6500 stems in dix | # total 6500 stems in dix | ||
| # 500-word evaluation, WER ~10% | # 500-word evaluation, WER ~10% | ||
| # trimmed coverage 72% | # trimmed coverage 72% | ||
| |{{Workeval5|2}} | |||
| | | | | ||
| * midterm TODO list goals only partially attained | |||
| | | |||
| * overall progress has been mediocre | |||
| | | |||
| * among the lowest-performing students | |||
| * noticeable improvement in the last few weeks | |||
| * needs to improve more to pass the final | |||
| —[[User:Firespeaker|Firespeaker]] 18:26, 1 August 2013 (UTC) | |||
| |-  | |-  | ||
| ! 8 | ! 8 | ||
| Line 143: | Line 185: | ||
| # clean testvoc for {{tag|n}} {{tag|num}}{{tag|subst}} {{tag|np}} {{tag|adj}}{{tag|subst}} | # clean testvoc for {{tag|n}} {{tag|num}}{{tag|subst}} {{tag|np}} {{tag|adj}}{{tag|subst}} | ||
| # trimmed coverage 75% | # trimmed coverage 75% | ||
| |{{Workeval5|2}} | |||
| | | |||
| |rowspan="3"| | |||
| | | |||
| # stems in dix: 6493 | |||
| # trimmed coverage: 79.6%,74.1% | |||
| | | | | ||
| |-  | |-  | ||
| Line 152: | Line 196: | ||
| # total 8000 stems in dix | # total 8000 stems in dix | ||
| # trimmed coverage 78% | # trimmed coverage 78% | ||
| |{{Workeval5|2}} | |||
| | | |||
| | | |||
| | | | | ||
| |-  | |-  | ||
| Line 161: | Line 204: | ||
| # total 8800 stems in dix | # total 8800 stems in dix | ||
| # trimmed coverage 81% | # trimmed coverage 81% | ||
| |{{Workeval5|3}} | |||
| | | |||
| | | |||
| | | | | ||
| |-  | |-  | ||
| Line 171: | Line 213: | ||
| # clean testvoc for {{tag|v}} | # clean testvoc for {{tag|v}} | ||
| # trimmed coverage 83% | # trimmed coverage 83% | ||
| |{{Workeval5|3}} | |||
| | | | | ||
| # stems in dix: 6730 | |||
| | | |||
| # trimmed coverage: 82.5%,78.4% | |||
| # azattyq_24455849 WER: 6.62% | |||
| | | | | ||
| |-  | |-  | ||
| Line 180: | Line 225: | ||
| # total 10400 stems in dix | # total 10400 stems in dix | ||
| # trimmed coverage 85% | # trimmed coverage 85% | ||
| |{{Workeval5|3}} | |||
| | | | | ||
| # stems in dix: 7007 | |||
| # trimmed coverage: 84.2%,79.8% | |||
| | | | | ||
| * Good [[Turkic_lexicon#Kyrgyz|adjective typology]] | |||
| | | |||
| * Decent progress on coverage | |||
| * Not around much later in the week | |||
| * Still no testvoc... | |||
| —[[User:Firespeaker|Firespeaker]] 07:29, 10 September 2013 (UTC) | |||
| |-  | |-  | ||
| ! 13 | ! 13 | ||
| Line 189: | Line 241: | ||
| # total 11200 stems in dix | # total 11200 stems in dix | ||
| # trimmed coverage 87% | # trimmed coverage 87% | ||
| |{{Workeval5|1}} | |||
| | | | | ||
| # stems in dix: 7454 | |||
| # trimmed coverage: 85.2%,80.4% | |||
| | | | | ||
| * Decent increase in coverage | |||
| | | |||
| * Still no testvoc | |||
| * Still ~600 unsorted ADJ | |||
| * Not around much | |||
| —[[User:Firespeaker|Firespeaker]] 20:06, 22 September 2013 (UTC) | |||
| |- | |- | ||
| !colspan="2" style="text-align: right"| pencils-down week<br />final evaluation<br />16 - 23 September | !colspan="2" style="text-align: right"| pencils-down week<br />final evaluation<br />16 - 23 September | ||
| Line 200: | Line 259: | ||
| # trimmed coverage 88% | # trimmed coverage 88% | ||
| # release 0.1.0 and move to trunk | # release 0.1.0 and move to trunk | ||
| | | |||
| | | |||
| # stems in dix: 7546 | |||
| # trimmed coverage: 85.8%,81.6% | |||
| | | |||
| * Good coverage | |||
| * "Good" WER results | |||
| ** But lots of # and * errors :( | |||
| * No work on testvoc | |||
| * Some ADJ sorted; still >500 unsorted | |||
| * only 2 sets of LRX rules since early in GSoC | |||
| * only 1 transfer rule since early in GSoC | |||
| |- | |||
| !colspan="2" style="text-align: right"| Final evaluation | |||
| | | |||
| | | | | ||
| | | | | ||
| | | | | ||
| * Has improved coverage a certain amount | |||
| * Has not done anything else | |||
| * Mentors have had to nag to get him to work | |||
| * Has not been around enough | |||
| * Among the lowest-performing students | |||
| * Has not improved since midterm | |||
| * Last-ditch efforts not at all impressive | |||
| |} | |} | ||
Latest revision as of 06:42, 23 September 2013
Contents
Major goals[edit]
- Good WER
- Clean testvoc
- 12'000 stems in bidix (~1000 stems per week, or ~200 per day)
- Sort Adjective and Noun stems in kir.lexc into appropriate categories
- Trimmed coverage approaching 90%
Schedule[edit]
Timeline[edit]
See GSoC 2013 Timeline for complete timeline. Important coding dates follow:
- June 17th: coding begins
- July 29th - August 2nd: midterm evaluations
- September 16th - September 23rd: pencils down
- September 27th: final evaluation
Workplan[edit]
| week | dates | goals | eval | accomplishments | notes | 
|---|---|---|---|---|---|
| post-application period 3 - 24 May | 
 | 
 | 
 
 | ||
| community bonding period 27 May - 16 June | 
 note: should be in IRC every day | 
 | 
 —Firespeaker 02:28, 2 July 2013 (UTC) | ||
| 1 | 17 - 22 June | 
 | 
 | ||
| 2 | 23 - 29 June | 
 | 
 | 
 | |
| 3 | 30 - 6 July | 
 | 
 | 
 —Firespeaker 20:43, 8 July 2013 (UTC) | |
| 4 | 7 - 13 July | 
 | 
 | 
 —Firespeaker 22:16, 22 July 2013 (UTC) | |
| 5 | 14 - 20 July | 
 | |||
| 6 | 21 - 27 July | 
 | 
 | 
 —Firespeaker 18:21, 1 August 2013 (UTC) | |
| 7 | 28 - 3 August | 
 | |||
| midterm eval 2 August | 
 | 
 —Firespeaker 18:26, 1 August 2013 (UTC) | |||
| 8 | 4 - 10 August | 
 | 
 | ||
| 9 | 11 - 17 August | 
 | |||
| 10 | 18 - 24 August | 
 | |||
| 11 | 25 - 31 August | 
 | 
 | ||
| 12 | 1 - 7 September | 
 | 
 | 
 —Firespeaker 07:29, 10 September 2013 (UTC) | |
| 13 | 8 - 15 September | 
 | 
 | 
 —Firespeaker 20:06, 22 September 2013 (UTC) | |
| pencils-down week final evaluation 16 - 23 September | 
 | 
 | 
 | ||
| Final evaluation | 
 | ||||
Tips and Tricks[edit]
Adding stems quickly[edit]
- Add top stems from frequency lists of unknown forms
- Use spectie's dix-entries-to-be-checked script

