CG tagging hybrid and tagger improvements/Work plan
|Week beginning||Planned work|
|22 April||Community bonding.|
|23 May||Classes end the previous week.
Set up cross validation and held back validation scripts for a few languages & give them reproducible language models
Clean up apertium-tagger.cc iff we have decided on a code style standard which forbids or discourages some of the control flow trickery it does and fix up CLI
Add new tagger file format
|30 May||For smoothed trigram, naive bigram & LSW implement strategies 1, 2 & 3 of integrating CG into training. Strategy 4 is not part of this proposal as it’s too speculative. Also implement the tagging strategies of Boosting Statistical Tagger Accuracy with Simple Rule-Based Grammars. (http://www.lrec-conf.org/proceedings/lrec2012/pdf/1075_Paper.pdf) which is for taggers trained without CG|
|13 June||Evaluate the last two weeks.|
|20 June||Begin submitting midterm evaluation.
Implement averaged perceptron tagger. The initial implementation will be similar to the blog post implementation. Note that averaged here refers to averaging over time so that new training data isn’t given too much weight.
|27 June||Implement averaged perceptron tagger.|
|4 July||Implement averaged perceptron tagger. Holiday from 6th.|
|11 July||RBMT workshop. A chance to promote and get feedback on the work.|
|18 July||RBMT workshop until 22nd. Holiday|
|25 July||Holiday ends 27th.|
|1 August||Implement averaged perceptron tagger.|
|8 August||Documentation, cleanup, testing, QA, incorporating feedback & evaluation|
|15 August||Final week. Documentation, cleanup, testing, QA, incorporating feedback & evaluation|
|23 August 19:00 UTC||Ended|