CG tagging hybrid and tagger improvements/Work plan

From Apertium
Revision as of 15:15, 13 June 2016 by Frankier (talk | contribs) (Created page with "{- | Week beginning || Planned work |- | 22 April || Community bonding. |- | 23 May || Classes end the previous week. Set up cross validation and held back validation scripts...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

{- | Week beginning || Planned work |- | 22 April || Community bonding. |- | 23 May || Classes end the previous week.

Set up cross validation and held back validation scripts for a few languages & give them reproducible language models

Clean up apertium-tagger.cc iff we have decided on a code style standard which forbids or discourages some of the control flow trickery it does and fix up CLI

Add new tagger file format

Fix find_similar_ambiguity_class. |- | 30 May || For smoothed trigram, naive bigram & LSW implement strategies 1, 2 & 3 of integrating CG into training. Strategy 4 is not part of this proposal as it’s too speculative. Also implement the tagging strategies of Boosting Statistical Tagger Accuracy with Simple Rule-Based Grammars. (http://www.lrec-conf.org/proceedings/lrec2012/pdf/1075_Paper.pdf) which is for taggers trained without CG |- | 6 June || Ditto |- | 13 June || Evaluate the last two weeks. |- | 20 June || Begin submitting midterm evaluation.

Implement averaged perceptron tagger. The initial implementation will be similar to the blog post implementation. Note that averaged here refers to averaging over time so that new training data isn’t given too much weight. |- | 27 June || Implement averaged perceptron tagger. |- | 4 July || Implement averaged perceptron tagger. Holiday from 6th. |- | 11 July || RBMT workshop. A chance to promote and get feedback on the work. |- | 18 July || RBMT workshop until 22nd. Holiday |- | 25 July || Holiday ends 27th. |- | 1 August || Implement averaged perceptron tagger. |- | 8 August || Documentation, cleanup, testing, QA, incorporating feedback & evaluation |- | 15 August || Final week. Documentation, cleanup, testing, QA, incorporating feedback & evaluation |- | 23 August 19:00 UTC || Ended