CG tagging hybrid and tagger improvements/Work plan

From Apertium
Revision as of 15:18, 13 June 2016 by Frankier (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
Week beginning Planned work
22 April Community bonding.
23 May Classes end the previous week.

Set up cross validation and held back validation scripts for a few languages & give them reproducible language models

Clean up iff we have decided on a code style standard which forbids or discourages some of the control flow trickery it does and fix up CLI

Add new tagger file format

Fix find_similar_ambiguity_class.

30 May For smoothed trigram, naive bigram & LSW implement strategies 1, 2 & 3 of integrating CG into training. Strategy 4 is not part of this proposal as it’s too speculative. Also implement the tagging strategies of Boosting Statistical Tagger Accuracy with Simple Rule-Based Grammars. ( which is for taggers trained without CG
6 June Ditto
13 June Evaluate the last two weeks.
20 June Begin submitting midterm evaluation.

Implement averaged perceptron tagger. The initial implementation will be similar to the blog post implementation. Note that averaged here refers to averaging over time so that new training data isn’t given too much weight.

27 June Implement averaged perceptron tagger.
4 July Implement averaged perceptron tagger. Holiday from 6th.
11 July RBMT workshop. A chance to promote and get feedback on the work.
18 July RBMT workshop until 22nd. Holiday
25 July Holiday ends 27th.
1 August Implement averaged perceptron tagger.
8 August Documentation, cleanup, testing, QA, incorporating feedback & evaluation
15 August Final week. Documentation, cleanup, testing, QA, incorporating feedback & evaluation
23 August 19:00 UTC Ended