Revision as of 23:08, 13 July 2014

MSc

Pending	Estimated date	Notes
Write a DCG for the apertium stream format.
Research UTF and Prolog.
Write a simple PoS disambiguator that makes a random choice.
Set up a repository for the project.
Check licensing of MIL code.
Design internal representation of the input data.
Design rules.
Implement basic predicates.
Learn rules using MIL.

Implemented and tested output writer.
Implemented a trivial disambiguator that always selects the first reading. The number of mismatches from the hand tagged version lowered from 107 to 45 (this was kind of a test to ensure the two are actually aligned and match a bit better).
Designed the internal structure of the data. We will keep the initial split tokens in lists and remove tags from these lists. Each one of the lists will have a metadata slot allocated somewhere (probably in a metadata list). I should research hashtables, trees, or whatever fast lookup data structure Prolog might have.

@@ Line 63: / Line 63: @@
 * Wrote stream tokenizer in Prolog.
 * Wrote token splitter in Prolog.
+=== 13.07.2014 ===
+* Implemented and tested output writer.
+* Implemented a trivial disambiguator that always selects the first reading. The number of mismatches from the hand tagged version lowered from 107 to 45 (this was kind of a test to ensure the two are actually  aligned and match a bit better).
+* Designed the internal structure of the data. We will keep the initial split tokens in lists and remove tags from these lists. Each one of the lists will have a metadata slot allocated somewhere (probably in a metadata list). I should research hashtables, trees, or whatever fast lookup data structure Prolog might have.