Difference between revisions of "User:Asfrent/MSc Log"

From Apertium
Jump to navigation Jump to search
Line 7: Line 7:
{| class="wikitable"
{| class="wikitable"
!Pending
!Pending
!Estimated date
!Notes
!Notes
|- style="background-color: #FFF68F;" |
|- style="background-color: #BCED91;" |
| Write a DCG for the apertium stream format.
| Write a DCG for the apertium stream format.
|
|
|
|- style="background-color: #FFF68F;" |
|- style="background-color: #FFF68F;" |
| Research UTF and Prolog.
| Research UTF and Prolog.
|
|
|- style="background-color: #BCED91;" |
|
| Write a simple PoS disambiguator that keeps only the first reading.
|- style="background-color: #FFF68F;" |
| Write a simple PoS disambiguator that makes a random choice.
| A '''random''' disambiguator would also be useful.
|
|
|- style="background-color: #FFF68F;" |
|- style="background-color: #FFF68F;" |
| Set up a repository for the project.
| Set up a repository for the project.
|
|
|
|- style="background-color: #FFF68F;" |
|- style="background-color: #FFF68F;" |
| Check licensing of MIL code.
| Check licensing of MIL code.
| No answer by email. :-(
|
|- style="background-color: #BCED91;" |
|
|- style="background-color: #FFF68F;" |
| Design internal representation of the input data.
| Design internal representation of the input data.
|
|
|
|- style="background-color: #FFF68F;" |
|- style="background-color: #FFF68F;" |
| Design rules.
| Design rules.
|
|
|
|- style="background-color: #FFF68F;" |
|- style="background-color: #FFF68F;" |
| Implement basic predicates.
| Implement basic predicates.
|
|
|
|- style="background-color: #FFF68F;" |
|- style="background-color: #FFF68F;" |
| Learn rules using MIL.
| Learn rules using MIL.
|
|
|- style="background-color: #FFF68F;" |
| Write a python script that aligns and tests two outputs (handtagged vs disambiguated).
|
|
|}
|}

Revision as of 23:13, 13 July 2014

MSc

Plan, questions, stuff

Short term plan / Pendings

Pending Notes
Write a DCG for the apertium stream format.
Research UTF and Prolog.
Write a simple PoS disambiguator that keeps only the first reading. A random disambiguator would also be useful.
Set up a repository for the project.
Check licensing of MIL code. No answer by email. :-(
Design internal representation of the input data.
Design rules.
Implement basic predicates.
Learn rules using MIL.
Write a python script that aligns and tests two outputs (handtagged vs disambiguated).

Questions

Log

11.07.2014

  • Read ILP paper from Francis.
  • Got MIL code, did a few tests.
  • Tracked down and downloaded test data from Apertium project for the tagger.
  • Read about tagging, CG and rules.
  • Wrote a Prolog script that reads all the lines from a file.

12.07.2014

  • Started to read CG docs in order to make the design of the data structures.
  • Did a bit of research on Prolog DCG.
  • Wrote stream tokenizer in Prolog.
  • Wrote token splitter in Prolog.

13.07.2014

  • Implemented and tested output writer.
  • Implemented a trivial disambiguator that always selects the first reading. The number of mismatches from the hand tagged version lowered from 107 to 45 (this was kind of a test to ensure the two are actually aligned and match a bit better).
  • Designed the internal structure of the data. We will keep the initial split tokens in lists and remove tags from these lists. Each one of the lists will have a metadata slot allocated somewhere (probably in a metadata list). I should research hashtables, trees, or whatever fast lookup data structure Prolog might have.