Difference between revisions of "User:Asfrent/MSc Log"
Jump to navigation
Jump to search
(→Log) |
|||
Line 7: | Line 7: | ||
{| class="wikitable" |
{| class="wikitable" |
||
!Pending |
!Pending |
||
!Estimated date |
|||
!Notes |
!Notes |
||
|- style="background-color: # |
|- style="background-color: #BCED91;" | |
||
| Write a DCG for the apertium stream format. |
| Write a DCG for the apertium stream format. |
||
| |
|||
| |
| |
||
|- style="background-color: #FFF68F;" | |
|- style="background-color: #FFF68F;" | |
||
| Research UTF and Prolog. |
| Research UTF and Prolog. |
||
| |
| |
||
⚫ | |||
| |
|||
| Write a simple PoS disambiguator that keeps only the first reading. |
|||
⚫ | |||
| |
| A '''random''' disambiguator would also be useful. |
||
| |
|||
| |
|||
|- style="background-color: #FFF68F;" | |
|- style="background-color: #FFF68F;" | |
||
| Set up a repository for the project. |
| Set up a repository for the project. |
||
| |
|||
| |
| |
||
|- style="background-color: #FFF68F;" | |
|- style="background-color: #FFF68F;" | |
||
| Check licensing of MIL code. |
| Check licensing of MIL code. |
||
| No answer by email. :-( |
|||
| |
|||
⚫ | |||
| |
|||
⚫ | |||
| Design internal representation of the input data. |
| Design internal representation of the input data. |
||
| |
|||
| |
| |
||
|- style="background-color: #FFF68F;" | |
|- style="background-color: #FFF68F;" | |
||
| Design rules. |
| Design rules. |
||
| |
|||
| |
| |
||
|- style="background-color: #FFF68F;" | |
|- style="background-color: #FFF68F;" | |
||
| Implement basic predicates. |
| Implement basic predicates. |
||
| |
|||
| |
| |
||
|- style="background-color: #FFF68F;" | |
|- style="background-color: #FFF68F;" | |
||
| Learn rules using MIL. |
| Learn rules using MIL. |
||
| |
| |
||
|- style="background-color: #FFF68F;" | |
|||
| Write a python script that aligns and tests two outputs (handtagged vs disambiguated). |
|||
| |
| |
||
|} |
|} |
Revision as of 23:13, 13 July 2014
Contents
MSc
Plan, questions, stuff
Short term plan / Pendings
Pending | Notes |
---|---|
Write a DCG for the apertium stream format. | |
Research UTF and Prolog. | |
Write a simple PoS disambiguator that keeps only the first reading. | A random disambiguator would also be useful. |
Set up a repository for the project. | |
Check licensing of MIL code. | No answer by email. :-( |
Design internal representation of the input data. | |
Design rules. | |
Implement basic predicates. | |
Learn rules using MIL. | |
Write a python script that aligns and tests two outputs (handtagged vs disambiguated). |
Questions
Log
11.07.2014
- Read ILP paper from Francis.
- Got MIL code, did a few tests.
- Tracked down and downloaded test data from Apertium project for the tagger.
- Read about tagging, CG and rules.
- Wrote a Prolog script that reads all the lines from a file.
12.07.2014
- Started to read CG docs in order to make the design of the data structures.
- Did a bit of research on Prolog DCG.
- Wrote stream tokenizer in Prolog.
- Wrote token splitter in Prolog.
13.07.2014
- Implemented and tested output writer.
- Implemented a trivial disambiguator that always selects the first reading. The number of mismatches from the hand tagged version lowered from 107 to 45 (this was kind of a test to ensure the two are actually aligned and match a bit better).
- Designed the internal structure of the data. We will keep the initial split tokens in lists and remove tags from these lists. Each one of the lists will have a metadata slot allocated somewhere (probably in a metadata list). I should research hashtables, trees, or whatever fast lookup data structure Prolog might have.