Indonesian and Malaysian/Work plan
Jump to navigation Jump to search
|Week||Dates||Main activities||Coverage reached (wp)||Trimmed coverage reached (wp)||Testvoc clean||Evaluation||WER reached|
|0||Translating the story to get a baseline WER.||500 words||4.68% (id->ms)|
|1||Working on Indonesian analyzer/generator.|
|2||Working on Indonesian analyzer/generator.
Translating Malaysian wikipedia articles to Indonesian to get a parallel corpus.
Bilingual dictionaries will be extracted from the corpus.
|3||Translating Malaysian wikipedia articles to Indonesian.
Working on Malaysian analyzer/generator.
|4||Working on Malaysian analyzer/generator and bidix.||75.6%, 72.9%|
|5||Working on Malaysian analyzer/generator and bidix.||80.1%, 77.5%||300 words||2.97% (ms->id)|
|6||Working on bidix.||80.1%, -||73.3%, -||
|7||Working on bidix.||80.3%, 77.1%||76.5%, 74.6%||500 words||24.34% (ms->id)|
|8||Parallel corpus development.|
|9||Working on bidix.|
|10||A little break during this period.|
|11||Working on bidix.|
|12||Cleaning up.||80.7%, 80.1%||80.7%, 80.1%||all categories clean||2,000 words||14.43% (id->ms), 7.58% (ms->id)|
Ideas for getting Indonesian-Malaysian bilingual dictionaries
- Filtering the Indonesian lemma list. For each lemma, check whether they are also valid Malaysian words.
- Interlanguage wiki links.
- Extracting bilingual dictionaries from parallel corpus.
- Building dictionaries
- Extracting bilingual dictionaries with Giza++
- Generating lexical-selection rules from a parallel corpus