Indonesian and Malaysian/Work plan
 Work plan
|Week||Dates||Main activities||Coverage reached (wp)||Trimmed coverage reached (wp)||Testvoc clean||Evaluation||WER reached|
|0|| ||Translating the story to get a baseline WER.||500 words||4.68% (id->ms)|
|1|| ||Working on Indonesian analyzer/generator.|
|2|| || Working on Indonesian analyzer/generator.|
Translating Malaysian wikipedia articles to Indonesian to get a parallel corpus.
Bilingual dictionaries will be extracted from the corpus.
|3|| || Translating Malaysian wikipedia articles to Indonesian.|
Working on Malaysian analyzer/generator.
|4|| ||Working on Malaysian analyzer/generator and bidix.||75.6%, 72.9%|
|5|| ||Working on Malaysian analyzer/generator and bidix.||80.1%, 77.5%||300 words||2.97% (ms->id)|
|6|| ||Working on bidix.||80.1%, -||73.3%, -|| |
|7|| ||Working on bidix.||80.3%, 77.1%||76.5%, 74.6%||500 words||24.34% (ms->id)|
|8|| ||Parallel corpus development.|
|9|| ||Working on bidix.|
|10|| ||A little break during this period.|
|11|| ||Working on bidix.|
|12|| ||Cleaning up.||80.7%, 80.1%||80.7%, 80.1%||all categories clean||2,000 words||14.43% (id->ms), 7.58% (ms->id)|
 Ideas for getting Indonesian-Malaysian bilingual dictionaries
- Filtering the Indonesian lemma list. For each lemma, check whether they are also valid Malaysian words.
- Interlanguage wiki links.
- Extracting bilingual dictionaries from parallel corpus.
 See also
- Building dictionaries
- Extracting bilingual dictionaries with Giza++
- Generating lexical-selection rules from a parallel corpus