Indonesian and Malaysian/Work plan
Jump to navigation
Jump to search
This is a workplan for development efforts for the Indonesian and Malaysian translator in Google Summer of Code 2012.
Contents
Work plan[edit]
Week | Dates | Main activities | Coverage reached (wp) | Trimmed coverage reached (wp) | Testvoc clean | Evaluation | WER reached |
---|---|---|---|---|---|---|---|
0 | Translating the story to get a baseline WER. | 500 words | 4.68% (id->ms) | ||||
1 | Working on Indonesian analyzer/generator. | ||||||
2 | Working on Indonesian analyzer/generator. Translating Malaysian wikipedia articles to Indonesian to get a parallel corpus. Bilingual dictionaries will be extracted from the corpus. |
72.9%, 29.9% | |||||
3 | Translating Malaysian wikipedia articles to Indonesian. Working on Malaysian analyzer/generator. |
74.9%, 46.4% | |||||
4 | Working on Malaysian analyzer/generator and bidix. | 75.6%, 72.9% | |||||
5 | Working on Malaysian analyzer/generator and bidix. | 80.1%, 77.5% | 300 words | 2.97% (ms->id) | |||
6 | Working on bidix. | 80.1%, - | 73.3%, - | <ij> <cnjcoo> <cnjsub> <cnjadv> <det> <pr> <num> <prn> <np> <adv> |
|||
7 | Working on bidix. | 80.3%, 77.1% | 76.5%, 74.6% | 500 words | 24.34% (ms->id) | ||
8 | Parallel corpus development. | ||||||
9 | Working on bidix. | ||||||
10 | A little break during this period. | ||||||
11 | Working on bidix. | ||||||
12 | Cleaning up. | 80.7%, 80.1% | 80.7%, 80.1% | all categories clean | 2,000 words | 14.43% (id->ms), 7.58% (ms->id) |
Ideas for getting Indonesian-Malaysian bilingual dictionaries[edit]
- Filtering the Indonesian lemma list. For each lemma, check whether they are also valid Malaysian words.
- Interlanguage wiki links.
- Extracting bilingual dictionaries from parallel corpus.
See also[edit]
- Building dictionaries
- Extracting bilingual dictionaries with Giza++
- Generating lexical-selection rules from a parallel corpus