Difference between revisions of "Indonesian and Malaysian/Work plan"
Jump to navigation
Jump to search
Line 15: | Line 15: | ||
| 4 || 11/06—17/06 || Working on Malaysian analyzer/generator and bidix. || || - || - |
| 4 || 11/06—17/06 || Working on Malaysian analyzer/generator and bidix. || || - || - |
||
|} |
|} |
||
==Ideas for getting Indonesian-Malaysian bilingual dictionaries== |
|||
# Filtering the Indonesian lemma list. For each lemma, check whether they are also valid Malaysian words. |
|||
# Interlanguage wiki links. |
|||
# Extracting bilingual dictionaries from parallel corpus. |
|||
==Todo list== |
|||
# <s>Convert the Malaysian dictionary to Apertium format</s> |
|||
# <s>Create a script to get Indonesian word list</s> |
|||
# <s>Adding missing words from the story</s> |
|||
# <s>Adding conjunctives and interjections</s> |
|||
# Assigning correct parameter which will be reduplicated, for verbs with meN- (id) |
|||
# Passive form for verbs with meN- (id) (Done: V -> V no suffix; N -> V -kan) |
|||
==External links== |
|||
* [http://pusatbahasa.kemdiknas.go.id/kbbi/ KBBI Daring] |
|||
* [http://prpm.dbp.gov.my/ PRPM's website] |
|||
* [http://kateglo.bahtera.org/api.php kateglo's website] |
|||
* [http://opus.lingfil.uu.se/ OPUS project] |
|||
* [http://wiki.apertium.org/wiki/Building_dictionaries#Getting_cheap_bilingual_dictionary_entries Getting cheap bilingual dictionary entries] |
|||
* [http://wiki.apertium.org/wiki/Extracting_bilingual_dictionaries_with_Giza%2B%2B Extracting bilingual dictionaries with Giza++] |
|||
[[Category:Indonesian and Malaysian]] |
[[Category:Indonesian and Malaysian]] |
Revision as of 20:32, 10 June 2012
This is a workplan for development efforts for the Indonesian and Malaysian translator in Google Summer of Code 2012.
Week | Dates | Activities | Coverage reached (wp) | Evaluation | WER reached |
---|---|---|---|---|---|
0 | 23/04—21/05 | Translating the story to get a baseline WER. | - | 500 words | 4.68% |
1 | 21/05—27/05 | Working on Indonesian analyzer/generator. | - | - | - |
2 | 28/05—03/06 | Working on Indonesian analyzer/generator. Translating Malaysian wikipedia articles to Indonesian to get a parallel corpus. Bilingual dictionaries will be extracted from the corpus. |
72.9%, 29.9% | - | - |
3 | 04/06—10/06 | Translating Malaysian wikipedia articles to Indonesian. Working on Malaysian analyzer/generator. |
74.9%, 46.4% | - | - |
4 | 11/06—17/06 | Working on Malaysian analyzer/generator and bidix. | - | - |
Ideas for getting Indonesian-Malaysian bilingual dictionaries
- Filtering the Indonesian lemma list. For each lemma, check whether they are also valid Malaysian words.
- Interlanguage wiki links.
- Extracting bilingual dictionaries from parallel corpus.
Todo list
Convert the Malaysian dictionary to Apertium formatCreate a script to get Indonesian word listAdding missing words from the storyAdding conjunctives and interjections- Assigning correct parameter which will be reduplicated, for verbs with meN- (id)
- Passive form for verbs with meN- (id) (Done: V -> V no suffix; N -> V -kan)