Difference between revisions of "Indonesian and Malaysian/Work plan"

From Apertium
Jump to navigation Jump to search
 
Line 39: Line 39:
 
# Interlanguage wiki links.
 
# Interlanguage wiki links.
 
# Extracting bilingual dictionaries from parallel corpus.
 
# Extracting bilingual dictionaries from parallel corpus.
  +
<!--
 
 
==Todo list==
 
==Todo list==
 
# <s>Convert the Malaysian dictionary to Apertium format</s>
 
# <s>Convert the Malaysian dictionary to Apertium format</s>
Line 52: Line 52:
 
# Check from the inflected and derived form, whether the lemma has been added as a separate entry
 
# Check from the inflected and derived form, whether the lemma has been added as a separate entry
 
# ke-an variations -> better tag naming
 
# ke-an variations -> better tag naming
  +
-->
   
 
==See also==
 
==See also==

Latest revision as of 17:05, 22 August 2012

This is a workplan for development efforts for the Indonesian and Malaysian translator in Google Summer of Code 2012.

Work plan[edit]

Week Dates Main activities Coverage reached (wp) Trimmed coverage reached (wp) Testvoc clean Evaluation WER reached
0 23/04—21/05 Translating the story to get a baseline WER. 500 words 4.68% (id->ms)
1 21/05—27/05 Working on Indonesian analyzer/generator.
2 28/05—03/06 Working on Indonesian analyzer/generator.
Translating Malaysian wikipedia articles to Indonesian to get a parallel corpus.
Bilingual dictionaries will be extracted from the corpus.
72.9%, 29.9%
3 04/06—10/06 Translating Malaysian wikipedia articles to Indonesian.
Working on Malaysian analyzer/generator.
74.9%, 46.4%
4 11/06—17/06 Working on Malaysian analyzer/generator and bidix. 75.6%, 72.9%
5 18/06—24/06 Working on Malaysian analyzer/generator and bidix. 80.1%, 77.5% 300 words 2.97% (ms->id)
6 25/06—01/07 Working on bidix. 80.1%, - 73.3%, - <ij> <cnjcoo> <cnjsub> <cnjadv> <det> <pr> <num> <prn> <np> <adv>
7 02/07—08/07 Working on bidix. 80.3%, 77.1% 76.5%, 74.6% 500 words 24.34% (ms->id)
8 09/07—15/07 Parallel corpus development.
9 16/07—22/07 Working on bidix.
10 23/07—29/07 A little break during this period.
11 30/07—5/08 Working on bidix.
12 6/08—12/08 Cleaning up. 80.7%, 80.1% 80.7%, 80.1% all categories clean 2,000 words 14.43% (id->ms), 7.58% (ms->id)

Ideas for getting Indonesian-Malaysian bilingual dictionaries[edit]

  1. Filtering the Indonesian lemma list. For each lemma, check whether they are also valid Malaysian words.
  2. Interlanguage wiki links.
  3. Extracting bilingual dictionaries from parallel corpus.

See also[edit]

External links[edit]