Todo list
- Bulgarian dictionary: Some noun paradigms are not 100% complete, e.g. no vocative, nor subj/obj forms with def. art.
- Macedonian dictionary: Regenerate noun paradigms to have the article as
<def>
<ind>
not with <j/>
- Bilingual dictionary: Exact matches, cognates, giza++
Plan
===========================================================
7th June : Closed category words complete (bg, bg-mk, mk)
14th June : 5,000 high frequency words in the bg-mk dix
21st June : Macedonian (mk.dix) paradigms
28th June : Macedonian (mk.dix) words (from bidix)
4th July : Revision bg.dix
11th July : Revision and expansion bg.dix
18th July : 5,000 entries in (bg, bg-mk, mk).dix files
25th July : Testvoc for bg-mk and mk-bg
===========================================================
Basic system working with word-for-word translation from
bg->mk and mk->bg. At this point an evaluation should be
done of ~1,000 words to calculate the word error rate. This
is the minimum amount of work expected.
===========================================================
2nd August: Increase number of words to the top-6,500 (freq.)
9th August: Improve part-of-speech tagging (start work on bg
and mk constraint grammars).
16th August: Implement transfer rules in both directions based
on contrastive analysis.
23rd August: Increase number of words to top-7,500 (freq.)
30th August: Find and add frequent multiwords.
===========================================================
See also