Macedonian and Bulgarian
Jump to navigation
Jump to search
Todo list
- Bulgarian dictionary: Some noun paradigms are not 100% complete, e.g. no vocative, nor subj/obj forms with def. art.
- Macedonian dictionary: Regenerate noun paradigms to have the article as
<def>
<ind>
not with<j/>
- Bilingual dictionary: Exact matches, cognates, giza++
Plan
=========================================================== 7th June : Closed category words complete (bg, bg-mk, mk) 14th June : 5,000 high frequency words in the bg-mk dix 21st June : Macedonian (mk.dix) paradigms 28th June : Macedonian (mk.dix) words (from bidix) 4th July : Revision bg.dix 11th July : Revision and expansion bg.dix 18th July : 5,000 entries in (bg, bg-mk, mk).dix files 25th July : Testvoc for bg-mk and mk-bg =========================================================== Basic system working with word-for-word translation from bg->mk and mk->bg. At this point an evaluation should be done of ~1,000 words to calculate the word error rate. This is the minimum amount of work expected. =========================================================== 2nd August: Increase number of words to the top-6,500 (freq.) 9th August: Improve part-of-speech tagging (start work on bg and mk constraint grammars). 16th August: Implement transfer rules in both directions based on contrastive analysis. 23rd August: Increase number of words to top-7,500 (freq.) 30th August: Find and add frequent multiwords. ===========================================================