Macedonian and Bulgarian

From Apertium
Revision as of 16:53, 17 June 2010 by Francis Tyers (talk | contribs) (→‎Todo list)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Todo list[edit]

  • Bulgarian dictionary: Some noun paradigms are not 100% complete, e.g. no vocative, nor subj/obj forms with def. art.
  • Macedonian dictionary: Regenerate noun paradigms to have the article as <def> <ind> not with <j/>
  • Bilingual dictionary: Exact matches, cognates, giza++

Plan[edit]



===========================================================
 7th June  : Closed category words complete (bg, bg-mk, mk)
14th June  : 5,000 high frequency words in the bg-mk dix
21st June  : Macedonian (mk.dix) paradigms
28th June  : Macedonian (mk.dix) words (from bidix)
 4th July  : Revision bg.dix
11th July  : Revision and expansion bg.dix
18th July  : 5,000 entries in (bg, bg-mk, mk).dix files
25th July  : Testvoc for bg-mk and mk-bg
===========================================================
Basic system working with word-for-word translation from 
bg->mk and mk->bg. At this point an evaluation should be 
done of ~1,000 words to calculate the word error rate. This
is the minimum amount of work expected.
===========================================================
 2nd August: Increase number of words to the top-6,500 (freq.)
 9th August: Improve part-of-speech tagging (start work on bg
             and mk constraint grammars).
16th August: Implement transfer rules in both directions based
             on contrastive analysis.
23rd August: Increase number of words to top-7,500 (freq.)
30th August: Find and add frequent multiwords.
===========================================================

See also[edit]