Difference between revisions of "Macedonian and Bulgarian"

From Apertium
Jump to navigation Jump to search
 
(One intermediate revision by the same user not shown)
Line 5: Line 5:
* Bilingual dictionary: Exact matches, cognates, giza++
* Bilingual dictionary: Exact matches, cognates, giza++


==Plan==

<pre>


===========================================================
7th June : Closed category words complete (bg, bg-mk, mk)
14th June : 5,000 high frequency words in the bg-mk dix
21st June : Macedonian (mk.dix) paradigms
28th June : Macedonian (mk.dix) words (from bidix)
4th July : Revision bg.dix
11th July : Revision and expansion bg.dix
18th July : 5,000 entries in (bg, bg-mk, mk).dix files
25th July : Testvoc for bg-mk and mk-bg
===========================================================
Basic system working with word-for-word translation from
bg->mk and mk->bg. At this point an evaluation should be
done of ~1,000 words to calculate the word error rate. This
is the minimum amount of work expected.
===========================================================
2nd August: Increase number of words to the top-6,500 (freq.)
9th August: Improve part-of-speech tagging (start work on bg
and mk constraint grammars).
16th August: Implement transfer rules in both directions based
on contrastive analysis.
23rd August: Increase number of words to top-7,500 (freq.)
30th August: Find and add frequent multiwords.
===========================================================

</pre>


==See also==
==See also==


* [[/Pending tests]]
* [[/Pending tests]]
* [[/Regression tests]]


[[Category:Macedonian and Bulgarian|*]]
[[Category:Macedonian and Bulgarian|*]]

Latest revision as of 16:53, 17 June 2010

Todo list[edit]

  • Bulgarian dictionary: Some noun paradigms are not 100% complete, e.g. no vocative, nor subj/obj forms with def. art.
  • Macedonian dictionary: Regenerate noun paradigms to have the article as <def> <ind> not with <j/>
  • Bilingual dictionary: Exact matches, cognates, giza++

Plan[edit]



===========================================================
 7th June  : Closed category words complete (bg, bg-mk, mk)
14th June  : 5,000 high frequency words in the bg-mk dix
21st June  : Macedonian (mk.dix) paradigms
28th June  : Macedonian (mk.dix) words (from bidix)
 4th July  : Revision bg.dix
11th July  : Revision and expansion bg.dix
18th July  : 5,000 entries in (bg, bg-mk, mk).dix files
25th July  : Testvoc for bg-mk and mk-bg
===========================================================
Basic system working with word-for-word translation from 
bg->mk and mk->bg. At this point an evaluation should be 
done of ~1,000 words to calculate the word error rate. This
is the minimum amount of work expected.
===========================================================
 2nd August: Increase number of words to the top-6,500 (freq.)
 9th August: Improve part-of-speech tagging (start work on bg
             and mk constraint grammars).
16th August: Implement transfer rules in both directions based
             on contrastive analysis.
23rd August: Increase number of words to top-7,500 (freq.)
30th August: Find and add frequent multiwords.
===========================================================

See also[edit]