Difference between revisions of "Macedonian and Bulgarian"
Jump to navigation
Jump to search
(3 intermediate revisions by the same user not shown) | |||
Line 2: | Line 2: | ||
* Bulgarian dictionary: Some noun paradigms are not 100% complete, e.g. no vocative, nor subj/obj forms with def. art. |
* Bulgarian dictionary: Some noun paradigms are not 100% complete, e.g. no vocative, nor subj/obj forms with def. art. |
||
* Macedonian dictionary: Regenerate noun paradigms to have the article as <code><def></code> <code><ind></code> not with <code><j/></code> |
|||
* Bilingual dictionary: Exact matches, cognates, giza++ |
|||
==Plan== |
|||
<pre> |
|||
=========================================================== |
|||
7th June : Closed category words complete (bg, bg-mk, mk) |
|||
14th June : 5,000 high frequency words in the bg-mk dix |
|||
21st June : Macedonian (mk.dix) paradigms |
|||
28th June : Macedonian (mk.dix) words (from bidix) |
|||
4th July : Revision bg.dix |
|||
11th July : Revision and expansion bg.dix |
|||
18th July : 5,000 entries in (bg, bg-mk, mk).dix files |
|||
25th July : Testvoc for bg-mk and mk-bg |
|||
=========================================================== |
|||
Basic system working with word-for-word translation from |
|||
bg->mk and mk->bg. At this point an evaluation should be |
|||
done of ~1,000 words to calculate the word error rate. This |
|||
is the minimum amount of work expected. |
|||
=========================================================== |
|||
2nd August: Increase number of words to the top-6,500 (freq.) |
|||
9th August: Improve part-of-speech tagging (start work on bg |
|||
and mk constraint grammars). |
|||
16th August: Implement transfer rules in both directions based |
|||
on contrastive analysis. |
|||
23rd August: Increase number of words to top-7,500 (freq.) |
|||
30th August: Find and add frequent multiwords. |
|||
=========================================================== |
|||
</pre> |
|||
==See also== |
==See also== |
||
* [[/Pending tests]] |
* [[/Pending tests]] |
||
* [[/Regression tests]] |
|||
[[Category:Macedonian and Bulgarian|*]] |
[[Category:Macedonian and Bulgarian|*]] |
Latest revision as of 16:53, 17 June 2010
Todo list[edit]
- Bulgarian dictionary: Some noun paradigms are not 100% complete, e.g. no vocative, nor subj/obj forms with def. art.
- Macedonian dictionary: Regenerate noun paradigms to have the article as
<def>
<ind>
not with<j/>
- Bilingual dictionary: Exact matches, cognates, giza++
Plan[edit]
=========================================================== 7th June : Closed category words complete (bg, bg-mk, mk) 14th June : 5,000 high frequency words in the bg-mk dix 21st June : Macedonian (mk.dix) paradigms 28th June : Macedonian (mk.dix) words (from bidix) 4th July : Revision bg.dix 11th July : Revision and expansion bg.dix 18th July : 5,000 entries in (bg, bg-mk, mk).dix files 25th July : Testvoc for bg-mk and mk-bg =========================================================== Basic system working with word-for-word translation from bg->mk and mk->bg. At this point an evaluation should be done of ~1,000 words to calculate the word error rate. This is the minimum amount of work expected. =========================================================== 2nd August: Increase number of words to the top-6,500 (freq.) 9th August: Improve part-of-speech tagging (start work on bg and mk constraint grammars). 16th August: Implement transfer rules in both directions based on contrastive analysis. 23rd August: Increase number of words to top-7,500 (freq.) 30th August: Find and add frequent multiwords. ===========================================================