Difference between revisions of "Macedonian and Bulgarian"

From Apertium
Jump to navigation Jump to search
 
(2 intermediate revisions by the same user not shown)
Line 3: Line 3:
 
* Bulgarian dictionary: Some noun paradigms are not 100% complete, e.g. no vocative, nor subj/obj forms with def. art.
 
* Bulgarian dictionary: Some noun paradigms are not 100% complete, e.g. no vocative, nor subj/obj forms with def. art.
 
* Macedonian dictionary: Regenerate noun paradigms to have the article as <code><def></code> <code><ind></code> not with <code><j/></code>
 
* Macedonian dictionary: Regenerate noun paradigms to have the article as <code><def></code> <code><ind></code> not with <code><j/></code>
  +
* Bilingual dictionary: Exact matches, cognates, giza++
  +
  +
==Plan==
  +
  +
<pre>
  +
  +
  +
===========================================================
  +
7th June : Closed category words complete (bg, bg-mk, mk)
  +
14th June : 5,000 high frequency words in the bg-mk dix
  +
21st June : Macedonian (mk.dix) paradigms
  +
28th June : Macedonian (mk.dix) words (from bidix)
  +
4th July : Revision bg.dix
  +
11th July : Revision and expansion bg.dix
  +
18th July : 5,000 entries in (bg, bg-mk, mk).dix files
  +
25th July : Testvoc for bg-mk and mk-bg
  +
===========================================================
  +
Basic system working with word-for-word translation from
  +
bg->mk and mk->bg. At this point an evaluation should be
  +
done of ~1,000 words to calculate the word error rate. This
  +
is the minimum amount of work expected.
  +
===========================================================
  +
2nd August: Increase number of words to the top-6,500 (freq.)
  +
9th August: Improve part-of-speech tagging (start work on bg
  +
and mk constraint grammars).
  +
16th August: Implement transfer rules in both directions based
  +
on contrastive analysis.
  +
23rd August: Increase number of words to top-7,500 (freq.)
  +
30th August: Find and add frequent multiwords.
  +
===========================================================
  +
  +
</pre>
  +
 
==See also==
 
==See also==
   
 
* [[/Pending tests]]
 
* [[/Pending tests]]
  +
* [[/Regression tests]]
   
 
[[Category:Macedonian and Bulgarian|*]]
 
[[Category:Macedonian and Bulgarian|*]]

Latest revision as of 16:53, 17 June 2010

Todo list[edit]

  • Bulgarian dictionary: Some noun paradigms are not 100% complete, e.g. no vocative, nor subj/obj forms with def. art.
  • Macedonian dictionary: Regenerate noun paradigms to have the article as <def> <ind> not with <j/>
  • Bilingual dictionary: Exact matches, cognates, giza++

Plan[edit]



===========================================================
 7th June  : Closed category words complete (bg, bg-mk, mk)
14th June  : 5,000 high frequency words in the bg-mk dix
21st June  : Macedonian (mk.dix) paradigms
28th June  : Macedonian (mk.dix) words (from bidix)
 4th July  : Revision bg.dix
11th July  : Revision and expansion bg.dix
18th July  : 5,000 entries in (bg, bg-mk, mk).dix files
25th July  : Testvoc for bg-mk and mk-bg
===========================================================
Basic system working with word-for-word translation from 
bg->mk and mk->bg. At this point an evaluation should be 
done of ~1,000 words to calculate the word error rate. This
is the minimum amount of work expected.
===========================================================
 2nd August: Increase number of words to the top-6,500 (freq.)
 9th August: Improve part-of-speech tagging (start work on bg
             and mk constraint grammars).
16th August: Implement transfer rules in both directions based
             on contrastive analysis.
23rd August: Increase number of words to top-7,500 (freq.)
30th August: Find and add frequent multiwords.
===========================================================

See also[edit]