Bengali and English/Anubadok

From Apertium
Jump to navigation Jump to search

Anubadok is an open source English to Bengali MT system developed by G M Hossain, currently in experimental stage.

The program is licensed under GPL. Its accessible from here.

Inflection Rules (BnSondhi.pm)

Legend

  • C Consonant
  • V Vowel
  • _ Any Letter
  • K Kar (Short form of Vowel)
  • G General Rule - The consonants and vowels in the example are exchangeable with any other consonants and vowels
  • S Special Rule - The consonants and vowels in the example are NOT exchangeable with any other consonants and vowels
  • H Hasanth - The joiner, e.g. ঙ+্+গ = ঙ্গ, as in মঙ্গল

__CC means the last two consonants of a word, and CV__ means the first letter of this word is consonant followed by a vowel

Rules

Verb Rules (Applies if the length of the first word is >= 3, rules are not exclusive e.g. one word can qualify for multiple rules)

  • G (C+ো+C)+(তে) = (C+ু+C+তে) e.g. খোল + তে = খুলতে
  • G (া+C+া) + (তে) = (া+C+া+তে) e.g. পাঠা + তে = পাঠাতে
  • G (C+ি)/(C+ে) + (ে+র)্(া+র) = (C+ে+ও+য়+া+র) e.g. নি + ের = নেওয়ার, নে + ার = নেওয়ার
  • G (C+া) + (তে) = (C+ে+তে) e.g. পা + তে = পেতে
  • G (C+ে) + (তে) = (C+ি+তে) e.g. দে + তে = দিতে
  • G (C+C) + (ে+র) = (C+C+া+র) e.g. কর + ের = করার
  • G (__K1) + (K2__) = (__K1__) [needs review]

Preposition Rules

(Applies if the length of the first word is >= 2, rules are not exclusive e.g. one word can qualify for multiple rules)

  • G (ল+ে+C) -> (ল+ি+C) e.g. লেখ -> লিখ
  • G (এ+ই) + (টা) = (এটি) e.g. এই + টি = এটি
  • S (_+ক+ে) + (ে+র) = (_+র) e.g. আমাকে + ের = আমার
  • G (C+C) + (ৈ+র) = (C+C+া+র) e.g. কর + এর = করার
  • G (C/V/H/K + C) + (তে) = (C/V/H/K + C + ে) [needs review]
  • S (ং) + (ে+র) = (ং+এ+র) e.g. সং + ের = সংএর
  • G (__K1) + (K2__) = (__K1__) [needs review]

Main Rules

(Applies if the length of the first word is >= 2, rules are not exclusive e.g. one word can qualify for multiple rules)

  • G (K+C) + (তে) = (K+C+ে) e.g. মার + তে = মারে [need review]
  • G (__K1) + (K2__) = (__K1__) [needs review]

Sondhi for progressive tag POS

  • G (K) + (_) = (K)

Basic Verb Shondhi

  • G (_+ি) + (_) = (_+ে+ও+য়+_)

Verb Shondhi for passive sentence

  • G (C/K + C) + (_) = (C/K+C+_) else
  • G (_+ি) + (_) = (_+ে+ও+য়+_)

Tag Set

Anubadok uses Penn Treebank Tag Set, the tag set is as follows:

  1. CC Coordinating conjunction
  2. CD Cardinal number
  3. DT Determiner
  4. EX Existential there
  5. FW Foreign word
  6. IN Preposition or subordinating conjunction
  7. JJ Adjective
  8. JJR Adjective, comparative
  9. JJS Adjective, superlative
  10. LS List item marker
  11. MD Modal
  12. NN Noun, singular or mass
  13. NNS Noun, plural
  14. NP Proper noun, singular
  15. NPS Proper noun, plural
  16. PDT Predeterminer
  17. POS Possessive ending
  18. PP Personal pronoun
  19. PP$ Possessive pronoun
  20. RB Adverb
  21. RBR Adverb, comparative
  22. RBS Adverb, superlative
  23. RP Particle
  24. SYM Symbol
  25. TO to
  26. UH Interjection
  27. VB Verb, base form
  28. VBD Verb, past tense
  29. VBG Verb, gerund or present participle
  30. VBN Verb, past participle
  31. VBP Verb, non-3rd person singular present
  32. VBZ Verb, 3rd person singular present
  33. WDT Wh-determiner
  34. WP Wh-pronoun
  35. WP$ Possessive wh-pronoun
  36. WRB Wh-adverb