Difference between revisions of "Bengali and English/Anubadok"

From Apertium
Jump to navigation Jump to search
m
m
Line 18: Line 18:
Verb Rules
Verb Rules


*G (C+ো+C)+(তে) = (C+ু+C+তে) eg খোল + তে = খুলতে
*G (C+ো+C)+(তে) = (C+ু+C+তে) e.g. খোল + তে = খুলতে


*G (া+C+া) + (তে) = (া+C+া+তে) eg পাঠা + তে = পাঠাতে
*G (া+C+া) + (তে) = (া+C+া+তে) e.g. পাঠা + তে = পাঠাতে


*G (C+ি)/(C+ে) + (ে+র)্(া+র) = C+ে+ও+য়+া+র eg নি + ের = নেওয়ার, নে + ার = নেওয়ার
*G (C+ি)/(C+ে) + (ে+র)্(া+র) = (C+ে+ও+য়+া+র) e.g. নি + ের = নেওয়ার, নে + ার = নেওয়ার


*G (C+া) + (তে) = (C+ে+তে) eg পা + তে = পেতে
*G (C+া) + (তে) = (C+ে+তে) e.g. পা + তে = পেতে
*G (C+ে) + (তে) = (C+ি+তে) eg দে + তে = দিতে
*G (C+ে) + (তে) = (C+ি+তে) e.g. দে + তে = দিতে
*G কর + ের = করার
*G (C+C) + (ে+র) = (C+C+া+র) e.g. কর + ের = করার
*G __K1 + K2__ = __K1__
*G __K1 + K2__ = __K1__



Revision as of 19:30, 10 April 2009

Anubadok is an open source English to Bengali MT system developed by G M Hossain, currently in experimental stage.

The program is licensed under GPL. Its accessible from here.

Inflection Rules (BnSondhi.pm)

Legend

  • C Consonant
  • V Vowel
  • _ Any Letter
  • K Kar (Short form of Vowel)
  • G General Rule - The consonants and vowels in the example are exchangeable with any other consonants and vowels
  • S Special Rule - The consonants and vowels in the example are NOT exchangeable with any other consonants and vowels

__CC will mean the last two consonants of a word, and CV__ means the first letter of this word is consonant followed by a vowel

Rules

Verb Rules

  • G (C+ো+C)+(তে) = (C+ু+C+তে) e.g. খোল + তে = খুলতে
  • G (া+C+া) + (তে) = (া+C+া+তে) e.g. পাঠা + তে = পাঠাতে
  • G (C+ি)/(C+ে) + (ে+র)্(া+র) = (C+ে+ও+য়+া+র) e.g. নি + ের = নেওয়ার, নে + ার = নেওয়ার
  • G (C+া) + (তে) = (C+ে+তে) e.g. পা + তে = পেতে
  • G (C+ে) + (তে) = (C+ি+তে) e.g. দে + তে = দিতে
  • G (C+C) + (ে+র) = (C+C+া+র) e.g. কর + ের = করার
  • G __K1 + K2__ = __K1__

Preposition Rules

  • S লেখ -> লিখ
  • S এই + টি = এটি
  • S __কে + এর = __ার


Tag Set

Anubadok uses Penn Treebank Tag Set, the tag set is as follows:

  1. CC Coordinating conjunction
  2. CD Cardinal number
  3. DT Determiner
  4. EX Existential there
  5. FW Foreign word
  6. IN Preposition or subordinating conjunction
  7. JJ Adjective
  8. JJR Adjective, comparative
  9. JJS Adjective, superlative
  10. LS List item marker
  11. MD Modal
  12. NN Noun, singular or mass
  13. NNS Noun, plural
  14. NP Proper noun, singular
  15. NPS Proper noun, plural
  16. PDT Predeterminer
  17. POS Possessive ending
  18. PP Personal pronoun
  19. PP$ Possessive pronoun
  20. RB Adverb
  21. RBR Adverb, comparative
  22. RBS Adverb, superlative
  23. RP Particle
  24. SYM Symbol
  25. TO to
  26. UH Interjection
  27. VB Verb, base form
  28. VBD Verb, past tense
  29. VBG Verb, gerund or present participle
  30. VBN Verb, past participle
  31. VBP Verb, non-3rd person singular present
  32. VBZ Verb, 3rd person singular present
  33. WDT Wh-determiner
  34. WP Wh-pronoun
  35. WP$ Possessive wh-pronoun
  36. WRB Wh-adverb