Difference between revisions of "Bengali and English/Anubadok"

From Apertium
Jump to navigation Jump to search
m
Line 1: Line 1:
{{TOCD}}
Anubadok is an open source English to Bengali MT system developed by G M Hossain, currently in experimental stage.
Anubadok is an open source English to Bengali MT system developed by G M Hossain, currently in experimental stage.


Line 17: Line 18:
=== Rules ===
=== Rules ===


Verb Rules (Applies if the length of the first word is >= 3, rules are not exclusive e.g. one word can qualify for multiple rules)
;Verb Rules (Applies if the length of the first word is >= 3, rules are not exclusive e.g. one word can qualify for multiple rules)


*G (C+ো+C)+(তে) = (C+ু+C+তে) e.g. খোল + তে = খুলতে
*G (C+ো+C)+(তে) = (C+ু+C+তে) e.g. খোল + তে = খুলতে
Line 33: Line 34:
*G (__K1) + (K2__) = (__K1__) [needs review]
*G (__K1) + (K2__) = (__K1__) [needs review]


Preposition Rules
;Preposition Rules


(Applies if the length of the first word is >= 2, rules are not exclusive e.g. one word can qualify for multiple rules)
(Applies if the length of the first word is >= 2, rules are not exclusive e.g. one word can qualify for multiple rules)
Line 51: Line 52:
*G (__K1) + (K2__) = (__K1__) [needs review]
*G (__K1) + (K2__) = (__K1__) [needs review]


Main Rules
;Main Rules


(Applies if the length of the first word is >= 2, rules are not exclusive e.g. one word can qualify for multiple rules)
(Applies if the length of the first word is >= 2, rules are not exclusive e.g. one word can qualify for multiple rules)
Line 59: Line 60:
*G (__K1) + (K2__) = (__K1__) [needs review]
*G (__K1) + (K2__) = (__K1__) [needs review]


Sondhi for progressive tag POS
;Sondhi for progressive tag POS


*G (K) + (_) = (K)
*G (K) + (_) = (K)


Basic Verb Shondhi
;Basic Verb Shondhi


*G (_+ি) + (_) = (_+ে+ও+য়+_)
*G (_+ি) + (_) = (_+ে+ও+য়+_)


Verb Shondhi for passive sentence
;Verb Shondhi for passive sentence


*G (C/K + C) + (_) = (C/K+C+_) else
*G (C/K + C) + (_) = (C/K+C+_) else
Line 73: Line 74:
*G (_+ি) + (_) = (_+ে+ও+য়+_)
*G (_+ি) + (_) = (_+ে+ও+য়+_)


Verb Shondhi for active sentence
;Verb Shondhi for active sentence


(Applies if the length of the first word is >= 1, rules are not exclusive e.g. one word can qualify for multiple rules)
(Applies if the length of the first word is >= 1, rules are not exclusive e.g. one word can qualify for multiple rules)
Line 83: Line 84:
*G (ল+ে+C) -> (ল+ি+C) e.g. লেখ -> লিখ [why anubadok repeats this rule?]
*G (ল+ে+C) -> (ল+ি+C) e.g. লেখ -> লিখ [why anubadok repeats this rule?]


3rd person, present simple
*3rd person, present simple
2nd person, present simple
*2nd person, present simple
1st/2nd/3rd person, future simple
*1st/2nd/3rd person, future simple
1st/2nd/3rd person, past simple
*1st/2nd/3rd person, past simple
1st person, present simple
*1st person, present simple
1st/2nd/3rd person, present/past continuous
*1st/2nd/3rd person, present/past continuous
1st/2nd/3rd person, present/past, perfect
*1st/2nd/3rd person, present/past, perfect


== Tag Set ==
== Tag Set ==

Revision as of 08:15, 11 April 2009

Anubadok is an open source English to Bengali MT system developed by G M Hossain, currently in experimental stage.

The program is licensed under GPL. Its accessible from here.

Inflection Rules (BnSondhi.pm)

Legend

  • C Consonant
  • V Vowel
  • _ Any Letter
  • K Kar (Short form of Vowel)
  • G General Rule - The consonants and vowels in the example are exchangeable with any other consonants and vowels
  • S Special Rule - The consonants and vowels in the example are NOT exchangeable with any other consonants and vowels
  • H Hasanth - The joiner, e.g. ঙ+্+গ = ঙ্গ, as in মঙ্গল

__CC means the last two consonants of a word, and CV__ means the first letter of this word is consonant followed by a vowel

Rules

Verb Rules (Applies if the length of the first word is >= 3, rules are not exclusive e.g. one word can qualify for multiple rules)
  • G (C+ো+C)+(তে) = (C+ু+C+তে) e.g. খোল + তে = খুলতে
  • G (া+C+া) + (তে) = (া+C+া+তে) e.g. পাঠা + তে = পাঠাতে
  • G (C+ি)/(C+ে) + (ে+র)্(া+র) = (C+ে+ও+য়+া+র) e.g. নি + ের = নেওয়ার, নে + ার = নেওয়ার
  • G (C+া) + (তে) = (C+ে+তে) e.g. পা + তে = পেতে
  • G (C+ে) + (তে) = (C+ি+তে) e.g. দে + তে = দিতে
  • G (C+C) + (ে+র) = (C+C+া+র) e.g. কর + ের = করার
  • G (__K1) + (K2__) = (__K1__) [needs review]
Preposition Rules

(Applies if the length of the first word is >= 2, rules are not exclusive e.g. one word can qualify for multiple rules)

  • G (ল+ে+C) -> (ল+ি+C) e.g. লেখ -> লিখ
  • G (এ+ই) + (টা) = (এটি) e.g. এই + টি = এটি
  • S (_+ক+ে) + (ে+র) = (_+র) e.g. আমাকে + ের = আমার
  • G (C+C) + (ৈ+র) = (C+C+া+র) e.g. কর + এর = করার
  • G (C/V/H/K + C) + (তে) = (C/V/H/K + C + ে) [needs review]
  • S (ং) + (ে+র) = (ং+এ+র) e.g. সং + ের = সংএর
  • G (__K1) + (K2__) = (__K1__) [needs review]
Main Rules

(Applies if the length of the first word is >= 2, rules are not exclusive e.g. one word can qualify for multiple rules)

  • G (K+C) + (তে) = (K+C+ে) e.g. মার + তে = মারে [need review]
  • G (__K1) + (K2__) = (__K1__) [needs review]
Sondhi for progressive tag POS
  • G (K) + (_) = (K)
Basic Verb Shondhi
  • G (_+ি) + (_) = (_+ে+ও+য়+_)
Verb Shondhi for passive sentence
  • G (C/K + C) + (_) = (C/K+C+_) else
  • G (_+ি) + (_) = (_+ে+ও+য়+_)
Verb Shondhi for active sentence

(Applies if the length of the first word is >= 1, rules are not exclusive e.g. one word can qualify for multiple rules)

  • G (_ে) + (ো) = (_া+ও) e.g. নে + ো = নাও
  • G (C+ে) + (_) = (C+ি+_) e.g. দে -> দি
  • G (ল+ে+C) -> (ল+ি+C) e.g. লেখ -> লিখ [why anubadok repeats this rule?]
  • 3rd person, present simple
  • 2nd person, present simple
  • 1st/2nd/3rd person, future simple
  • 1st/2nd/3rd person, past simple
  • 1st person, present simple
  • 1st/2nd/3rd person, present/past continuous
  • 1st/2nd/3rd person, present/past, perfect

Tag Set

Anubadok uses Penn Treebank Tag Set, the tag set is as follows:

  1. CC Coordinating conjunction
  2. CD Cardinal number
  3. DT Determiner
  4. EX Existential there
  5. FW Foreign word
  6. IN Preposition or subordinating conjunction
  7. JJ Adjective
  8. JJR Adjective, comparative
  9. JJS Adjective, superlative
  10. LS List item marker
  11. MD Modal
  12. NN Noun, singular or mass
  13. NNS Noun, plural
  14. NP Proper noun, singular
  15. NPS Proper noun, plural
  16. PDT Predeterminer
  17. POS Possessive ending
  18. PP Personal pronoun
  19. PP$ Possessive pronoun
  20. RB Adverb
  21. RBR Adverb, comparative
  22. RBS Adverb, superlative
  23. RP Particle
  24. SYM Symbol
  25. TO to
  26. UH Interjection
  27. VB Verb, base form
  28. VBD Verb, past tense
  29. VBG Verb, gerund or present participle
  30. VBN Verb, past participle
  31. VBP Verb, non-3rd person singular present
  32. VBZ Verb, 3rd person singular present
  33. WDT Wh-determiner
  34. WP Wh-pronoun
  35. WP$ Possessive wh-pronoun
  36. WRB Wh-adverb