Difference between revisions of "Bengali and English/Anubadok"

From Apertium
Jump to navigation Jump to search
m
m
Line 92: Line 92:


;2nd person, present simple
;2nd person, present simple

*S (ল+ি+C) + (েন) = (ল+ে+C+ে+ন) e.g. লিখ + েন-> লেখ + েন-> লেখেন
*G (C/V/ে+C) + (েন) = (C/V/ে+C+েন) e.g. বল + েন = বলেন, চাল + েন + চালেন
*G (imperative) লিখ + েন -> লিখ + ুন -> লিখুন
*G (C+K) + (েন) = (C+K+ন) বাড়া + েন -> বাড়ান

;1st/2nd/3rd person, future simple
;1st/2nd/3rd person, future simple
;1st/2nd/3rd person, past simple
;1st/2nd/3rd person, past simple

Revision as of 18:43, 11 April 2009

Anubadok is an open source English to Bengali MT system developed by G M Hossain, currently in experimental stage.

The program is licensed under GPL. Its accessible from here.

Inflection Rules (BnSondhi.pm)

Legend

  • C Consonant
  • V Vowel
  • _ Any Letter
  • K Kar (Short form of Vowel)
  • G General Rule - The consonants and vowels in the example are exchangeable with any other consonants and vowels
  • S Special Rule - The consonants and vowels in the example are NOT exchangeable with any other consonants and vowels
  • H Hasanth - The joiner, e.g. ঙ+্+গ = ঙ্গ, as in মঙ্গল

__CC means the last two consonants of a word, and CV__ means the first letter of this word is consonant followed by a vowel

Rules

Verb Rules (Applies if the length of the first word is >= 3, rules are not exclusive e.g. one word can qualify for multiple rules)
  • G (C+ো+C)+(তে) = (C+ু+C+তে) e.g. খোল + তে = খুলতে
  • G (া+C+া) + (তে) = (া+C+া+তে) e.g. পাঠা + তে = পাঠাতে
  • G (C+ি)/(C+ে) + (ে+র)্(া+র) = (C+ে+ও+য়+া+র) e.g. নি + ের = নেওয়ার, নে + ার = নেওয়ার
  • G (C+া) + (তে) = (C+ে+তে) e.g. পা + তে = পেতে
  • G (C+ে) + (তে) = (C+ি+তে) e.g. দে + তে = দিতে
  • G (C+C) + (ে+র) = (C+C+া+র) e.g. কর + ের = করার
  • G (__K1) + (K2__) = (__K1__) [needs review]
Preposition Rules

(Applies if the length of the first word is >= 2, rules are not exclusive e.g. one word can qualify for multiple rules)

  • G (ল+ে+C) -> (ল+ি+C) e.g. লেখ -> লিখ
  • G (এ+ই) + (টা) = (এটি) e.g. এই + টি = এটি
  • S (_+ক+ে) + (ে+র) = (_+র) e.g. আমাকে + ের = আমার
  • G (C+C) + (ৈ+র) = (C+C+া+র) e.g. কর + এর = করার
  • G (C/V/H/K + C) + (তে) = (C/V/H/K + C + ে) [needs review]
  • S (ং) + (ে+র) = (ং+এ+র) e.g. সং + ের = সংএর
  • G (__K1) + (K2__) = (__K1__) [needs review]
Main Rules

(Applies if the length of the first word is >= 2, rules are not exclusive e.g. one word can qualify for multiple rules)

  • G (K+C) + (তে) = (K+C+ে) e.g. মার + তে = মারে [need review]
  • G (__K1) + (K2__) = (__K1__) [needs review]
Sondhi for progressive tag POS
  • G (K) + (_) = (K)
Basic Verb Shondhi
  • G (_+ি) + (_) = (_+ে+ও+য়+_)
Verb Shondhi for passive sentence
  • G (C/K + C) + (_) = (C/K+C+_) else
  • G (_+ি) + (_) = (_+ে+ও+য়+_)
Verb Shondhi for active sentence

(Applies if the length of the first word is >= 1, rules are not exclusive e.g. one word can qualify for multiple rules)

  • G (_ে) + (ো) = (_া+ও) e.g. নে + ো = নাও
  • G (C+ে) + (_) = (C+ি+_) e.g. দে -> দি
  • G (ল+ে+C) -> (ল+ি+C) e.g. লেখ -> লিখ [why anubadok repeats this rule?]
3rd person, present simple
  • S (ল+ি+C) + (ে) = (ল+ে+C+ে) e.g. লিখ + ে-> লেখ + ে-> লেখে
  • S (দি) + (ে) = দেয় e.g. দি + ে-> দে + ে-> দেয়
  • G (C/K+C) + (ে) = (C/K+C+ে) else
  • G (_) + (ে) = (_য়) e.g. হ + ে-> হয়, খা + ে-> খায়
2nd person, present simple
  • S (ল+ি+C) + (েন) = (ল+ে+C+ে+ন) e.g. লিখ + েন-> লেখ + েন-> লেখেন
  • G (C/V/ে+C) + (েন) = (C/V/ে+C+েন) e.g. বল + েন = বলেন, চাল + েন + চালেন
  • G (imperative) লিখ + েন -> লিখ + ুন -> লিখুন
  • G (C+K) + (েন) = (C+K+ন) বাড়া + েন -> বাড়ান
1st/2nd/3rd person, future simple
1st/2nd/3rd person, past simple
1st person, present simple
1st/2nd/3rd person, present/past continuous
1st/2nd/3rd person, present/past, perfect

Tag Set

Anubadok uses Penn Treebank Tag Set, the tag set is as follows:

Tag Gloss Example
CC Coordinating conjunction
CD Cardinal number
DT Determiner
EX Existential there
FW Foreign word
IN Preposition or subordinating conjunction
JJ Adjective
JJR Adjective, comparative
JJS Adjective, superlative
LS List item marker
MD Modal
NN Noun, singular or mass
NNS Noun, plural
NP Proper noun, singular
NPS Proper noun, plural
PDT Predeterminer
POS Possessive ending
PP Personal pronoun
PP$ Possessive pronoun
RB Adverb
RBR Adverb, comparative
RBS Adverb, superlative
RP Particle
SYM Symbol
TO to
UH Interjection
VB Verb, base form
VBD Verb, past tense
VBG Verb, gerund or present participle
VBN Verb, past participle
VBP Verb, non-3rd person singular present
VBZ Verb, 3rd person singular present
WDT Wh-determiner
WP Wh-pronoun
WP$ Possessive wh-pronoun
WRB Wh-adverb