Difference between revisions of "Bengali and English/Anubadok"
Darthxaher (talk | contribs) m |
Darthxaher (talk | contribs) m |
||
Line 48: | Line 48: | ||
*G (K+C) + (তে) = (K+C+ে) e.g. মার + তে = মারে [need review] |
*G (K+C) + (তে) = (K+C+ে) e.g. মার + তে = মারে [need review] |
||
*G (__K1) + (K2__) = (__K1__) [needs review] |
*G (__K1) + (K2__) = (__K1__) [needs review] |
||
*G |
|||
Sondhi for progressive tag POS |
|||
*G |
|||
*G |
|||
*G (K) + (_) = (K) |
|||
*G |
|||
Basic Verb Shondhi |
|||
*G (_+ি) + (_) = (_+ে+ও+য়+_) |
|||
== Tag Set == |
== Tag Set == |
Revision as of 01:42, 11 April 2009
Anubadok is an open source English to Bengali MT system developed by G M Hossain, currently in experimental stage.
The program is licensed under GPL. Its accessible from here.
Inflection Rules (BnSondhi.pm)
Legend
- C Consonant
- V Vowel
- _ Any Letter
- K Kar (Short form of Vowel)
- G General Rule - The consonants and vowels in the example are exchangeable with any other consonants and vowels
- S Special Rule - The consonants and vowels in the example are NOT exchangeable with any other consonants and vowels
- H Hasanth - The joiner, e.g. ঙ+্+গ = ঙ্গ, as in মঙ্গল
__CC means the last two consonants of a word, and CV__ means the first letter of this word is consonant followed by a vowel
Rules
Verb Rules (Applies if the length of the first word is >= 3, rules are not exclusive e.g. one word can qualify for multiple rules)
- G (C+ো+C)+(তে) = (C+ু+C+তে) e.g. খোল + তে = খুলতে
- G (া+C+া) + (তে) = (া+C+া+তে) e.g. পাঠা + তে = পাঠাতে
- G (C+ি)/(C+ে) + (ে+র)্(া+র) = (C+ে+ও+য়+া+র) e.g. নি + ের = নেওয়ার, নে + ার = নেওয়ার
- G (C+া) + (তে) = (C+ে+তে) e.g. পা + তে = পেতে
- G (C+ে) + (তে) = (C+ি+তে) e.g. দে + তে = দিতে
- G (C+C) + (ে+র) = (C+C+া+র) e.g. কর + ের = করার
- G (__K1) + (K2__) = (__K1__) [needs review]
Preposition Rules
(Applies if the length of the first word is >= 2, rules are not exclusive e.g. one word can qualify for multiple rules)
- G (ল+ে+C) -> (ল+ি+C) e.g. লেখ -> লিখ
- G (এ+ই) + (টা) = (এটি) e.g. এই + টি = এটি
- S (_+ক+ে) + (ে+র) = (_+র) e.g. আমাকে + ের = আমার
- G (C+C) + (ৈ+র) = (C+C+া+র) e.g. কর + এর = করার
- G (C/V/H/K + C) + (তে) = (C/V/H/K + C + ে) [needs review]
- S (ং) + (ে+র) = (ং+এ+র)
- G (__K1) + (K2__) = (__K1__) [needs review]
Main Rules
(Applies if the length of the first word is >= 2, rules are not exclusive e.g. one word can qualify for multiple rules)
- G (K+C) + (তে) = (K+C+ে) e.g. মার + তে = মারে [need review]
- G (__K1) + (K2__) = (__K1__) [needs review]
Sondhi for progressive tag POS
- G (K) + (_) = (K)
Basic Verb Shondhi
- G (_+ি) + (_) = (_+ে+ও+য়+_)
Tag Set
Anubadok uses Penn Treebank Tag Set, the tag set is as follows:
- CC Coordinating conjunction
- CD Cardinal number
- DT Determiner
- EX Existential there
- FW Foreign word
- IN Preposition or subordinating conjunction
- JJ Adjective
- JJR Adjective, comparative
- JJS Adjective, superlative
- LS List item marker
- MD Modal
- NN Noun, singular or mass
- NNS Noun, plural
- NP Proper noun, singular
- NPS Proper noun, plural
- PDT Predeterminer
- POS Possessive ending
- PP Personal pronoun
- PP$ Possessive pronoun
- RB Adverb
- RBR Adverb, comparative
- RBS Adverb, superlative
- RP Particle
- SYM Symbol
- TO to
- UH Interjection
- VB Verb, base form
- VBD Verb, past tense
- VBG Verb, gerund or present participle
- VBN Verb, past participle
- VBP Verb, non-3rd person singular present
- VBZ Verb, 3rd person singular present
- WDT Wh-determiner
- WP Wh-pronoun
- WP$ Possessive wh-pronoun
- WRB Wh-adverb