Difference between revisions of "Bengali and English/Anubadok"
		
		
		
		
		
		
		Jump to navigation
		Jump to search
		
				
		
		
		
		
		
		
		
	
Darthxaher (talk | contribs) m  | 
				|||
| Line 1: | Line 1: | ||
{{TOCD}}  | 
|||
Anubadok is an open source English to Bengali MT system developed by G M Hossain, currently in experimental stage.  | 
  Anubadok is an open source English to Bengali MT system developed by G M Hossain, currently in experimental stage.  | 
||
| Line 17: | Line 18: | ||
=== Rules ===  | 
  === Rules ===  | 
||
Verb Rules (Applies if the length of the first word is >= 3, rules are not exclusive e.g. one word can qualify for multiple rules)  | 
  ;Verb Rules (Applies if the length of the first word is >= 3, rules are not exclusive e.g. one word can qualify for multiple rules)  | 
||
*G (C+ো+C)+(তে) = (C+ু+C+তে) e.g. খোল + তে = খুলতে  | 
  *G (C+ো+C)+(তে) = (C+ু+C+তে) e.g. খোল + তে = খুলতে  | 
||
| Line 33: | Line 34: | ||
*G (__K1) + (K2__) = (__K1__) [needs review]  | 
  *G (__K1) + (K2__) = (__K1__) [needs review]  | 
||
Preposition Rules  | 
  ;Preposition Rules  | 
||
(Applies if the length of the first word is >= 2, rules are not exclusive e.g. one word can qualify for multiple rules)  | 
  (Applies if the length of the first word is >= 2, rules are not exclusive e.g. one word can qualify for multiple rules)  | 
||
| Line 51: | Line 52: | ||
*G (__K1) + (K2__) = (__K1__) [needs review]  | 
  *G (__K1) + (K2__) = (__K1__) [needs review]  | 
||
Main Rules  | 
  ;Main Rules  | 
||
(Applies if the length of the first word is >= 2, rules are not exclusive e.g. one word can qualify for multiple rules)  | 
  (Applies if the length of the first word is >= 2, rules are not exclusive e.g. one word can qualify for multiple rules)  | 
||
| Line 59: | Line 60: | ||
*G (__K1) + (K2__) = (__K1__) [needs review]  | 
  *G (__K1) + (K2__) = (__K1__) [needs review]  | 
||
Sondhi for progressive tag POS  | 
  ;Sondhi for progressive tag POS  | 
||
*G (K) + (_) = (K)  | 
  *G (K) + (_) = (K)  | 
||
Basic Verb Shondhi  | 
  ;Basic Verb Shondhi  | 
||
*G (_+ি) + (_) = (_+ে+ও+য়+_)  | 
  *G (_+ি) + (_) = (_+ে+ও+য়+_)  | 
||
Verb Shondhi for passive sentence  | 
  ;Verb Shondhi for passive sentence  | 
||
*G (C/K + C) + (_) = (C/K+C+_) else  | 
  *G (C/K + C) + (_) = (C/K+C+_) else  | 
||
| Line 73: | Line 74: | ||
*G (_+ি) + (_) = (_+ে+ও+য়+_)  | 
  *G (_+ি) + (_) = (_+ে+ও+য়+_)  | 
||
Verb Shondhi for active sentence  | 
  ;Verb Shondhi for active sentence  | 
||
(Applies if the length of the first word is >= 1, rules are not exclusive e.g. one word can qualify for multiple rules)  | 
  (Applies if the length of the first word is >= 1, rules are not exclusive e.g. one word can qualify for multiple rules)  | 
||
| Line 83: | Line 84: | ||
*G (ল+ে+C) -> (ল+ি+C) e.g. লেখ -> লিখ [why anubadok repeats this rule?]  | 
  *G (ল+ে+C) -> (ল+ি+C) e.g. লেখ -> লিখ [why anubadok repeats this rule?]  | 
||
3rd person, present simple  | 
  *3rd person, present simple  | 
||
2nd person, present simple  | 
  *2nd person, present simple  | 
||
1st/2nd/3rd person, future simple  | 
  *1st/2nd/3rd person, future simple  | 
||
1st/2nd/3rd person, past simple  | 
  *1st/2nd/3rd person, past simple  | 
||
1st person, present simple  | 
  *1st person, present simple  | 
||
1st/2nd/3rd person, present/past continuous  | 
  *1st/2nd/3rd person, present/past continuous  | 
||
1st/2nd/3rd person, present/past, perfect  | 
  *1st/2nd/3rd person, present/past, perfect  | 
||
== Tag Set ==  | 
  == Tag Set ==  | 
||
Revision as of 08:15, 11 April 2009
Anubadok is an open source English to Bengali MT system developed by G M Hossain, currently in experimental stage.
The program is licensed under GPL. Its accessible from here.
Inflection Rules (BnSondhi.pm)
Legend
- C Consonant
 - V Vowel
 - _ Any Letter
 - K Kar (Short form of Vowel)
 - G General Rule - The consonants and vowels in the example are exchangeable with any other consonants and vowels
 - S Special Rule - The consonants and vowels in the example are NOT exchangeable with any other consonants and vowels
 - H Hasanth - The joiner, e.g. ঙ+্+গ = ঙ্গ, as in মঙ্গল
 
__CC means the last two consonants of a word, and CV__ means the first letter of this word is consonant followed by a vowel
Rules
- Verb Rules (Applies if the length of the first word is >= 3, rules are not exclusive e.g. one word can qualify for multiple rules)
 
- G (C+ো+C)+(তে) = (C+ু+C+তে) e.g. খোল + তে = খুলতে
 
- G (া+C+া) + (তে) = (া+C+া+তে) e.g. পাঠা + তে = পাঠাতে
 
- G (C+ি)/(C+ে) + (ে+র)্(া+র) = (C+ে+ও+য়+া+র) e.g. নি + ের = নেওয়ার, নে + ার = নেওয়ার
 
- G (C+া) + (তে) = (C+ে+তে) e.g. পা + তে = পেতে
 
- G (C+ে) + (তে) = (C+ি+তে) e.g. দে + তে = দিতে
 
- G (C+C) + (ে+র) = (C+C+া+র) e.g. কর + ের = করার
 
- G (__K1) + (K2__) = (__K1__) [needs review]
 
- Preposition Rules
 
(Applies if the length of the first word is >= 2, rules are not exclusive e.g. one word can qualify for multiple rules)
- G (ল+ে+C) -> (ল+ি+C) e.g. লেখ -> লিখ
 
- G (এ+ই) + (টা) = (এটি) e.g. এই + টি = এটি
 
- S (_+ক+ে) + (ে+র) = (_+র) e.g. আমাকে + ের = আমার
 
- G (C+C) + (ৈ+র) = (C+C+া+র) e.g. কর + এর = করার
 
- G (C/V/H/K + C) + (তে) = (C/V/H/K + C + ে) [needs review]
 
- S (ং) + (ে+র) = (ং+এ+র) e.g. সং + ের = সংএর
 
- G (__K1) + (K2__) = (__K1__) [needs review]
 
- Main Rules
 
(Applies if the length of the first word is >= 2, rules are not exclusive e.g. one word can qualify for multiple rules)
- G (K+C) + (তে) = (K+C+ে) e.g. মার + তে = মারে [need review]
 
- G (__K1) + (K2__) = (__K1__) [needs review]
 
- Sondhi for progressive tag POS
 
- G (K) + (_) = (K)
 
- Basic Verb Shondhi
 
- G (_+ি) + (_) = (_+ে+ও+য়+_)
 
- Verb Shondhi for passive sentence
 
- G (C/K + C) + (_) = (C/K+C+_) else
 
- G (_+ি) + (_) = (_+ে+ও+য়+_)
 
- Verb Shondhi for active sentence
 
(Applies if the length of the first word is >= 1, rules are not exclusive e.g. one word can qualify for multiple rules)
- G (_ে) + (ো) = (_া+ও) e.g. নে + ো = নাও
 
- G (C+ে) + (_) = (C+ি+_) e.g. দে -> দি
 
- G (ল+ে+C) -> (ল+ি+C) e.g. লেখ -> লিখ [why anubadok repeats this rule?]
 
- 3rd person, present simple
 - 2nd person, present simple
 - 1st/2nd/3rd person, future simple
 - 1st/2nd/3rd person, past simple
 - 1st person, present simple
 - 1st/2nd/3rd person, present/past continuous
 - 1st/2nd/3rd person, present/past, perfect
 
Tag Set
Anubadok uses Penn Treebank Tag Set, the tag set is as follows:
- CC Coordinating conjunction
 - CD Cardinal number
 - DT Determiner
 - EX Existential there
 - FW Foreign word
 - IN Preposition or subordinating conjunction
 - JJ Adjective
 - JJR Adjective, comparative
 - JJS Adjective, superlative
 - LS List item marker
 - MD Modal
 - NN Noun, singular or mass
 - NNS Noun, plural
 - NP Proper noun, singular
 - NPS Proper noun, plural
 - PDT Predeterminer
 - POS Possessive ending
 - PP Personal pronoun
 - PP$ Possessive pronoun
 - RB Adverb
 - RBR Adverb, comparative
 - RBS Adverb, superlative
 - RP Particle
 - SYM Symbol
 - TO to
 - UH Interjection
 - VB Verb, base form
 - VBD Verb, past tense
 - VBG Verb, gerund or present participle
 - VBN Verb, past participle
 - VBP Verb, non-3rd person singular present
 - VBZ Verb, 3rd person singular present
 - WDT Wh-determiner
 - WP Wh-pronoun
 - WP$ Possessive wh-pronoun
 - WRB Wh-adverb