Difference between revisions of "Bengali and English/Anubadok"
		
		
		
		
		
		
		Jump to navigation
		Jump to search
		
				
		
		
		
		
		
		
		
	
| Line 96: | Line 96: | ||
Anubadok uses Penn Treebank Tag Set, the tag set is as follows:  | 
  Anubadok uses Penn Treebank Tag Set, the tag set is as follows:  | 
||
{|class="wikitable"  | 
|||
#CC	Coordinating conjunction  | 
  |||
!Tag !! Gloss !! Example   | 
|||
#CD	Cardinal number  | 
  |||
|-  | 
|||
#DT	Determiner  | 
  |||
|<code>CC</code> || Coordinating conjunction ||  | 
|||
#EX	Existential there  | 
  |||
|-  | 
|||
#FW	Foreign word  | 
  |||
|<code>CD</code> || Cardinal number ||  | 
|||
#IN	Preposition or subordinating conjunction  | 
  |||
|-  | 
|||
#JJ	Adjective  | 
  |||
|<code>DT</code> || Determiner ||  | 
|||
#JJR	Adjective, comparative  | 
  |||
|-  | 
|||
#JJS	Adjective, superlative  | 
  |||
|<code>EX</code> || Existential there ||  | 
|||
#LS	List item marker  | 
  |||
|-  | 
|||
#MD	Modal  | 
  |||
|<code>FW</code> || Foreign word ||  | 
|||
#NN	Noun, singular or mass  | 
  |||
|-  | 
|||
#NNS	Noun, plural  | 
  |||
|<code>IN</code> || Preposition or subordinating conjunction ||  | 
|||
#NP	Proper noun, singular  | 
  |||
|-  | 
|||
#NPS	Proper noun, plural  | 
  |||
|<code>JJ</code> || Adjective ||  | 
|||
#PDT	Predeterminer  | 
  |||
|-  | 
|||
#POS	Possessive ending  | 
  |||
|<code>JJR</code> || Adjective, comparative ||  | 
|||
#PP	Personal pronoun  | 
  |||
|-  | 
|||
#PP$	Possessive pronoun  | 
  |||
|<code>JJS</code> || Adjective, superlative ||  | 
|||
#RB	Adverb  | 
  |||
|-  | 
|||
#RBR	Adverb, comparative  | 
  |||
|<code>LS</code> || List item marker ||  | 
|||
#RBS	Adverb, superlative  | 
  |||
|-  | 
|||
#RP	Particle  | 
  |||
|<code>MD</code> || Modal ||  | 
|||
#SYM	Symbol  | 
  |||
|-  | 
|||
#TO	to  | 
  |||
|<code>NN</code> || Noun, singular or mass ||  | 
|||
#UH	Interjection  | 
  |||
|-  | 
|||
#VB	Verb, base form  | 
  |||
|<code>NNS</code> || Noun, plural ||  | 
|||
#VBD	Verb, past tense  | 
  |||
|-  | 
|||
#VBG	Verb, gerund or present participle  | 
  |||
|<code>NP</code> || Proper noun, singular ||  | 
|||
#VBN	Verb, past participle  | 
  |||
|-  | 
|||
#VBP	Verb, non-3rd person singular present  | 
  |||
|<code>NPS</code> || Proper noun, plural ||  | 
|||
#VBZ	Verb, 3rd person singular present  | 
  |||
|-  | 
|||
#WDT	Wh-determiner  | 
  |||
|<code>PDT</code> || Predeterminer ||  | 
|||
#WP	Wh-pronoun  | 
  |||
|-  | 
|||
#WP$	Possessive wh-pronoun  | 
  |||
|<code>POS</code> || Possessive ending ||  | 
|||
#WRB	Wh-adverb  | 
  |||
|-  | 
|||
|<code>PP</code> || Personal pronoun ||  | 
|||
|-  | 
|||
|<code>PP$</code> || Possessive pronoun ||  | 
|||
|-  | 
|||
|<code>RB</code> || Adverb ||  | 
|||
|-  | 
|||
|<code>RBR</code> || Adverb, comparative ||  | 
|||
|-  | 
|||
|<code>RBS</code> || Adverb, superlative ||  | 
|||
|-  | 
|||
|<code>RP</code> || Particle ||  | 
|||
|-  | 
|||
|<code>SYM</code> || Symbol ||  | 
|||
|-  | 
|||
|<code>TO</code> || to ||  | 
|||
|-  | 
|||
|<code>UH</code> || Interjection ||  | 
|||
|-  | 
|||
|<code>VB</code> || Verb, base form ||  | 
|||
|-  | 
|||
|<code>VBD</code> || Verb, past tense ||  | 
|||
|-  | 
|||
|<code>VBG</code> || Verb, gerund or present participle ||  | 
|||
|-  | 
|||
|<code>VBN</code> || Verb, past participle ||  | 
|||
|-  | 
|||
|<code>VBP</code> || Verb, non-3rd person singular present ||  | 
|||
|-  | 
|||
|<code>VBZ</code> || Verb, 3rd person singular present ||  | 
|||
|-  | 
|||
|<code>WDT</code> || Wh-determiner ||  | 
|||
|-  | 
|||
|<code>WP</code> || Wh-pronoun ||  | 
|||
|-  | 
|||
|<code>WP$</code> || Possessive wh-pronoun ||  | 
|||
|-  | 
|||
|<code>WRB</code> || Wh-adverb ||  | 
|||
|}  | 
|||
Revision as of 08:17, 11 April 2009
Anubadok is an open source English to Bengali MT system developed by G M Hossain, currently in experimental stage.
The program is licensed under GPL. Its accessible from here.
Inflection Rules (BnSondhi.pm)
Legend
- C Consonant
 - V Vowel
 - _ Any Letter
 - K Kar (Short form of Vowel)
 - G General Rule - The consonants and vowels in the example are exchangeable with any other consonants and vowels
 - S Special Rule - The consonants and vowels in the example are NOT exchangeable with any other consonants and vowels
 - H Hasanth - The joiner, e.g. ঙ+্+গ = ঙ্গ, as in মঙ্গল
 
__CC means the last two consonants of a word, and CV__ means the first letter of this word is consonant followed by a vowel
Rules
- Verb Rules (Applies if the length of the first word is >= 3, rules are not exclusive e.g. one word can qualify for multiple rules)
 
- G (C+ো+C)+(তে) = (C+ু+C+তে) e.g. খোল + তে = খুলতে
 
- G (া+C+া) + (তে) = (া+C+া+তে) e.g. পাঠা + তে = পাঠাতে
 
- G (C+ি)/(C+ে) + (ে+র)্(া+র) = (C+ে+ও+য়+া+র) e.g. নি + ের = নেওয়ার, নে + ার = নেওয়ার
 
- G (C+া) + (তে) = (C+ে+তে) e.g. পা + তে = পেতে
 
- G (C+ে) + (তে) = (C+ি+তে) e.g. দে + তে = দিতে
 
- G (C+C) + (ে+র) = (C+C+া+র) e.g. কর + ের = করার
 
- G (__K1) + (K2__) = (__K1__) [needs review]
 
- Preposition Rules
 
(Applies if the length of the first word is >= 2, rules are not exclusive e.g. one word can qualify for multiple rules)
- G (ল+ে+C) -> (ল+ি+C) e.g. লেখ -> লিখ
 
- G (এ+ই) + (টা) = (এটি) e.g. এই + টি = এটি
 
- S (_+ক+ে) + (ে+র) = (_+র) e.g. আমাকে + ের = আমার
 
- G (C+C) + (ৈ+র) = (C+C+া+র) e.g. কর + এর = করার
 
- G (C/V/H/K + C) + (তে) = (C/V/H/K + C + ে) [needs review]
 
- S (ং) + (ে+র) = (ং+এ+র) e.g. সং + ের = সংএর
 
- G (__K1) + (K2__) = (__K1__) [needs review]
 
- Main Rules
 
(Applies if the length of the first word is >= 2, rules are not exclusive e.g. one word can qualify for multiple rules)
- G (K+C) + (তে) = (K+C+ে) e.g. মার + তে = মারে [need review]
 
- G (__K1) + (K2__) = (__K1__) [needs review]
 
- Sondhi for progressive tag POS
 
- G (K) + (_) = (K)
 
- Basic Verb Shondhi
 
- G (_+ি) + (_) = (_+ে+ও+য়+_)
 
- Verb Shondhi for passive sentence
 
- G (C/K + C) + (_) = (C/K+C+_) else
 
- G (_+ি) + (_) = (_+ে+ও+য়+_)
 
- Verb Shondhi for active sentence
 
(Applies if the length of the first word is >= 1, rules are not exclusive e.g. one word can qualify for multiple rules)
- G (_ে) + (ো) = (_া+ও) e.g. নে + ো = নাও
 
- G (C+ে) + (_) = (C+ি+_) e.g. দে -> দি
 
- G (ল+ে+C) -> (ল+ি+C) e.g. লেখ -> লিখ [why anubadok repeats this rule?]
 
- 3rd person, present simple
 - 2nd person, present simple
 - 1st/2nd/3rd person, future simple
 - 1st/2nd/3rd person, past simple
 - 1st person, present simple
 - 1st/2nd/3rd person, present/past continuous
 - 1st/2nd/3rd person, present/past, perfect
 
Tag Set
Anubadok uses Penn Treebank Tag Set, the tag set is as follows:
| Tag | Gloss | Example | 
|---|---|---|
CC | 
Coordinating conjunction | |
CD | 
Cardinal number | |
DT | 
Determiner | |
EX | 
Existential there | |
FW | 
Foreign word | |
IN | 
Preposition or subordinating conjunction | |
JJ | 
Adjective | |
JJR | 
Adjective, comparative | |
JJS | 
Adjective, superlative | |
LS | 
List item marker | |
MD | 
Modal | |
NN | 
Noun, singular or mass | |
NNS | 
Noun, plural | |
NP | 
Proper noun, singular | |
NPS | 
Proper noun, plural | |
PDT | 
Predeterminer | |
POS | 
Possessive ending | |
PP | 
Personal pronoun | |
PP$ | 
Possessive pronoun | |
RB | 
Adverb | |
RBR | 
Adverb, comparative | |
RBS | 
Adverb, superlative | |
RP | 
Particle | |
SYM | 
Symbol | |
TO | 
to | |
UH | 
Interjection | |
VB | 
Verb, base form | |
VBD | 
Verb, past tense | |
VBG | 
Verb, gerund or present participle | |
VBN | 
Verb, past participle | |
VBP | 
Verb, non-3rd person singular present | |
VBZ | 
Verb, 3rd person singular present | |
WDT | 
Wh-determiner | |
WP | 
Wh-pronoun | |
WP$ | 
Possessive wh-pronoun | |
WRB | 
Wh-adverb |