Difference between revisions of "Bengali and English/Anubadok"

From Apertium
Jump to navigation Jump to search
Line 96: Line 96:
 
Anubadok uses Penn Treebank Tag Set, the tag set is as follows:
 
Anubadok uses Penn Treebank Tag Set, the tag set is as follows:
   
  +
{|class="wikitable"
#CC Coordinating conjunction
 
  +
!Tag !! Gloss !! Example
#CD Cardinal number
 
  +
|-
#DT Determiner
 
  +
|<code>CC</code> || Coordinating conjunction ||
#EX Existential there
 
  +
|-
#FW Foreign word
 
  +
|<code>CD</code> || Cardinal number ||
#IN Preposition or subordinating conjunction
 
  +
|-
#JJ Adjective
 
  +
|<code>DT</code> || Determiner ||
#JJR Adjective, comparative
 
  +
|-
#JJS Adjective, superlative
 
  +
|<code>EX</code> || Existential there ||
#LS List item marker
 
  +
|-
#MD Modal
 
  +
|<code>FW</code> || Foreign word ||
#NN Noun, singular or mass
 
  +
|-
#NNS Noun, plural
 
  +
|<code>IN</code> || Preposition or subordinating conjunction ||
#NP Proper noun, singular
 
  +
|-
#NPS Proper noun, plural
 
  +
|<code>JJ</code> || Adjective ||
#PDT Predeterminer
 
  +
|-
#POS Possessive ending
 
  +
|<code>JJR</code> || Adjective, comparative ||
#PP Personal pronoun
 
  +
|-
#PP$ Possessive pronoun
 
  +
|<code>JJS</code> || Adjective, superlative ||
#RB Adverb
 
  +
|-
#RBR Adverb, comparative
 
  +
|<code>LS</code> || List item marker ||
#RBS Adverb, superlative
 
  +
|-
#RP Particle
 
  +
|<code>MD</code> || Modal ||
#SYM Symbol
 
  +
|-
#TO to
 
  +
|<code>NN</code> || Noun, singular or mass ||
#UH Interjection
 
  +
|-
#VB Verb, base form
 
  +
|<code>NNS</code> || Noun, plural ||
#VBD Verb, past tense
 
  +
|-
#VBG Verb, gerund or present participle
 
  +
|<code>NP</code> || Proper noun, singular ||
#VBN Verb, past participle
 
  +
|-
#VBP Verb, non-3rd person singular present
 
  +
|<code>NPS</code> || Proper noun, plural ||
#VBZ Verb, 3rd person singular present
 
  +
|-
#WDT Wh-determiner
 
  +
|<code>PDT</code> || Predeterminer ||
#WP Wh-pronoun
 
  +
|-
#WP$ Possessive wh-pronoun
 
  +
|<code>POS</code> || Possessive ending ||
#WRB Wh-adverb
 
  +
|-
  +
|<code>PP</code> || Personal pronoun ||
  +
|-
  +
|<code>PP$</code> || Possessive pronoun ||
  +
|-
  +
|<code>RB</code> || Adverb ||
  +
|-
  +
|<code>RBR</code> || Adverb, comparative ||
  +
|-
  +
|<code>RBS</code> || Adverb, superlative ||
  +
|-
  +
|<code>RP</code> || Particle ||
  +
|-
  +
|<code>SYM</code> || Symbol ||
  +
|-
  +
|<code>TO</code> || to ||
  +
|-
  +
|<code>UH</code> || Interjection ||
  +
|-
  +
|<code>VB</code> || Verb, base form ||
  +
|-
  +
|<code>VBD</code> || Verb, past tense ||
  +
|-
  +
|<code>VBG</code> || Verb, gerund or present participle ||
  +
|-
  +
|<code>VBN</code> || Verb, past participle ||
  +
|-
  +
|<code>VBP</code> || Verb, non-3rd person singular present ||
  +
|-
  +
|<code>VBZ</code> || Verb, 3rd person singular present ||
  +
|-
  +
|<code>WDT</code> || Wh-determiner ||
  +
|-
  +
|<code>WP</code> || Wh-pronoun ||
  +
|-
  +
|<code>WP$</code> || Possessive wh-pronoun ||
  +
|-
  +
|<code>WRB</code> || Wh-adverb ||
  +
|}

Revision as of 08:17, 11 April 2009

Anubadok is an open source English to Bengali MT system developed by G M Hossain, currently in experimental stage.

The program is licensed under GPL. Its accessible from here.

Inflection Rules (BnSondhi.pm)

Legend

  • C Consonant
  • V Vowel
  • _ Any Letter
  • K Kar (Short form of Vowel)
  • G General Rule - The consonants and vowels in the example are exchangeable with any other consonants and vowels
  • S Special Rule - The consonants and vowels in the example are NOT exchangeable with any other consonants and vowels
  • H Hasanth - The joiner, e.g. ঙ+্+গ = ঙ্গ, as in মঙ্গল

__CC means the last two consonants of a word, and CV__ means the first letter of this word is consonant followed by a vowel

Rules

Verb Rules (Applies if the length of the first word is >= 3, rules are not exclusive e.g. one word can qualify for multiple rules)
  • G (C+ো+C)+(তে) = (C+ু+C+তে) e.g. খোল + তে = খুলতে
  • G (া+C+া) + (তে) = (া+C+া+তে) e.g. পাঠা + তে = পাঠাতে
  • G (C+ি)/(C+ে) + (ে+র)্(া+র) = (C+ে+ও+য়+া+র) e.g. নি + ের = নেওয়ার, নে + ার = নেওয়ার
  • G (C+া) + (তে) = (C+ে+তে) e.g. পা + তে = পেতে
  • G (C+ে) + (তে) = (C+ি+তে) e.g. দে + তে = দিতে
  • G (C+C) + (ে+র) = (C+C+া+র) e.g. কর + ের = করার
  • G (__K1) + (K2__) = (__K1__) [needs review]
Preposition Rules

(Applies if the length of the first word is >= 2, rules are not exclusive e.g. one word can qualify for multiple rules)

  • G (ল+ে+C) -> (ল+ি+C) e.g. লেখ -> লিখ
  • G (এ+ই) + (টা) = (এটি) e.g. এই + টি = এটি
  • S (_+ক+ে) + (ে+র) = (_+র) e.g. আমাকে + ের = আমার
  • G (C+C) + (ৈ+র) = (C+C+া+র) e.g. কর + এর = করার
  • G (C/V/H/K + C) + (তে) = (C/V/H/K + C + ে) [needs review]
  • S (ং) + (ে+র) = (ং+এ+র) e.g. সং + ের = সংএর
  • G (__K1) + (K2__) = (__K1__) [needs review]
Main Rules

(Applies if the length of the first word is >= 2, rules are not exclusive e.g. one word can qualify for multiple rules)

  • G (K+C) + (তে) = (K+C+ে) e.g. মার + তে = মারে [need review]
  • G (__K1) + (K2__) = (__K1__) [needs review]
Sondhi for progressive tag POS
  • G (K) + (_) = (K)
Basic Verb Shondhi
  • G (_+ি) + (_) = (_+ে+ও+য়+_)
Verb Shondhi for passive sentence
  • G (C/K + C) + (_) = (C/K+C+_) else
  • G (_+ি) + (_) = (_+ে+ও+য়+_)
Verb Shondhi for active sentence

(Applies if the length of the first word is >= 1, rules are not exclusive e.g. one word can qualify for multiple rules)

  • G (_ে) + (ো) = (_া+ও) e.g. নে + ো = নাও
  • G (C+ে) + (_) = (C+ি+_) e.g. দে -> দি
  • G (ল+ে+C) -> (ল+ি+C) e.g. লেখ -> লিখ [why anubadok repeats this rule?]
  • 3rd person, present simple
  • 2nd person, present simple
  • 1st/2nd/3rd person, future simple
  • 1st/2nd/3rd person, past simple
  • 1st person, present simple
  • 1st/2nd/3rd person, present/past continuous
  • 1st/2nd/3rd person, present/past, perfect

Tag Set

Anubadok uses Penn Treebank Tag Set, the tag set is as follows:

Tag Gloss Example
CC Coordinating conjunction
CD Cardinal number
DT Determiner
EX Existential there
FW Foreign word
IN Preposition or subordinating conjunction
JJ Adjective
JJR Adjective, comparative
JJS Adjective, superlative
LS List item marker
MD Modal
NN Noun, singular or mass
NNS Noun, plural
NP Proper noun, singular
NPS Proper noun, plural
PDT Predeterminer
POS Possessive ending
PP Personal pronoun
PP$ Possessive pronoun
RB Adverb
RBR Adverb, comparative
RBS Adverb, superlative
RP Particle
SYM Symbol
TO to
UH Interjection
VB Verb, base form
VBD Verb, past tense
VBG Verb, gerund or present participle
VBN Verb, past participle
VBP Verb, non-3rd person singular present
VBZ Verb, 3rd person singular present
WDT Wh-determiner
WP Wh-pronoun
WP$ Possessive wh-pronoun
WRB Wh-adverb