Difference between revisions of "Bengali and English/BugsAndIssues"

From Apertium
Jump to navigation Jump to search
(Made a few suggestions for indicating increased emphasis in English equivalents; these should be checked by a person fluent in Bengali and English)
 
(11 intermediate revisions by one other user not shown)
Line 3: Line 3:
== Nouns ==
== Nouns ==


# Only 800 tagged pure nouns from anubadok dictionary matched against CRBLP's 20K most freq used word list
<s># Only 800 tagged pure nouns from anubadok dictionary matched against CRBLP's 20K most freq used word list
:* need to tag more manually (en-es package has 5K approx. need to reach there)
:* need to tag more manually (en-es package has 5K approx. need to reach there)</s>
:* Anubadok has about 2000 Nouns in its own list
:* Anubadok has about 2000 Nouns in its own list
:* Anubadok has about 2300 Proper Nouns in its own list
:* Anubadok has about 2300 Proper Nouns in its own list
Line 10: Line 10:
# <s>We are excluding Proper nouns now</s>
# <s>We are excluding Proper nouns now</s>
# We are excluding adjectives that can be used as nouns, right now
# We are excluding adjectives that can be used as nouns, right now
# We are keeping track the plural form generation through animacy, this is good, but in the long run need to come up with something more sophisticated
# We are keeping track of the plural form generation through animacy; this is good, but in the long run we need to come up with something more sophisticated
# Some nouns can have hybrid animacy, need to tag those later
# Some nouns can have hybrid animacy; need to tag those later
# Should we tag the subtype of Noun?
# Should we tag the subtype of Noun?

# মা - মারা , জনক - জনকরা - These are wrong, need to add rule to fix that, either mark them as irregular and entry in a separate table or just find the adequate rule for them
=== Number ===
# মা - মারা , জনক - জনকরা - These are wrong; need to add rule to fix that. Either mark them as irregular and entry in a separate table or just find the adequate rule for them; right is মা - মায়েরা, জনক - জনকেরা
#There is still some confusion on how to treat definite articles. In the case of the indefinite article, '''a, an''' is translated as একটা/একটি. Now, for the definite article ''the'', number needs to be taken into account. For singular number, we add টা/টি. e.g. বই - বইটা, মানুষ - মানুষটা. But this is only used if the noun has a low animacy. We can safely say, বইটা (The book), বিড়ালটা (The cat), মানুষটা (The man), পাগলটা (The mad man). But we cannot say, রাষ্ট্রপতিটা - (gloss, the president); apparently, the affix is dropped as the animacy gets higher, so রাষ্ট্রপতি can mean 'both president' and 'the president'. For plural number, things are somewhat similar, adding গুলা/গুলি/গুলো at the end of a noun makes it plural and also has an implicit 'the'. So, বইগুলো - The books, বিড়ালগুলো - The cats, মানুষগুলো - The men. But we cannot say সন্যাসীগুলো - (gloss, the saints). For higher animacy plurality, রা or গণ is generally used, but these affixes express indefiniteness. For example, সন্যাসীরা/সন্যসীগণ means 'saints', NOT 'the saints'. This issue needs to be resolved.


== Pronouns ==
== Pronouns ==
Line 24: Line 27:


* The gerund form of the verb can be used as nouns, so we need to add these gerunds into noun table, and mark them as inanimate.
* The gerund form of the verb can be used as nouns, so we need to add these gerunds into noun table, and mark them as inanimate.
* Some verbs have alternate spelling that is equally acceptable, for eg. দেই - দিই for the verb দি - দে, apparently both forms are acceptable, so for the analyzing part, we'll need to be able to analyze both, Some more example would be উলটা - ওলটা, ঝুলা - ঝোলা, গুছা - গোছা। Right now will focus on only one of the forms


== Adverb ==
== Adverb ==


* We are marking all the adverbs as <adv> and have not marked <cnjadv> properly, this needs to be changed ASAP
* We are marking all the adverbs as <adv> and have not marked <cnjadv> properly; this needs to be changed ASAP


== Determiner ==
== Determiner ==
Line 33: Line 37:
== Enclitic/Proclitic ==
== Enclitic/Proclitic ==


=== ও (O)===
* ও (0): e.g. করে - করেও, পড়ে - পড়েও, when added to verbs, it adds the meaning of 'Inspite of' eg. সে '''পড়ে''' পাস করতে পারল না - He could not pass by studying. সে '''পড়েও''' পাস করতে পারল না - He could not pass in-spite of studying.

* ও (0): e.g. করে - করেও, পড়ে - পড়েও, when added to past participles, adds the meaning of 'Despite' or 'In spite of'
:* সে '''পড়ে''' পাস করতে পারল না - He could not pass by studying. সে '''পড়েও''' পাস করতে পারল না - He could not pass, despite studying.
* ও (0): The same enclitic as above, when added to nouns and pronouns, bears the sense of 'also'/'too'
:* বাড়িটা - বাড়িটাও -> সে বাড়িটাও বিক্রি করে দিল - He sold the house too.
:* আমি - আমিও -> সবার সাথে আমিও সেখানে গেলাম - I, too, went there '''or''' I, along with others, went there.
* ও has the same effect on adjectives, adverbs and verbs
:* Verb - সে কাজ করে এবং খায়ও খুব - He works and also eats a lot.
:* Adjective - সে সুন্দরী এবং বুদ্ধিমতিও - She is pretty and intelligent as well.
:* Adverb - তুমি এভাবেও কাজটি করতে পার - You can also do the work in this way.
* ও, When added after a gerund, it has the meaning of 'even' (adverb)
:* সে পড়ারও সময় পেল না - He did not even get the time to read.

=== ই (I) ===
* When added after a verb, it acts as an emphasizer. e.g করব - করবই. আমি কাজটি করব - I shall do the work, আমি কাজটি করবই - I will do the work/ I shall surely do the work, same is for infinitive - করে - করেই e.g. আমি কাজটি করতে গেলাম - I went to do the work, আমি কাজটি করতেই গেলাম - I went only to do the work.
* Adding after gerund is somewhat cosmetic, nevertheless it adds emphasis, ওখানে যাওয়াটাই ভুল ছিল - (emph) Going there was a mistake [Can anyone suggest a better translation? :(]<br>: How about this? --> Geing there was indeed a mistake.
* ই, added after nouns or pronouns, similarly adds emphasis. রহিমই দোষী - (emph) Rahim is guilty.<br>: How about this? --> Rahim is indeed guilty.


== Misc ==
== Misc ==
Line 39: Line 60:
* The word 'কাছ':
* The word 'কাছ':
: সে আমার কাছে আসল - He came to me/ He came near me (Anubadok translates 'He came to me' - সে আমাকেতে আসেছিল, which is wrong ...)
: সে আমার কাছে আসল - He came to me/ He came near me (Anubadok translates 'He came to me' - সে আমাকেতে আসেছিল, which is wrong ...)
: সে আমার কাছের লোক - He is a close person of mine (The translation is still incorrect, I don't know the exact translation ...)
: সে আমার কাছের লোক - He is a close person of mine (The translation is still incorrect, I don't know the exact translation ...)<br>: How about this? --> He is a person close to me.
: সে আমার কাছ থেকে বইটা নিল - He took the book from me.
: সে আমার কাছ থেকে বইটা নিল - He took the book from me.


Line 45: Line 66:
: সে অল্পে খুশি (রয়েছে) - He is satisfied with less.
: সে অল্পে খুশি (রয়েছে) - He is satisfied with less.
: আমি অল্প খাই - I eat less
: আমি অল্প খাই - I eat less
: আমি অল্পের জন্য কাজটা করতে পারলাম না - I could not do the work for (less, or something) [its tough doing word for word translation :(]
: আমি অল্পের জন্য কাজটা করতে পারলাম না - I could not do the work for (less, or something) [it's tough doing word for word translation :(]

Latest revision as of 00:11, 23 December 2011

Nouns[edit]

# Only 800 tagged pure nouns from anubadok dictionary matched against CRBLP's 20K most freq used word list

  • need to tag more manually (en-es package has 5K approx. need to reach there)
  • Anubadok has about 2000 Nouns in its own list
  • Anubadok has about 2300 Proper Nouns in its own list
  1. Some nouns are always pl or sg, need to tag those
  2. We are excluding Proper nouns now
  3. We are excluding adjectives that can be used as nouns, right now
  4. We are keeping track of the plural form generation through animacy; this is good, but in the long run we need to come up with something more sophisticated
  5. Some nouns can have hybrid animacy; need to tag those later
  6. Should we tag the subtype of Noun?

Number[edit]

  1. মা - মারা , জনক - জনকরা - These are wrong; need to add rule to fix that. Either mark them as irregular and entry in a separate table or just find the adequate rule for them; right is মা - মায়েরা, জনক - জনকেরা
  2. There is still some confusion on how to treat definite articles. In the case of the indefinite article, a, an is translated as একটা/একটি. Now, for the definite article the, number needs to be taken into account. For singular number, we add টা/টি. e.g. বই - বইটা, মানুষ - মানুষটা. But this is only used if the noun has a low animacy. We can safely say, বইটা (The book), বিড়ালটা (The cat), মানুষটা (The man), পাগলটা (The mad man). But we cannot say, রাষ্ট্রপতিটা - (gloss, the president); apparently, the affix is dropped as the animacy gets higher, so রাষ্ট্রপতি can mean 'both president' and 'the president'. For plural number, things are somewhat similar, adding গুলা/গুলি/গুলো at the end of a noun makes it plural and also has an implicit 'the'. So, বইগুলো - The books, বিড়ালগুলো - The cats, মানুষগুলো - The men. But we cannot say সন্যাসীগুলো - (gloss, the saints). For higher animacy plurality, রা or গণ is generally used, but these affixes express indefiniteness. For example, সন্যাসীরা/সন্যসীগণ means 'saints', NOT 'the saints'. This issue needs to be resolved.

Pronouns[edit]

Adjective[edit]

  • Adjectives can have genitive forms, eg. অল্পের জন্য বেঁচে গেছি। But this is only when the adjective is used as nouns, so we need to add these adjectives as nouns too

Verb[edit]

  • The gerund form of the verb can be used as nouns, so we need to add these gerunds into noun table, and mark them as inanimate.
  • Some verbs have alternate spelling that is equally acceptable, for eg. দেই - দিই for the verb দি - দে, apparently both forms are acceptable, so for the analyzing part, we'll need to be able to analyze both, Some more example would be উলটা - ওলটা, ঝুলা - ঝোলা, গুছা - গোছা। Right now will focus on only one of the forms

Adverb[edit]

  • We are marking all the adverbs as <adv> and have not marked <cnjadv> properly; this needs to be changed ASAP

Determiner[edit]

Enclitic/Proclitic[edit]

ও (O)[edit]

  • ও (0): e.g. করে - করেও, পড়ে - পড়েও, when added to past participles, adds the meaning of 'Despite' or 'In spite of'
  • সে পড়ে পাস করতে পারল না - He could not pass by studying. সে পড়েও পাস করতে পারল না - He could not pass, despite studying.
  • ও (0): The same enclitic as above, when added to nouns and pronouns, bears the sense of 'also'/'too'
  • বাড়িটা - বাড়িটাও -> সে বাড়িটাও বিক্রি করে দিল - He sold the house too.
  • আমি - আমিও -> সবার সাথে আমিও সেখানে গেলাম - I, too, went there or I, along with others, went there.
  • ও has the same effect on adjectives, adverbs and verbs
  • Verb - সে কাজ করে এবং খায়ও খুব - He works and also eats a lot.
  • Adjective - সে সুন্দরী এবং বুদ্ধিমতিও - She is pretty and intelligent as well.
  • Adverb - তুমি এভাবেও কাজটি করতে পার - You can also do the work in this way.
  • ও, When added after a gerund, it has the meaning of 'even' (adverb)
  • সে পড়ারও সময় পেল না - He did not even get the time to read.

ই (I)[edit]

  • When added after a verb, it acts as an emphasizer. e.g করব - করবই. আমি কাজটি করব - I shall do the work, আমি কাজটি করবই - I will do the work/ I shall surely do the work, same is for infinitive - করে - করেই e.g. আমি কাজটি করতে গেলাম - I went to do the work, আমি কাজটি করতেই গেলাম - I went only to do the work.
  • Adding after gerund is somewhat cosmetic, nevertheless it adds emphasis, ওখানে যাওয়াটাই ভুল ছিল - (emph) Going there was a mistake [Can anyone suggest a better translation? :(]
    : How about this? --> Geing there was indeed a mistake.
  • ই, added after nouns or pronouns, similarly adds emphasis. রহিমই দোষী - (emph) Rahim is guilty.
    : How about this? --> Rahim is indeed guilty.

Misc[edit]

  • The word 'কাছ':
সে আমার কাছে আসল - He came to me/ He came near me (Anubadok translates 'He came to me' - সে আমাকেতে আসেছিল, which is wrong ...)
সে আমার কাছের লোক - He is a close person of mine (The translation is still incorrect, I don't know the exact translation ...)
: How about this? --> He is a person close to me.
সে আমার কাছ থেকে বইটা নিল - He took the book from me.
  • Another word 'অল্প':
সে অল্পে খুশি (রয়েছে) - He is satisfied with less.
আমি অল্প খাই - I eat less
আমি অল্পের জন্য কাজটা করতে পারলাম না - I could not do the work for (less, or something) [it's tough doing word for word translation :(]