Bengali and English/BugsAndIssues

From Apertium
Jump to navigation Jump to search

Nouns

# Only 800 tagged pure nouns from anubadok dictionary matched against CRBLP's 20K most freq used word list

  • need to tag more manually (en-es package has 5K approx. need to reach there)
  • Anubadok has about 2000 Nouns in its own list
  • Anubadok has about 2300 Proper Nouns in its own list
  1. Some nouns are always pl or sg, need to tag those
  2. We are excluding Proper nouns now
  3. We are excluding adjectives that can be used as nouns, right now
  4. We are keeping track the plural form generation through animacy, this is good, but in the long run need to come up with something more sophisticated
  5. Some nouns can have hybrid animacy, need to tag those later
  6. Should we tag the subtype of Noun?

Number

  1. মা - মারা , জনক - জনকরা - These are wrong, need to add rule to fix that, either mark them as irregular and entry in a separate table or just find the adequate rule for them, right is মা - মায়েরা, জনক - জনকেরা
  2. There is still some confusion on how to treat definite articles. In case of definite article, a, an is translated as একটা/একটি. Now for definite article the, number needs to be taken into account. For singular number, we add টা/টি. e.g. বই - বইটা, মানুষ - মানুষটা. But this is only used if the noun has a low animacy. We can safely say, বইটা (The book), বিড়ালটা (The cat), মানুষটা (The man), পাগলটা (The mad man), But we cannot say, রাষ্ট্রপতিটা - (gloss, the president), apparently, the affix is dropped as the animacy gets higher, so রাষ্ট্রপতি can mean 'both president' and 'the president'. For plural number, things are somewhat similar, adding গুলা/গুলি/গুলো at the end of a noun makes it plural and also has an implicit 'the'. So, বইগুলো - The books, বিড়ালগুলো - The cats, মানুষগুলো - The men. But we cannot say সন্যাসীগুলো - (gloss, the saints). For higher animacy plurality, রা or গণ is generally used, but these affixes express indefiniteness. For eg, সন্যাসীরা/সন্যসীগণ means 'saints', NOT 'the saints'. This issue needs to be resolved.

Pronouns

Adjective

  • Adjectives can have genitive forms, eg. অল্পের জন্য বেঁচে গেছি। But this is only when the adjective is used as nouns, so we need to add these adjectives as nouns too

Verb

  • The gerund form of the verb can be used as nouns, so we need to add these gerunds into noun table, and mark them as inanimate.
  • Some verbs have alternate spelling that is equally acceptable, for eg. দেই - দিই for the verb দি - দে, apparently both forms are acceptable, so for the analyzing part, we'll need to be able to analyze both, Some more example would be উলটা - ওলটা, ঝুলা - ঝোলা, গুছা - গোছা। Right now will focus on only one of the forms

Adverb

  • We are marking all the adverbs as <adv> and have not marked <cnjadv> properly, this needs to be changed ASAP

Determiner

Enclitic/Proclitic

ও (O)

  • ও (0): e.g. করে - করেও, পড়ে - পড়েও, when added to past participles, it adds the meaning of 'Inspite of'
  • সে পড়ে পাস করতে পারল না - He could not pass by studying. সে পড়েও পাস করতে পারল না - He could not pass in-spite of studying.
  • ও (0): The same enclitic as above, when added to nouns and pronouns, it bears the sense of 'also'/'too'
  • বাড়িটা - বাড়িটাও -> সে বাড়িটাও বিক্রি করে দিল - He sold the house too.
  • আমি - আমিও -> সবার সাথে আমিও সেখানে গেলাম - I too, along with others went there (Spectie, is the eng translation grammatically correct?)
  • ও has the same effect on adjective, adverb and verbs
  • Verb - সে কাজ করে এবং খায়ও খুব - He works and also eats a lot.
  • Adjective - সে সুন্দরী এবং বুদ্ধিমতিও - She is pretty and also intelligent.
  • Adverb - তুমি এভাবেও কাজটি করতে পার - You can also do the work in this way.
  • ও, When added after Gerund, it has the meaning of 'Even' (adverb)
  • সে পড়ারও সময় পেল না - He did not even get the time to read.

ই (I)

  • When added after verb, it acts as an emphasizer. e.g করব - করবই. আমি কাজটি করব - I shall do the work, আমি কাজটি করবই - I will do the work/ I shall surely do the work, same is for infinitive - করে - করেই e.g. আমি কাজটি করতে গেলাম - I went to do the work, আমি কাজটি করতেই গেলাম - I went only to do the work.
  • Adding after gerund is somewhat cosmetic, nevertheless it adds emphasis, ওখানে যাওয়াটাই ভুল ছিল - (emph) Going there was a mistake [Can anyone suggest a better translation? :(]
  • ই, added after nouns or pronouns, similarly adds emphasis. রহিমই দোষী - (emph) Rahim is guilty.

Misc

  • The word 'কাছ':
সে আমার কাছে আসল - He came to me/ He came near me (Anubadok translates 'He came to me' - সে আমাকেতে আসেছিল, which is wrong ...)
সে আমার কাছের লোক - He is a close person of mine (The translation is still incorrect, I don't know the exact translation ...)
সে আমার কাছ থেকে বইটা নিল - He took the book from me.
  • Another word 'অল্প':
সে অল্পে খুশি (রয়েছে) - He is satisfied with less.
আমি অল্প খাই - I eat less
আমি অল্পের জন্য কাজটা করতে পারলাম না - I could not do the work for (less, or something) [its tough doing word for word translation :(]