Difference between revisions of "Bengali and English/BugsAndIssues"

From Apertium
Jump to navigation Jump to search
Line 8: Line 8:
:* Anubadok has about 2300 Proper Nouns in its own list
:* Anubadok has about 2300 Proper Nouns in its own list
# Some nouns are always pl or sg, need to tag those
# Some nouns are always pl or sg, need to tag those
<s># We are excluding Proper nouns now</s>
# <s>We are excluding Proper nouns now</s>
# We are excluding adjectives that can be used as nouns, right now
# We are excluding adjectives that can be used as nouns, right now
# We are keeping track the plural form generation through animacy, this is good, but in the long run need to come up with something more sophisticated
# We are keeping track the plural form generation through animacy, this is good, but in the long run need to come up with something more sophisticated

Revision as of 00:15, 2 July 2009

Nouns

  1. Only 800 tagged pure nouns from anubadok dictionary matched against CRBLP's 20K most freq used word list
  • need to tag more manually (en-es package has 5K approx. need to reach there)
  • Anubadok has about 2000 Nouns in its own list
  • Anubadok has about 2300 Proper Nouns in its own list
  1. Some nouns are always pl or sg, need to tag those
  2. We are excluding Proper nouns now
  3. We are excluding adjectives that can be used as nouns, right now
  4. We are keeping track the plural form generation through animacy, this is good, but in the long run need to come up with something more sophisticated
  5. Some nouns can have hybrid animacy, need to tag those later
  6. Should we tag the subtype of Noun?
  7. মা - মারা , জনক - জনকরা - These are wrong, need to add rule to fix that, either mark them as irregular and entry in a separate table or just find the adequate rule for them

Pronouns

Adjective

Adjectives can have genitive forms, eg. অল্পের জন্য বেঁচে গেছি। But this is only when the adjective is used as nouns, so we need to add these adjectives as nouns too

Verb

Adverb

Determiner