Difference between revisions of "Bengali and English/BugsAndIssues"
Jump to navigation
Jump to search
Darthxaher (talk | contribs) (→Nouns) |
Darthxaher (talk | contribs) (→Nouns) |
||
Line 13: | Line 13: | ||
# Some nouns can have hybrid animacy, need to tag those later |
# Some nouns can have hybrid animacy, need to tag those later |
||
# Should we tag the subtype of Noun? |
# Should we tag the subtype of Noun? |
||
# মা - মারা , জনক - জনকরা - These are wrong, need to add rule to fix that |
# মা - মারা , জনক - জনকরা - These are wrong, need to add rule to fix that, either mark them as irregular and entry in a separate table or just find the adequate rule for them |
||
== Pronouns == |
== Pronouns == |
Revision as of 13:57, 25 June 2009
Contents |
Nouns
- Only 800 tagged pure nouns from anubadok dictionary matched against CRBLP's 20K most freq used word list
- need to tag more manually (en-es package has 5K approx. need to reach there)
- Anubadok has about 2000 Nouns in its own list
- Anubadok has about 2300 Proper Nouns in its own list
- Some nouns are always pl or sg, need to tag those
- We are excluding Proper nouns now
- We are excluding adjectives that can be used as nouns, right now
- We are keeping track the plural form generation through animacy, this is good, but in the long run need to come up with something more sophisticated
- Some nouns can have hybrid animacy, need to tag those later
- Should we tag the subtype of Noun?
- মা - মারা , জনক - জনকরা - These are wrong, need to add rule to fix that, either mark them as irregular and entry in a separate table or just find the adequate rule for them