Difference between revisions of "Bengali and English/Issues"
Jump to navigation
Jump to search
Line 2: | Line 2: | ||
* Problem analyzing words with enclitic 'টি' - it can analyze "বিষয়" to "^বিষয়/বিষয়<n><nt><nn><sg><nom>/বিষয়<n><nt><nn><sg><obj>$" but can't analyze "বিষয়টি". Similarly can "বস্তু" but not "বস্তুটি" |
* Problem analyzing words with enclitic 'টি' - it can analyze "বিষয়" to "^বিষয়/বিষয়<n><nt><nn><sg><nom>/বিষয়<n><nt><nn><sg><obj>$" but can't analyze "বিষয়টি". Similarly can "বস্তু" but not "বস্তুটি" |
||
* Some words have entries in bn.dix, yet they are not being analyzed with "lt-proc -a bn-en.automorf.bin". Say, for "সময়", "জাতীয়" we have corresponding entries: |
|||
<e lm="সময়"><i>সময়</i><par n="গড়__n_mf" /></e> |
|||
<e lm="জাতীয়"><i>জাতীয়</i><par n="টক__adj" /></e> |
|||
still the output: |
|||
echo "সময়" | lt-proc -a bn-en.automorf.bin |
|||
^সময়/*সময়$ |
|||
echo "জাতীয়" | lt-proc -a bn-en.automorf.bin |
|||
^জাতীয়/*জাতীয়$ |
|||
==Tagset== |
==Tagset== |
Revision as of 10:05, 26 June 2011
Morphological Analyzer
- Problem analyzing words with enclitic 'টি' - it can analyze "বিষয়" to "^বিষয়/বিষয়<n><nt><nn><sg><nom>/বিষয়<n><nt><nn><sg><obj>$" but can't analyze "বিষয়টি". Similarly can "বস্তু" but not "বস্তুটি"
- Some words have entries in bn.dix, yet they are not being analyzed with "lt-proc -a bn-en.automorf.bin". Say, for "সময়", "জাতীয়" we have corresponding entries:
<e lm="সময়">সময়<par n="গড়__n_mf" /></e> <e lm="জাতীয়">জাতীয়<par n="টক__adj" /></e>
still the output:
echo "সময়" | lt-proc -a bn-en.automorf.bin ^সময়/*সময়$ echo "জাতীয়" | lt-proc -a bn-en.automorf.bin ^জাতীয়/*জাতীয়$
Tagset
- Confusion on animacy 'elite': what is the exact definition ? Is these correct examples of <el> - "ক্রেতা", "বিদ্রোহী", "সহকারী" ? And is these not <el> for sure - "মেয়র", "ম্যাজিস্ট্রেট", "উপাচার্য"
- Unmatched paradigm: "মামা", "চাচা" should be <m><hu> with the pardefs "ভাই__n_m" or "লোক__n_m" but neither of the two provides enough inflections for "মামার" or "চাচার"
- What is the exact difference between <mf><nn> and <nt><nn> ? What are the exclusive properties ?