Difference between revisions of "Bengali and English/Updating Bilingual Dictionary"

Revision as of 12:11, 25 August 2009

We are going to try adding more adjective entries in the bn-en bdix. Assuming that we are in the apertium-bn-en folder (download it from the svn), try this,

lt-expand apertium-bn-en.bn.dix| grep '<adj>' | sed 's/:>:/:/g' | sed 's/:<:/:/g' | cut -f2 -d':' | tee /tmp/foo1 | sed 's/^/^/g' | sed 's/$/$/g' | \
sed 's/$/^.<sent>$/g'  | apertium-pretransfer  | apertium-transfer apertium-bn-en.bn-en.t1x bn-en.t1x.bin bn-en.autobil.bin | \
apertium-interchunk apertium-bn-en.bn-en.t2x bn-en.t2x.bin | apertium-postchunk apertium-bn-en.bn-en.t3x bn-en.t3x.bin  | tee /tmp/foo2 | \
lt-proc -g bn-en.autogen.bin > /tmp/foo3 && paste /tmp/foo1 /tmp/foo2 /tmp/foo3 | egrep -v '\+' | egrep -v '@' | cut -f1 | \
perl -pe 's/<comp>|<sup>//g' | python dev/uniq.py

we used grep '<adj>' to filter out the adjectives, and perl -pe 's/<comp>|<sup>//g' to remove the tags inflection tags from every adjective entry. Then we used uniq.py to filter the uniq entries instead of shell's 'uniq', which is not fully Unicode compliant.

Assume that we have this output saved in dev/bdix/adjective.list file. Let's see how the file looks like in the first glance.

চিহ্নিত<adj><mf>
মঞ্চস্থ<adj><mf>
ভিন্ন<adj><mf>
শিক্ষিত<adj><mf>
শাসক<adj><mf>
অন্তর্ভুক্ত<adj><mf>

Now we are going to add corresponding English entries to this file. So after adding entries the file looks like this

চিহ্নিত<adj><mf>    marked
মঞ্চস্থ<adj><mf>    #
ভিন্ন<adj><mf>    different
শিক্ষিত<adj><mf>    educated
শাসক<adj><mf>    ruler    !
অন্তর্ভুক্ত<adj><mf>    included

Difference between revisions of "Bengali and English/Updating Bilingual Dictionary"

Revision as of 12:11, 25 August 2009

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools