Bengali and English/Updating Bilingual Dictionary

From Apertium
Jump to navigation Jump to search

We are going to try adding more adjective entries in the bn-en bdix. Assuming that we are in the apertium-bn-en folder (download it from the svn), try this,

lt-expand apertium-bn-en.bn.dix| grep '<adj>' | sed 's/:>:/:/g' | sed 's/:<:/:/g' | cut -f2 -d':' | tee /tmp/foo1 | sed 's/^/^/g' | sed 's/$/$/g' | \
sed 's/$/^.<sent>$/g'  | apertium-pretransfer  | apertium-transfer apertium-bn-en.bn-en.t1x bn-en.t1x.bin bn-en.autobil.bin | \
apertium-interchunk apertium-bn-en.bn-en.t2x bn-en.t2x.bin | apertium-postchunk apertium-bn-en.bn-en.t3x bn-en.t3x.bin  | tee /tmp/foo2 | \
lt-proc -g bn-en.autogen.bin > /tmp/foo3 && paste /tmp/foo1 /tmp/foo2 /tmp/foo3 | egrep -v '\+' | egrep -v '@' | cut -f1 | \
perl -pe 's/<comp>|<sup>//g' | python dev/uniq.py

we used grep '<adj>' to filter out the adjectives, and perl -pe 's/<comp>|<sup>//g' to remove the tags inflection tags from every adjective entry. Then we used uniq.py to filter the uniq entries instead of shell's 'uniq', which is not fully unicode compliant.