Difference between revisions of "User:Ragib06"

From Apertium
Jump to navigation Jump to search
Line 85: Line 85:
   
 
* 29815: OK
 
* 29815: OK
* 30071:
+
* 30071:
  +
 
Some confusion, these are not wrong, but need to be rechecked
 
Some confusion, these are not wrong, but need to be rechecked
 
<pre>
 
<pre>
Line 100: Line 101:
 
+ <e><p><l>সরাসরি<s n="adj"/><s n="mf"/></l><r>live<s n="adj"/></r></p></e>
 
+ <e><p><l>সরাসরি<s n="adj"/><s n="mf"/></l><r>live<s n="adj"/></r></p></e>
 
</pre>
 
</pre>
 
 
Wrong Meaning
 
Wrong Meaning
 
<pre>
 
<pre>
+ <e><p><l>ত্রুটিপূর্ণ<s n="adj"/><s n="mf"/></l><r>defected<s n="adj"/></r></p></e>
+
+ <e><p><l>ত্রুটিপূর্ণ<s n="adj"/><s n="mf"/></l><r>defected<s n="adj"/></r></p></e> - corrected to 'defective'
+ <e><p><l>বিরত<s n="adj"/><s n="mf"/></l><r>discontinued<s n="adj"/></r></p></e>
+
+ <e><p><l>বিরত<s n="adj"/><s n="mf"/></l><r>discontinued<s n="adj"/></r></p></e> - stopped ?
+ <e><p><l>সুনির্দিষ্ট<s n="adj"/><s n="mf"/></l><r>specified<s n="adj"/></r></p></e>
+
+ <e><p><l>সুনির্দিষ্ট<s n="adj"/><s n="mf"/></l><r>specified<s n="adj"/></r></p></e> - corrected to 'specific'
+ <e><p><l>বিশৃঙ্খল<s n="adj"/><s n="mf"/></l><r>caos<s n="adj"/></r></p></e>
+
+ <e><p><l>বিশৃঙ্খল<s n="adj"/><s n="mf"/></l><r>caos<s n="adj"/></r></p></e> - corrected to 'caotic'
+ <e><p><l>গোটা<s n="adj"/><s n="mf"/></l><r>full<s n="adj"/><s n="sint"/></r></p></e>
+
+ <e><p><l>গোটা<s n="adj"/><s n="mf"/></l><r>full<s n="adj"/><s n="sint"/></r></p></e> - corrected to 'whole'
+ <e><p><l>আতঙ্কিত<s n="adj"/><s n="mf"/></l><r>panicked<s n="adj"/></r></p></e>
+
+ <e><p><l>আতঙ্কিত<s n="adj"/><s n="mf"/></l><r>panicked<s n="adj"/></r></p></e> - corrected to 'terrified'
+ <e><p><l>সংখ্যালঘু<s n="adj"/><s n="mf"/></l><r>minor<s n="adj"/></r></p></e>
+
+ <e><p><l>সংখ্যালঘু<s n="adj"/><s n="mf"/></l><r>minor<s n="adj"/></r></p></e> - minority ? noun ?
+ <e><p><l>বিশদ<s n="adj"/><s n="mf"/></l><r>details<s n="adj"/></r></p></e>
+
+ <e><p><l>বিশদ<s n="adj"/><s n="mf"/></l><r>details<s n="adj"/></r></p></e> - corrected to 'evident'
+ <e><p><l>চকচকে<s n="adj"/><s n="mf"/></l><r>glitter<s n="adj"/></r></p></e>
+
+ <e><p><l>চকচকে<s n="adj"/><s n="mf"/></l><r>glitter<s n="adj"/></r></p></e> - corrected to 'shining'
+ <e><p><l>কাল<s n="adj"/><s n="mf"/></l><r>time<s n="adj"/></r></p></e>
+
+ <e><p><l>কাল<s n="adj"/><s n="mf"/></l><r>time<s n="adj"/></r></p></e> - tomorrow ? noun ?
+ <e><p><l>বিরক্তিকর<s n="adj"/><s n="mf"/></l><r>disturbing<s n="adj"/></r></p></e> - annoying is a better option
+
+ <e><p><l>বিরক্তিকর<s n="adj"/><s n="mf"/></l><r>disturbing<s n="adj"/></r></p></e> - annoying is a better option - corrected
 
</pre>
 
</pre>
 
 
Wrong Parts of Speech Tagging
 
Wrong Parts of Speech Tagging
   
 
Possible Nouns
 
Possible Nouns
 
<pre>
 
<pre>
+ <e><p><l>জনগণ<s n="adj"/><s n="mf"/></l><r>people<s n="adj"/></r></p></e>
+
+ <e><p><l>জনগণ<s n="adj"/><s n="mf"/></l><r>people<s n="adj"/></r></p></e> - removed from adj
+ <e><p><l>শহীদ<s n="adj"/><s n="mf"/></l><r>martyr<s n="adj"/></r></p></e>
+
+ <e><p><l>শহীদ<s n="adj"/><s n="mf"/></l><r>martyr<s n="adj"/></r></p></e> - removed from adj
+ <e><p><l>বিশ্বাসী<s n="adj"/><s n="mf"/></l><r>believer<s n="adj"/></r></p></e>
+
+ <e><p><l>বিশ্বাসী<s n="adj"/><s n="mf"/></l><r>believer<s n="adj"/></r></p></e> - removed from adj
 
</pre>
 
</pre>
 
 
Possible Adverbs
 
Possible Adverbs
 
<pre>
 
<pre>
+ <e><p><l>নিয়মিতভাবে<s n="adj"/><s n="mf"/></l><r>regularly<s n="adj"/></r></p></e>
+
+ <e><p><l>নিয়মিতভাবে<s n="adj"/><s n="mf"/></l><r>regularly<s n="adj"/></r></p></e> - removed from adj
+ <e><p><l>নিয়মিত<s n="adj"/><s n="mf"/></l><r>regularly<s n="adj"/></r></p></e>
+
+ <e><p><l>নিয়মিত<s n="adj"/><s n="mf"/></l><r>regularly<s n="adj"/></r></p></e> - corrected to 'regular' (adj)
 
</pre>
 
</pre>
 
 
Possible Spelling Mistake, please recheck
 
Possible Spelling Mistake, please recheck
 
<pre>
 
<pre>
+ <e><p><l>পচা<s n="adj"/><s n="mf"/></l><r>rotten<s n="adj"/></r></p></e>
+
+ <e><p><l>পচা<s n="adj"/><s n="mf"/></l><r>rotten<s n="adj"/></r></p></e> - corrected to পঁচা
+ <e><p><l>দূষিত<s n="adj"/><s n="mf"/></l><r>polluted<s n="adj"/></r></p></e>
+
+ <e><p><l>দূষিত<s n="adj"/><s n="mf"/></l><r>polluted<s n="adj"/></r></p></e> - ??
 
</pre>
 
</pre>
 
 
Capitalization
 
Capitalization
 
<pre>
 
<pre>
 
+ <e><p><l>বাংলাদেশী<s n="adj"/><s n="mf"/></l><r>Bangladeshi<s n="adj"/></r></p></e>
 
+ <e><p><l>বাংলাদেশী<s n="adj"/><s n="mf"/></l><r>Bangladeshi<s n="adj"/></r></p></e>
 
</pre>
 
</pre>
 
* 30075:
   
* 30075:
 
 
Wrong Parts of Speech Tagging
 
Wrong Parts of Speech Tagging
   
 
Possible Adjective
 
Possible Adjective
 
<pre>
 
<pre>
+ <e><p><l>ভরা<s n="n"/><s n="mf"/><s n="nn"/></l><r>full<s n="n"/></r></p></e>
+
+ <e><p><l>ভরা<s n="n"/><s n="mf"/><s n="nn"/></l><r>full<s n="n"/></r></p></e> - removed from noun
+ <e><p><l>আণবিক<s n="n"/><s n="mf"/><s n="nn"/></l><r>atomic<s n="n"/></r></p></e>
+
+ <e><p><l>আণবিক<s n="n"/><s n="mf"/><s n="nn"/></l><r>atomic<s n="n"/></r></p></e> - removed from noun
+ <e><p><l>সাম্রাজ্যবাদী<s n="n"/><s n="mf"/><s n="hu"/></l><r>imperialist<s n="n"/></r></p></e>
+
+ <e><p><l>সাম্রাজ্যবাদী<s n="n"/><s n="mf"/><s n="hu"/></l><r>imperialist<s n="n"/></r></p></e> - isn't noun ?
+ <e><p><l>আহত<s n="n"/><s n="mf"/><s n="nn"/></l><r>wounded<s n="n"/></r></p></e>
+
+ <e><p><l>আহত<s n="n"/><s n="mf"/><s n="nn"/></l><r>wounded<s n="n"/></r></p></e> - removed from noun
+ <e><p><l>হাজির<s n="n"/><s n="mf"/><s n="nn"/></l><r>present<s n="n"/></r></p></e>
+
+ <e><p><l>হাজির<s n="n"/><s n="mf"/><s n="nn"/></l><r>present<s n="n"/></r></p></e> - removed from noun
 
</pre>
 
</pre>
 
 
Wrong Meaning
 
Wrong Meaning
 
<pre>
 
<pre>
+ <e><p><l>নির্দেশ<s n="n"/><s n="mf"/><s n="nn"/></l><r>direction<s n="n"/></r></p></e> - order perhaps?
+
+ <e><p><l>নির্দেশ<s n="n"/><s n="mf"/><s n="nn"/></l><r>direction<s n="n"/></r></p></e> - order perhaps? - corrected
+ <e><p><l>সুশাসন<s n="n"/><s n="nt"/><s n="nn"/></l><r>justice<s n="n"/></r></p></e>
+
+ <e><p><l>সুশাসন<s n="n"/><s n="nt"/><s n="nn"/></l><r>justice<s n="n"/></r></p></e> - good administration ?
+ <e><p><l>বিদেশ<s n="n"/><s n="mf"/><s n="nn"/></l><r>foreign<s n="n"/></r></p></e>
+
+ <e><p><l>বিদেশ<s n="n"/><s n="mf"/><s n="nn"/></l><r>foreign<s n="n"/></r></p></e> - abroad ?
+ <e><p><l>পল্লী<s n="n"/><s n="mf"/><s n="nn"/></l><r>village<s n="n"/></r></p></e> - shoudn't it be 'rural'
+
+ <e><p><l>পল্লী<s n="n"/><s n="mf"/><s n="nn"/></l><r>village<s n="n"/></r></p></e> - shoudn't it be 'rural' - isn't 'rural' adj ?
 
+ <e><p><l>জিজ্ঞেস<s n="n"/><s n="mf"/><s n="nn"/></l><r>ask<s n="n"/></r></p></e>
 
+ <e><p><l>জিজ্ঞেস<s n="n"/><s n="mf"/><s n="nn"/></l><r>ask<s n="n"/></r></p></e>
+ <e><p><l>অনুসরণ<s n="n"/><s n="mf"/><s n="nn"/></l><r>follow<s n="n"/></r></p></e>
+
+ <e><p><l>অনুসরণ<s n="n"/><s n="mf"/><s n="nn"/></l><r>follow<s n="n"/></r></p></e> - corrected to 'following'
+ <e><p><l>মিয়া<s n="n"/><s n="mf"/><s n="nn"/></l><r>Mia<s n="n"/></r></p></e>
+
+ <e><p><l>মিয়া<s n="n"/><s n="mf"/><s n="nn"/></l><r>Mia<s n="n"/></r></p></e> - ??
+ <e><p><l>অপেক্ষা<s n="n"/><s n="mf"/><s n="nn"/></l><r>wait<s n="n"/></r></p></e> - waiting
+
+ <e><p><l>অপেক্ষা<s n="n"/><s n="mf"/><s n="nn"/></l><r>wait<s n="n"/></r></p></e> - waiting - corrected
 
</pre>
 
</pre>
 
 
Possible Spelling Mistake
 
Possible Spelling Mistake
 
<pre>
 
<pre>
+ <e><p><l>ঝাপসা<s n="n"/><s n="nt"/><s n="nn"/></l><r>blurry<s n="n"/></r></p></e>
+
+ <e><p><l>ঝাপসা<s n="n"/><s n="nt"/><s n="nn"/></l><r>blurry<s n="n"/></r></p></e> - ??
 
</pre>
 
</pre>
 
 
Better translation
 
Better translation
 
<pre>
 
<pre>
+ <e><p><l>বাড়িঘর<s n="n"/><s n="mf"/><s n="nn"/></l><r>house<s n="n"/></r></p></e> - household
+
+ <e><p><l>বাড়িঘর<s n="n"/><s n="mf"/><s n="nn"/></l><r>house<s n="n"/></r></p></e> - household - corrected
+ <e><p><l>জোট<s n="n"/><s n="mf"/><s n="nn"/></l><r>union<s n="n"/></r></p></e> - coalition
+
+ <e><p><l>জোট<s n="n"/><s n="mf"/><s n="nn"/></l><r>union<s n="n"/></r></p></e> - coalition - corrected
 
</pre>
 
</pre>
 
 
Capitalization
 
Capitalization
 
<pre>
 
<pre>
Line 182: Line 174:
 
+ <e><p><l>ডাচ<s n="n"/><s n="mf"/><s n="nn"/></l><r>dutch<s n="n"/></r></p></e>
 
+ <e><p><l>ডাচ<s n="n"/><s n="mf"/><s n="nn"/></l><r>dutch<s n="n"/></r></p></e>
 
</pre>
 
</pre>
 
 
Some confusion
 
Some confusion
 
<pre>
 
<pre>
Line 191: Line 182:
 
+ <e><p><l>নিন্দা<s n="n"/><s n="nt"/><s n="nn"/></l><r>disrepute<s n="n"/></r></p></e>
 
+ <e><p><l>নিন্দা<s n="n"/><s n="nt"/><s n="nn"/></l><r>disrepute<s n="n"/></r></p></e>
 
</pre>
 
</pre>
 
 
Missing Entry
 
Missing Entry
 
<pre>
 
<pre>
+ <e><p><l>ব্যাট<s n="n"/><s n="mf"/><s n="nn"/></l><r><s n="n"/></r></p></e>
+
+ <e><p><l>ব্যাট<s n="n"/><s n="mf"/><s n="nn"/></l><r><s n="n"/></r></p></e> - corrected to 'bat'
 
</pre>
 
</pre>
 
* 30090:
 
* 30090:
 
   
 
Confusing entry, please recheck
 
Confusing entry, please recheck
Line 206: Line 195:
 
+ <e><p><l>পালন<s n="n"/><s n="mf"/><s n="nn"/></l><r>maintenance<s n="n"/></r></p></e>
 
+ <e><p><l>পালন<s n="n"/><s n="mf"/><s n="nn"/></l><r>maintenance<s n="n"/></r></p></e>
 
</pre>
 
</pre>
 
 
Better translation possible?
 
Better translation possible?
 
<pre>
 
<pre>
Line 212: Line 200:
 
+ <e><p><l>আওতা<s n="n"/><s n="mf"/><s n="nn"/></l><r>custody<s n="n"/></r></p></e>
 
+ <e><p><l>আওতা<s n="n"/><s n="mf"/><s n="nn"/></l><r>custody<s n="n"/></r></p></e>
 
+ <e><p><l>সন্ধান<s n="n"/><s n="mf"/><s n="nn"/></l><r>search<s n="n"/></r></p></e>
 
+ <e><p><l>সন্ধান<s n="n"/><s n="mf"/><s n="nn"/></l><r>search<s n="n"/></r></p></e>
+ <e><p><l>অবরোধ<s n="n"/><s n="mf"/><s n="nn"/></l><r>blockage<s n="n"/></r></p></e> - blockade?
+
+ <e><p><l>অবরোধ<s n="n"/><s n="mf"/><s n="nn"/></l><r>blockage<s n="n"/></r></p></e> - blockade? - corrected
 
+ <e><p><l>ধাওয়া<s n="n"/><s n="mf"/><s n="nn"/></l><r>gallop<s n="n"/></r></p></e>
 
+ <e><p><l>ধাওয়া<s n="n"/><s n="mf"/><s n="nn"/></l><r>gallop<s n="n"/></r></p></e>
+ <e><p><l>ঘরবাড়ি<s n="n"/><s n="mf"/><s n="nn"/></l><r>house<s n="n"/></r></p></e> - household?
+
+ <e><p><l>ঘরবাড়ি<s n="n"/><s n="mf"/><s n="nn"/></l><r>house<s n="n"/></r></p></e> - household? - corrected
 
+ <e><p><l>প্রীতি<s n="n"/><s n="mf"/><s n="nn"/></l><r>pleasure<s n="n"/></r></p></e>
 
+ <e><p><l>প্রীতি<s n="n"/><s n="mf"/><s n="nn"/></l><r>pleasure<s n="n"/></r></p></e>
+ <e><p><l>মোকাবিলা<s n="n"/><s n="mf"/><s n="nn"/></l><r>face<s n="n"/></r></p></e> - confrontation
+
+ <e><p><l>মোকাবিলা<s n="n"/><s n="mf"/><s n="nn"/></l><r>face<s n="n"/></r></p></e> - confrontation - corrected
+ <e><p><l>সংস্কার<s n="n"/><s n="mf"/><s n="nn"/></l><r>purification<s n="n"/></r></p></e> - amendment?
+
+ <e><p><l>সংস্কার<s n="n"/><s n="mf"/><s n="nn"/></l><r>purification<s n="n"/></r></p></e> - amendment? - corrected
+ <e><p><l>স্থাপন<s n="n"/><s n="mf"/><s n="nn"/></l><r>place<s n="n"/></r></p></e> - placement?
+
+ <e><p><l>স্থাপন<s n="n"/><s n="mf"/><s n="nn"/></l><r>place<s n="n"/></r></p></e> - placement? - corrected
 
+ <e><p><l>অকৃত্রিম<s n="n"/><s n="nt"/><s n="nn"/></l><r>real<s n="n"/></r></p></e> - unfeigned?
 
+ <e><p><l>অকৃত্রিম<s n="n"/><s n="nt"/><s n="nn"/></l><r>real<s n="n"/></r></p></e> - unfeigned?
+ <e><p><l>পরিশোধ<s n="n"/><s n="mf"/><s n="nn"/></l><r>pay<s n="n"/></r></p></e> - payment?
+
+ <e><p><l>পরিশোধ<s n="n"/><s n="mf"/><s n="nn"/></l><r>pay<s n="n"/></r></p></e> - payment? - corrected
+ <e><p><l>যোগদান<s n="n"/><s n="mf"/><s n="nn"/></l><r>join<s n="n"/></r></p></e> - joining?
+
+ <e><p><l>যোগদান<s n="n"/><s n="mf"/><s n="nn"/></l><r>join<s n="n"/></r></p></e> - joining? - corrected
+ <e><p><l>রক্ষা<s n="n"/><s n="mf"/><s n="nn"/></l><r>protect<s n="n"/></r></p></e> - protection
+
+ <e><p><l>রক্ষা<s n="n"/><s n="mf"/><s n="nn"/></l><r>protect<s n="n"/></r></p></e> - protection - corrected
 
+ <e><p><l>রটানো<s n="n"/><s n="mf"/><s n="nn"/></l><r>rumor<s n="n"/></r></p></e> -
 
+ <e><p><l>রটানো<s n="n"/><s n="mf"/><s n="nn"/></l><r>rumor<s n="n"/></r></p></e> -
 
</pre>
 
</pre>
 
 
Note: Gerund forms must be matched, for e.g. যোগদান should be matched with 'joining' not 'join'
 
Note: Gerund forms must be matched, for e.g. যোগদান should be matched with 'joining' not 'join'
   
 
Wrong meaning
 
Wrong meaning
 
<pre>
 
<pre>
+ <e><p><l>দুর্গা<s n="n"/><s n="nt"/><s n="nn"/></l><r>fort<s n="n"/></r></p></e>
+
+ <e><p><l>দুর্গা<s n="n"/><s n="nt"/><s n="nn"/></l><r>fort<s n="n"/></r></p></e> - corrected to 'Durga'
+ <e><p><l>ছুটা<s n="n"/><s n="mf"/><s n="nn"/></l><r>run<s n="n"/></r></p></e> - 'running' is more appropriate
+
+ <e><p><l>ছুটা<s n="n"/><s n="mf"/><s n="nn"/></l><r>run<s n="n"/></r></p></e> - 'running' is more appropriate - corrected
+ <e><p><l>তরুণী<s n="n"/><s n="f"/><s n="hu"/></l><r>young<s n="n"/></r></p></e>
+
+ <e><p><l>তরুণী<s n="n"/><s n="f"/><s n="hu"/></l><r>young<s n="n"/></r></p></e> - young girl/young lady ?
 
+ <e><p><l>রাজি<s n="n"/><s n="mf"/><s n="nn"/></l><r>agreement<s n="n"/></r></p></e>
 
+ <e><p><l>রাজি<s n="n"/><s n="mf"/><s n="nn"/></l><r>agreement<s n="n"/></r></p></e>
+ <e><p><l>জবানবন্দি<s n="n"/><s n="mf"/><s n="nn"/></l><r>witness<s n="n"/></r></p></e> - witness is for স্বাক্ষী
+
+ <e><p><l>জবানবন্দি<s n="n"/><s n="mf"/><s n="nn"/></l><r>witness<s n="n"/></r></p></e> - witness is for স্বাক্ষী - corrected to 'testimony'
 
+ <e><p><l>পেয়ার<s n="n"/><s n="mf"/><s n="nn"/></l><r>pair<s n="n"/></r></p></e>
 
+ <e><p><l>পেয়ার<s n="n"/><s n="mf"/><s n="nn"/></l><r>pair<s n="n"/></r></p></e>
+ <e><p><l>সহসভাপতি<s n="n"/><s n="mf"/><s n="hu"/></l><r>vice<s n="n"/></r></p></e> - vice chairman, perhaps?
+
+ <e><p><l>সহসভাপতি<s n="n"/><s n="mf"/><s n="hu"/></l><r>vice<s n="n"/></r></p></e> - vice chairman, perhaps? - already corrected to 'vice president'
+ <e><p><l>সংযম<s n="n"/><s n="nt"/><s n="nn"/></l><r>restrained<s n="n"/></r></p></e> - restrain
+
+ <e><p><l>সংযম<s n="n"/><s n="nt"/><s n="nn"/></l><r>restrained<s n="n"/></r></p></e> - restrain - already corrected
+ <e><p><l>আটক<s n="n"/><s n="mf"/><s n="nn"/></l><r>imprisoned<s n="n"/></r></p></e>
+
+ <e><p><l>আটক<s n="n"/><s n="mf"/><s n="nn"/></l><r>imprisoned<s n="n"/></r></p></e> corrected to 'captive'
 
</pre>
 
</pre>
 
 
Possible Adjective
 
Possible Adjective
 
<pre>
 
<pre>
+ <e><p><l>জড়ো<s n="n"/><s n="mf"/><s n="nn"/></l><r>collected<s n="n"/></r></p></e>
+
+ <e><p><l>জড়ো<s n="n"/><s n="mf"/><s n="nn"/></l><r>collected<s n="n"/></r></p></e> - removed from noun
+ <e><p><l>গোছানো<s n="n"/><s n="mf"/><s n="nn"/></l><r>organised<s n="n"/></r></p></e>
+
+ <e><p><l>গোছানো<s n="n"/><s n="mf"/><s n="nn"/></l><r>organised<s n="n"/></r></p></e> - removed from noun
+ <e><p><l>অন্তর্বর্তীকালীন<s n="n"/><s n="nt"/><s n="nn"/></l><r>interum<s n="n"/></r></p></e>
+
+ <e><p><l>অন্তর্বর্তীকালীন<s n="n"/><s n="nt"/><s n="nn"/></l><r>interum<s n="n"/></r></p></e> - removed from noun
+ <e><p><l>বিপুল<s n="n"/><s n="mf"/><s n="nn"/></l><r>many<s n="n"/></r></p></e>
+
+ <e><p><l>বিপুল<s n="n"/><s n="mf"/><s n="nn"/></l><r>many<s n="n"/></r></p></e> - removed from noun
+ <e><p><l>মোটামুটি<s n="n"/><s n="nt"/><s n="nn"/></l><r>moderate<s n="n"/></r></p></e>
+
+ <e><p><l>মোটামুটি<s n="n"/><s n="nt"/><s n="nn"/></l><r>moderate<s n="n"/></r></p></e> - removed from noun
+ <e><p><l>অর্ধেক<s n="n"/><s n="mf"/><s n="nn"/></l><r>half<s n="n"/></r></p></e>
+
+ <e><p><l>অর্ধেক<s n="n"/><s n="mf"/><s n="nn"/></l><r>half<s n="n"/></r></p></e> - removed from noun
+ <e><p><l>সামান্য<s n="n"/><s n="mf"/><s n="nn"/></l><r>few<s n="n"/></r></p></e>
+
+ <e><p><l>সামান্য<s n="n"/><s n="mf"/><s n="nn"/></l><r>few<s n="n"/></r></p></e> - removed from noun
+ <e><p><l>অক্ষম<s n="n"/><s n="nt"/><s n="nn"/></l><r>unable<s n="n"/></r></p></e>
+
+ <e><p><l>অক্ষম<s n="n"/><s n="nt"/><s n="nn"/></l><r>unable<s n="n"/></r></p></e> - removed from noun
+ <e><p><l>প্রতিবন্ধী<s n="n"/><s n="mf"/><s n="hu"/></l><r>handicapped<s n="n"/></r></p></e>
+
+ <e><p><l>প্রতিবন্ধী<s n="n"/><s n="mf"/><s n="hu"/></l><r>handicapped<s n="n"/></r></p></e> - removed from noun
+ <e><p><l>প্রচুর<s n="n"/><s n="nt"/><s n="nn"/></l><r>many<s n="n"/></r></p></e>
+
+ <e><p><l>প্রচুর<s n="n"/><s n="nt"/><s n="nn"/></l><r>many<s n="n"/></r></p></e> - removed from noun
+ <e><p><l>অধীর<s n="n"/><s n="nt"/><s n="nn"/></l><r>eager<s n="n"/></r></p></e>
+
+ <e><p><l>অধীর<s n="n"/><s n="nt"/><s n="nn"/></l><r>eager<s n="n"/></r></p></e> - removed from noun
 
</pre>
 
</pre>
 
 
Capitalization
 
Capitalization
 
<pre>
 
<pre>

Revision as of 16:02, 2 June 2011

I'm Ragib Ahsan from Bangladesh. I'm currently an undergrad student of Computer Science and Engineering Department in Bangladesh University of Engineering and Technology.

I'm willing to participate in Google Summer of Code 2011 with apertium. And I'm interested in adopting the new Bengali-English language pair.



Apertium Bengali-English

Currently the morphological analyzer is nearly complete with 68% coverage of wiki. The bilingual dictionary needs a lot of entries and finally the transfer system has only a few rules to work with.

Some example outputs are -

I eat rice -> আমি ধান খাই
I love you -> আমি আপনাকে ভালবাসি

You can find a list of tests here

My project goal should be as follows:

  • Completing the monolingual dictionary for Bengali upto a wide coverage (at least 80%) of wiki.
  • Completing the bilingual dictionary with necessary entries.
  • Writing the transfer rules, that will be a challenging part as the two languages are not closely related.
  • Finally, performing tetsvocing to ensure release quality


Preparing Myself

I've downloaded and installed the "apertium-bn-en" pack from the apertium incubator. And I'm really excited playing around with it in my system. I've gone through the Apertium New Language Pair HOWTO already. I tried to have a look at the Apertium Official Documentation. It seems really complex. I'm discussing various issues with the prospective mentors Francis Tyers and Abu Zaher. With their help and some exploring on the apertium-bn-en project I've finally prepared my proposal for This years GSoC. You can find it here.

I also found the paper on Bengali Morphological Analyzer[1] quite interesting. And last but not the least I'm trying to solve some of the challenges given on this project. Check here.


Community Bonding Period

The community bonding period started right after the announcement of accepted student proposal on April 25. I had my plans for this period. As mentioned in my work plan I'm exploring the Apertium tool chain to be familiarized with it. I'm regular on the IRC and I'm getting to know the community more closely. In the meantime, I'm working on putting some new bdix entries to reduce the load a bit from the coding phase. I also have a plan to prepare a test case list for testvocing at the end. I'm putting some remarks here and intend to update time to time-

Week 1: April 26 - May 02, 2011

  • Familiarizing with using the svn
  • After building the bn-en with make tool we need to copy the '*.mode' files from 'modes' directory to the systems installation directory i.e. /usr/share/apertium/modes
  • 240 new adjectives added to apertium-bn-en.bn-en.dix
  • 147 new nouns added to apertium-bn-en.bn-en.dix
  • Planning for generating a test case list

Week 2: May 03 - May 09, 2011

  • 156 new nouns added to apertium-bn-en.bn-en.dix
  • 100 new adjectives added to apertium-bn-en.bn-en.dix
  • Another 100 new adjectives added to apertium-bn-en.bn-en.dix
  • Studying the apertium-doc, currently, getting to know the different modules of apertium
  • Correcting some minor errors in the previous entries and misplaced lemmas (nouns and adjectives)
  • Adding some test cases here
  • Added another 138 adjectives to apertium-bn-en.bn-en.dix with some minor corrections

Week 3: May 10 - May 16, 2011

  • Studying the apertium-doc, currently, getting to know the different modules of apertium
  • 87 new adjectives added to apertium-bn-en.bn-en.dix with minor corrections
  • studying the Bengali verbs issues specially those mentioned in the paper on Bengali Morphological Analyzer[1]


Week 4: May 17 - May 22, 2011

  • Studying the apertium-bn-en a bit more closely
  • Generating some more test cases


Coding Phase

Week 1: May 23 - May 29, 2011

  • 44 adjectives added to bidix with some minor corrections
  • more adjectives added to bidix
  • 67 nouns added to bidix
  • 45 nouns added to bidix with some corrections on previous entry
  • learning monodix basics

Week 2: May 30 - June 5, 2011

  • corrections on previous entries
  • some scripts added to /dev/gsoc2011/adjective & /dev/gsoc2011/noun
  • again corrections based on code review

Code Review

SVN Commits:

  • 29815: OK
  • 30071:

Some confusion, these are not wrong, but need to be rechecked

+    <e><p><l>অবস্থিত<s n="adj"/><s n="mf"/></l><r>placed<s n="adj"/></r></p></e>
+    <e><p><l>সম্মত<s n="adj"/><s n="mf"/></l><r>agreed<s n="adj"/></r></p></e>
+    <e><p><l>দীর্ঘায়িত<s n="adj"/><s n="mf"/></l><r>lengthened<s n="adj"/></r></p></e>
+    <e><p><l>সমাপনী<s n="adj"/><s n="mf"/></l><r>closing<s n="adj"/></r></p></e>
+    <e><p><l>ছাড়<s n="adj"/><s n="mf"/></l><r>concession<s n="adj"/></r></p></e>
+    <e><p><l>নিকটবর্তী<s n="adj"/><s n="mf"/></l><r>near<s n="adj"/><s n="sint"/></r></p></e>
+    <e><p><l>ধর্মীয়<s n="adj"/><s n="mf"/></l><r>religious<s n="adj"/></r></p></e>
+    <e><p><l>বিশিষ্ট<s n="adj"/><s n="mf"/></l><r>special<s n="adj"/></r></p></e>
+    <e><p><l>মঞ্চস্থ<s n="adj"/><s n="mf"/></l><r>staged<s n="adj"/></r></p></e>
+    <e><p><l>নিকট<s n="adj"/><s n="mf"/></l><r>near<s n="adj"/><s n="sint"/></r></p></e>
+    <e><p><l>সরাসরি<s n="adj"/><s n="mf"/></l><r>live<s n="adj"/></r></p></e>

Wrong Meaning

+    <e><p><l>ত্রুটিপূর্ণ<s n="adj"/><s n="mf"/></l><r>defected<s n="adj"/></r></p></e> - corrected to 'defective'
+    <e><p><l>বিরত<s n="adj"/><s n="mf"/></l><r>discontinued<s n="adj"/></r></p></e> - stopped ?
+    <e><p><l>সুনির্দিষ্ট<s n="adj"/><s n="mf"/></l><r>specified<s n="adj"/></r></p></e> - corrected to 'specific'
+    <e><p><l>বিশৃঙ্খল<s n="adj"/><s n="mf"/></l><r>caos<s n="adj"/></r></p></e> - corrected to 'caotic'
+    <e><p><l>গোটা<s n="adj"/><s n="mf"/></l><r>full<s n="adj"/><s n="sint"/></r></p></e> - corrected to 'whole'
+    <e><p><l>আতঙ্কিত<s n="adj"/><s n="mf"/></l><r>panicked<s n="adj"/></r></p></e> - corrected to 'terrified'
+    <e><p><l>সংখ্যালঘু<s n="adj"/><s n="mf"/></l><r>minor<s n="adj"/></r></p></e> - minority ? noun ?
+    <e><p><l>বিশদ<s n="adj"/><s n="mf"/></l><r>details<s n="adj"/></r></p></e> - corrected to 'evident'
+    <e><p><l>চকচকে<s n="adj"/><s n="mf"/></l><r>glitter<s n="adj"/></r></p></e> - corrected to 'shining'
+    <e><p><l>কাল<s n="adj"/><s n="mf"/></l><r>time<s n="adj"/></r></p></e> - tomorrow ? noun ?
+    <e><p><l>বিরক্তিকর<s n="adj"/><s n="mf"/></l><r>disturbing<s n="adj"/></r></p></e> - annoying is a better option - corrected

Wrong Parts of Speech Tagging

Possible Nouns

+    <e><p><l>জনগণ<s n="adj"/><s n="mf"/></l><r>people<s n="adj"/></r></p></e> - removed from adj
+    <e><p><l>শহীদ<s n="adj"/><s n="mf"/></l><r>martyr<s n="adj"/></r></p></e> - removed from adj
+    <e><p><l>বিশ্বাসী<s n="adj"/><s n="mf"/></l><r>believer<s n="adj"/></r></p></e> - removed from adj

Possible Adverbs

+    <e><p><l>নিয়মিতভাবে<s n="adj"/><s n="mf"/></l><r>regularly<s n="adj"/></r></p></e> - removed from adj
+    <e><p><l>নিয়মিত<s n="adj"/><s n="mf"/></l><r>regularly<s n="adj"/></r></p></e> - corrected to 'regular' (adj)

Possible Spelling Mistake, please recheck

+    <e><p><l>পচা<s n="adj"/><s n="mf"/></l><r>rotten<s n="adj"/></r></p></e> - corrected to পঁচা
+    <e><p><l>দূষিত<s n="adj"/><s n="mf"/></l><r>polluted<s n="adj"/></r></p></e> - ??

Capitalization

+    <e><p><l>বাংলাদেশী<s n="adj"/><s n="mf"/></l><r>Bangladeshi<s n="adj"/></r></p></e>
   * 30075: 

Wrong Parts of Speech Tagging

Possible Adjective

+    <e><p><l>ভরা<s n="n"/><s n="mf"/><s n="nn"/></l><r>full<s n="n"/></r></p></e> - removed from noun
+    <e><p><l>আণবিক<s n="n"/><s n="mf"/><s n="nn"/></l><r>atomic<s n="n"/></r></p></e> - removed from noun
+    <e><p><l>সাম্রাজ্যবাদী<s n="n"/><s n="mf"/><s n="hu"/></l><r>imperialist<s n="n"/></r></p></e> - isn't noun ?
+    <e><p><l>আহত<s n="n"/><s n="mf"/><s n="nn"/></l><r>wounded<s n="n"/></r></p></e> - removed from noun
+    <e><p><l>হাজির<s n="n"/><s n="mf"/><s n="nn"/></l><r>present<s n="n"/></r></p></e> - removed from noun

Wrong Meaning

+    <e><p><l>নির্দেশ<s n="n"/><s n="mf"/><s n="nn"/></l><r>direction<s n="n"/></r></p></e> - order perhaps? - corrected
+    <e><p><l>সুশাসন<s n="n"/><s n="nt"/><s n="nn"/></l><r>justice<s n="n"/></r></p></e> - good administration ?
+    <e><p><l>বিদেশ<s n="n"/><s n="mf"/><s n="nn"/></l><r>foreign<s n="n"/></r></p></e> - abroad ?
+    <e><p><l>পল্লী<s n="n"/><s n="mf"/><s n="nn"/></l><r>village<s n="n"/></r></p></e> - shoudn't it be 'rural' - isn't 'rural' adj ?
+    <e><p><l>জিজ্ঞেস<s n="n"/><s n="mf"/><s n="nn"/></l><r>ask<s n="n"/></r></p></e>
+    <e><p><l>অনুসরণ<s n="n"/><s n="mf"/><s n="nn"/></l><r>follow<s n="n"/></r></p></e> - corrected to 'following'
+    <e><p><l>মিয়া<s n="n"/><s n="mf"/><s n="nn"/></l><r>Mia<s n="n"/></r></p></e> - ??
+    <e><p><l>অপেক্ষা<s n="n"/><s n="mf"/><s n="nn"/></l><r>wait<s n="n"/></r></p></e> - waiting - corrected

Possible Spelling Mistake

+    <e><p><l>ঝাপসা<s n="n"/><s n="nt"/><s n="nn"/></l><r>blurry<s n="n"/></r></p></e> - ??

Better translation

+    <e><p><l>বাড়িঘর<s n="n"/><s n="mf"/><s n="nn"/></l><r>house<s n="n"/></r></p></e> - household - corrected
+    <e><p><l>জোট<s n="n"/><s n="mf"/><s n="nn"/></l><r>union<s n="n"/></r></p></e> - coalition - corrected

Capitalization

+    <e><p><l>তামিল<s n="n"/><s n="mf"/><s n="nn"/></l><r>tamil<s n="n"/></r></p></e>
+    <e><p><l>ডাচ<s n="n"/><s n="mf"/><s n="nn"/></l><r>dutch<s n="n"/></r></p></e>

Some confusion

+    <e><p><l>রওনা<s n="n"/><s n="mf"/><s n="nn"/></l><r>start<s n="n"/></r></p></e>
+    <e><p><l>বেড়ানো<s n="n"/><s n="mf"/><s n="nn"/></l><r>touring<s n="n"/></r></p></e>
+    <e><p><l>ছিনতাই<s n="n"/><s n="mf"/><s n="nn"/></l><r>snatch<s n="n"/></r></p></e>
+    <e><p><l>অসহযোগ<s n="n"/><s n="nt"/><s n="nn"/></l><r>noncooperation<s n="n"/></r></p></e>
+    <e><p><l>নিন্দা<s n="n"/><s n="nt"/><s n="nn"/></l><r>disrepute<s n="n"/></r></p></e>

Missing Entry

+    <e><p><l>ব্যাট<s n="n"/><s n="mf"/><s n="nn"/></l><r><s n="n"/></r></p></e> - corrected to 'bat'
   * 30090: 

Confusing entry, please recheck

+    <e><p><l>টের<s n="n"/><s n="mf"/><s n="nn"/></l><r>sensation<s n="n"/></r></p></e>
+    <e><p><l>তৃণমূল<s n="n"/><s n="nt"/><s n="nn"/></l><r>grassroot<s n="n"/></r></p></e>
+    <e><p><l>প্রণয়ন<s n="n"/><s n="mf"/><s n="nn"/></l><r>composition<s n="n"/></r></p></e>
+    <e><p><l>পালন<s n="n"/><s n="mf"/><s n="nn"/></l><r>maintenance<s n="n"/></r></p></e>

Better translation possible?

+    <e><p><l>তোলা<s n="n"/><s n="mf"/><s n="nn"/></l><r>picking<s n="n"/></r></p></e>
+    <e><p><l>আওতা<s n="n"/><s n="mf"/><s n="nn"/></l><r>custody<s n="n"/></r></p></e>
+    <e><p><l>সন্ধান<s n="n"/><s n="mf"/><s n="nn"/></l><r>search<s n="n"/></r></p></e>
+    <e><p><l>অবরোধ<s n="n"/><s n="mf"/><s n="nn"/></l><r>blockage<s n="n"/></r></p></e> - blockade? - corrected
+    <e><p><l>ধাওয়া<s n="n"/><s n="mf"/><s n="nn"/></l><r>gallop<s n="n"/></r></p></e>
+    <e><p><l>ঘরবাড়ি<s n="n"/><s n="mf"/><s n="nn"/></l><r>house<s n="n"/></r></p></e> - household? - corrected
+    <e><p><l>প্রীতি<s n="n"/><s n="mf"/><s n="nn"/></l><r>pleasure<s n="n"/></r></p></e>
+    <e><p><l>মোকাবিলা<s n="n"/><s n="mf"/><s n="nn"/></l><r>face<s n="n"/></r></p></e> - confrontation - corrected
+    <e><p><l>সংস্কার<s n="n"/><s n="mf"/><s n="nn"/></l><r>purification<s n="n"/></r></p></e> - amendment? - corrected
+    <e><p><l>স্থাপন<s n="n"/><s n="mf"/><s n="nn"/></l><r>place<s n="n"/></r></p></e> - placement? - corrected
+    <e><p><l>অকৃত্রিম<s n="n"/><s n="nt"/><s n="nn"/></l><r>real<s n="n"/></r></p></e> - unfeigned?
+    <e><p><l>পরিশোধ<s n="n"/><s n="mf"/><s n="nn"/></l><r>pay<s n="n"/></r></p></e> - payment? - corrected
+    <e><p><l>যোগদান<s n="n"/><s n="mf"/><s n="nn"/></l><r>join<s n="n"/></r></p></e> - joining? - corrected
+    <e><p><l>রক্ষা<s n="n"/><s n="mf"/><s n="nn"/></l><r>protect<s n="n"/></r></p></e> - protection - corrected
+    <e><p><l>রটানো<s n="n"/><s n="mf"/><s n="nn"/></l><r>rumor<s n="n"/></r></p></e> - 

Note: Gerund forms must be matched, for e.g. যোগদান should be matched with 'joining' not 'join'

Wrong meaning

+    <e><p><l>দুর্গা<s n="n"/><s n="nt"/><s n="nn"/></l><r>fort<s n="n"/></r></p></e> - corrected to 'Durga'
+    <e><p><l>ছুটা<s n="n"/><s n="mf"/><s n="nn"/></l><r>run<s n="n"/></r></p></e> - 'running' is more appropriate - corrected
+    <e><p><l>তরুণী<s n="n"/><s n="f"/><s n="hu"/></l><r>young<s n="n"/></r></p></e> - young girl/young lady ?
+    <e><p><l>রাজি<s n="n"/><s n="mf"/><s n="nn"/></l><r>agreement<s n="n"/></r></p></e>
+    <e><p><l>জবানবন্দি<s n="n"/><s n="mf"/><s n="nn"/></l><r>witness<s n="n"/></r></p></e> - witness is for স্বাক্ষী - corrected to 'testimony'
+    <e><p><l>পেয়ার<s n="n"/><s n="mf"/><s n="nn"/></l><r>pair<s n="n"/></r></p></e>
+    <e><p><l>সহসভাপতি<s n="n"/><s n="mf"/><s n="hu"/></l><r>vice<s n="n"/></r></p></e> - vice chairman, perhaps? - already corrected to 'vice president'
+    <e><p><l>সংযম<s n="n"/><s n="nt"/><s n="nn"/></l><r>restrained<s n="n"/></r></p></e> - restrain - already corrected
+    <e><p><l>আটক<s n="n"/><s n="mf"/><s n="nn"/></l><r>imprisoned<s n="n"/></r></p></e>  corrected to 'captive'

Possible Adjective

+    <e><p><l>জড়ো<s n="n"/><s n="mf"/><s n="nn"/></l><r>collected<s n="n"/></r></p></e> - removed from noun
+    <e><p><l>গোছানো<s n="n"/><s n="mf"/><s n="nn"/></l><r>organised<s n="n"/></r></p></e> - removed from noun
+    <e><p><l>অন্তর্বর্তীকালীন<s n="n"/><s n="nt"/><s n="nn"/></l><r>interum<s n="n"/></r></p></e> - removed from noun
+    <e><p><l>বিপুল<s n="n"/><s n="mf"/><s n="nn"/></l><r>many<s n="n"/></r></p></e> - removed from noun
+    <e><p><l>মোটামুটি<s n="n"/><s n="nt"/><s n="nn"/></l><r>moderate<s n="n"/></r></p></e> - removed from noun
+    <e><p><l>অর্ধেক<s n="n"/><s n="mf"/><s n="nn"/></l><r>half<s n="n"/></r></p></e> - removed from noun
+    <e><p><l>সামান্য<s n="n"/><s n="mf"/><s n="nn"/></l><r>few<s n="n"/></r></p></e> - removed from noun
+    <e><p><l>অক্ষম<s n="n"/><s n="nt"/><s n="nn"/></l><r>unable<s n="n"/></r></p></e> - removed from noun
+    <e><p><l>প্রতিবন্ধী<s n="n"/><s n="mf"/><s n="hu"/></l><r>handicapped<s n="n"/></r></p></e> - removed from noun
+    <e><p><l>প্রচুর<s n="n"/><s n="nt"/><s n="nn"/></l><r>many<s n="n"/></r></p></e> - removed from noun
+    <e><p><l>অধীর<s n="n"/><s n="nt"/><s n="nn"/></l><r>eager<s n="n"/></r></p></e> - removed from noun

Capitalization

+    <e><p><l>বাঙালি<s n="n"/><s n="mf"/><s n="hu"/></l><r>bangalee<s n="n"/></r></p></e>
  • 30095:
  • 30104:
  • 30105:
  • 30108:
  • 30109:
  • 30111:
  • 30161:
  • 30162:
  • 30214:
  • 30286:
  • 30338:
  • 30409:
  • 30419:
  • 30453: