Difference between revisions of "User:Gourab337/GSoC2021-Workplan-Control"
(Added Coverage) |
Hectoralos (talk | contribs) |
||
(11 intermediate revisions by 2 users not shown) | |||
Line 39: | Line 39: | ||
| |
| |
||
| |
| |
||
− | |apertium-ben: |
+ | |apertium-ben:<br> |
− | Main paradigms: n, adj, vblex, vbser, adv, pr, post, cnjcoo, cnjsub, cnjadv, det, num, prn |
+ | Main paradigms: n, adj, vblex, vbser, adv, pr, post, cnjcoo, cnjsub, cnjadv, det, num, prn<br> |
Add/check words: pr, post, cnjcoo, cnjsub, cnjadv, det, num, prn |
Add/check words: pr, post, cnjcoo, cnjsub, cnjadv, det, num, prn |
||
|pr, post, cnjcoo, cnjsub, cnjsub, num, det, prn |
|pr, post, cnjcoo, cnjsub, cnjsub, num, det, prn |
||
⚫ | |||
− | |8370 |
||
⚫ | |||
⚫ | |||
− | |||
− | post: 70 |
||
− | cnj: 87 |
||
− | num: 123 |
||
− | det: 68 |
||
− | prn: 52 |
||
− | |762 |
||
⚫ | |||
− | |||
− | post: 53 |
||
− | cnj: 123 |
||
− | num: 2 |
||
− | det: 31 |
||
− | prn: 66 |
||
| |
| |
||
− | | |
+ | |hin-ben: ~33.3%<br> |
− | ben: |
+ | ben-hin: ~20.4%<br> |
+ | ben: ~67.9% |
||
| |
| |
||
| |
| |
||
Line 72: | Line 59: | ||
| |
| |
||
|preparing scripts for adding words from the available free data into the dictionaries |
|preparing scripts for adding words from the available free data into the dictionaries |
||
+ | |6637 |
||
+ | |895 |
||
| |
| |
||
⚫ | |||
− | | |
||
+ | ben-hin: ~29.7%<br> |
||
− | | |
||
+ | ben: ~67.7% |
||
− | | |
||
| |
| |
||
| |
| |
||
Line 85: | Line 74: | ||
| |
| |
||
| |
| |
||
− | |Key transfer rules hin > ben to avoid # |
+ | |Key transfer rules hin > ben to avoid #<br> |
− | + | Eventually: the same for ben > hin<br> |
|
Manual disambiguation of Hindi texts |
Manual disambiguation of Hindi texts |
||
+ | |6640 |
||
+ | |931 |
||
| |
| |
||
+ | |hin-ben: ~39.5%<br> |
||
− | | |
||
+ | ben-hin: ~34.0%<br> |
||
− | | |
||
+ | ben: ~69.9% |
||
− | | |
||
| |
| |
||
| |
| |
||
Line 102: | Line 93: | ||
| |
| |
||
|Manual disambiguation of Hindi texts |
|Manual disambiguation of Hindi texts |
||
+ | |6687 |
||
+ | |917 |
||
| |
| |
||
+ | |hin-ben: ~44.5%<br> |
||
− | | |
||
+ | ben-hin: ~39.3%<br> |
||
− | | |
||
+ | ben: ~70.0% |
||
− | | |
||
| |
| |
||
| |
| |
||
Line 114: | Line 107: | ||
| |
| |
||
| |
| |
||
− | |apertium-ben: |
+ | |apertium-ben:<br> |
− | ordinals |
+ | ordinals<br> |
− | Manual adding of most often names (150), adjectives (100), verbs (50) |
+ | Manual adding of most often names (150), adjectives (100), verbs (50) |
− | | |
+ | |ordinals<br> |
− | Most often names (150), adjectives (100), verbs (50) |
+ | Most often names (150), adjectives (100), verbs (50)<br> |
Word selection rules |
Word selection rules |
||
+ | |6764 |
||
+ | |1136 |
||
| |
| |
||
+ | |hin-ben: ~63.2%<br> |
||
− | | |
||
+ | ben-hin: ~43.4%<br> |
||
− | | |
||
+ | ben: ~71.0% |
||
− | | |
||
| |
| |
||
| |
| |
||
Line 133: | Line 128: | ||
| |
| |
||
|Adding words from available data |
|Adding words from available data |
||
− | |Adding words from available data |
+ | |Adding words from available data<br> |
Word selection rules |
Word selection rules |
||
+ | |6984 |
||
+ | |1328 |
||
| |
| |
||
+ | |hin-ben: ~65.5%<br> |
||
− | | |
||
+ | ben-hin: ~47.6%<br> |
||
− | | |
||
+ | ben: ~71.8% |
||
− | | |
||
| |
| |
||
| |
| |
||
Line 148: | Line 145: | ||
|hin - ben ~50% |
|hin - ben ~50% |
||
|Adding words from available data |
|Adding words from available data |
||
− | |Adding words from available data |
+ | |Adding words from available data<br> |
Word selection rules |
Word selection rules |
||
+ | |7075 |
||
+ | |1670 |
||
| |
| |
||
+ | |hin-ben: ~67.6%<br> |
||
− | | |
||
+ | ben-hin: ~49.6%<br> |
||
− | | |
||
+ | ben: ~72.0% |
||
− | | |
||
| |
| |
||
| |
| |
||
Line 163: | Line 162: | ||
| |
| |
||
|Morphological disambiguation rules for Hindi |
|Morphological disambiguation rules for Hindi |
||
− | |Transfer rules |
+ | |Transfer rules<br> |
Testvoc: closed categories, adv |
Testvoc: closed categories, adv |
||
+ | |7078 |
||
+ | |1718 |
||
| |
| |
||
+ | |hin-ben: ~67.8%<br> |
||
− | | |
||
+ | ben-hin: ~49.7%<br> |
||
− | | |
||
+ | ben: ~72.0% |
||
− | | |
||
| |
| |
||
| |
| |
||
Line 178: | Line 179: | ||
| |
| |
||
|Morphological disambiguation rules for Hindi |
|Morphological disambiguation rules for Hindi |
||
− | |Transfer rules |
+ | |Transfer rules<br> |
Testvoc: adj |
Testvoc: adj |
||
| |
| |
||
Line 193: | Line 194: | ||
| |
| |
||
|Morphological disambiguation rules for Hindi |
|Morphological disambiguation rules for Hindi |
||
− | |Transfer rules |
+ | |Transfer rules<br> |
Testvoc: n |
Testvoc: n |
||
| |
| |
||
Line 208: | Line 209: | ||
|hin - ben ~65% |
|hin - ben ~65% |
||
|Morphological disambiguation rules for Hindi |
|Morphological disambiguation rules for Hindi |
||
− | | |
+ | |Transfer rules<br> |
− | Testvoc: vblex |
+ | Testvoc: vblex |
− | | |
||
− | | |
||
− | | |
||
− | | |
||
− | | |
||
− | | |
||
− | |- |
||
− | |11 |
||
− | |07/25/2021 |
||
− | |10000 |
||
− | |~80% |
||
⚫ | |||
− | |Adding words from available data |
||
− | |Adding words from available data |
||
− | Word selection rules |
||
| |
| |
||
| |
| |
Latest revision as of 18:37, 3 August 2021
Workplan | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Week | Dates | Goals | Fulfilled | |||||||||
Bidix
(excluding proper names) |
Coverage | WER | Monlingual dictionaries | Bilingual dictionary / repository | ben monodix
(excl. proper names) |
Bidix
(excl. proper names) |
Non-WP
coverage (%) |
WP
coverage (%) |
WER
(%) |
Testvoc
(clean %) --- Manual disamb. (words) | ||
1 | 06/13/2021 | 500 | apertium-ben: Main paradigms: n, adj, vblex, vbser, adv, pr, post, cnjcoo, cnjsub, cnjadv, det, num, prn |
pr, post, cnjcoo, cnjsub, cnjsub, num, det, prn | 6603 | 756 | hin-ben: ~33.3% ben-hin: ~20.4% |
|||||
2 | 06/20/2021 | 500 | preparing scripts for adding words from the available free data into the dictionaries | 6637 | 895 | hin-ben: ~40.1% ben-hin: ~29.7% |
||||||
3 | 06/27/2021 | 500 | Key transfer rules hin > ben to avoid # Eventually: the same for ben > hin |
6640 | 931 | hin-ben: ~39.5% ben-hin: ~34.0% |
||||||
4 | 07/04/2021 | 500 | Manual disambiguation of Hindi texts | 6687 | 917 | hin-ben: ~44.5% ben-hin: ~39.3% |
||||||
5 | 07/11/2021 | 800 | apertium-ben: ordinals |
ordinals Most often names (150), adjectives (100), verbs (50) |
6764 | 1136 | hin-ben: ~63.2% ben-hin: ~43.4% |
|||||
6 | 07/18/2021 | 5000 | Adding words from available data | Adding words from available data Word selection rules |
6984 | 1328 | hin-ben: ~65.5% ben-hin: ~47.6% |
|||||
7 | 07/25/2021 | 10000 | ~80% | hin - ben ~50% | Adding words from available data | Adding words from available data Word selection rules |
7075 | 1670 | hin-ben: ~67.6% ben-hin: ~49.6% |
|||
8 | 08/01/2021 | 10100 | Morphological disambiguation rules for Hindi | Transfer rules Testvoc: closed categories, adv |
7078 | 1718 | hin-ben: ~67.8% ben-hin: ~49.7% |
|||||
9 | 08/08/2021 | 10200 | Morphological disambiguation rules for Hindi | Transfer rules Testvoc: adj |
||||||||
10 | 08/15/2021 | 10300 | Morphological disambiguation rules for Hindi | Transfer rules Testvoc: n |
||||||||
11 | 08/22/2021 | 10400 | ~80% | hin - ben ~65% | Morphological disambiguation rules for Hindi | Transfer rules Testvoc: vblex |