Difference between revisions of "User:Kamush/GSoC2021ProgresReport"
Jump to navigation
Jump to search
Line 158: | Line 158: | ||
== DONE (NOTES) == |
== DONE (NOTES) == |
||
* Made a script to calculate DixCount, Coverage, WER/PER at once |
|||
* Calculating WER/PER: |
|||
** Apertium-eval-translator: |
|||
*** https://github.com/apertium/apertium-eval-translator |
|||
*** apertium-eval-translator -ref uzb.txt -test kaz-uzb.txt |
|||
** Parallel text: |
|||
*** JaM Story: |
|||
*** “Azamat va Oygul” in our case; |
|||
*** kaz-uzb/texts/[kaz|uzb].txt |
|||
** Astana article from Kazakh Wiki |
|||
* Calculated dix Coverage: |
|||
** (kaz-uzb/texts): bash ../../coverage-ltproc-new.sh ../docs/kaz-wiki.txt ../kaz-uzb.automorf.bin |
|||
** coverage: 26752095 / 32305875 (~0.82808761564266561423) |
|||
** remaining unknown forms: 5553780 |
|||
** kaz-wiki.txt Sun Jul 11 11:55:25 CEST 2021 |
|||
* Counting Dix elements: |
|||
** Apertium-Eval: dixcounter.py: |
|||
*** python3 ../dixcounter.py apertium-kaz-uzb.kaz-uzb.dix |
|||
** July 09: 11008 dix elements before deduplication. |
|||
* Translating kaz-uig.dix into kaz-uzb |
|||
* Translating kaz-kaa.dix into kaz-uzb.dix |
|||
** Removing those that were already done by crossdic |
|||
** Changing karakalpak translation into uzbek one by looking at both kazakh and karakalpak words |
|||
* Translating kaz-tur.dix into kaz-uzb.dix |
|||
** Removing those that were already done by crossdic |
|||
** Changing turkish translation into uzbek one by looking at both kazakh and turkish words |
|||
** Added 3200 more words from this. |
Revision as of 10:23, 12 July 2021
Contents
Progress Report
Time Period | Goal | Bidix | Coverage | WER,PER | Details/Comments | |
---|---|---|---|---|---|---|
kaz-uzb | kaz-uzb | kaz-uzb | uzb-kaz | |||
Community Bonding Period
May 17-June 5 |
|
426
(+426) |
43.80 % | - | - |
|
Week 1
June 6-12 |
Make Uzbek better | 2220
(+1794) |
52.11 % | - | - |
|
Week 2
June 13-19 |
Expand bilingual dictionary | 5262
(+3042) |
77.03 % | 74.77% / 67.57% | 64.23% / 54.37% |
|
Week 3
June 20-26 |
More on .dix and .lrx | 8543
(+3281) |
81.55 % | 74.77% / 67.57% | 64.23% / 54.37% |
|
Week 4
June 27-July 3 |
Focus on transfer rules | 9432
(+889) |
81.85 % | 74.77% / 67.57% | 64.23% / 54.37% |
|
Week 5
July 4-10 |
Test translator and expand more | 11008
(+1576) |
82.81% | 74.77% / 67.57% | 64.23% / 54.37% |
|
Week 6
July 11-17 |
Focus more on transfer rules | - | - | - | - | - |
Week 7
July 18-24 |
Test the kaz-uzb translator | - | - | - | - | - |
Week 8
July 25-31 |
Focus on transfer rules | - | - | - | - | - |
Week 9
August 1-7 |
Focus on testvoc | - | - | - | - | - |
Week 10
August 8-14 |
Finalize work | - | - | - | - | - |
TODO
- Writing lexical selection rules for uzb-kaz
- Transfer rules
- Testvoc
ONGOING
- Lexical selection rules for kaz-uzb
- Translating big Kaz text into Uzb
- For better WER/PER calculation
- For checking transfer rules
- Chose Nur-Sultan(capital city) article of Kazakh Wiki for that.
- Made 112 sentences out of Nur-Sultan.
- Collecting more bidix
DONE (NOTES)
- Made a script to calculate DixCount, Coverage, WER/PER at once
- Calculating WER/PER:
- Apertium-eval-translator:
- https://github.com/apertium/apertium-eval-translator
- apertium-eval-translator -ref uzb.txt -test kaz-uzb.txt
- Parallel text:
- JaM Story:
- “Azamat va Oygul” in our case;
- kaz-uzb/texts/[kaz|uzb].txt
- Astana article from Kazakh Wiki
- Apertium-eval-translator:
- Calculated dix Coverage:
- (kaz-uzb/texts): bash ../../coverage-ltproc-new.sh ../docs/kaz-wiki.txt ../kaz-uzb.automorf.bin
- coverage: 26752095 / 32305875 (~0.82808761564266561423)
- remaining unknown forms: 5553780
- kaz-wiki.txt Sun Jul 11 11:55:25 CEST 2021
- Counting Dix elements:
- Apertium-Eval: dixcounter.py:
- python3 ../dixcounter.py apertium-kaz-uzb.kaz-uzb.dix
- July 09: 11008 dix elements before deduplication.
- Apertium-Eval: dixcounter.py:
- Translating kaz-uig.dix into kaz-uzb
- Translating kaz-kaa.dix into kaz-uzb.dix
- Removing those that were already done by crossdic
- Changing karakalpak translation into uzbek one by looking at both kazakh and karakalpak words
- Translating kaz-tur.dix into kaz-uzb.dix
- Removing those that were already done by crossdic
- Changing turkish translation into uzbek one by looking at both kazakh and turkish words
- Added 3200 more words from this.