Difference between revisions of "User:Kamush/GSoC2021ProgresReport"

From Apertium
Jump to navigation Jump to search
Line 17: Line 17:
 
May 17-June 5
 
May 17-June 5
 
|
 
|
* Installing Apertium
+
* Installed Apertium
 
* Initialize kaz-uzb pair
 
* Initialize kaz-uzb pair
 
* Collect data in both languages
 
* Collect data in both languages
Line 25: Line 25:
 
| -
 
| -
 
|
 
|
* Installing Apertium and necessary tools;
+
* Installed Apertium and necessary tools;
  +
* Cloned Apertium-kaz and apertium-uzb, initialized the kaz-uzb pair
* Send the first PR that can translate a small sample text;
 
  +
* Translated a small sample text;
* Extract Uzbek and Kazakh wiki corpus;
 
* Collect Uzbek and Kazakh web(non-wiki) corpus;
+
* Extracted Uzbek and Kazakh wiki corpus;
* Collect Kazakh-Uzbek dictionary and parallel corpora;
+
* Collected Kazakh-Uzbek dictionary and parallel corpora;
 
|-
 
|-
 
|Week 1
 
|Week 1
Line 39: Line 39:
 
| -
 
| -
 
|
 
|
* Go through all Uzbek stems in uzb.lexc;
+
* Went through all Uzbek and Kazakh stems;
  +
* Initialized the pair with apertium-recursive;
* Clean(deduplicate) and correct uzb stems;
 
  +
* Collected dictionaries from other pairs for crossdic;
* Improve Uzbek lexicon;
 
  +
* Obtained crossdic results from two ways.
 
|-
 
|-
 
|Week 2
 
|Week 2
Line 51: Line 52:
 
| -
 
| -
 
|
 
|
* Start adding bilingual dictionary elements;
+
* Started adding bilingual dictionary elements;
 
|-
 
|-
 
|Week 3
 
|Week 3
Line 61: Line 62:
 
| -
 
| -
 
|
 
|
* Expand bilingual dictionary;
+
* Expanded bilingual dictionary;
* Lexical selection rules;
+
* Started sample Lexical selection rules;
 
|-
 
|-
 
|Week 4
 
|Week 4
Line 72: Line 73:
 
| -
 
| -
 
|
 
|
* Expand bilingual dictionary;
+
* Expanded bilingual dictionary more;
* Lexical selection rules;
 
 
|-
 
|-
 
|Week 5
 
|Week 5
Line 83: Line 83:
 
| -
 
| -
 
|
 
|
 
* Expanded bilingual dictionary;
* Test the kaz-uzb translator;
 
 
* Collected texts for lexical selection rules, tried a small script;
* Expand the Uzbek lexicon with missing words;
 
  +
* Translated a Big Kazkh text into Uzbek for better WER/PER calculation.
* Expand bilingual dictionary;
 
* Expand lexical selection rules;
 
 
|-
 
|-
 
|Week 6
 
|Week 6
Line 95: Line 94:
 
| -
 
| -
 
| -
 
| -
|
+
| -
* Work more on transfer rules;
 
* More bilingual dictionary;
 
* More lexical section rules;
 
*
 
 
|-
 
|-
 
|Week 7
 
|Week 7
Line 108: Line 103:
 
| -
 
| -
 
| -
 
| -
|
+
| -
* Test the kaz-uzb translator;
 
* Extend the Uzbek lexicon with missing words;
 
* Extend the Kazakh lexicon with missing words;
 
* Extend bilingual dictionary;
 
* Add more lexical selection rules;
 
 
|-
 
|-
 
|Week 8
 
|Week 8
Line 122: Line 112:
 
| -
 
| -
 
| -
 
| -
|
+
| -
* Add words, rules;
 
* Work on transfer rules;
 
* Start the testvoc;
 
 
|-
 
|-
 
|Week 9
 
|Week 9
Line 134: Line 121:
 
| -
 
| -
 
| -
 
| -
|
+
| -
* Add words, rules;
 
* Transfer rules kaz-uzb;
 
* Testvoc kaz-uzb
 
 
|-
 
|-
 
|Week 10
 
|Week 10
Line 146: Line 130:
 
| -
 
| -
 
| -
 
| -
|
+
| -
* Test the kaz-uzb translator;
 
* Check the transfer rules;
 
* Check the testvoc;
 
* Write the final report;
 
 
|-
 
|-
 
|}
 
|}

Revision as of 10:51, 11 July 2021

Progress Report

Time Period Goal Bidix Coverage WER,PER Details/Comments
kaz-uzb kaz-uzb kaz-uzb uzb-kaz
Community Bonding Period

May 17-June 5

  • Installed Apertium
  • Initialize kaz-uzb pair
  • Collect data in both languages
- - - -
  • Installed Apertium and necessary tools;
  • Cloned Apertium-kaz and apertium-uzb, initialized the kaz-uzb pair
  • Translated a small sample text;
  • Extracted Uzbek and Kazakh wiki corpus;
  • Collected Kazakh-Uzbek dictionary and parallel corpora;
Week 1

June 6-12

Make Uzbek better - - - -
  • Went through all Uzbek and Kazakh stems;
  • Initialized the pair with apertium-recursive;
  • Collected dictionaries from other pairs for crossdic;
  • Obtained crossdic results from two ways.
Week 2

June 13-19

Expand bilingual dictionary - - - -
  • Started adding bilingual dictionary elements;
Week 3

June 20-26

More on .dix and .lrx - - - -
  • Expanded bilingual dictionary;
  • Started sample Lexical selection rules;
Week 4

June 27-July 3

Focus on transfer rules - - - -
  • Expanded bilingual dictionary more;
Week 5

July 4-10

Test translator and expand more - - - -
  • Expanded bilingual dictionary;
  • Collected texts for lexical selection rules, tried a small script;
  • Translated a Big Kazkh text into Uzbek for better WER/PER calculation.
Week 6

July 11-17

Focus more on transfer rules - - - - -
Week 7

July 18-24

Test the kaz-uzb translator - - - - -
Week 8

July 25-31

Focus on transfer rules - - - - -
Week 9

August 1-7

Focus on testvoc - - - - -
Week 10

August 8-14

Finalize work - - - - -