Difference between revisions of "Ces-Rus/Workplan"

From Apertium
Jump to navigation Jump to search
Line 7: Line 7:
 
! Objective
 
! Objective
 
! Measures/targets
 
! Measures/targets
! Comments
 
 
|-
 
|-
 
| 0
 
| 0
Line 13: Line 12:
 
| <s>Find resources for improving the bilingual dictionary. Work on expanding the bil-Dictionary's coverage. Study Czech grammar. Write Bash scripts for easy alteration, compiling, and measurement. Parse UD-Czech (Script already exists)</s>
 
| <s>Find resources for improving the bilingual dictionary. Work on expanding the bil-Dictionary's coverage. Study Czech grammar. Write Bash scripts for easy alteration, compiling, and measurement. Parse UD-Czech (Script already exists)</s>
 
|
 
|
* Create a text corpus for various testing phases and progress measurements: 8 Basic(200 words), 4 each for each category (Wikipedia, News, chat/blogs/forums)(300 words), 6 advanced(500 words)
+
<s>* Create a text corpus for various testing phases and progress measurements: 8 Basic(200 words), 4 each for each category (Wikipedia, News, chat/blogs/forums)(300 words), 6 advanced(500 words)
*Improve bidix size to 45% clean in testvoc for all categories
+
*Improve bidix size to 45% clean in testvoc for all categories</s>
| This will involve primarily preparing materials for working on the translator. Also, despite already having a solid foundation in Czech grammar, I would still like to improve upon it before I begin.
 
 
|-
 
|-
 
| 1
 
| 1
Line 21: Line 19:
 
| Work on expanding the bilingual dictionary, and monolingual dictionaries where necessary. Create a new tagger for Czech. Test tagger.
 
| Work on expanding the bilingual dictionary, and monolingual dictionaries where necessary. Create a new tagger for Czech. Test tagger.
 
|
 
|
*Improve bidix size to 60% clean in testvoc for all categories
+
<s>*Improve bidix size to 60% clean in testvoc for all categories
 
* Improve bidix covrage 50%
 
* Improve bidix covrage 50%
* Achieve a WER < 20% for 1 basic text
+
* Achieve a WER < 20% for 1 basic text</s>
| The bilingual dictionary has a paucity of entries at the moment. On top of this, Apertium needs a new POS-tagger for Czech, which in its current state leads to unnecessary translation errors.
 
 
|-
 
|-
 
| 2
 
| 2
Line 33: Line 30:
 
* Improve bidix coverage to 60%
 
* Improve bidix coverage to 60%
 
* Achieve a WER < 20% for 2 basic texts
 
* Achieve a WER < 20% for 2 basic texts
| The expansion of the bil-dictionary and lexical selection is obviously going to be an ongoing task, as it is responsible for a majority of errors at the moment.
 
 
|-
 
|-
 
| 3
 
| 3
Line 42: Line 38:
 
* Improve bidix coverage to 65%
 
* Improve bidix coverage to 65%
 
* Achieve a WER < 15% for 2 basic texts
 
* Achieve a WER < 15% for 2 basic texts
| As far as transfer rules, here it will be important to focus on the transfer of grammatical tags, and specifically on the treatment of reflexive verb transfer(i.e. verbs which are reflexive in Czech, but not in Russian). This could potentially be solved with a specific tag in the dictionary.
 
 
|-
 
|-
 
| 4
 
| 4
Line 50: Line 45:
 
* Improve bidix Coverage to 70%
 
* Improve bidix Coverage to 70%
 
* Achieve a WER < 10% on 2 basic texts
 
* Achieve a WER < 10% on 2 basic texts
| Prepositions often have multiple possible translations. The subject is less mandatory in Czech than in Russian. The cases, while a minor part, provide more fluency to translations.
 
 
|-
 
|-
 
|5
 
|5
Line 60: Line 54:
 
* Achieve WER < 10% on 1 basic text
 
* Achieve WER < 10% on 1 basic text
 
* Achieve WER < 20% on 1 advanced text
 
* Achieve WER < 20% on 1 advanced text
| Many common constructions are currently absent in the dictionary and in the transfer rules. Seeing as how there is not a huge amount to cover, at this point I would like to access and discuss possibly working on the Rus -> Ces direction. Because of uncertainty, the following plans will assume continual work on the Ces -> Rus direction.
 
 
|-
 
|-
 
|6
 
|6
Line 68: Line 61:
 
* Improve bidix coverage to 80%
 
* Improve bidix coverage to 80%
 
* Achieve WER < 20% on texts from Wikipedia (4 texts)
 
* Achieve WER < 20% on texts from Wikipedia (4 texts)
| This week will be based on evaluations of the translator's grammatical performance as a whole. I will focus solely on improving loopholes in the transfer rules of the translator.
 
 
|-
 
|-
 
|7
 
|7
Line 75: Line 67:
 
|
 
|
 
* Achieve WER < 20% on texts from News (4 texts)
 
* Achieve WER < 20% on texts from News (4 texts)
| It is important to further test the abilities of the translator on certain topics: ''Academic writing, literature, law, ect.'', and assess it's capabilities individually in these areas. One of the obvious main goals is coverage, and testing on specific topics can provide insight into deficiencies within the translator both lexical and grammatical. It is pertinent to focus effort on specific areas that will be the most beneficial for potential users.
 
 
|-
 
|-
 
|8
 
|8
Line 83: Line 74:
 
* Improve bedix coverage to 85%
 
* Improve bedix coverage to 85%
 
* Achieve WER < 20% on texts from online chat/blogs/forums (4 texts)
 
* Achieve WER < 20% on texts from online chat/blogs/forums (4 texts)
| After identifying what to work on, this week I will continue to develop the dictionaries with terms common to selected topics. Here I suspect that there will also be room to expand on transfer rules, for overlooked grammatical errors which arise in the testing process.
 
 
|-
 
|-
 
|9
 
|9
Line 90: Line 80:
 
|
 
|
 
* Achieve WER < 15% on texts from all categories
 
* Achieve WER < 15% on texts from all categories
| Here I will identify what areas of the dictionary and transfer rules still need to be developed. The changes that need to be implemented will largely rely on the evaluation.
 
 
|-
 
|-
 
|10
 
|10
Line 97: Line 86:
 
|
 
|
 
* Achieve WER < 15% on 2 advanced texts
 
* Achieve WER < 15% on 2 advanced texts
|Testing begins, and work on finding the key areas of improvement for the translator.
 
 
|-
 
|-
 
|11
 
|11
Line 105: Line 93:
 
* Improve bidix coverage to 90%
 
* Improve bidix coverage to 90%
 
* Achieve WER < 10% on 2 advanced texts
 
* Achieve WER < 10% on 2 advanced texts
| This week is devoted to fixing bugs and the key problems and weak points identified in the previous week
 
 
|-
 
|-
 
|12
 
|12
Line 112: Line 99:
 
|
 
|
 
* Achieve WER < 10% on all previous advanced texts and 1 new advanced texts (6 texts)
 
* Achieve WER < 10% on all previous advanced texts and 1 new advanced texts (6 texts)
| Similar to the previous week. The last month is mostly focused on utilising time as efficiently as possible to create a quality finished product.
 
 
|-
 
|-
 
|13
 
|13
Line 119: Line 105:
 
|
 
|
 
Done!
 
Done!
| Final week. Everything should already be ready.
 
 
|-
 
|-
 
|}
 
|}

Revision as of 05:47, 5 June 2017

Timeline

Week Dates Objective Measures/targets
0 now - 30/05 Find resources for improving the bilingual dictionary. Work on expanding the bil-Dictionary's coverage. Study Czech grammar. Write Bash scripts for easy alteration, compiling, and measurement. Parse UD-Czech (Script already exists)

* Create a text corpus for various testing phases and progress measurements: 8 Basic(200 words), 4 each for each category (Wikipedia, News, chat/blogs/forums)(300 words), 6 advanced(500 words)

  • Improve bidix size to 45% clean in testvoc for all categories
1 30/05 - 04/06 Work on expanding the bilingual dictionary, and monolingual dictionaries where necessary. Create a new tagger for Czech. Test tagger.

*Improve bidix size to 60% clean in testvoc for all categories

  • Improve bidix covrage 50%
  • Achieve a WER < 20% for 1 basic text
2 05/06 - 11/06 Continue to expand bil-dictionary where needed(constant task), and work on lexical selection. Work on rudimentary transfer rules, as well as begin on transfer rules for verbs.
  • Improve bidix size to 80% clean in testvoc for all categories
  • Improve bidix coverage to 60%
  • Achieve a WER < 20% for 2 basic texts
3 12/06 - 18/06 Continue work on transfer rules for verbs.
  • Improve bidix size to 100% clean in testvoc for all categories
  • Improve bidix coverage to 65%
  • Achieve a WER < 15% for 2 basic texts
4 19/06 - 25/06 Work on specific transfer rules for prepositions and subject addition/placement. Create rules for ancillary Russian cases in the mono-dictionary (2nd prepositional(locative), partitive, ect.).
  • Improve bidix Coverage to 70%
  • Achieve a WER < 10% on 2 basic texts
5 26/06 - 02/07 Write Dictionary entries transfer rules for specific grammatical constructions such as: aby, kdyby, to... ani..., ect. Start/Finish evaluation #1.
  • Checkpoint: Measure progress of the project, and discuss the feasibility of working on Rus -> Ces. Final check on previously composed transfer rules from weeks 3-5. Test on texts, and try to "break" the translator.
  • Improve bidix coverage to 75%
  • Achieve WER < 10% on 1 basic text
  • Achieve WER < 20% on 1 advanced text
6 24/07 - 30/07 Add new and fix existing transfer rule issues identified in the previous week. Begin testing on thematic texts.
  • Improve bidix coverage to 80%
  • Achieve WER < 20% on texts from Wikipedia (4 texts)
7 03/07 - 09/07 Test the translator in these differing areas. Identify key places for improvement and begin working on them. Compile key terms for each topic.
  • Achieve WER < 20% on texts from News (4 texts)
8 10/07 - 16/07 Work on expanding the dictionaries in previously identified areas. Solve grammatical issues with transfer rules which arise in given thematic areas.
  • Improve bedix coverage to 85%
  • Achieve WER < 20% on texts from online chat/blogs/forums (4 texts)
9 17/07 - 23/07 Test performance increases in the selected topic areas. Ascertain what still needs to be improved. Work on fixes for issues. Start/Finish Evaluation #2
  • Achieve WER < 15% on texts from all categories
10 31/07 - 06/08 Test the performance of the translator as a whole. Identify problematic areas.
  • Achieve WER < 15% on 2 advanced texts
11 07/08 - 13/08 Bug fixes, correcting most problematic areas
  • Improve bidix coverage to 90%
  • Achieve WER < 10% on 2 advanced texts
12 14/08 - 20/08 Documentation (will try to do this gradually), final testing and bug fixes
  • Achieve WER < 10% on all previous advanced texts and 1 new advanced texts (6 texts)
13 08/21 - 08/29 Work on final evaluations and other bureaucratic necessities.

Done!