Difference between revisions of "Crimean Tatar and Turkish/Work plan"

From Apertium
Jump to navigation Jump to search
Line 3: Line 3:
 
* '''a bidirectional Crimean Tatar-Turkish translator for translating Wikipedia articles''', with:
 
* '''a bidirectional Crimean Tatar-Turkish translator for translating Wikipedia articles''', with:
 
** >90% [[Calculating coverage|bidix-trimmed coverage]] on both Wikipedias,
 
** >90% [[Calculating coverage|bidix-trimmed coverage]] on both Wikipedias,
** single-stem-per-lexicon-testvoc and [[Testvoc#Corpus testvoc|Wikipedia-corpus-testvoc]] clean in both directions,
+
** [[Testvoc#Corpus testvoc|Wikipedia-corpus-testvoc]] and single-stem-per-lexicon-testvoc clean in both directions,
 
** [[WER]] < 25% in both directions.
 
** [[WER]] < 25% in both directions.
   
 
{|class=wikitable
 
{|class=wikitable
 
|-
 
|-
!rowspan="2"| Week !!rowspan="2"| Dates !!colspan="2"| Target !! !!colspan="2"| Achieved !!rowspan="2"| Evaluation !!rowspan="2"| Notes
+
!rowspan="2"| Week !!rowspan="2"| Dates !!colspan="3"| Target !! !!colspan="3"| Achieved !!rowspan="2"| Evaluation
 
|-
 
|-
! crh-tur cov. !! tur-crh cov. !! !! crh-tur cov. !! tur-crh cov.
+
! crh-tur cov. !! tur-crh cov. !! testvoc !! !! crh-tur cov. !! tur-crh cov. !! testvoc
 
|-
 
|-
 
| 1 || 07/06&mdash;11/06
 
| 1 || 07/06&mdash;11/06
| 65% || 65% || || || || ||
+
| 65% || 65% || pronouns || || || || ||
 
|-
 
|-
 
|-
 
|-
 
| 12 || 21/08&mdash;27/08
 
| 12 || 21/08&mdash;27/08
| 90% || 90% || || || || ||
+
| 90% || 90% || all categories || || || || ||
   
 
|}
 
|}
   
  +
Requirements for testvoc in week 1:
  +
  +
# all pronouns from Wikipedia corpora are translated without debug symbols
  +
# all pronouns transducers generate must pass without debug symbols (this is less important, and only to focus on if done with 1)
  +
  +
To achieve 1:
  +
  +
* analyse corpora with crh-morph/tur-morph mode
  +
* grep pronouns
  +
* make sure they pass through the rest of the pipeline without getting @ or #
  +
  +
To achieve 2:
  +
  +
* in 'Root' lexicon of the .lexc files, comment out everything except Pronouns
  +
* generate pronouns with <code>hfst-fst2string crh/tur.automorf.hfst</code>
  +
* make sure they pass through the rest of the pipeline without getting @ or #
  +
  +
We don't want to spend too much time on forms which are probably over-generated by the transducers. This is the reason why we focus on 1 first.
  +
  +
----
  +
  +
This plan below might change later.
   
 
{|class=wikitable
 
{|class=wikitable

Revision as of 00:56, 7 June 2017

What selimcan expects:

  • a bidirectional Crimean Tatar-Turkish translator for translating Wikipedia articles, with:
Week Dates Target Achieved Evaluation
crh-tur cov. tur-crh cov. testvoc crh-tur cov. tur-crh cov. testvoc
1 07/06—11/06 65% 65% pronouns
12 21/08—27/08 90% 90% all categories

Requirements for testvoc in week 1:

  1. all pronouns from Wikipedia corpora are translated without debug symbols
  2. all pronouns transducers generate must pass without debug symbols (this is less important, and only to focus on if done with 1)

To achieve 1:

  • analyse corpora with crh-morph/tur-morph mode
  • grep pronouns
  • make sure they pass through the rest of the pipeline without getting @ or #

To achieve 2:

  • in 'Root' lexicon of the .lexc files, comment out everything except Pronouns
  • generate pronouns with hfst-fst2string crh/tur.automorf.hfst
  • make sure they pass through the rest of the pipeline without getting @ or #

We don't want to spend too much time on forms which are probably over-generated by the transducers. This is the reason why we focus on 1 first.


This plan below might change later.

Week Dates Coverage Achieved Evaluation
3 22nd May — 28th May 40% 43.9%
* Add all non-inflecting words
* Finish challenge text (no *,#)
* Do baseline evaluation (WER)
Official start
4 29th May — 4th June 40%
* Break
5 5th June — 11th June 65%
* ?
6 12th June — 18th June 70%
* ?
* ?
7 19th June — 25th June 80%
Phase 1 evaluation
Deliverable: All closed classes + numerals testvoc clean
8 26th June — 2nd July 84%
* ?
* ?
9 3rd July — 9th July 82%
* ?
10 10th July — 16th July 84%
* ?
* ?
11 17th July — 23rd July 86%
Phase 2 evaluation
Deliverable: Nouns, adjectives testvoc clean
* ?
12 24th July — 30th July 88%
* ?
13 1st August — 6th August 89%
* ?
14 7th August — 13th August 90%
* ?
15 14th August — 20th August 91%
* ?
16 21th August — 27th August 92%
Final evaluation
Final deliverable: Full MT system, testvoc clean.
* Evaluation
* Write paper
17 28th August — 3rd September
* Write paper
18 4th September — 6th September
* Write paper