English and Kazakh/Work plan (GSOC 2014)
< English and Kazakh
Jump to navigation
Jump to search
Revision as of 14:50, 23 April 2014 by Francis Tyers (talk | contribs)
Corpora
Downloads
- SETimes: http://nlp.ffzg.hr/data/corpora/setimes/setimes.en-tr.txt.tgz
- EuroParl: http://www.statmt.org/europarl/v7/es-en.tgz
- NewsCommentary: http://www.statmt.org/wmt13/training-parallel-nc-v8.tgz
- Wikipedia: http://dumps.wikimedia.org/enwiki/20140402/enwiki-20140402-pages-articles.xml.bz2
Setting up corpora
Coverage targets
Date | Corpus | Target reached |
Notes | |||
---|---|---|---|---|---|---|
SETimes | EuroParl | Wikipedia | NewsCommentary | |||
23-04-2014 | 72.90% | ? | ? | ? | √ | Initial value |
30-04-2014 | ||||||
07-05-2014 | ||||||
22-08-2014 | 90.00% | 90.00% | 90.00% | 90.00% | Final target |