User:Ilnar.salimzyan/GSoC2014

Post-application period

work on the 'James and Mary' translation
- ~~get rid of the debugging symbols~~
- get the baseline WER
get permission to use one of the modern government-funded Tatar-Russian dictionaries under a free license and digitize it or fall back to one of the dictionaries in the public domain and scan that
read documentation on chunking based-transfer and papers describing other Apertium pairs for distant languages
- ~~Chunking~~, ~~Chunking: A full example~~, sme-nob paper, eus-eng paper, eng-kaz paper.
acceptance tests for an Aperitum MT system are: regression tests on the wiki, corpus test (WER and number of [*@#] errors) and testvoc. Unit testing an Apertium MT system is testing its modules (modes). Figure out how to unit test each module.
- one should be able to run his tests without the internet connection. Keeping a copy of the 'regression tests' html page in the /dev solves this problem, but it doesn't allow us to add new tests while not having internet access. One way to deal with that is to have a local copy of regression tests in the wiki format, so that if you add new test while flying over the atlantic, you can copy paste them to the wiki page of the pair later.

Deliverables 0:

testvoc script(s) which doesn't take forever to run (consider footnote #5 in the proposal)
ocr'd public domain dictionary
parallel corpus in /corpa is expanded with texts which represent domains the system could potentially be applied to (500 sentences?)