Difference between revisions of "User:Ilnar.salimzyan/GSoC2014/Application"

From Apertium
Jump to navigation Jump to search
Line 11: Line 11:
* acceptance tests for an Aperitum MT system are: regression tests on the wiki, corpus test (WER and number of [*@#] errors) and testvoc. Unit testing an Apertium MT system is testing its modules (modes). Figure out how to unit test each module.
* acceptance tests for an Aperitum MT system are: regression tests on the wiki, corpus test (WER and number of [*@#] errors) and testvoc. Unit testing an Apertium MT system is testing its modules (modes). Figure out how to unit test each module.
** one should be able to run his tests without the internet connection. Keeping a copy of the 'regression tests' html page in the /dev solves this problem, but it doesn't allow us to add new tests while not having internet access. One way to deal with that is to have a local copy of regression tests in the wiki format, so that if you add new test while flying over the atlantic, you can copy paste them to the wiki page of the pair later.
** one should be able to run his tests without the internet connection. Keeping a copy of the 'regression tests' html page in the /dev solves this problem, but it doesn't allow us to add new tests while not having internet access. One way to deal with that is to have a local copy of regression tests in the wiki format, so that if you add new test while flying over the atlantic, you can copy paste them to the wiki page of the pair later.

== Community-bonding period ==

'''Deliverables 0:'''

# testvoc script(s) which doesn't take forever to run (consider footnote #5 in the proposal)
# ocr'd public domain dictionary
# parallel corpus in /corpa is expanded with texts which represent domains the system could potentially be applied to (500 sentences?)


[[Category:GSoC_2014_Student_proposals|Ilnar.salimzyan]]
[[Category:GSoC_2014_Student_proposals|Ilnar.salimzyan]]

Revision as of 23:44, 24 April 2014

You can find my proposal for GSoC 2014 here:

Post-application period

  • work on the 'James and Mary' translation
    • get rid of the debugging symbols
    • get the baseline WER
  • get permission to use one of the modern government-funded Tatar-Russian dictionaries under a free license and digitize it or fall back to one of the dictionaries in the public domain and scan that
  • read documentation on chunking based-transfer and papers describing other Apertium pairs for distant languages
  • acceptance tests for an Aperitum MT system are: regression tests on the wiki, corpus test (WER and number of [*@#] errors) and testvoc. Unit testing an Apertium MT system is testing its modules (modes). Figure out how to unit test each module.
    • one should be able to run his tests without the internet connection. Keeping a copy of the 'regression tests' html page in the /dev solves this problem, but it doesn't allow us to add new tests while not having internet access. One way to deal with that is to have a local copy of regression tests in the wiki format, so that if you add new test while flying over the atlantic, you can copy paste them to the wiki page of the pair later.

Community-bonding period

Deliverables 0:

  1. testvoc script(s) which doesn't take forever to run (consider footnote #5 in the proposal)
  2. ocr'd public domain dictionary
  3. parallel corpus in /corpa is expanded with texts which represent domains the system could potentially be applied to (500 sentences?)