Difference between revisions of "Talk:Apertium-quality"
Jump to navigation
Jump to search
(→Todo) |
|||
Line 32: | Line 32: | ||
== Todo == |
== Todo == |
||
# Complete the todo. |
|||
===Tests and stats=== |
===Tests and stats=== |
||
====Monolingual corpus==== |
|||
* dicts: Coverage |
* dicts: Coverage |
||
Line 47: | Line 44: | ||
* dicts: (bilingual) mean fertility -- e.g. number of translations per SL/TL word |
* dicts: (bilingual) mean fertility -- e.g. number of translations per SL/TL word |
||
====Tests==== |
|||
* dictionary tests (e.g. hfst-tester) |
* dictionary tests (e.g. hfst-tester) |
||
Line 56: | Line 53: | ||
* corpus test |
* corpus test |
||
====Parallel corpus==== |
|||
* WER, PER, BLEU against reference |
* WER, PER, BLEU against reference |
||
====Graphs==== |
|||
* coverage over time |
* coverage over time |
||
Line 71: | Line 68: | ||
== Feature Requests == |
== Feature Requests == |
||
* Cache the wiki Regression test web page so that we can test when the wiki is offline or when stuck in airports with expensive wifi |
* Cache the wiki Regression test web page so that we can test when the wiki is offline or when stuck in airports with expensive wifi |
||
== Extensions == |
|||
=== Sanity Tests === |
|||
Simple allow the use of a sanity_tests directory in a dictionary directory, and if found, run any scripts found in there, storing their name and return value in the quality-stats.xml. This allows the scripts to be in any language given they return non-zero return values on error. |
Revision as of 07:27, 17 June 2011
Contents
Menu
Getting Started
Technical Documentation
Notes
Community Bonding Period
Week 1 — 25th April
- Must demonstrate that setuptools can allow a prefix-based installation for non-root users before end of bonding period
- Emailed Francis a written proof of setuptools adequately meeting expectations and requirements.
Week 2 — 2nd May
- Converted LaTeX source to Wikimedia format, and placed below this section for annotation.
- Completed example regtest.py
- Added Installation and Usage pages, uploaded initial files.
Week 3 — 9th May
- Fixed a Python regression-related bug in regtest.py
- Fixed a personal regression in setup.py
- Plan to add autogen.sh for config
- Consider using virtualenv for rootless installations
- Fixed installation instructions
- SVN and git now synched
Coding Period
Week 1 — 23rd May
- Completed autogen.sh
Todo
Tests and stats
Monolingual corpus
- dicts: Coverage
- rules: Rule counting (CG, apertium-transfer)
- rules: number of rules
- dicts: number of entries (sl mono, sl-tl, tl mono) -- lttoolbox/hfst
- dicts: (monolingual) mean ambiguity
- system: translation speed (per module?)
- dicts: (bilingual) mean fertility -- e.g. number of translations per SL/TL word
Tests
- dictionary tests (e.g. hfst-tester)
- regression tests
- pending tests
- testvoc
- generation test
- corpus test
Parallel corpus
- WER, PER, BLEU against reference
Graphs
- coverage over time
- number of rules over time
- mean ambiguity over time
- number of dict entries over time
- translation speed over time
- WER/PER/BLEU over time
- percentage of regression tests passed over time
Feature Requests
- Cache the wiki Regression test web page so that we can test when the wiki is offline or when stuck in airports with expensive wifi
Extensions
Sanity Tests
Simple allow the use of a sanity_tests directory in a dictionary directory, and if found, run any scripts found in there, storing their name and return value in the quality-stats.xml. This allows the scripts to be in any language given they return non-zero return values on error.