Difference between revisions of "Talk:Apertium-quality"
Jump to navigation
Jump to search
m (moved Somewhere you'll never find! to Talk:Apertium-quality) |
|||
(6 intermediate revisions by 4 users not shown) | |||
Line 32: | Line 32: | ||
== Todo == |
== Todo == |
||
# Complete the todo. |
|||
===Tests and stats=== |
===Tests and stats=== |
||
====Monolingual corpus==== |
|||
* dicts: Coverage |
* dicts: Coverage |
||
Line 46: | Line 43: | ||
* system: translation speed (per module?) |
* system: translation speed (per module?) |
||
* dicts: (bilingual) mean fertility -- e.g. number of translations per SL/TL word |
* dicts: (bilingual) mean fertility -- e.g. number of translations per SL/TL word |
||
* rules: for disambiguation, if there is cg + apertium tagger, how much work does CG do and how much does apertium-tagger do ? (count LU input to CG, LU output from CG and LU output form apertium-tagger) |
|||
====Tests==== |
|||
* dictionary tests (e.g. hfst-tester) |
* dictionary tests (e.g. hfst-tester) |
||
Line 53: | Line 51: | ||
* pending tests |
* pending tests |
||
* testvoc |
* testvoc |
||
* testvoc+bidixvoc (some language pairs have bilingual dictionaries with more than one translation for a given SL word, at the moment testvoc will only ever test the default translation. testvoc+bidixvoc will test them all) |
|||
* generation test |
* generation test |
||
* corpus test |
* corpus test |
||
====Parallel corpus==== |
|||
* WER, PER, BLEU against reference |
* WER, PER, BLEU against reference |
||
====Graphs==== |
|||
* coverage over time |
* coverage over time |
||
Line 69: | Line 68: | ||
* WER/PER/BLEU over time |
* WER/PER/BLEU over time |
||
* percentage of regression tests passed over time |
* percentage of regression tests passed over time |
||
== Feature Requests == |
|||
* Cache the wiki Regression test web page so that we can test when the wiki is offline or when stuck in airports with expensive wifi |
|||
== Extensions == |
|||
=== Sanity Tests === |
|||
Simple allow the use of a sanity_tests directory in a dictionary directory, and if found, run any scripts found in there, storing their name and return value in the quality-stats.xml. This allows the scripts to be in any language given they return non-zero return values on error. |
|||
Possible tests: |
|||
* Superblank order test |
Latest revision as of 18:20, 21 August 2011
Contents
Menu[edit]
Getting Started[edit]
Technical Documentation[edit]
Notes[edit]
Community Bonding Period[edit]
Week 1 — 25th April[edit]
- Must demonstrate that setuptools can allow a prefix-based installation for non-root users before end of bonding period
- Emailed Francis a written proof of setuptools adequately meeting expectations and requirements.
Week 2 — 2nd May[edit]
- Converted LaTeX source to Wikimedia format, and placed below this section for annotation.
- Completed example regtest.py
- Added Installation and Usage pages, uploaded initial files.
Week 3 — 9th May[edit]
- Fixed a Python regression-related bug in regtest.py
- Fixed a personal regression in setup.py
- Plan to add autogen.sh for config
- Consider using virtualenv for rootless installations
- Fixed installation instructions
- SVN and git now synched
Coding Period[edit]
Week 1 — 23rd May[edit]
- Completed autogen.sh
Todo[edit]
Tests and stats[edit]
Monolingual corpus[edit]
- dicts: Coverage
- rules: Rule counting (CG, apertium-transfer)
- rules: number of rules
- dicts: number of entries (sl mono, sl-tl, tl mono) -- lttoolbox/hfst
- dicts: (monolingual) mean ambiguity
- system: translation speed (per module?)
- dicts: (bilingual) mean fertility -- e.g. number of translations per SL/TL word
- rules: for disambiguation, if there is cg + apertium tagger, how much work does CG do and how much does apertium-tagger do ? (count LU input to CG, LU output from CG and LU output form apertium-tagger)
Tests[edit]
- dictionary tests (e.g. hfst-tester)
- regression tests
- pending tests
- testvoc
- testvoc+bidixvoc (some language pairs have bilingual dictionaries with more than one translation for a given SL word, at the moment testvoc will only ever test the default translation. testvoc+bidixvoc will test them all)
- generation test
- corpus test
Parallel corpus[edit]
- WER, PER, BLEU against reference
Graphs[edit]
- coverage over time
- number of rules over time
- mean ambiguity over time
- number of dict entries over time
- translation speed over time
- WER/PER/BLEU over time
- percentage of regression tests passed over time
Feature Requests[edit]
- Cache the wiki Regression test web page so that we can test when the wiki is offline or when stuck in airports with expensive wifi
Extensions[edit]
Sanity Tests[edit]
Simple allow the use of a sanity_tests directory in a dictionary directory, and if found, run any scripts found in there, storing their name and return value in the quality-stats.xml. This allows the scripts to be in any language given they return non-zero return values on error.
Possible tests:
- Superblank order test