Difference between revisions of "Talk:Apertium-quality"
Jump to navigation
Jump to search
(→Todo) |
m (moved Somewhere you'll never find! to Talk:Apertium-quality) |
||
(11 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
= Menu = |
= Menu = |
||
==== Getting Started ==== |
|||
* [[Quality_control_framework/Installation|Installation]] |
* [[Quality_control_framework/Installation|Installation]] |
||
* [[Quality_control_framework/Usage|Usage]] |
* [[Quality_control_framework/Usage|Usage]] |
||
==== Technical Documentation ==== |
|||
* [[Quality_control_framework/Proposal|Proposal]] |
* [[Quality_control_framework/Proposal|Proposal]] |
||
* [[Quality_control_framework/XML_Schema|XML Schema]] |
|||
= Notes = |
= Notes = |
||
Line 29: | Line 32: | ||
== Todo == |
== Todo == |
||
# Complete the todo. |
|||
===Tests and stats=== |
===Tests and stats=== |
||
====Monolingual corpus==== |
|||
* dicts: Coverage |
* dicts: Coverage |
||
Line 40: | Line 40: | ||
* rules: number of rules |
* rules: number of rules |
||
* dicts: number of entries (sl mono, sl-tl, tl mono) -- lttoolbox/hfst |
* dicts: number of entries (sl mono, sl-tl, tl mono) -- lttoolbox/hfst |
||
* dicts: mean ambiguity |
* dicts: (monolingual) mean ambiguity |
||
* system: translation speed (per module?) |
* system: translation speed (per module?) |
||
* dicts: (bilingual) mean fertility -- e.g. number of translations per SL/TL word |
|||
* rules: for disambiguation, if there is cg + apertium tagger, how much work does CG do and how much does apertium-tagger do ? (count LU input to CG, LU output from CG and LU output form apertium-tagger) |
|||
====Tests==== |
|||
* dictionary tests (e.g. hfst-tester) |
* dictionary tests (e.g. hfst-tester) |
||
* regression tests |
* regression tests |
||
* pending tests |
* pending tests |
||
* testvoc |
|||
* testvoc+bidixvoc (some language pairs have bilingual dictionaries with more than one translation for a given SL word, at the moment testvoc will only ever test the default translation. testvoc+bidixvoc will test them all) |
|||
* generation test |
|||
* corpus test |
|||
====Parallel corpus==== |
|||
* WER, PER, BLEU against reference |
* WER, PER, BLEU against reference |
||
====Graphs==== |
|||
* coverage over time |
* coverage over time |
||
Line 62: | Line 68: | ||
* WER/PER/BLEU over time |
* WER/PER/BLEU over time |
||
* percentage of regression tests passed over time |
* percentage of regression tests passed over time |
||
== Feature Requests == |
|||
* Cache the wiki Regression test web page so that we can test when the wiki is offline or when stuck in airports with expensive wifi |
|||
== Extensions == |
|||
=== Sanity Tests === |
|||
Simple allow the use of a sanity_tests directory in a dictionary directory, and if found, run any scripts found in there, storing their name and return value in the quality-stats.xml. This allows the scripts to be in any language given they return non-zero return values on error. |
|||
Possible tests: |
|||
* Superblank order test |
Latest revision as of 18:20, 21 August 2011
Contents
Menu[edit]
Getting Started[edit]
Technical Documentation[edit]
Notes[edit]
Community Bonding Period[edit]
Week 1 — 25th April[edit]
- Must demonstrate that setuptools can allow a prefix-based installation for non-root users before end of bonding period
- Emailed Francis a written proof of setuptools adequately meeting expectations and requirements.
Week 2 — 2nd May[edit]
- Converted LaTeX source to Wikimedia format, and placed below this section for annotation.
- Completed example regtest.py
- Added Installation and Usage pages, uploaded initial files.
Week 3 — 9th May[edit]
- Fixed a Python regression-related bug in regtest.py
- Fixed a personal regression in setup.py
- Plan to add autogen.sh for config
- Consider using virtualenv for rootless installations
- Fixed installation instructions
- SVN and git now synched
Coding Period[edit]
Week 1 — 23rd May[edit]
- Completed autogen.sh
Todo[edit]
Tests and stats[edit]
Monolingual corpus[edit]
- dicts: Coverage
- rules: Rule counting (CG, apertium-transfer)
- rules: number of rules
- dicts: number of entries (sl mono, sl-tl, tl mono) -- lttoolbox/hfst
- dicts: (monolingual) mean ambiguity
- system: translation speed (per module?)
- dicts: (bilingual) mean fertility -- e.g. number of translations per SL/TL word
- rules: for disambiguation, if there is cg + apertium tagger, how much work does CG do and how much does apertium-tagger do ? (count LU input to CG, LU output from CG and LU output form apertium-tagger)
Tests[edit]
- dictionary tests (e.g. hfst-tester)
- regression tests
- pending tests
- testvoc
- testvoc+bidixvoc (some language pairs have bilingual dictionaries with more than one translation for a given SL word, at the moment testvoc will only ever test the default translation. testvoc+bidixvoc will test them all)
- generation test
- corpus test
Parallel corpus[edit]
- WER, PER, BLEU against reference
Graphs[edit]
- coverage over time
- number of rules over time
- mean ambiguity over time
- number of dict entries over time
- translation speed over time
- WER/PER/BLEU over time
- percentage of regression tests passed over time
Feature Requests[edit]
- Cache the wiki Regression test web page so that we can test when the wiki is offline or when stuck in airports with expensive wifi
Extensions[edit]
Sanity Tests[edit]
Simple allow the use of a sanity_tests directory in a dictionary directory, and if found, run any scripts found in there, storing their name and return value in the quality-stats.xml. This allows the scripts to be in any language given they return non-zero return values on error.
Possible tests:
- Superblank order test