Difference between revisions of "Talk:Apertium-quality"

From Apertium
Jump to navigation Jump to search
 
(16 intermediate revisions by 4 users not shown)
Line 1: Line 1:
 
= Menu =
 
= Menu =
  +
==== Getting Started ====
 
* [[Quality_control_framework/Installation|Installation]]
 
* [[Quality_control_framework/Installation|Installation]]
 
* [[Quality_control_framework/Usage|Usage]]
 
* [[Quality_control_framework/Usage|Usage]]
  +
  +
==== Technical Documentation ====
  +
* [[Quality_control_framework/Proposal|Proposal]]
  +
* [[Quality_control_framework/XML_Schema|XML Schema]]
   
 
= Notes =
 
= Notes =
  +
== Community Bonding Period ==
 
=== Week 1 — 25th April ===
 
=== Week 1 — 25th April ===
 
* Must demonstrate that setuptools can allow a prefix-based installation for non-root users before end of bonding period
 
* Must demonstrate that setuptools can allow a prefix-based installation for non-root users before end of bonding period
Line 21: Line 27:
 
* SVN and git now synched
 
* SVN and git now synched
   
= Todo =
+
== Coding Period ==
  +
=== Week 1 — 23rd May ===
# Complete the todo.
 
  +
* Completed autogen.sh
  +
  +
== Todo ==
  +
===Tests and stats===
  +
  +
====Monolingual corpus====
  +
  +
* dicts: Coverage
  +
* rules: Rule counting (CG, apertium-transfer)
  +
* rules: number of rules
  +
* dicts: number of entries (sl mono, sl-tl, tl mono) -- lttoolbox/hfst
  +
* dicts: (monolingual) mean ambiguity
  +
* system: translation speed (per module?)
  +
* dicts: (bilingual) mean fertility -- e.g. number of translations per SL/TL word
  +
* rules: for disambiguation, if there is cg + apertium tagger, how much work does CG do and how much does apertium-tagger do ? (count LU input to CG, LU output from CG and LU output form apertium-tagger)
  +
  +
====Tests====
  +
  +
* dictionary tests (e.g. hfst-tester)
  +
* regression tests
  +
* pending tests
  +
* testvoc
  +
* testvoc+bidixvoc (some language pairs have bilingual dictionaries with more than one translation for a given SL word, at the moment testvoc will only ever test the default translation. testvoc+bidixvoc will test them all)
  +
* generation test
  +
* corpus test
  +
  +
====Parallel corpus====
  +
  +
* WER, PER, BLEU against reference
  +
  +
====Graphs====
  +
  +
* coverage over time
  +
* number of rules over time
  +
* mean ambiguity over time
  +
* number of dict entries over time
  +
* translation speed over time
  +
* WER/PER/BLEU over time
  +
* percentage of regression tests passed over time
  +
  +
== Feature Requests ==
  +
* Cache the wiki Regression test web page so that we can test when the wiki is offline or when stuck in airports with expensive wifi
  +
  +
== Extensions ==
  +
=== Sanity Tests ===
  +
Simple allow the use of a sanity_tests directory in a dictionary directory, and if found, run any scripts found in there, storing their name and return value in the quality-stats.xml. This allows the scripts to be in any language given they return non-zero return values on error.
  +
  +
Possible tests:
  +
  +
* Superblank order test

Latest revision as of 18:20, 21 August 2011

Menu[edit]

Getting Started[edit]

Technical Documentation[edit]

Notes[edit]

Community Bonding Period[edit]

Week 1 — 25th April[edit]

  • Must demonstrate that setuptools can allow a prefix-based installation for non-root users before end of bonding period
  • Emailed Francis a written proof of setuptools adequately meeting expectations and requirements.

Week 2 — 2nd May[edit]

  • Converted LaTeX source to Wikimedia format, and placed below this section for annotation.
  • Completed example regtest.py
  • Added Installation and Usage pages, uploaded initial files.

Week 3 — 9th May[edit]

  • Fixed a Python regression-related bug in regtest.py
  • Fixed a personal regression in setup.py
  • Plan to add autogen.sh for config
  • Consider using virtualenv for rootless installations
  • Fixed installation instructions
  • SVN and git now synched

Coding Period[edit]

Week 1 — 23rd May[edit]

  • Completed autogen.sh

Todo[edit]

Tests and stats[edit]

Monolingual corpus[edit]

  • dicts: Coverage
  • rules: Rule counting (CG, apertium-transfer)
  • rules: number of rules
  • dicts: number of entries (sl mono, sl-tl, tl mono) -- lttoolbox/hfst
  • dicts: (monolingual) mean ambiguity
  • system: translation speed (per module?)
  • dicts: (bilingual) mean fertility -- e.g. number of translations per SL/TL word
  • rules: for disambiguation, if there is cg + apertium tagger, how much work does CG do and how much does apertium-tagger do ? (count LU input to CG, LU output from CG and LU output form apertium-tagger)

Tests[edit]

  • dictionary tests (e.g. hfst-tester)
  • regression tests
  • pending tests
  • testvoc
  • testvoc+bidixvoc (some language pairs have bilingual dictionaries with more than one translation for a given SL word, at the moment testvoc will only ever test the default translation. testvoc+bidixvoc will test them all)
  • generation test
  • corpus test

Parallel corpus[edit]

  • WER, PER, BLEU against reference

Graphs[edit]

  • coverage over time
  • number of rules over time
  • mean ambiguity over time
  • number of dict entries over time
  • translation speed over time
  • WER/PER/BLEU over time
  • percentage of regression tests passed over time

Feature Requests[edit]

  • Cache the wiki Regression test web page so that we can test when the wiki is offline or when stuck in airports with expensive wifi

Extensions[edit]

Sanity Tests[edit]

Simple allow the use of a sanity_tests directory in a dictionary directory, and if found, run any scripts found in there, storing their name and return value in the quality-stats.xml. This allows the scripts to be in any language given they return non-zero return values on error.

Possible tests:

  • Superblank order test