Difference between revisions of "Talk:Apertium-quality"

From Apertium
Jump to navigation Jump to search
 
(10 intermediate revisions by 4 users not shown)
Line 1: Line 1:
= Menu =
= Menu =
==== Getting Started ====
* [[Quality_control_framework/Installation|Installation]]
* [[Quality_control_framework/Installation|Installation]]
* [[Quality_control_framework/Usage|Usage]]
* [[Quality_control_framework/Usage|Usage]]


==== Technical Documentation ====
* [[Quality_control_framework/Proposal|Proposal]]
* [[Quality_control_framework/Proposal|Proposal]]
* [[Quality_control_framework/XML_Schema|XML Schema]]


= Notes =
= Notes =
Line 29: Line 32:


== Todo ==
== Todo ==
# Complete the todo.


===Tests and stats===
===Tests and stats===


;Monolingual corpus
====Monolingual corpus====


* dicts: Coverage
* dicts: Coverage
Line 40: Line 40:
* rules: number of rules
* rules: number of rules
* dicts: number of entries (sl mono, sl-tl, tl mono) -- lttoolbox/hfst
* dicts: number of entries (sl mono, sl-tl, tl mono) -- lttoolbox/hfst
* dicts: mean ambiguity
* dicts: (monolingual) mean ambiguity
* system: translation speed (per module?)
* system: translation speed (per module?)
* dicts: (bilingual) mean fertility -- e.g. number of translations per SL/TL word
* system: testvoc
* rules: for disambiguation, if there is cg + apertium tagger, how much work does CG do and how much does apertium-tagger do ? (count LU input to CG, LU output from CG and LU output form apertium-tagger)
* system: generation test
* system: corpus test


;Tests
====Tests====


* dictionary tests (e.g. hfst-tester)
* dictionary tests (e.g. hfst-tester)
* regression tests
* regression tests
* pending tests
* pending tests
* testvoc
* testvoc+bidixvoc (some language pairs have bilingual dictionaries with more than one translation for a given SL word, at the moment testvoc will only ever test the default translation. testvoc+bidixvoc will test them all)
* generation test
* corpus test


;Parallel corpus
====Parallel corpus====


* WER, PER, BLEU against reference
* WER, PER, BLEU against reference


;Graphs
====Graphs====


* coverage over time
* coverage over time
Line 65: Line 68:
* WER/PER/BLEU over time
* WER/PER/BLEU over time
* percentage of regression tests passed over time
* percentage of regression tests passed over time

== Feature Requests ==
* Cache the wiki Regression test web page so that we can test when the wiki is offline or when stuck in airports with expensive wifi

== Extensions ==
=== Sanity Tests ===
Simple allow the use of a sanity_tests directory in a dictionary directory, and if found, run any scripts found in there, storing their name and return value in the quality-stats.xml. This allows the scripts to be in any language given they return non-zero return values on error.

Possible tests:

* Superblank order test

Latest revision as of 18:20, 21 August 2011

Menu[edit]

Getting Started[edit]

Technical Documentation[edit]

Notes[edit]

Community Bonding Period[edit]

Week 1 — 25th April[edit]

  • Must demonstrate that setuptools can allow a prefix-based installation for non-root users before end of bonding period
  • Emailed Francis a written proof of setuptools adequately meeting expectations and requirements.

Week 2 — 2nd May[edit]

  • Converted LaTeX source to Wikimedia format, and placed below this section for annotation.
  • Completed example regtest.py
  • Added Installation and Usage pages, uploaded initial files.

Week 3 — 9th May[edit]

  • Fixed a Python regression-related bug in regtest.py
  • Fixed a personal regression in setup.py
  • Plan to add autogen.sh for config
  • Consider using virtualenv for rootless installations
  • Fixed installation instructions
  • SVN and git now synched

Coding Period[edit]

Week 1 — 23rd May[edit]

  • Completed autogen.sh

Todo[edit]

Tests and stats[edit]

Monolingual corpus[edit]

  • dicts: Coverage
  • rules: Rule counting (CG, apertium-transfer)
  • rules: number of rules
  • dicts: number of entries (sl mono, sl-tl, tl mono) -- lttoolbox/hfst
  • dicts: (monolingual) mean ambiguity
  • system: translation speed (per module?)
  • dicts: (bilingual) mean fertility -- e.g. number of translations per SL/TL word
  • rules: for disambiguation, if there is cg + apertium tagger, how much work does CG do and how much does apertium-tagger do ? (count LU input to CG, LU output from CG and LU output form apertium-tagger)

Tests[edit]

  • dictionary tests (e.g. hfst-tester)
  • regression tests
  • pending tests
  • testvoc
  • testvoc+bidixvoc (some language pairs have bilingual dictionaries with more than one translation for a given SL word, at the moment testvoc will only ever test the default translation. testvoc+bidixvoc will test them all)
  • generation test
  • corpus test

Parallel corpus[edit]

  • WER, PER, BLEU against reference

Graphs[edit]

  • coverage over time
  • number of rules over time
  • mean ambiguity over time
  • number of dict entries over time
  • translation speed over time
  • WER/PER/BLEU over time
  • percentage of regression tests passed over time

Feature Requests[edit]

  • Cache the wiki Regression test web page so that we can test when the wiki is offline or when stuck in airports with expensive wifi

Extensions[edit]

Sanity Tests[edit]

Simple allow the use of a sanity_tests directory in a dictionary directory, and if found, run any scripts found in there, storing their name and return value in the quality-stats.xml. This allows the scripts to be in any language given they return non-zero return values on error.

Possible tests:

  • Superblank order test