Difference between revisions of "Talk:Apertium-quality"

From Apertium
Jump to navigation Jump to search
 
(16 intermediate revisions by 4 users not shown)
Line 1: Line 1:
= Menu =
= Menu =
==== Getting Started ====
* [[Quality_control_framework/Installation|Installation]]
* [[Quality_control_framework/Installation|Installation]]
* [[Quality_control_framework/Usage|Usage]]
* [[Quality_control_framework/Usage|Usage]]

==== Technical Documentation ====
* [[Quality_control_framework/Proposal|Proposal]]
* [[Quality_control_framework/XML_Schema|XML Schema]]


= Notes =
= Notes =
== Community Bonding Period ==
=== Week 1 — 25th April ===
=== Week 1 — 25th April ===
* Must demonstrate that setuptools can allow a prefix-based installation for non-root users before end of bonding period
* Must demonstrate that setuptools can allow a prefix-based installation for non-root users before end of bonding period
Line 21: Line 27:
* SVN and git now synched
* SVN and git now synched


= Todo =
== Coding Period ==
=== Week 1 — 23rd May ===
# Complete the todo.
* Completed autogen.sh

== Todo ==
===Tests and stats===

====Monolingual corpus====

* dicts: Coverage
* rules: Rule counting (CG, apertium-transfer)
* rules: number of rules
* dicts: number of entries (sl mono, sl-tl, tl mono) -- lttoolbox/hfst
* dicts: (monolingual) mean ambiguity
* system: translation speed (per module?)
* dicts: (bilingual) mean fertility -- e.g. number of translations per SL/TL word
* rules: for disambiguation, if there is cg + apertium tagger, how much work does CG do and how much does apertium-tagger do ? (count LU input to CG, LU output from CG and LU output form apertium-tagger)

====Tests====

* dictionary tests (e.g. hfst-tester)
* regression tests
* pending tests
* testvoc
* testvoc+bidixvoc (some language pairs have bilingual dictionaries with more than one translation for a given SL word, at the moment testvoc will only ever test the default translation. testvoc+bidixvoc will test them all)
* generation test
* corpus test

====Parallel corpus====

* WER, PER, BLEU against reference

====Graphs====

* coverage over time
* number of rules over time
* mean ambiguity over time
* number of dict entries over time
* translation speed over time
* WER/PER/BLEU over time
* percentage of regression tests passed over time

== Feature Requests ==
* Cache the wiki Regression test web page so that we can test when the wiki is offline or when stuck in airports with expensive wifi

== Extensions ==
=== Sanity Tests ===
Simple allow the use of a sanity_tests directory in a dictionary directory, and if found, run any scripts found in there, storing their name and return value in the quality-stats.xml. This allows the scripts to be in any language given they return non-zero return values on error.

Possible tests:

* Superblank order test

Latest revision as of 18:20, 21 August 2011

Menu[edit]

Getting Started[edit]

Technical Documentation[edit]

Notes[edit]

Community Bonding Period[edit]

Week 1 — 25th April[edit]

  • Must demonstrate that setuptools can allow a prefix-based installation for non-root users before end of bonding period
  • Emailed Francis a written proof of setuptools adequately meeting expectations and requirements.

Week 2 — 2nd May[edit]

  • Converted LaTeX source to Wikimedia format, and placed below this section for annotation.
  • Completed example regtest.py
  • Added Installation and Usage pages, uploaded initial files.

Week 3 — 9th May[edit]

  • Fixed a Python regression-related bug in regtest.py
  • Fixed a personal regression in setup.py
  • Plan to add autogen.sh for config
  • Consider using virtualenv for rootless installations
  • Fixed installation instructions
  • SVN and git now synched

Coding Period[edit]

Week 1 — 23rd May[edit]

  • Completed autogen.sh

Todo[edit]

Tests and stats[edit]

Monolingual corpus[edit]

  • dicts: Coverage
  • rules: Rule counting (CG, apertium-transfer)
  • rules: number of rules
  • dicts: number of entries (sl mono, sl-tl, tl mono) -- lttoolbox/hfst
  • dicts: (monolingual) mean ambiguity
  • system: translation speed (per module?)
  • dicts: (bilingual) mean fertility -- e.g. number of translations per SL/TL word
  • rules: for disambiguation, if there is cg + apertium tagger, how much work does CG do and how much does apertium-tagger do ? (count LU input to CG, LU output from CG and LU output form apertium-tagger)

Tests[edit]

  • dictionary tests (e.g. hfst-tester)
  • regression tests
  • pending tests
  • testvoc
  • testvoc+bidixvoc (some language pairs have bilingual dictionaries with more than one translation for a given SL word, at the moment testvoc will only ever test the default translation. testvoc+bidixvoc will test them all)
  • generation test
  • corpus test

Parallel corpus[edit]

  • WER, PER, BLEU against reference

Graphs[edit]

  • coverage over time
  • number of rules over time
  • mean ambiguity over time
  • number of dict entries over time
  • translation speed over time
  • WER/PER/BLEU over time
  • percentage of regression tests passed over time

Feature Requests[edit]

  • Cache the wiki Regression test web page so that we can test when the wiki is offline or when stuck in airports with expensive wifi

Extensions[edit]

Sanity Tests[edit]

Simple allow the use of a sanity_tests directory in a dictionary directory, and if found, run any scripts found in there, storing their name and return value in the quality-stats.xml. This allows the scripts to be in any language given they return non-zero return values on error.

Possible tests:

  • Superblank order test