Notes

Community Bonding Period

Week 1 — 25th April

Must demonstrate that setuptools can allow a prefix-based installation for non-root users before end of bonding period
Emailed Francis a written proof of setuptools adequately meeting expectations and requirements.

Week 2 — 2nd May

Converted LaTeX source to Wikimedia format, and placed below this section for annotation.
Completed example regtest.py
Added Installation and Usage pages, uploaded initial files.

Week 3 — 9th May

Fixed a Python regression-related bug in regtest.py
Fixed a personal regression in setup.py
Plan to add autogen.sh for config
Consider using virtualenv for rootless installations
Fixed installation instructions
SVN and git now synched

Coding Period

Week 1 — 23rd May

Completed autogen.sh

Todo

Tests and stats

Monolingual corpus

dicts: Coverage
rules: Rule counting (CG, apertium-transfer)
rules: number of rules
dicts: number of entries (sl mono, sl-tl, tl mono) -- lttoolbox/hfst
dicts: (monolingual) mean ambiguity
system: translation speed (per module?)
dicts: (bilingual) mean fertility -- e.g. number of translations per SL/TL word
rules: for disambiguation, if there is cg + apertium tagger, how much work does CG do and how much does apertium-tagger do ? (count LU input to CG, LU output from CG and LU output form apertium-tagger)

Tests

dictionary tests (e.g. hfst-tester)
regression tests
pending tests
testvoc
testvoc+bidixvoc (some language pairs have bilingual dictionaries with more than one translation for a given SL word, at the moment testvoc will only ever test the default translation. testvoc+bidixvoc will test them all)
generation test
corpus test

Parallel corpus

WER, PER, BLEU against reference

Graphs

coverage over time
number of rules over time
mean ambiguity over time
number of dict entries over time
translation speed over time
WER/PER/BLEU over time
percentage of regression tests passed over time

Feature Requests

Cache the wiki Regression test web page so that we can test when the wiki is offline or when stuck in airports with expensive wifi

Extensions

Sanity Tests

Simple allow the use of a sanity_tests directory in a dictionary directory, and if found, run any scripts found in there, storing their name and return value in the quality-stats.xml. This allows the scripts to be in any language given they return non-zero return values on error.

Possible tests:

Superblank order test

Talk:Apertium-quality

Contents

Menu

Getting Started

Technical Documentation

Notes

Community Bonding Period

Week 1 — 25th April

Week 2 — 2nd May

Week 3 — 9th May

Coding Period

Week 1 — 23rd May

Todo

Tests and stats

Monolingual corpus

Tests

Parallel corpus

Graphs

Feature Requests

Extensions

Sanity Tests

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools