Revision as of 07:27, 17 June 2011

Notes

Community Bonding Period

Week 1 — 25th April

Must demonstrate that setuptools can allow a prefix-based installation for non-root users before end of bonding period
Emailed Francis a written proof of setuptools adequately meeting expectations and requirements.

Week 2 — 2nd May

Converted LaTeX source to Wikimedia format, and placed below this section for annotation.
Completed example regtest.py
Added Installation and Usage pages, uploaded initial files.

Week 3 — 9th May

Fixed a Python regression-related bug in regtest.py
Fixed a personal regression in setup.py
Plan to add autogen.sh for config
Consider using virtualenv for rootless installations
Fixed installation instructions
SVN and git now synched

Coding Period

Week 1 — 23rd May

Completed autogen.sh

Todo

Tests and stats

Monolingual corpus

dicts: Coverage
rules: Rule counting (CG, apertium-transfer)
rules: number of rules
dicts: number of entries (sl mono, sl-tl, tl mono) -- lttoolbox/hfst
dicts: (monolingual) mean ambiguity
system: translation speed (per module?)
dicts: (bilingual) mean fertility -- e.g. number of translations per SL/TL word

Tests

dictionary tests (e.g. hfst-tester)
regression tests
pending tests
testvoc
generation test
corpus test

Parallel corpus

WER, PER, BLEU against reference

Graphs

coverage over time
number of rules over time
mean ambiguity over time
number of dict entries over time
translation speed over time
WER/PER/BLEU over time
percentage of regression tests passed over time

Feature Requests

Cache the wiki Regression test web page so that we can test when the wiki is offline or when stuck in airports with expensive wifi

Extensions

Sanity Tests

Simple allow the use of a sanity_tests directory in a dictionary directory, and if found, run any scripts found in there, storing their name and return value in the quality-stats.xml. This allows the scripts to be in any language given they return non-zero return values on error.

@@ Line 32: / Line 32: @@
 == Todo ==
-# Complete the todo.
 ===Tests and stats===
-;Monolingual corpus
+====Monolingual corpus====
 * dicts: Coverage
@@ Line 47: / Line 44: @@
 * dicts: (bilingual) mean fertility -- e.g. number of translations per SL/TL word
-;Tests
+====Tests====
 * dictionary tests (e.g. hfst-tester)
@@ Line 56: / Line 53: @@
 * corpus test
-;Parallel corpus
+====Parallel corpus====
 * WER, PER, BLEU against reference
-;Graphs
+====Graphs====
 * coverage over time
@@ Line 71: / Line 68: @@
 == Feature Requests ==
 * Cache the wiki Regression test web page so that we can test when the wiki is offline or when stuck in airports with expensive wifi
+== Extensions ==
+=== Sanity Tests ===
+Simple allow the use of a sanity_tests directory in a dictionary directory, and if found, run any scripts found in there, storing their name and return value in the quality-stats.xml. This allows the scripts to be in any language given they return non-zero return values on error.

Difference between revisions of "Talk:Apertium-quality"

Revision as of 07:27, 17 June 2011

Contents

Menu

Getting Started

Technical Documentation

Notes

Community Bonding Period

Week 1 — 25th April

Week 2 — 2nd May

Week 3 — 9th May

Coding Period

Week 1 — 23rd May

Todo

Tests and stats

Monolingual corpus

Tests

Parallel corpus

Graphs

Feature Requests

Extensions

Sanity Tests

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools