Difference between revisions of "Talk:Apertium-quality"

From Apertium
Jump to navigation Jump to search
Line 43: Line 43:
* rules: number of rules
* rules: number of rules
* dicts: number of entries (sl mono, sl-tl, tl mono) -- lttoolbox/hfst
* dicts: number of entries (sl mono, sl-tl, tl mono) -- lttoolbox/hfst
* dicts: mean ambiguity
* dicts: (monolingual) mean ambiguity
* system: translation speed (per module?)
* system: translation speed (per module?)
* dicts: (bilingual) mean fertility -- e.g. number of translations per SL/TL word


;Tests
;Tests

Revision as of 09:05, 8 June 2011

Menu

Getting Started

Technical Documentation

Notes

Community Bonding Period

Week 1 — 25th April

  • Must demonstrate that setuptools can allow a prefix-based installation for non-root users before end of bonding period
  • Emailed Francis a written proof of setuptools adequately meeting expectations and requirements.

Week 2 — 2nd May

  • Converted LaTeX source to Wikimedia format, and placed below this section for annotation.
  • Completed example regtest.py
  • Added Installation and Usage pages, uploaded initial files.

Week 3 — 9th May

  • Fixed a Python regression-related bug in regtest.py
  • Fixed a personal regression in setup.py
  • Plan to add autogen.sh for config
  • Consider using virtualenv for rootless installations
  • Fixed installation instructions
  • SVN and git now synched

Coding Period

Week 1 — 23rd May

  • Completed autogen.sh

Todo

  1. Complete the todo.


Tests and stats

Monolingual corpus
  • dicts: Coverage
  • rules: Rule counting (CG, apertium-transfer)
  • rules: number of rules
  • dicts: number of entries (sl mono, sl-tl, tl mono) -- lttoolbox/hfst
  • dicts: (monolingual) mean ambiguity
  • system: translation speed (per module?)
  • dicts: (bilingual) mean fertility -- e.g. number of translations per SL/TL word
Tests
  • dictionary tests (e.g. hfst-tester)
  • regression tests
  • pending tests
  • testvoc
  • generation test
  • corpus test
Parallel corpus
  • WER, PER, BLEU against reference
Graphs
  • coverage over time
  • number of rules over time
  • mean ambiguity over time
  • number of dict entries over time
  • translation speed over time
  • WER/PER/BLEU over time
  • percentage of regression tests passed over time