Difference between revisions of "Apertium-quality/Quickstart"

From Apertium
Jump to navigation Jump to search
m (→‎Regression testing: fix a typo)
Line 98: Line 98:
==== Regression testing ====
==== Regression testing ====
Dictionaries are complex beings. Sometimes you want to write some tests to prove you haven't broken something, or you've met a milestone. Luckily, this dictionary has quite a few regression tests already written for our testing pleasure, so let's have at it then!
Dictionaries are complex beings. Sometimes you want to write some tests to prove you haven't broken something, or you've met a milestone. Luckily, this dictionary has quite a few regression tests already written for our testing pleasure, so let's have a look at it then!

Revision as of 19:22, 8 February 2014


The Apertium Quality Control Framework, or `apertium-quality`, is a framework and toolkit for unit testing Apertium dictionaries, and to a lesser extent, HFST finite state transducers, and recording detailed statistics for later analysis. The toolkit portion of the project consists of tools such as regression testing, coverage testing, vocabulary testing and statistics storage, while the framework is robust enough to allow extensive modification through standard interfaces and detailed XML schemas.


You must have, at the very least, the following applications and libraries installed:

  • Autotools
  • Python >= 3.1
  • Subversion
  • Apertium >= 3.2 (or SVN version)

Assuming Debian or Ubuntu, you can most likely get these by running:

apt-get install automake autoconf subversion apertium python3 python3-lxml

A typical use case/walkthrough — apertium-mt-he

Getting the goodies

Installing apertium-quality

First of all, we need to actually install `apertium-quality`, so in the terminal, enter:

svn co https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-tools/apertium-quality

It may ask you to accept a security certificate. Feel free to accept permanently. Once it has downloaded, cd into the directory and run the usual commands:

cd apertium-quality
./autogen.sh && make && make install && make install-nltk

If you see any "errors" relating to libyaml, ignore them. They're not errors, it simply isn't finding the C libyaml libraries, and instead uses pure Python ones.

Add `--prefix` wherever you please, and if Python 3 isn't detected for some reason, prefix the `./autogen.sh` command with `PYTHON=/path/to/python3`.

Before running any of the tools you must export the line specified by the Makefile after you build the module if you installed to a prefix, otherwise Python wont be able to find the module.

PLEASE NOTE: if you cannot install lxml, you may come across a few bugs, as lxml is preferred over the built-in ElementTree library for compatibility and speed reasons. If you do come across any bugs however, please report them!

Getting apertium-mt-he

`apertium-mt-he` is a dictionary pair that was being developed during the GSoC period while I was coding `apertium-quality`, so why not try it out and see how it's going?

Let's download it:

svn co https://apertium.svn.sourceforge.net/svnroot/apertium/incubator/apertium-mt-he
cd apertium-mt-he

So assuming that the aq tools were installed correctly and are visible on your `$PATH`, we can begin testing. Otherwise, set up your $PATH correctly.

First of all, we should pick a few revisions to test so we can get out some pretty stats, so the logs today show this for me:

$ svn log
r33535 | unhammer | 2011-08-21 00:03:20 +1000 (Sun, 21 Aug 2011) | 1 line

new summary, only 51 verb _forms_ missing :)
r33534 | unhammer | 2011-08-20 23:57:00 +1000 (Sat, 20 Aug 2011) | 1 line

minor typo
r33525 | n0nick | 2011-08-20 19:13:12 +1000 (Sat, 20 Aug 2011) | 1 line

adding more missing verbs to he.dix
r33524 | n0nick | 2011-08-20 19:04:58 +1000 (Sat, 20 Aug 2011) | 1 line

adding some missing verbs
r33523 | n0nick | 2011-08-20 18:43:58 +1000 (Sat, 20 Aug 2011) | 1 line

Let's start by checking out an earlier revision and work our way forward:

svn up -r 33402

Herein lies the testing

So you've successfully gotten up to this point? Wonderful! Let's start doing some testing. But first, a few things:

  1. Never do testing on a dictionary without committing changes first. It most likely won't let you in order to guarantee the integrity of the data.
  2. Unless you want to save statistics, don't use -X. Test first, statistic save later.

First, let's compile the dictionary.

./autogen.sh && make

If you have installed Apertium to a prefix, make sure you prepend ./autogen in the above code with PKG_CONFIG_PATH=/path/to/lib/pkgconfig!

Ambiguity testing

This kind of test allows you to see the naive ambiguity of your dictionaries by counting the forms of each word, then averaging the result. We do this by running `aq-ambtest` on each dix. So let's do it!

aq-ambtest apertium-mt-he.he.dix -X
aq-ambtest apertium-mt-he.mt.dix -X
aq-ambtest apertium-mt-he.mt-he.dix -X

See that dangling `-X` on the end? That's the flag to save statistics. By default it saves to a file called `quality-stats.xml`. If you add a filename to the end of the `-X`, it'll save in that file. By putting it at the end, we can just use the default, and we will! All tests use `-X` for saving stats, so easy to remember.

Regression testing

Dictionaries are complex beings. Sometimes you want to write some tests to prove you haven't broken something, or you've met a milestone. Luckily, this dictionary has quite a few regression tests already written for our testing pleasure, so let's have a look at it then!

aq-regtest -d . mt-he http://wiki.apertium.org/wiki/Special:Export/Maltese_and_Hebrew/Regression_tests -X
aq-regtest -d . mt-he http://wiki.apertium.org/wiki/Special:Export/Maltese_and_Hebrew/Pending_tests -X

You should get a lot of output saying WORKS everywhere for the first one, with 24/24 passes. 100%, yay! Pending however is a bit more disappointing. 4/33? How disheartening. Not to matter, maybe it gets better in the future!

Wikipedia corpus extractor

So in order to do some of the tests like generation testing or coverage testing, we need corpora, right? Have no fear, for `aq-wikicrp` is here! Let us get a Maltese wikipedia dump and make a lovely little corpus with it, but first, we must get a sentence tokeniser compatible with our version of Python.

For information on how to generate your own tokenisers, check out dev/punktgen.py in the apertium-quality svn repo.

And now, assuming Python 3.2, we run the following:

wget http://dumps.wikimedia.org/mtwiki/latest/mtwiki-latest-pages-articles.xml.bz2 && bunzip2 mtwiki-latest-pages-articles.xml.bz2
aq-wikicrp mtwiki-latest-pages-articles.xml mt.wikipedia.crp.txt -t ./maltese-3.2.pickle
aq-wikicrp -x mtwiki-latest-pages-articles.xml mt.wikipedia.crp.xml -t ./maltese-3.2.pickle

Have a look at both of the output files. One of them is purely plain text, with a sentence per line. The other file is an XML corpus, separated into "entries". The XML format allows for some more advanced parsing and searching, but is so far hardly used with this toolkit, other than coverage testing, where both formats are supported. Feel free to use whichever you're more comfortable with.

Coverage testing

Coverage testing does what you'd expect, tests the dictionary for coverage. Using our newly created corpora, we can test the coverage! Feel free to use either one, but be consistent; only use one of them.

aq-covtest mt.wikipedia.crp.txt mt-he.automorf.bin -X
# OR
aq-covtest mt.wikipedia.crp.xml mt-he.automorf.bin -X

This command should run fairly quickly, and give you the number of tokenised words, coverage percentage and top unknown words. It also displays the speed it translated the whole corpus, and it's all stored nice and cosy in the stats file.

Dictionary testing

There are a few other tests that can be done on the dictionaries that don't really need their own command, so these functions are in `aq-dixtest`. They include counting entries and counting rules, but will soon be extended to count CG rules and transfer rules. It can be run:

aq-dixtest mt-he -X

This will show the count of rules per file, and the entry count per file. It also shows the totals and unique entries.

Vocabulary testing

This test allows you to see the manner that your dictionary's transfer rules are working, and get some useful output. By default it will save the output in voctest.txt, so it's worth having a look at this output to find transfer bugs and what not.

aq-voctest mt-he -X

You should get some pretty line counts, # counts and @ counts. Sweet!

Generation testing

Generation testing isn't working at the moment sadly, so stay tuned.

Morph testing

Morph testing isn't supported by the language we're using, but it is as simple to run as regression testing. One simply runs a configuration file like:

aq-morftest tests.yaml -X

This command will do very similar things to the regression test, but for HFST dictionaries. It allows you to test morphology in both directions to find bugs and regressions. Pretty schmick.

What now?

Now we've run pretty much every test. What now? Didn't you say something about pretty graphs? Yes, I did! But first, we need more statistics, and who really wants to type all of these commands all of the time unless testing for output?

Introducing aq-autotest

`aq-autotest` takes an .aqx file as input. It is an XML configuration file, defined at http://github.com/bbqsrc/apertium-schemas in the aqx.rnc file. For this walkthrough, here's a ready-made configuration file just for you!

<config xmlns="http://apertium.org/xml/quality/config/0.1">
		<command>make clean</command>
        <corpus language="mt-he" path="mt.wikipedia.crp.txt"/>
        <test language="mt-he" path="http://wiki.apertium.org/wiki/Special:Export/Maltese_and_Hebrew/Regression_tests"/>
        <test language="mt-he" path="http://wiki.apertium.org/wiki/Special:Export/Maltese_and_Hebrew/Pending_tests"/>

If you have installed Apertium to a prefix, make sure you prepend ./autogen in the above file with PKG_CONFIG_PATH=/path/to/lib/pkgconfig!

For this example, simply save the file in the current directory as `quality.aqx`. Before running it, let's delete `quality-stats.xml` so we don't duplicate data unnecessarily, and feed the aqx file to `aq-autotest` like so:

aq-autotest quality.aqx -X

This has much less output than the other frontends, due to its primary purpose being to generate statistics as easily as possible.

Now to make this rather short, I have made a simple script so that you can generate the remaining revisions in a timely fashion for the other revisions:

for i in 33402 33403 33404 33405 33406 33421 33423 33424 33425 33426 33427 33471 33523 33524 33525 33534 33535 33681 33682 33683; do svn up -r $i; aq-autotest quality.aqx -X; done

You might notice that some of the testing fails and bails out. This is normal. It means that revision of the dictionary was buggy and won't compile correctly, so is simply skipped. As it should be.

So, once that finishes running, hurray! We have enough stats to do something fun. What's that, you ask?

Welcome to pretty graphs — aq-htmlgen

We can generate the preliminary version of some pretty graphs with a very simple command:

aq-htmlgen quality-stats.xml out

This will output a bunch of JS, CSS, and HTML files to the directory specified, which here is `out`. Open `index.html` in `out` and enjoy the graphs! They are a bit limited right now, but can be easily improved with a drop-in replacement of raphael_linegraph.js which is on the list of things to do!


I hope the walkthrough wasn't too tedious and highlights how using this framework can improve your testing life. Developer documentation will come very soon so if you wish to extend or improve the framework, you will be able to do it with ease.

Documentation about the configuration formats and XML schemas can be found on the wiki at http://wiki.apertium.org/wiki/apertium-quality.