Apertium-quality/Quickstart

From Apertium
Jump to navigation Jump to search

Introduction[edit]

The Apertium Quality Control Framework, or `apertium-quality`, is a framework and toolkit for unit testing Apertium dictionaries, and to a lesser extent, HFST finite state transducers, and recording detailed statistics for later analysis. The toolkit portion of the project consists of tools such as regression testing, coverage testing, vocabulary testing and statistics storage, while the framework is robust enough to allow extensive modification through standard interfaces and detailed XML schemas.

Prerequisites[edit]

You must have, at the very least, the following applications and libraries installed:

  • Autotools
  • Python >= 3.1
  • Git
  • Apertium >= 3.2

Assuming Debian or Ubuntu, you can most likely get these by running:

apt-get install automake autoconf git apertium python3 python3-lxml

A typical use case/walkthrough — apertium-mt-he[edit]

Getting the goodies[edit]

Installing apertium-quality[edit]

First of all, we need to actually install `apertium-quality`, so in the terminal, enter:

git clone https://github.com/apertium/apertium-quality

It may ask you to accept a security certificate. Feel free to accept permanently. Once it has downloaded, cd into the directory and run the usual commands:

cd apertium-quality
./autogen.sh && make && sudo make install && sudo make install-nltk

If you see any "errors" relating to libyaml, ignore them. They're not errors, it simply isn't finding the C libyaml libraries, and instead uses pure Python ones.

Add `--prefix` wherever you please, and if Python 3 isn't detected for some reason, prefix the `./autogen.sh` command with `PYTHON=/path/to/python3`.

Before running any of the tools you must export the line specified by the Makefile after you build the module if you installed to a prefix, otherwise Python won't be able to find the module.

PLEASE NOTE: if you cannot install lxml, you may come across a few bugs, as lxml is preferred over the built-in ElementTree library for compatibility and speed reasons. If you do come across any bugs however, please report them!

Getting apertium-mt-he[edit]

`apertium-mt-he` is a dictionary pair that was being developed during the GSoC period while I was coding `apertium-quality`, so why not try it out and see how it's going?

Let's download it:

git clone https://github.com/apertium/apertium-mlt-heb
cd apertium-mlt-heb

So assuming that the aq tools were installed correctly and are visible on your `$PATH`, we can begin testing. Otherwise, set up your $PATH correctly.

First of all, we should pick a few revisions to test so we can get out some pretty stats, so the logs today show this for me:

$ git log
commit efc73a5695e3c549ccfa4a66bb6a63d2fd830e8c
Author: Francis M. Tyers <ftyers@prompsit.com>
Date:   Fri Nov 18 12:32:33 2016 +0000

    modes

commit 686db20d4fcb5156e73f4f2c94acdf8ee2a6371a
Author: Francis M. Tyers <ftyers@prompsit.com>
Date:   Fri Nov 18 12:31:35 2016 +0000

    move

commit 8b3078c3d83ed2f51f7e782304851ddb56e03a46
Author: Francis M. Tyers <ftyers@prompsit.com>
Date:   Fri Nov 18 12:30:38 2016 +0000

    move to mlt-heb

commit 386091477c4f1640ad11b70db3ff998234e002e5
Author: Kevin Brubeck Unhammer <unhammer@fsfe.org>
Date:   Mon Jan 4 10:34:00 2016 +0000

    tolbox→toolbox

Let's start by checking out an earlier revision and work our way forward:

git checkout a3f1b90732c3d155b2db0ae1211288d51d998c77

Herein lies the testing[edit]

So you've successfully gotten up to this point? Wonderful! Let's start doing some testing. But first, a few things:

  1. Never do testing on a dictionary without committing changes first. It most likely won't let you in order to guarantee the integrity of the data.
  2. Unless you want to save statistics, don't use -X. Test first, statistic save later.

First, let's compile the dictionary.

./autogen.sh && make

If you have installed Apertium to a prefix, make sure you prepend ./autogen in the above code with PKG_CONFIG_PATH=/path/to/lib/pkgconfig!

Ambiguity testing[edit]

This kind of test allows you to see the naive ambiguity of your dictionaries by counting the forms of each word, then averaging the result. We do this by running `aq-ambtest` on each dix. So let's do it!

aq-ambtest apertium-mt-he.he.dix -X
aq-ambtest apertium-mt-he.mt.dix -X
aq-ambtest apertium-mt-he.mt-he.dix -X

See that dangling `-X` on the end? That's the flag to save statistics. By default it saves to a file called `quality-stats.xml`. If you add a filename to the end of the `-X`, it'll save in that file. By putting it at the end, we can just use the default, and we will! All tests use `-X` for saving stats, so easy to remember.

Regression testing[edit]

Dictionaries are complex beings. Sometimes you want to write some tests to prove you haven't broken something, or you've met a milestone. Luckily, this dictionary has quite a few regression tests already written for our testing pleasure, so let's have a look at it then!

aq-regtest -d . mt-he http://wiki.apertium.org/wiki/Special:Export/Maltese_and_Hebrew/Regression_tests -X
aq-regtest -d . mt-he http://wiki.apertium.org/wiki/Special:Export/Maltese_and_Hebrew/Pending_tests -X

You should get a lot of output saying WORKS everywhere for the first one, with 24/24 passes. 100%, yay! Pending however is a bit more disappointing. 4/33? How disheartening. Not to matter, maybe it gets better in the future!

Wikipedia corpus extractor[edit]

So in order to do some of the tests like generation testing or coverage testing, we need corpora, right? Have no fear, for `aq-wikicrp` is here! Let us get a Maltese wikipedia dump and make a lovely little corpus with it, but first, we must get a sentence tokeniser compatible with our version of Python.

For information on how to generate your own tokenisers, check out dev/punktgen.py in the apertium-quality git repo.

And now, assuming Python 3.2, we run the following:

wget http://dumps.wikimedia.org/mtwiki/latest/mtwiki-latest-pages-articles.xml.bz2 && bunzip2 mtwiki-latest-pages-articles.xml.bz2
aq-wikicrp mtwiki-latest-pages-articles.xml mt.wikipedia.crp.txt -t ./maltese-3.2.pickle
aq-wikicrp -x mtwiki-latest-pages-articles.xml mt.wikipedia.crp.xml -t ./maltese-3.2.pickle

Have a look at both of the output files. One of them is purely plain text, with a sentence per line. The other file is an XML corpus, separated into "entries". The XML format allows for some more advanced parsing and searching, but is so far hardly used with this toolkit, other than coverage testing, where both formats are supported. Feel free to use whichever you're more comfortable with.

Coverage testing[edit]

Coverage testing does what you'd expect, tests the dictionary for coverage. Using our newly created corpora, we can test the coverage! Feel free to use either one, but be consistent; only use one of them.

aq-covtest mt.wikipedia.crp.txt mt-he.automorf.bin -X
# OR
aq-covtest mt.wikipedia.crp.xml mt-he.automorf.bin -X

This command should run fairly quickly, and give you the number of tokenised words, coverage percentage and top unknown words. It also displays the speed it translated the whole corpus, and it's all stored nice and cosy in the stats file.

Dictionary testing[edit]

There are a few other tests that can be done on the dictionaries that don't really need their own command, so these functions are in `aq-dixtest`. Currently they include counting entries and counting transfer rules, but might be extended to count CG rules as well. It can be run:

aq-dixtest mt-he -X

This will show the count of rules per file, and the entry count per file. It also shows the totals and unique entries.

Vocabulary testing[edit]

See also: Testvoc

This test allows you to see the manner that your dictionary's transfer rules are working, and get some useful output. By default it will save the output in voctest.txt, so it's worth having a look at this output to find transfer bugs and what not.

aq-voctest mt-he -X

You should get some pretty line counts, # counts and @ counts. Sweet!

Generation testing[edit]

Generation testing isn't working at the moment sadly, so stay tuned.

Morph testing[edit]

Morph testing isn't supported by the language we're using, but it is as simple to run as regression testing. One simply runs a configuration file like:

aq-morftest tests.yaml -X

This command will do very similar things to the regression test, but for HFST dictionaries. It allows you to test morphology in both directions to find bugs and regressions. Pretty schmick.

What now?[edit]

Now we've run pretty much every test. What now? Didn't you say something about pretty graphs? Yes, I did! But first, we need more statistics, and who really wants to type all of these commands all of the time unless testing for output?

Introducing aq-autotest[edit]

`aq-autotest` takes an .aqx file as input. It is an XML configuration file, defined at http://github.com/bbqsrc/apertium-schemas in the aqx.rnc file. For this walkthrough, here's a ready-made configuration file just for you!

<config xmlns="http://apertium.org/xml/quality/config/0.1">
	<commands>
		<command>./autogen.sh</command>
		<command>make clean</command>
		<command>make</command>
	</commands>
    <coverage>
        <corpus language="mt-he" path="mt.wikipedia.crp.txt"/>
    </coverage>
    <regression>
        <test language="mt-he" path="http://wiki.apertium.org/wiki/Special:Export/Maltese_and_Hebrew/Regression_tests"/>
        <test language="mt-he" path="http://wiki.apertium.org/wiki/Special:Export/Maltese_and_Hebrew/Pending_tests"/>
    </regression>
</config>

If you have installed Apertium to a prefix, make sure you prepend ./autogen in the above file with PKG_CONFIG_PATH=/path/to/lib/pkgconfig!

For this example, simply save the file in the current directory as `quality.aqx`. Before running it, let's delete `quality-stats.xml` so we don't duplicate data unnecessarily, and feed the aqx file to `aq-autotest` like so:

aq-autotest quality.aqx -X

This has much less output than the other frontends, due to its primary purpose being to generate statistics as easily as possible.

Now to make this rather short, I have made a simple script so that you can generate the remaining revisions in a timely fashion for the other revisions:

for i in 33402 33403 33404 33405 33406 33421 33423 33424 33425 33426 33427 33471 33523 33524 33525 33534 33535 33681 33682 33683; do git pull -r $i; aq-autotest quality.aqx -X; done

You might notice that some of the testing fails and bails out. This is normal. It means that revision of the dictionary was buggy and won't compile correctly, so is simply skipped. As it should be.

So, once that finishes running, hurray! We have enough stats to do something fun. What's that, you ask?

Welcome to pretty graphs — aq-htmlgen[edit]

We can generate the preliminary version of some pretty graphs with a very simple command:

aq-htmlgen quality-stats.xml out

This will output a bunch of JS, CSS, and HTML files to the directory specified, which here is `out`. Open `index.html` in `out` and enjoy the graphs! They are a bit limited right now, but can be easily improved with a drop-in replacement of raphael_linegraph.js which is on the list of things to do!

Conclusion[edit]

I hope the walkthrough wasn't too tedious and highlights how using this framework can improve your testing life. Developer documentation will come very soon so if you wish to extend or improve the framework, you will be able to do it with ease.

Documentation about the configuration formats and XML schemas can be found on the wiki at http://wiki.apertium.org/wiki/apertium-quality.