User:Ilnar.salimzyan/On testing
Jump to navigation
Jump to search
Overview
Test-driven language pair development or on testing strategy in Apertium ======================================================================== Some terminology ---------------- # Acceptance tests define when a language pair or a particular sub-module (like # morphological transducer or CG) are done. That is, they define the # requirements for what you are going to develop and are written in the process # of communicating with "stakeholders"/mentors or anyone funding the # development. Therefore it's good if anything from this category is wiki-based/ # easy to edit. # Unit tests are written by programmers for programmers. They describe how the # system works and what the structure and behavior of the code is. # Integration tests, as the name suggests, test whether components (in our case, # these are modules like morphological transducers, disambiguators, lexical # selection rules and transfer rules) are successfully integrated into a system # (=language pair). In case of a language pair, you can think of the acceptance # tests for that language pair as integration tests, since they test how modules # of the language pair integrate into a complete machine translation system. Overview -------- Testing an Apertium MT system / \ / \ / \ Acceptance Unit tests tests | / testing the output / of each module / | * Regression-Pending tests Morphological--Acceptance: * recall or coverage on the wiki transducers * precision * Corpus test | \ * # of stems * upper bound for WER | --Unit: * morphophonology * upper bounds for [*@#] | * morphotactics errors ConstraintGr----Acceptance: * ambig. rate before&after * Testvoc (has to be clean) | \ * precision | \ * # of rules + | ----Unit: * INPUT/OUTPUT comments for each | rule numbers: Lexical-----Acceptance: * ambig. rate before&after * of stems in the bidix selection * precision * of lrx rules <-> ambiguity | \ * # of rules rate before and after | ----Unit: * INPUT/OUTPUT comments for each * transfer rules | rule Transfer------Acceptance: * wiki tests for phrases and + \ sentences \ * "testvoc-lite" tests for (gisting evaluation) \ single-words ----Unit: * INPUT/OUTPUT comments in the headers of rules
Cheat sheet for availabe testing commands
In apertium-tat-rus (monodirectional pair)
Command | When |
---|---|
./qa t1x | apertium-tat-rus.tat-rus.t1x changed |
./qa t2x | apertium-tat-rus.tat-rus.t2x changed |
./qa t3x | apertium-tat-rus.tat-rus.t3x changed |
./qa t4x | apertium-tat-rus.tat-rus.t4x changed |
./qa (or make unit_tests) | Before commiting |
./qa corp | Corpus test ('./qa' will do this) |
./wiki-tests.sh Regression tat rus [update] | 'update' if Tatar and Russian/Regression tests page changed
|
./qa testvoc reg | currently local tests in testvoc/lite/regression.txt ('./qa' will do this) |
In apertium-kaz-tat (bidirectional pair)
Command | When |
---|---|
./qa kaz-tat-t1x | apertium-kaz-tat.kaz-tat.t1x changed |
./qa | Before commiting |
./qa kaz-tat-corp | Corpus test in kaz>tat direction ('./qa' will do this) |
./qa tat-kaz-corp | Corpus test in tat>kaz direction ('./qa' will do this) |
./wiki-tests.sh Regression tat kaz [update] | 'update' if Kazakh and Tatar/Regression tests page changed
|
In apertium-kaz
Command | When |
---|---|
./wiki-tests Regression kaz kaz [update] | Before committing
|
In other words, './qa' is supposed to run essential tests at each commit/build time. Currently it runs lightened up versions of the acceptance tests for the machine translator, but probably should run all the unit tests (./qa t1x etc) as well (they run fast).
My plan is to have './qa full' test to run all tests (including testvoc-lite) once in a while.
Wishlist
- make './qa' return 0 or 1 depending on whether all tests pass or not
- automate yaml tests for the morphological analyzer
- new trmorph like options for yaml tests
- move tests from apertium-tat-rus/testvoc/lite/regression.txt onto the wiki as well (another template(s) will be needed for that)
- consider moving yaml tests for the morphological analyzer onto the wiki (as a way of allowing native speakers to specify the paradigms in their language)
- like seen on Apertium-kaz/Regression tests, but we will need to specify that we want to test the output of the morphological anayzer, not of the morphological analyzer and tagger. Something like this: {{test|morf|Жапония|^Жапония<np><top><nom>$}} vs {{test|tagger|Жапония ұлттық футбол құрамасы.|^Жапония<np><top><nom>$ ^ұлттық<adj>$ ^футбол<n><attr>$ ^құрама<n><px3sp><nom>$^..<sent>$}}
- 'make unit_tests', 'make acceptance_tests' (and possibly 'make slow_acceptance_tests' targets in apertium-tat-bak's makefile
Other ideas
What to test | How to test |
---|---|
apertium-kaz-tat.kaz-tat.dix (or, rather, lexical content of all dictonaries) |
look up lemmas in the "pan-turkic" dictionary
|
<selimcan> I was thinking of at least keeping a list of lexicons which stems can continue with (i.e. directly) separate in lexc <selimcan> giving examples for each <selimcan> I mean, right before the stems section, a litst of N1, N2, N-RUS, V-TV, etc <selimcan> *list <selimcan> with short comment and examples for each <selimcan> Lexicon : Description : Example <selimcan> N1 : commoun nouns : бақша <selimcan> N5 : nouns loaned from Russian (often don't obey the syngarmonism laws, that's why should be kept separate) : актив <selimcan> N-COMPUND-PX : compound nouns with 3p possessive at the last noun <selimcan> firespeaker, you know, like we do for adjectives, but for only lexicons we have <selimcan> err, "for all lexicons" I mean <selimcan> That kind of comments for all lexicons (stems can link to) we have, and in one place, so that whoever adding stems to the lexicon doesn't have to look at the entire morphology description in lexc <selimcan> Plus a full paradigm of one example word linking to that lexicon in apertium-foo/tests/morphotactics or somewhere else (useful for testvoc and potentially for automatically guessing the paradigm) Or even a small decision tree. E.g. for tat.lexc: (pos? (Verb (Transitive? (yes (stem? (infinitive?))) (no (stem? infinitive?))))) (Adjective Comparison levels? ...)