User:Popcorndude/Unit-Testing
Proposed file structure for unit testing framework.
Contents
TODO
- testvoc?
- wiki format? Template:Test, Template:TransferTest, and plain text seem to be the current options (with the last doubtless hiding many complexities)
- Template:TransferTest is only used by eng-kir while Template:Test appears on almost 200 pages
- do we want wiki format for more than just full-pipeline tests?
Unit-Testing Specification
make test
will check for a file named tests/apertium-[name].tests.yaml
which can include other files in the same directory.
At the top level of a test file, include
is a list of included files (paths given relative to the directory of the current file). All other keys are names of tests.
Each test consists of a collection of pairs of strings which can be written as left: right
or placed in a TSV file and referenced with tsv-file: [path]
. Multiple valid outputs can be written as a list
left: - right1 - right2
or by having more than 2 columns in the TSV file.
Each test also specifies what mode to run. This specification can be either lr
(left side is input) or rl
(right side is input). Bidirectional tests may be created by listing both.
[direction]: mode: [mode-name] match: [match-mode] options: [option-string] stream-type: [type]
Options for match-mode
are documented below. The default value if not specified is one-of
.
option-string
will be passed to the apertium
executable. It defaults to -f none
.
Options for stream-type
are documented below. This will usually be inferred by the test runner and so need not be specified.
If no settings other than mode
are specified, this can be abbreviated to [direction]: [mode-name]
.
Matching Modes
mode name | description | reversal |
---|---|---|
exact |
pipeline output and right side must be identical | pipeline output and left side must be identical |
one-of |
pipeline output must appear on the right side | pipeline output of one of the right sides must match the left side |
include |
right side must appear in the pipeline output | |
exclude |
right side must not appear in the pipeline output |
Stream Types
stream type | description |
---|---|
text |
values are compared as strings |
readings |
values with the same surface or source (preceding the first / ) are compared based on remaining readings (unordered)
|
no-src |
like readings , but without surface form
|
anaphora |
like readings , but the last reading is matched separately
|
If the value in the text file contains any character in ^/$
, interpretation will default to readings
. Otherwise text
will be used. LUs not delimited by ^$
will be split on spaces.
*
will match arbitrary readings and ?
will match a single arbitrary reading. These can be escaped with \
.
Under readings
LUs in corresponding positions will be considered non-matching if they have different first segments.
Unit-Testing Example Files
apertium-eng-spa/tests/tests.yaml
include: - other_file_1.yaml - other_file_2.yaml "possession": lr: eng-spa rl: spa-eng "the cat's box": "la caja del gato" "my sister's socks": "los calcetines de me hermana" "noun/verb disam": lr: mode: eng-spa-tagger match: exact "the cat's box": "the/the<det><def><sp> cat's/cat<n><sg>+'s<gen> box/box<n><sg>" "de": lr: mode: spa-eng match: one-of "el gato de mi hermana": - "my sister's cat" - "the cat of my sister"
apertium-eng/tests/tests.yaml
"past tense": lr: mode: eng-morph match: include rl: mode: eng-gener match: exact tsv-file: past-tense-tests.tsv "disam": lr: mode: eng-tagger match: exclude "to be purple": "to/ be/be<vbser><imp> purple/"
apertium-eng/tests/past-tense-tests.tsv
sang sing<vblex><past> jumped jump<vblex><past>
Annotated Unit-Testing Example Files
apertium-eng-spa/tests/tests.yaml
include: # run the tests in these files as well - other_file_1.yaml - other_file_2.yaml "possession": # test named "possession" lr: eng-spa # left side | apertium eng-spa => right side rl: spa-eng # right side | apertium spa-eng => left side "the cat's box": "la caja del gato" "my sister's socks": "los calcetines de me hermana" "noun/verb disam": # test named "noun/verb disam" lr: # only has 1 direction mode: eng-spa-tagger match: exact # output must match what we've written below exactly # since there's only one thing on the right, this is # equivalent to the default behavior, # so we're just being explicit "the cat's box": "^the/the<det><def><sp>$ ^cat's/cat<n><sg>+'s<gen>$ ^box/box<n><sg>$"
apertium-eng/tests/tests.yaml
"past tense": lr: mode: eng-morph match: include # the right side of the test must appear in the output # but the test will still pass if other things appear as well rl: mode: eng-gener match: exact tsv-file: past-tense-tests.tsv # read the test data from a tab-separated list "disam": lr: mode: eng-tagger match: exclude # the output can contain other things, but must not contain # the readings listed "to be purple": "^to/$ ^be/be<vbser><imp>$ ^purple/$" "multiwords": lr: eng-morph # we need ^ and $ here because there are spaces "good morning": "^good morning/good morning<ij>$"
Corpus Regression Testing
The test runner can be run in either static mode (which functions as a test that can pass or fail) or in interactive mode (which updates the data to reflect the state of the translator).
The test runner will by default check for a file named tests/apertium-[name].regressions.yaml
. This file will contain one or more entries of the form
[name]: mode: [mode-name] input: [file-name]
Where name
is the name of this corpus, mode-name
names a pipeline mode (usually abc-xyz
or xyz-abc
), and the value of input:
is a text file where each line contains an input sentence.
The mode will be read from modes.xml
and each step will be named in the same fashion as gendebug="yes"
. That is, using debug-suff
is present, otherwise trying to guess a standard suffix, and finally falling back to NAMEME
. If more than one step has the same debug suffix, they will be numbered sequentially.
For each step, the test runner will check for files named [name].[step-name].expected.txt
and [name].[step-name].gold.txt
in the same directory as the input file.
expected.txt
is assumed to be the output of a previous run and gold.txt
is assumed to be the ideal output. gold.txt
can contain multiple ideal outputs for each line, separated by tabs.
In static mode, if the output of a step does not appear in either expected.txt
or gold.txt
, the test fails.
In dynamic mode, differences between the output and the files will be presented to the user, who will have the option to add the output to either file.