Difference between revisions of "User:Popcorndude/Unit-Testing"
Popcorndude (talk | contribs) (todo list) |
Popcorndude (talk | contribs) (→TODO) |
||
Line 31: | Line 31: | ||
** [[Template:TransferTest]] is only used by eng-kir while [[Template:Test]] appears on almost 200 pages |
** [[Template:TransferTest]] is only used by eng-kir while [[Template:Test]] appears on almost 200 pages |
||
** do we want wiki format for more than just full-pipeline tests? |
** do we want wiki format for more than just full-pipeline tests? |
||
* have input map to multiple outputs |
|||
== Example Files == |
== Example Files == |
Revision as of 15:23, 24 February 2021
Proposed file structure for unit testing framework.
Specification
make test
will check for a file named tests/tests.yaml
which can include other files in the same directory.
At the top level of a test file, include
is a list of included files (paths given relative to the directory of the current file). All other keys are names of tests.
Each test consists of a collection of pairs of strings which can be written as left: right
or placed in a TSV file and referenced with tsv-file: [path]
.
Each test also specifies what mode to run. This specification can be either lr
(left side is input) or rl
(right side is input). Bidirectional tests may be created by listing both.
[direction]: mode: [mode-name] match: [match-mode] options: [option-string]
match-mode
can be one of exact
(output must be exactly as written in the test), include
(the values in the test must be present in the output), or exclude
(the values in the test must not be present in the output). This defaults to exact
if not specified.
option-string
will be passed to the apertium
executable. It defaults to -f none
.
If neither match
nor options
is specified, this can be abbreviated to [direction]: [mode-name]
.
TODO
- how do
include
andexclude
work on things other than possible readings? - do
include
andexclude
work on things other than possible readings? - testvoc?
- regressions on a corpus
- wiki format? Template:Test, Template:TransferTest, and plain text seem to be the current options (with the last doubtless hiding many complexities)
- Template:TransferTest is only used by eng-kir while Template:Test appears on almost 200 pages
- do we want wiki format for more than just full-pipeline tests?
- have input map to multiple outputs
Example Files
apertium-eng-spa/tests/tests.yaml
include: - other_file_1.yaml - other_file_2.yaml "possession": lr: eng-spa rl: spa-eng "the cat's box": "la caja del gato" "my sister's socks": "los calcetines de me hermana" "noun/verb disam": lr: mode: eng-spa-tagger match: exact "the cat's box": "^the/the<det><def><sp>$ ^cat's/cat<n><sg>+'s<gen>$ ^box/box<n><sg>$"
apertium-eng/tests/tests.yaml
"past tense": lr: mode: eng-morph match: include rl: mode: eng-gener match: exact tsv-file: past-tense-tests.tsv "disam": lr: mode: eng-tagger match: exclude "to be purple": "^to/$ ^be/be<vbser><imp>$ ^purple/$"
apertium-eng/tests/past-tense-tests.tsv
sang ^sing<vblex><past>$ jumped ^jump<vblex><past>$
Annotated Example Files
apertium-eng-spa/tests/tests.yaml
include: # run the tests in these files as well - other_file_1.yaml - other_file_2.yaml "possession": # test named "possession" lr: eng-spa # left side | apertium eng-spa => right side rl: spa-eng # right side | apertium spa-eng => left side "the cat's box": "la caja del gato" "my sister's socks": "los calcetines de me hermana" "noun/verb disam": # test named "noun/verb disam" lr: # only has 1 direction mode: eng-spa-tagger match: exact # output must match what we've written below exactly # this is the default, but we're being explicit "the cat's box": "^the/the<det><def><sp>$ ^cat's/cat<n><sg>+'s<gen>$ ^box/box<n><sg>$"
apertium-eng/tests/tests.yaml
"past tense": lr: mode: eng-morph match: include # the right side of the test must appear in the output # but the test will still pass if other things appear as well rl: mode: eng-gener match: exact tsv-file: past-tense-tests.tsv # read the test data from a tab-separated list "disam": lr: mode: eng-tagger match: exclude # the output can contain other things, but must not contain # the readings listed "to be purple": "^to/$ ^be/be<vbser><imp>$ ^purple/$"