Difference between revisions of "User:Popcorndude/Unit-Testing"

From Apertium
Jump to navigation Jump to search
(todo list)
Line 31: Line 31:
 
** [[Template:TransferTest]] is only used by eng-kir while [[Template:Test]] appears on almost 200 pages
 
** [[Template:TransferTest]] is only used by eng-kir while [[Template:Test]] appears on almost 200 pages
 
** do we want wiki format for more than just full-pipeline tests?
 
** do we want wiki format for more than just full-pipeline tests?
  +
* have input map to multiple outputs
   
 
== Example Files ==
 
== Example Files ==

Revision as of 15:23, 24 February 2021

Proposed file structure for unit testing framework.

Specification

make test will check for a file named tests/tests.yaml which can include other files in the same directory.

At the top level of a test file, include is a list of included files (paths given relative to the directory of the current file). All other keys are names of tests.

Each test consists of a collection of pairs of strings which can be written as left: right or placed in a TSV file and referenced with tsv-file: [path].

Each test also specifies what mode to run. This specification can be either lr (left side is input) or rl (right side is input). Bidirectional tests may be created by listing both.

[direction]:
  mode: [mode-name]
  match: [match-mode]
  options: [option-string]

match-mode can be one of exact (output must be exactly as written in the test), include (the values in the test must be present in the output), or exclude (the values in the test must not be present in the output). This defaults to exact if not specified.

option-string will be passed to the apertium executable. It defaults to -f none.

If neither match nor options is specified, this can be abbreviated to [direction]: [mode-name].

TODO

  • how do include and exclude work on things other than possible readings?
  • do include and exclude work on things other than possible readings?
  • testvoc?
  • regressions on a corpus
  • wiki format? Template:Test, Template:TransferTest, and plain text seem to be the current options (with the last doubtless hiding many complexities)
  • have input map to multiple outputs

Example Files

apertium-eng-spa/tests/tests.yaml

include:
  - other_file_1.yaml
  - other_file_2.yaml
"possession":
  lr: eng-spa
  rl: spa-eng
  "the cat's box": "la caja del gato"
  "my sister's socks": "los calcetines de me hermana"
"noun/verb disam":
  lr:
    mode: eng-spa-tagger
    match: exact
  "the cat's box": "^the/the<det><def><sp>$ ^cat's/cat<n><sg>+'s<gen>$ ^box/box<n><sg>$"

apertium-eng/tests/tests.yaml

"past tense":
  lr:
    mode: eng-morph
    match: include
  rl:
    mode: eng-gener
    match: exact
  tsv-file: past-tense-tests.tsv
"disam":
  lr:
    mode: eng-tagger
    match: exclude
  "to be purple": "^to/$ ^be/be<vbser><imp>$ ^purple/$"

apertium-eng/tests/past-tense-tests.tsv

sang	^sing<vblex><past>$
jumped	^jump<vblex><past>$

Annotated Example Files

apertium-eng-spa/tests/tests.yaml

include:     # run the tests in these files as well
  - other_file_1.yaml
  - other_file_2.yaml
"possession":       # test named "possession"
  lr: eng-spa       # left side  | apertium eng-spa => right side
  rl: spa-eng       # right side | apertium spa-eng => left side
  "the cat's box": "la caja del gato"
  "my sister's socks": "los calcetines de me hermana"
"noun/verb disam":  # test named "noun/verb disam"
  lr:               # only has 1 direction
    mode: eng-spa-tagger
    match: exact    # output must match what we've written below exactly
                    # this is the default, but we're being explicit
  "the cat's box": "^the/the<det><def><sp>$ ^cat's/cat<n><sg>+'s<gen>$ ^box/box<n><sg>$"

apertium-eng/tests/tests.yaml

"past tense":
  lr:
    mode: eng-morph
    match: include   # the right side of the test must appear in the output
                     # but the test will still pass if other things appear as well
  rl:
    mode: eng-gener
    match: exact
  tsv-file: past-tense-tests.tsv  # read the test data from a tab-separated list
"disam":
  lr:
    mode: eng-tagger
    match: exclude   # the output can contain other things, but must not contain
                     # the readings listed
  "to be purple": "^to/$ ^be/be<vbser><imp>$ ^purple/$"