Difference between revisions of "User:Popcorndude/Unit-Testing"

From Apertium
Jump to navigation Jump to search
Line 7: Line 7:
 
At the top level of a test file, <code>include</code> is a list of included files (paths given relative to the directory of the current file). All other keys are names of tests.
 
At the top level of a test file, <code>include</code> is a list of included files (paths given relative to the directory of the current file). All other keys are names of tests.
   
Each test consists of a collection of pairs of strings which can be written as <code>left: right</code> or placed in a TSV file and referenced with <code>tsv-file: [path]</code>.
+
Each test consists of a collection of pairs of strings which can be written as <code>left: right</code> or placed in a TSV file and referenced with <code>tsv-file: [path]</code>. Multiple valid outputs can be written as a list
  +
  +
left:
  +
- right1
  +
- right2
  +
  +
or by having more than 2 columns in the TSV file.
   
 
Each test also specifies what mode to run. This specification can be either <code>lr</code> (left side is input) or <code>rl</code> (right side is input). Bidirectional tests may be created by listing both.
 
Each test also specifies what mode to run. This specification can be either <code>lr</code> (left side is input) or <code>rl</code> (right side is input). Bidirectional tests may be created by listing both.
Line 16: Line 22:
 
options: [option-string]
 
options: [option-string]
   
<code>match-mode</code> can be one of <code>exact</code> (output must be exactly as written in the test), <code>include</code> (the values in the test must be present in the output), or <code>exclude</code> (the values in the test must not be present in the output). This defaults to <code>exact</code> if not specified.
+
<code>match-mode</code> can be one of <code>exact</code> (output must be exactly as written in the test), <code>include</code> (the values in the test must be present in the output), <code>exclude</code> (the values in the test must not be present in the output), or <code>one-of</code> (the output must be among the listed options). If not specified, this defaults to <code>one-of</code>, which is equivalent to <code>exact</code> is the tests are 1:1.
   
 
<code>option-string</code> will be passed to the <code>apertium</code> executable. It defaults to <code>-f none</code>.
 
<code>option-string</code> will be passed to the <code>apertium</code> executable. It defaults to <code>-f none</code>.
Line 31: Line 37:
 
** [[Template:TransferTest]] is only used by eng-kir while [[Template:Test]] appears on almost 200 pages
 
** [[Template:TransferTest]] is only used by eng-kir while [[Template:Test]] appears on almost 200 pages
 
** do we want wiki format for more than just full-pipeline tests?
 
** do we want wiki format for more than just full-pipeline tests?
  +
* can 1:n tests have rl?
* have input map to multiple outputs
 
   
 
== Example Files ==
 
== Example Files ==
Line 48: Line 54:
 
match: exact
 
match: exact
 
"the cat's box": "^the/the<det><def><sp>$ ^cat's/cat<n><sg>+'s<gen>$ ^box/box<n><sg>$"
 
"the cat's box": "^the/the<det><def><sp>$ ^cat's/cat<n><sg>+'s<gen>$ ^box/box<n><sg>$"
  +
"de":
  +
lr:
  +
mode: spa-eng
  +
match: one-of
  +
"el gato de mi hermana":
  +
- "my sister's cat"
  +
- "the cat of my sister"
   
 
<code>apertium-eng/tests/tests.yaml</code>
 
<code>apertium-eng/tests/tests.yaml</code>

Revision as of 15:37, 24 February 2021

Proposed file structure for unit testing framework.

Specification

make test will check for a file named tests/tests.yaml which can include other files in the same directory.

At the top level of a test file, include is a list of included files (paths given relative to the directory of the current file). All other keys are names of tests.

Each test consists of a collection of pairs of strings which can be written as left: right or placed in a TSV file and referenced with tsv-file: [path]. Multiple valid outputs can be written as a list

left:
  - right1
  - right2

or by having more than 2 columns in the TSV file.

Each test also specifies what mode to run. This specification can be either lr (left side is input) or rl (right side is input). Bidirectional tests may be created by listing both.

[direction]:
  mode: [mode-name]
  match: [match-mode]
  options: [option-string]

match-mode can be one of exact (output must be exactly as written in the test), include (the values in the test must be present in the output), exclude (the values in the test must not be present in the output), or one-of (the output must be among the listed options). If not specified, this defaults to one-of, which is equivalent to exact is the tests are 1:1.

option-string will be passed to the apertium executable. It defaults to -f none.

If neither match nor options is specified, this can be abbreviated to [direction]: [mode-name].

TODO

  • how do include and exclude work on things other than possible readings?
  • do include and exclude work on things other than possible readings?
  • testvoc?
  • regressions on a corpus
  • wiki format? Template:Test, Template:TransferTest, and plain text seem to be the current options (with the last doubtless hiding many complexities)
  • can 1:n tests have rl?

Example Files

apertium-eng-spa/tests/tests.yaml

include:
  - other_file_1.yaml
  - other_file_2.yaml
"possession":
  lr: eng-spa
  rl: spa-eng
  "the cat's box": "la caja del gato"
  "my sister's socks": "los calcetines de me hermana"
"noun/verb disam":
  lr:
    mode: eng-spa-tagger
    match: exact
  "the cat's box": "^the/the<det><def><sp>$ ^cat's/cat<n><sg>+'s<gen>$ ^box/box<n><sg>$"
"de":
  lr:
    mode: spa-eng
    match: one-of
  "el gato de mi hermana":
    - "my sister's cat"
    - "the cat of my sister"

apertium-eng/tests/tests.yaml

"past tense":
  lr:
    mode: eng-morph
    match: include
  rl:
    mode: eng-gener
    match: exact
  tsv-file: past-tense-tests.tsv
"disam":
  lr:
    mode: eng-tagger
    match: exclude
  "to be purple": "^to/$ ^be/be<vbser><imp>$ ^purple/$"

apertium-eng/tests/past-tense-tests.tsv

sang	^sing<vblex><past>$
jumped	^jump<vblex><past>$

Annotated Example Files

apertium-eng-spa/tests/tests.yaml

include:     # run the tests in these files as well
  - other_file_1.yaml
  - other_file_2.yaml
"possession":       # test named "possession"
  lr: eng-spa       # left side  | apertium eng-spa => right side
  rl: spa-eng       # right side | apertium spa-eng => left side
  "the cat's box": "la caja del gato"
  "my sister's socks": "los calcetines de me hermana"
"noun/verb disam":  # test named "noun/verb disam"
  lr:               # only has 1 direction
    mode: eng-spa-tagger
    match: exact    # output must match what we've written below exactly
                    # this is the default, but we're being explicit
  "the cat's box": "^the/the<det><def><sp>$ ^cat's/cat<n><sg>+'s<gen>$ ^box/box<n><sg>$"

apertium-eng/tests/tests.yaml

"past tense":
  lr:
    mode: eng-morph
    match: include   # the right side of the test must appear in the output
                     # but the test will still pass if other things appear as well
  rl:
    mode: eng-gener
    match: exact
  tsv-file: past-tense-tests.tsv  # read the test data from a tab-separated list
"disam":
  lr:
    mode: eng-tagger
    match: exclude   # the output can contain other things, but must not contain
                     # the readings listed
  "to be purple": "^to/$ ^be/be<vbser><imp>$ ^purple/$"