Difference between revisions of "User:Popcorndude/Unit-Testing"

From Apertium
Jump to navigation Jump to search
 
(6 intermediate revisions by the same user not shown)
Line 1: Line 1:
Proposed file structure for unit testing framework.
Proposed file structure for unit testing framework.


== Specification ==
== Unit-Testing Specification ==


<code>make test</code> will check for a file named <code>tests/tests.yaml</code> which can include other files in the same directory.
<code>make test</code> will check for a file named <code>tests/tests.yaml</code> which can include other files in the same directory.
Line 21: Line 21:
match: [match-mode]
match: [match-mode]
options: [option-string]
options: [option-string]
stream-type: [type]


Options for <code>match-mode</code> are documented below. The default value if not specified is <code>one-of</code>.
<code>match-mode</code> can be one of <code>exact</code> (output must be exactly as written in the test), <code>include</code> (the values in the test must be present in the output), <code>exclude</code> (the values in the test must not be present in the output), or <code>one-of</code> (the output must be among the listed options). If not specified, this defaults to <code>one-of</code>, which is equivalent to <code>exact</code> is the tests are 1:1.


<code>option-string</code> will be passed to the <code>apertium</code> executable. It defaults to <code>-f none</code>.
<code>option-string</code> will be passed to the <code>apertium</code> executable. It defaults to <code>-f none</code>.


Options for <code>stream-type</code> are documented below. This will usually be inferred by the test runner and so need not be specified.
If neither <code>match</code> nor <code>options</code> is specified, this can be abbreviated to <code>[direction]: [mode-name]</code>.


If no settings other than <code>mode</code> are specified, this can be abbreviated to <code>[direction]: [mode-name]</code>.
== TODO ==


=== Matching Modes ===
* how do <code>include</code> and <code>exclude</code> work on things other than possible readings?
* do <code>include</code> and <code>exclude</code> work on things other than possible readings?
* testvoc?
* regressions on a corpus
* wiki format? [[Template:Test]], [[Template:TransferTest]], and plain text seem to be the current options (with the last doubtless hiding many complexities)
** [[Template:TransferTest]] is only used by eng-kir while [[Template:Test]] appears on almost 200 pages
** do we want wiki format for more than just full-pipeline tests?
* can 1:n tests have rl?


{| class="wikitable"
== Example Files ==
! mode name !! description !! reversal
|-
| <code>exact</code> || pipeline output and right side must be identical || pipeline output and left side must be identical
|-
| <code>one-of</code> || pipeline output must appear on the right side || pipeline output of one of the right sides must match the left side
|-
| <code>include</code> || right side must appear in the pipeline output ||
|-
| <code>exclude</code> || right side must not appear in the pipeline output ||
|}

=== Stream Types ===

{| class="wikitable"
! stream type !! description
|-
| <code>text</code> || values are compared as strings
|-
| <code>readings</code> || values with the same surface or source (preceding the first <code>/</code>) are compared based on remaining readings (unordered)
|-
| <code>no-src</code> || like <code>readings</code>, but without surface form
|-
| <code>anaphora</code> || like <code>readings</code>, but the last reading is matched separately
|}

If the value in the text file contains any character in <code>^/$</code>, interpretation will default to <code>readings</code>. Otherwise <code>text</code> will be used. LUs not delimited by <code>^$</code> will be split on spaces.

<code>*</code> will match arbitrary readings and <code>?</code> will match a single arbitrary reading. These can be escaped with <code>\</code>.

Under <code>readings</code> LUs in corresponding positions will be considered non-matching if they have different first segments.

== Unit-Testing Example Files ==
<code>apertium-eng-spa/tests/tests.yaml</code>
<code>apertium-eng-spa/tests/tests.yaml</code>
include:
include:
Line 53: Line 79:
mode: eng-spa-tagger
mode: eng-spa-tagger
match: exact
match: exact
"the cat's box": "^the/the<det><def><sp>$ ^cat's/cat<n><sg>+'s<gen>$ ^box/box<n><sg>$"
"the cat's box": "the/the<det><def><sp> cat's/cat<n><sg>+'s<gen> box/box<n><sg>"
"de":
"de":
lr:
lr:
Line 75: Line 101:
mode: eng-tagger
mode: eng-tagger
match: exclude
match: exclude
"to be purple": "^to/$ ^be/be<vbser><imp>$ ^purple/$"
"to be purple": "to/ be/be<vbser><imp> purple/"


<code>apertium-eng/tests/past-tense-tests.tsv</code>
<code>apertium-eng/tests/past-tense-tests.tsv</code>
sang ^sing<vblex><past>$
sang sing<vblex><past>
jumped ^jump<vblex><past>$
jumped jump<vblex><past>


== Annotated Example Files ==
== Annotated Unit-Testing Example Files ==
<code>apertium-eng-spa/tests/tests.yaml</code>
<code>apertium-eng-spa/tests/tests.yaml</code>
include: # run the tests in these files as well
include: # run the tests in these files as well
Line 95: Line 121:
mode: eng-spa-tagger
mode: eng-spa-tagger
match: exact # output must match what we've written below exactly
match: exact # output must match what we've written below exactly
# this is the default, but we're being explicit
# since there's only one thing on the right, this is
# equivalent to the default behavior,
# so we're just being explicit
"the cat's box": "^the/the<det><def><sp>$ ^cat's/cat<n><sg>+'s<gen>$ ^box/box<n><sg>$"
"the cat's box": "^the/the<det><def><sp>$ ^cat's/cat<n><sg>+'s<gen>$ ^box/box<n><sg>$"


Line 114: Line 142:
# the readings listed
# the readings listed
"to be purple": "^to/$ ^be/be<vbser><imp>$ ^purple/$"
"to be purple": "^to/$ ^be/be<vbser><imp>$ ^purple/$"
"multiwords":
lr: eng-morph
# we need ^ and $ here because there are spaces
"good morning": "^good morning/good morning<ij>$"

== Corpus Regression Testing ==

The test runner can be run in either static mode (which functions as a test that can pass or fail) or in interactive mode (which updates the data to reflect the state of the translator).

The test runner will by default check for a file named <code>tests/regressions.yaml</code>. This file will contain one or more entries of the form

[name]:
mode: [mode-name]
input: [file-name]

Where <code>name</code> is the name of this corpus, <code>mode-name</code> names a pipeline mode (usually <code>abc-xyz</code> or <code>xyz-abc</code>), and the value of <code>input:</code> is a text file where each line contains an input sentence.

The mode will be read from <code>modes.xml</code> and each step will be named in the same fashion as <code>gendebug="yes"</code>. That is, using <code>debug-suff</code> is present, otherwise trying to guess a standard suffix, and finally falling back to <code>NAMEME</code>. If more than one step has the same debug suffix, they will be numbered sequentially.

For each step, the test runner will check for files named <code>[name].[step-name].expected.txt</code> and <code>[name].[step-name].gold.txt</code> in the same directory as the input file.

<code>expected.txt</code> is assumed to be the output of a previous run and <code>gold.txt</code> is assumed to be the ideal output. <code>gold.txt</code> can contain multiple ideal outputs for each line, separated by tabs.

In static mode, if the output of a step does not appear in either <code>expected.txt</code> or <code>gold.txt</code>, the test fails.

In dynamic mode, differences between the output and the files will be presented to the user, who will have the option to add the output to either file.

See https://github.com/TinoDidriksen/regtest/wiki for images of what the workflow in dynamic mode might look like. A command line interface may also be available.

== Repository Structure ==

If the test runner does not find a directory named <code>tests</code> containing the expected YAML files, it will guess that the tests live in <code>tests-[name]</code> (for example <code>tests-eng</code>) and offer to clone that repository if it exists.

== Scraping Tests ==

Standard scripts will be available to construct test files from wiki pages and comments in source files.

Existing wiki tests will be reformatted to use the [[Template:TransferTest]] template (since [[Template:Test]] doesn't specify the target language and I plan to double check all the formatting anyway to catch formatting errors or plaintext tests).

{{TransferTest|abc|xyz|sentence 1|sentence 2}}

<nowiki>{{TransferTest|abc|xyz|sentence 1|sentence 2}}</nowiki>

Will be converted to

"[nearest header]":
lr: abc-xyz
"sentence 1": "sentence 2"

Templates for other sorts of tests are discouraged, but can be added if desired.

The equivalent source file comment to the above will be

# TEST abc-xyz: sentence 1 => sentence 2

Or, in XML:

<nowiki><!-- TEST abc-xyz: sentence 1 => sentence 2
--></nowiki>

== Testvoc ==

This project could probably be expanded to include trimmed testvoc, though I currently don't understand the existing code well enough to propose anything concrete.

== Conversion Process ==

Any existing tests which have text as input will be converted to regression tests and their expected outputs will be included as gold outputs and the <code>expected.txt</code> files will be filled in with the current output of the pipeline.

Any tests which take stream format as input will be converted to unit tests.

Latest revision as of 16:40, 2 March 2021

Proposed file structure for unit testing framework.

Unit-Testing Specification[edit]

make test will check for a file named tests/tests.yaml which can include other files in the same directory.

At the top level of a test file, include is a list of included files (paths given relative to the directory of the current file). All other keys are names of tests.

Each test consists of a collection of pairs of strings which can be written as left: right or placed in a TSV file and referenced with tsv-file: [path]. Multiple valid outputs can be written as a list

left:
  - right1
  - right2

or by having more than 2 columns in the TSV file.

Each test also specifies what mode to run. This specification can be either lr (left side is input) or rl (right side is input). Bidirectional tests may be created by listing both.

[direction]:
  mode: [mode-name]
  match: [match-mode]
  options: [option-string]
  stream-type: [type]

Options for match-mode are documented below. The default value if not specified is one-of.

option-string will be passed to the apertium executable. It defaults to -f none.

Options for stream-type are documented below. This will usually be inferred by the test runner and so need not be specified.

If no settings other than mode are specified, this can be abbreviated to [direction]: [mode-name].

Matching Modes[edit]

mode name description reversal
exact pipeline output and right side must be identical pipeline output and left side must be identical
one-of pipeline output must appear on the right side pipeline output of one of the right sides must match the left side
include right side must appear in the pipeline output
exclude right side must not appear in the pipeline output

Stream Types[edit]

stream type description
text values are compared as strings
readings values with the same surface or source (preceding the first /) are compared based on remaining readings (unordered)
no-src like readings, but without surface form
anaphora like readings, but the last reading is matched separately

If the value in the text file contains any character in ^/$, interpretation will default to readings. Otherwise text will be used. LUs not delimited by ^$ will be split on spaces.

* will match arbitrary readings and ? will match a single arbitrary reading. These can be escaped with \.

Under readings LUs in corresponding positions will be considered non-matching if they have different first segments.

Unit-Testing Example Files[edit]

apertium-eng-spa/tests/tests.yaml

include:
  - other_file_1.yaml
  - other_file_2.yaml
"possession":
  lr: eng-spa
  rl: spa-eng
  "the cat's box": "la caja del gato"
  "my sister's socks": "los calcetines de me hermana"
"noun/verb disam":
  lr:
    mode: eng-spa-tagger
    match: exact
  "the cat's box": "the/the<det><def><sp> cat's/cat<n><sg>+'s<gen> box/box<n><sg>"
"de":
  lr:
    mode: spa-eng
    match: one-of
  "el gato de mi hermana":
    - "my sister's cat"
    - "the cat of my sister"

apertium-eng/tests/tests.yaml

"past tense":
  lr:
    mode: eng-morph
    match: include
  rl:
    mode: eng-gener
    match: exact
  tsv-file: past-tense-tests.tsv
"disam":
  lr:
    mode: eng-tagger
    match: exclude
  "to be purple": "to/ be/be<vbser><imp> purple/"

apertium-eng/tests/past-tense-tests.tsv

sang	sing<vblex><past>
jumped	jump<vblex><past>

Annotated Unit-Testing Example Files[edit]

apertium-eng-spa/tests/tests.yaml

include:     # run the tests in these files as well
  - other_file_1.yaml
  - other_file_2.yaml
"possession":       # test named "possession"
  lr: eng-spa       # left side  | apertium eng-spa => right side
  rl: spa-eng       # right side | apertium spa-eng => left side
  "the cat's box": "la caja del gato"
  "my sister's socks": "los calcetines de me hermana"
"noun/verb disam":  # test named "noun/verb disam"
  lr:               # only has 1 direction
    mode: eng-spa-tagger
    match: exact    # output must match what we've written below exactly
                    # since there's only one thing on the right, this is
                    # equivalent to the default behavior,
                    # so we're just being explicit
  "the cat's box": "^the/the<det><def><sp>$ ^cat's/cat<n><sg>+'s<gen>$ ^box/box<n><sg>$"

apertium-eng/tests/tests.yaml

"past tense":
  lr:
    mode: eng-morph
    match: include   # the right side of the test must appear in the output
                     # but the test will still pass if other things appear as well
  rl:
    mode: eng-gener
    match: exact
  tsv-file: past-tense-tests.tsv  # read the test data from a tab-separated list
"disam":
  lr:
    mode: eng-tagger
    match: exclude   # the output can contain other things, but must not contain
                     # the readings listed
  "to be purple": "^to/$ ^be/be<vbser><imp>$ ^purple/$"
"multiwords":
  lr: eng-morph
                     # we need ^ and $ here because there are spaces
  "good morning": "^good morning/good morning<ij>$"

Corpus Regression Testing[edit]

The test runner can be run in either static mode (which functions as a test that can pass or fail) or in interactive mode (which updates the data to reflect the state of the translator).

The test runner will by default check for a file named tests/regressions.yaml. This file will contain one or more entries of the form

[name]:
  mode: [mode-name]
  input: [file-name]

Where name is the name of this corpus, mode-name names a pipeline mode (usually abc-xyz or xyz-abc), and the value of input: is a text file where each line contains an input sentence.

The mode will be read from modes.xml and each step will be named in the same fashion as gendebug="yes". That is, using debug-suff is present, otherwise trying to guess a standard suffix, and finally falling back to NAMEME. If more than one step has the same debug suffix, they will be numbered sequentially.

For each step, the test runner will check for files named [name].[step-name].expected.txt and [name].[step-name].gold.txt in the same directory as the input file.

expected.txt is assumed to be the output of a previous run and gold.txt is assumed to be the ideal output. gold.txt can contain multiple ideal outputs for each line, separated by tabs.

In static mode, if the output of a step does not appear in either expected.txt or gold.txt, the test fails.

In dynamic mode, differences between the output and the files will be presented to the user, who will have the option to add the output to either file.

See https://github.com/TinoDidriksen/regtest/wiki for images of what the workflow in dynamic mode might look like. A command line interface may also be available.

Repository Structure[edit]

If the test runner does not find a directory named tests containing the expected YAML files, it will guess that the tests live in tests-[name] (for example tests-eng) and offer to clone that repository if it exists.

Scraping Tests[edit]

Standard scripts will be available to construct test files from wiki pages and comments in source files.

Existing wiki tests will be reformatted to use the Template:TransferTest template (since Template:Test doesn't specify the target language and I plan to double check all the formatting anyway to catch formatting errors or plaintext tests).

sentence 1(abc) → sentence 2(xyz)

{{TransferTest|abc|xyz|sentence 1|sentence 2}}

Will be converted to

"[nearest header]":
  lr: abc-xyz
  "sentence 1": "sentence 2"

Templates for other sorts of tests are discouraged, but can be added if desired.

The equivalent source file comment to the above will be

# TEST abc-xyz: sentence 1 => sentence 2

Or, in XML:

<!-- TEST abc-xyz: sentence 1 => sentence 2
 -->

Testvoc[edit]

This project could probably be expanded to include trimmed testvoc, though I currently don't understand the existing code well enough to propose anything concrete.

Conversion Process[edit]

Any existing tests which have text as input will be converted to regression tests and their expected outputs will be included as gold outputs and the expected.txt files will be filled in with the current output of the pipeline.

Any tests which take stream format as input will be converted to unit tests.