Difference between revisions of "User:Ilnar.salimzyan/On testing"

From Apertium
Jump to navigation Jump to search
m
Line 58: Line 58:


<pre>
<pre>
<selimcan> I was thinking of at least keeping a list of lexicons which stems can continue with (i.e. directly) separate in lexc
<selimcan> I was thinking of at least keeping a list of lexicons which stems can continue with
(i.e. directly) separate in lexc
<selimcan> giving examples for each
<selimcan> giving examples for each
<selimcan> I mean, right before the stems section, a litst of N1, N2, N-RUS, V-TV, etc
<selimcan> I mean, right before the stems section, a litst of N1, N2, N-RUS, V-TV, etc
Line 65: Line 66:
<selimcan> Lexicon : Description : Example
<selimcan> Lexicon : Description : Example
<selimcan> N1 : commoun nouns : бақша
<selimcan> N1 : commoun nouns : бақша
<selimcan> N5 : nouns loaned from Russian (often don't obey the syngarmonism laws, that's why should be kept separate) : актив
<selimcan> N5 : nouns loaned from Russian (often don't obey the syngarmonism laws, that's why
should be kept separate) : актив
<selimcan> N-COMPUND-PX : compound nouns with 3p possessive at the last noun
<selimcan> N-COMPUND-PX : compound nouns with 3p possessive at the last noun
<selimcan> firespeaker, you know, like we do for adjectives, but for only lexicons we have
<selimcan> firespeaker, you know, like we do for adjectives, but for only lexicons we have
<selimcan> err, "for all lexicons" I mean
<selimcan> err, "for all lexicons" I mean
<selimcan> That kind of comments for all lexicons (stems can link to) we have, and in one place, so that whoever adding stems to the lexicon doesn't have to look at the entire morphology description in lexc
<selimcan> That kind of comments for all lexicons (stems can link to) we have, and in one place,
so that whoever adding stems to the lexicon doesn't have to look at the entire morphology
description in lexc
<selimcan> :(
<selimcan> :(
<selimcan> Plus a full paradigm of one example linking to that lexicon in apertium-foo/tests/morphotactics or somewhere else</pre>
<selimcan> Plus a full paradigm of one example linking to that lexicon in apertium-foo/tests/morphotactics or somewhere else</pre>

Revision as of 04:19, 8 March 2015

Test-driven language pair development or on testing strategy in Apertium
========================================================================

Some terminology
----------------

# Acceptance tests define when a language pair or a particular sub-module (like
# morphological transducer or CG) are done. That is, they define the
# requirements for what you are going to develop and are written in the process
# of communicating with "stakeholders"/mentors or anyone funding the
# development.

# Unit tests are written by programmers for programmers. They describe how the
# system works and what the structure and behavior of the code is.

# Integration tests, as the name suggests, test whether components (in our case,
# these are modules like morphological transducers, disambiguators, lexical
# selection rules and transfer rules) are successfully integrated into a system
# (=language pair). In case of a language pair, you can think of the acceptance
# tests for that language pair as integration tests, since they test how modules
# of the language pair integrate into a complete machine translation system.

Overview
--------

                    Testing an Apertium MT system
                            /         \
                           /           \
                          /             \
                     Acceptance       Unit tests
                       tests               |
                       /             testing the output
                      /              of each module
                     /                     |
      * Regression-Pending tests      Morphological--Acceptance: * recall or coverage            
              on the wiki              transducers               * precision
      * Corpus test                        |      \              * # of stems
         * upper bound for WER             |       --Unit: * morphophonology
         * upper bounds for [*@#]          |               * morphotactics
                 errors              ConstraintGr----Acceptance: * ambig. rate before&after 
      * Testvoc (has to be clean)          |   \                 * precision
                                           |    \                * # of rules
                  +                        |     ----Unit: * INPUT/OUTPUT comments for each
                                           |                      rule
               numbers:                  Lexical-----Acceptance: * ambig. rate before&after 
      * of stems in the bidix           selection                * precision
      * of lrx rules <-> ambiguity         |    \               * # of rules
         rate before and after             |     ----Unit: * INPUT/OUTPUT comments for each
      * transfer rules                     |                      rule
                                       Transfer------Acceptance: * wiki tests for phrases and
                 +                            \                       sentences
                                               \                  * "testvoc-lite" tests for           
        (gisting evaluation)                    \                      single-words
                                                 ----Unit: * INPUT/OUTPUT comments in the headers
                                                                   of rules
<selimcan> I was thinking of at least keeping a list of lexicons which stems can continue with
           (i.e. directly) separate in lexc
<selimcan> giving examples for each
<selimcan> I mean, right before the stems section, a litst of N1, N2, N-RUS, V-TV, etc
<selimcan> *list
<selimcan> with short comment and examples for each
<selimcan> Lexicon : Description : Example 
<selimcan>  N1 :  commoun nouns : бақша
<selimcan>  N5 : nouns loaned from Russian (often don't obey the syngarmonism laws, that's why
            should be kept separate) : актив 
<selimcan>  N-COMPUND-PX : compound nouns with 3p possessive at the last noun
<selimcan> firespeaker, you know, like we do for adjectives, but for only lexicons we have
<selimcan> err, "for all lexicons" I mean
<selimcan> That kind of comments for all lexicons (stems can link to) we have, and in one place,
           so that whoever adding stems to the lexicon doesn't have to look at the entire morphology
           description in lexc
<selimcan> :(
<selimcan> Plus a full paradigm of one example linking to that lexicon in apertium-foo/tests/morphotactics or somewhere else