Lint

From Apertium
Revision as of 13:46, 18 August 2016 by Schindler (talk | contribs) (→‎About)
Jump to navigation Jump to search

About

This page contains a log of all the different errors that the lint will be designed to handle. Along with it, as it's development progresses updates will be posted here about the different releases.

Along with this, documentation about the lint's working and technicalities will also be specified here.

Trunk : https://sourceforge.net/p/apertium/svn/HEAD/tree/trunk/apertium-tools/apertium-lint/

Development : https://gitlab.com/jpsinghgoud/apertium-lint

Monodix

List of issues in monodixex

  • Redundant Entries
<pardef n="di/e__vblex"><e><par n="liv/e__vblex"/></e></pardef>
  • Maintain consistency in the data present in the <r> tag in pardef entries.
<e><p><l>as</l><r>o<s n="prn"/><s n="tn"/><s n="f"/><s n="pl"/></r></p></e>
<e><p><l>a</l><r>abz<s n="prn"/><s n="tn"/><s n="f"/><s n="sg"/></r></p></e>
  • Paradigms should have 2:"_" instead of 1:"_".
<pardef n="outr/o__prn">
<pardef n="/o_meu__prn">
  • Repeated tag entries : Main Section.
<e><p><l>y</l><r>y<s n="adj"/><s n="adj"/></r></p></e>
  • Repeated tag entries : Pardef.
<pardef n="br/other__n">
  <e>       <p><l>other</l>     <r>other<s n="n"/><s n="sg"/></r></p></e>
  <e>       <p><l>other</l>     <r>other<s n="sg"/><s n="n"/></r></p></e>
  <e r="LR"><p><l>ethren</l>    <r>other<s n="n"/><s n="pl"/></r></p></e>
  <e>       <p><l>others</l>    <r>other<s n="n"/><s n="pl"/></r></p><par n="gen__apos"/></e>
</pardef>
  • Repeated entries in the dictionary.
<e lm="house"><i>house</i><par n="house__n"/></e>
<e lm="house"><i>house</i><par n="house__n"/></e>
  • The "cm" entry should have only the comma and nothing else.
<e><re>,</re><p><l></l><r><s n="cm"/></r></p></e> 
  • Recursive paradigms. (Still to be decided)
<pardef n="up__n"><e><i></i><par n="house__n"/></e></pardef>
<pardef n="house__n"><e><i></i><par n="xyz__n"/></e></pardef>
<pardef n="xyz__n"><e><i></i><par n="up__n"/></e></pardef>
  • Incorrect transfer direction.
<e r="XYZ">
  • Unused paradigms
  • Unwanted(repeated) tags in entries.
<l>as<b/>minhas<b/></l><r>o<b/>meu<s n="prn"/>
  • White spaces should be denoted using
<e lm="Middle Ages"><i>Middle Ages</i><par n="house__n"/><par n="gen__apos"/></e>

Bidix

  • Bidix should not have sdefs that are not in monodix or lexc file.
  • When a word pair is not equivalent grammatically one needs to specify all the grammatical symbols in the same order as they are specified in the monolingual dictionaries. (Under Review)
<e><p><l>Londono<s n="np"/><s n="top"/></l><r>London<s n="np"/><s n="top"/></r></p></e>


<pardef n="Barcelona__np">
    <e><p><l></l><r><s n="top"/><s n="np"/><s n="sg"/></r></p></e>
</pardef>
  • Blank spaces should always be denoted using
<e><p><l>Sunsistemo<s n="np"/><s n="top"/></l><r>Solar System<s n="np"/><s n="top"/></r></p></e>
  • Only LR or RL can be used in <e r="XY">
<e r="XY"><p><l>ĥina<s n="adj"/></l><r>Chinese<s n="adj"/></r></p></e>
  • Multiwords with inner inflection consist of a word that can inflect an invariable element. For these entries we need to specify the inflection paradigm just after the word that inflects. The invariable part must be marked with the element <g> in the right side. If the <g> tag is present in the monolingual dictionary, it should also be present in the bilingual dictionary.
<e><p><l>eltiriĝi<s n="vblex"/></l><r>back<g><b/>out</g><s n="vblex"/></r></p></e>

<e lm="back out"><i>back</i><par n="accept__vblex"/><p><l><b/>out</l><r><g><b/>out</g></r></p></e>

Transfer

  • In transfer files, one common error is calling, for instance in <clip> an attribute in part="" that does not exist. The lint detects and reports such issues :
 <code><clip pos="1" side="tl" part="xyz"/></code> 

Problem : Check that the attribute listed in part="xyz" is defined using a <def-attr>.

  • Unused def-cats. There are certain redundant def-cats that maybe present in the transfer rules definition but are never actually used. The lint can now detect and report such def cats.
  • Repeated cat-items in def-cats :
<def-cat n="adj">
      <cat-item tags="adj"/>
      <cat-item tags="adj.comp"/> 
      <cat-item tags="adj.comp"/> 
</def-cat>
  • Conflicting cat-items. This detects and reports if the same cat-item has been used in two or more def-cats :
<def-cat n="nounx">
      <cat-item tags="np.*"/>
</def-cat>
    
<def-cat n="nouny">
      <cat-item tags="np.*"/>
</def-cat>
  • Repeated attr-items in def-attr :
<def-attr n="temps">
      <attr-item tags="cni"/>
      <attr-item tags="fti"/>  
      <attr-item tags="ifi"/>
      <attr-item tags="fti"/>
</def-attr>
  • Checking for valid position. Every mention of pos="xyz" is checked to make sure that xyz is less than or equal to the number of elements in pattern-item :
<pattern>
	<pattern-item n="on"/>
	<pattern-item n="num"/>
</pattern>
.....
<with param pos = "3"/>
  • Enforce using a between consecutive <lu> tags.
<lu>
	<clip pos="4" side="tl" part="lemh"/>
</lu>

<lu>
        <lit v="se"/>
        <lit-tag v="prn.enc.ref.p3.mf.sp"/>
        <clip pos="4" side="tl" part="lemq"/>
</lu>
  • Enforce side. In <out> the side attribute can either take the value of "sl" or "tl". With the lint in place, the user can enforce the side attribute to be "tl" in all the <out> tags.
<out>
        <lu>
                <clip pos="1" side="sl" part="lem"
        </lu>
</out>
  • Checking the validity of <equal>. Another common error in transfer files is trying to compare an attribute to a value it cannot take :
<equal>
        <clip pos="1" side="tl" part="a_nom"/>
        <lit-tag v="n.acr"/>
</equal>  
  • XSD Validation for transfer files takes care of issues such as :
    • Repeated def-cats
    • Repeated def-attrs
    • Repeated def-list
    • Valid side : 'sl' or 'tl'
    • Repeated def-macro


Here, the definition of attribute a_nom matches only "n" and "np" in attr-item definitions.


In the Pipeline
  • In this example, the user tried setting a non-lemma clip to a lit instead of lit-tag:
        <let><clip pos="2" part="art"/><lit v="def"/></let>

lit makes sense for part="lem" or lemh or lemq, but most likely not for anything else

  • In this example, the user set the definiteness to singular, which doesn't make much sense:
        <let><clip pos="2" part="art"/><lit-tag v="sg"/></let>

(it could be a hacky way of deleting the definiteness and appending a number, but then the code would be clearer if we first append definiteness using concat, then delete it)

Modes

  • Install attribute : In modes files, the install files can only take binary values : 'yes' or 'no'
<mode name="fr-eo" install="xyz">
  • Repeated programs : Detects and reports repeated programs in modes.
<program name="apertium-transfer">
<file name="apertium-eo-fr.fr-eo.t1x"/>
<file name="fr-eo.t1x.bin"/>
<file name="fr-eo.autobil.bin"/>
</program>

<program name="apertium-transfer">
<file name="apertium-eo-fr.fr-eo.t1x"/>
<file name="fr-eo.t1x.bin"/>
<file name="fr-eo.autobil.bin"/>
</program>
  • Program validation : Every modes files consists of various programs, each having a unique name. It is possible that a program may be wrongly named and passes unnoticed.
<program name="apertium-ppretransfer"/>
  • Enforce specific rules : This function is responsible for enforcing certain rules specific to given programs. This is not an exhaustive function and new rules relating to programs (flags and attributes) will be added here.
    • Enforcing the use of apertium-tagger with “-g $2”
  • Install switch : If in the definition of a certain mode, the attribute install=”no”, the name should have an appropriate suffix like -morph, -interchunk, etc.
  • Locate file : Assuming you're working with modes.xml present in the same directory as the other files for the given language pair, this function checks and prompts incase a file defined in a program.
<program name="apertium-interchunk">
        <file name="apertium-eo-fr.fr-eo.antaux1_t2x"/>
        <file name="xyz.bin"
</program>
  • Empty programs : Not so much a risk, but this function prompts if a given program does not have any file associated with it.
<program name="apertium-pretransfer"/>

Tagger

Others

Consistency of {Multichar_Symbols/sdefs/LISTs/SETs}
  • This may be larger task chopped to dozens of checks, but IMO source of most problems: