Difference between revisions of "Lint"

From Apertium
Jump to navigation Jump to search
Line 53: Line 53:
   
 
* Unused paradigms
 
* Unused paradigms
 
* Time, Number, Ordinals, Dates, URLs, emails, separators, proper_names,et al to be standard
 
   
 
* Unwanted(repeated) <b/> tags in entries.
 
* Unwanted(repeated) <b/> tags in entries.

Revision as of 20:16, 16 June 2016

About

This page contains a log of all the different errors that the lint will be designed to handle. Along with it, as it's development progresses updates will be posted here about the different releases.

Along with this, documentation about the lint's working and technicalities will also be specified here.

Monodix

List of issues in monodixex

  • Redundant Entries
<pardef n="di/e__vblex"><e><par n="liv/e__vblex"/></e></pardef>
  • Maintain consistency in the data present in the <r> tag in pardef entries.
<e><p><l>as</l><r>o<s n="prn"/><s n="tn"/><s n="f"/><s n="pl"/></r></p></e>
<e><p><l>a</l><r>abz<s n="prn"/><s n="tn"/><s n="f"/><s n="sg"/></r></p></e>
  • Paradigms should have 2:"_" instead of 1:"_".
<pardef n="outr/o__prn">
<pardef n="/o_meu__prn">
  • Repeated tag entries : Main Section.
<e><p><l>y</l><r>y<s n="adj"/><s n="adj"/></r></p></e>
  • Repeated tag entries : Pardef.
<pardef n="br/other__n">
  <e>       <p><l>other</l>     <r>other<s n="n"/><s n="sg"/></r></p></e>
  <e>       <p><l>other</l>     <r>other<s n="sg"/><s n="n"/></r></p></e>
  <e r="LR"><p><l>ethren</l>    <r>other<s n="n"/><s n="pl"/></r></p></e>
  <e>       <p><l>others</l>    <r>other<s n="n"/><s n="pl"/></r></p><par n="gen__apos"/></e>
</pardef>
  • Repeated entries in the dictionary.
<e lm="house"><i>house</i><par n="house__n"/></e>
<e lm="house"><i>house</i><par n="house__n"/></e>
  • The "cm" entry should have only the comma and nothing else.
<e><re>,</re><p><l></l><r><s n="cm"/></r></p></e> 
  • Recursive paradigms. (Still to be decided)
<pardef n="up__n"><e><i></i><par n="house__n"/></e></pardef>
<pardef n="house__n"><e><i></i><par n="xyz__n"/></e></pardef>
<pardef n="xyz__n"><e><i></i><par n="up__n"/></e></pardef>
  • Incorrect transfer direction.
<e r="XYZ">
  • Unused paradigms
  • Unwanted(repeated) tags in entries.
<l>as<b/>minhas<b/></l><r>o<b/>meu<s n="prn"/>
  • White spaces should be denoted using
<e lm="Middle Ages"><i>Middle Ages</i><par n="house__n"/><par n="gen__apos"/></e>

Bidix

  • Bidix should not have sdefs that are not in monodix or lexc file.
  • When a word pair is not equivalent grammatically one needs to specify all the grammatical symbols in the same order as they are specified in the monolingual dictionaries.
<e><p><l>Londono<s n="np"/><s n="top"/></l><r>London<s n="np"/><s n="top"/></r></p></e>


<pardef n="Barcelona__np">
    <e><p><l></l><r><s n="top"/><s n="np"/><s n="sg"/></r></p></e>
</pardef>
  • Blank spaces should always be denoted using
<e><p><l>Sunsistemo<s n="np"/><s n="top"/></l><r>Solar System<s n="np"/><s n="top"/></r></p></e>
  • Only LR or RL can be used in <e r="XY">
<e r="XY"><p><l>ĥina<s n="adj"/></l><r>Chinese<s n="adj"/></r></p></e>
  • Multiwords with inner inflection consist of a word that can inflect an invariable element. For these entries we need to specify the inflection paradigm just after the word that inflects. The invariable part must be marked with the element <g> in the right side. If the <g> tag is present in the monolingual dictionary, it should also be present in the bilingual dictionary.
<e><p><l>eltiriĝi<s n="vblex"/></l><r>back<g><b/>out</g><s n="vblex"/></r></p></e>

<e lm="back out"><i>back</i><par n="accept__vblex"/><p><l><b/>out</l><r><g><b/>out</g></r></p></e>

Transfer

  1. Issue in :
     <code><clip></code> 
    Problem : Check that the attribute listed in part="" is defined using a <def-attr>.


In this example, the user tried setting a non-lemma clip to a lit instead of lit-tag:

        <let><clip pos="2" part="art"/><lit v="def"/></let>

lit makes sense for part="lem" or lemh or lemq, but most likely not for anything else


In this example, the user set the definiteness to singular, which doesn't make much sense:

        <let><clip pos="2" part="art"/><lit-tag v="sg"/></let>

(it could be a hacky way of deleting the definiteness and appending a number, but then the code would be clearer if we first append definiteness using concat, then delete it)

Modes

  • don't use program namelrx-proc without the -m option.

Tagger

Others

Consistency of {Multichar_Symbols/sdefs/LISTs/SETs}
  • This may be larger task chopped to dozens of checks, but IMO source of most problems: