Difference between revisions of "Paradigm chopper"

From Apertium
Jump to navigation Jump to search
Line 3: Line 3:
<pre>
<pre>
<pardef n="car__n">
<pardef n="car__n">
<e><p><l/><r><s n="n"/><s n="sg"/></r></p></e>
<e>
<e><p><l>s</l><r><s n="n"/><s n="pl"/></r></p></e>
<p>
<l/>
<r><s n="n"/><s n="sg"/></r>
</p>
</e>
<e>
<p>
<l>s</l>
<r><s n="n"/><s n="pl"/></r>
</p>
</e>
</pardef>
</pardef>
<pardef n="tree__n">
<pardef n="tree__n">
<e><p><l/><r><s n="n"/><s n="sg"/></r></p></e>
<e>
<e><p><l>s</l><r><s n="n"/><s n="pl"/></r></p></e>
<p>
<l/>
<r><s n="n"/><s n="sg"/></r>
</p>
</e>
<e>
<p>
<l>s</l>
<r><s n="n"/><s n="pl"/></r>
</p>
</e>
</pardef>
</pardef>
</pre>
</pre>

Revision as of 11:41, 26 March 2011

Paradigm chopper is a python script which removes redundant paradigm definitions from dictionaries, and fixes references to them. For example, if you have a dictionary thus:

  <pardef n="car__n">
    <e><p><l/><r><s n="n"/><s n="sg"/></r></p></e>
    <e><p><l>s</l><r><s n="n"/><s n="pl"/></r></p></e>
  </pardef>
  <pardef n="tree__n">
    <e><p><l/><r><s n="n"/><s n="sg"/></r></p></e>
    <e><p><l>s</l><r><s n="n"/><s n="pl"/></r></p></e>
  </pardef>

it would remove the tree__n paradigm, and make all main section elements that point to this point to car__n instead. Currently if two paradigms are the same, it keeps the one with the shortest name.

Example

To use it, do:

$ python paradigm-chopper.py <dix> > newdix

This will print some output on stderr about what its doing (with large dictionaries it may take some time), and put the dictionary output in newdix. You will need to copy in the header (including sdef entries). After you've made this, use lt-expand to do a sanity check by doing:

$ lt-expand <olddix> > olddix.exp
$ lt-expand <newdix> > newdix.exp
$ diff -Naur olddix.exp newdix.exp

If there are any differences, please send the dictionary files, to Fran.