Apertium has moved from SourceForge to GitHub.
If you have any questions, please come and talk to us on #apertium on irc.freenode.net or contact the GitHub migration team.

Paradigm chopper

From Apertium
Jump to navigation Jump to search

Paradigm chopper is a python script which removes redundant paradigm definitions from dictionaries, and fixes references to them. For example, if you have a dictionary thus:

  <pardef n="car__n">
    <e><p><l/><r><s n="n"/><s n="sg"/></r></p></e>
    <e><p><l>s</l><r><s n="n"/><s n="pl"/></r></p></e>
  <pardef n="tree__n">
    <e><p><l/><r><s n="n"/><s n="sg"/></r></p></e>
    <e><p><l>s</l><r><s n="n"/><s n="pl"/></r></p></e>

it would remove the tree__n paradigm, and make all main section elements that point to this point to car__n instead. Currently if two paradigms are the same, it keeps the one with the shortest name.


To use it, do:

$ python paradigm-chopper.py <dix> > newdix

This will print some output on stderr about what its doing (with large dictionaries it may take some time), and put the dictionary output in newdix. You will need to copy in the header (including sdef entries). After you've made this, use lt-expand to do a sanity check by doing:

$ lt-expand <olddix> > olddix.exp
$ lt-expand <newdix> > newdix.exp
$ diff -Naur olddix.exp newdix.exp

If there are any differences, please send the dictionary files, to Fran.