Dixtools: Enhance

From Apertium
Revision as of 11:52, 6 October 2019 by Unhammer (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Dixtools has an enhance option to add new words to a dictionary.

How to use it:

  • java -jar path/to/apertium-dixtools.jar enhance existing_dict.xml name_new_dict.dix

There are 3 parameters

   enhance 
       the name for the tool
   existing_dict.dix
       the name of the existing dict (i.e.: apertium-es-ca.ca.dix)
   name_new_dict.dix
       the name we want the new (enhanced) dict to be saved
       ALERT: the tool simply overrides whatever file exists with that name

Then you'll be dropped into an interactive session where you can type a word you want to add, followed by a comma and then word which exists in the dictionary and has the paradigm you want to use. Example session:

$ java -jar dist/apertium-dixtools.jar enhance apertium-nno.nno.dix new.dix
Reading file 'apertium-nno.nno.dix'
'enhance' method
Enter the word you want to add to the dictionaries,
followed by a word already in the dictionaries separated by ','

(enter --exit to finish)
ostekaffi
Wrong format
Enter the word you want to add to the dictionaries,
followed by a word already in the dictionaries separated by ','

(enter --exit to finish)
ostekaffi,kaffi
<e>[IVXLCDM\-]+<l></l><r><det><qnt><un><pl></r></e>
<e><i>AP_PAIR_VERSION</i><l></l><r><adv></r></e>
<e r="LR"><i>AP_LANG_VERSION</i><l></l><r>@APERTIUM_AUTO_VERSION@<adv></r></e>
Result  <e><i>ostekaffi</i><par n="ep__n"/></e>
Element added: (<e><i>ostekaffi</i><par n="ep__n"/></e>)

--------------------------------------------------------------------
Enter the word you want to add to the dictionaries,
followed by a word already in the dictionaries separated by ','

(enter --exit to finish)
pianomaskin,maskin
<e>[IVXLCDM\-]+<l></l><r><det><qnt><un><pl></r></e>
<e><i>AP_PAIR_VERSION</i><l></l><r><adv></r></e>
<e r="LR"><i>AP_LANG_VERSION</i><l></l><r>@APERTIUM_AUTO_VERSION@<adv></r></e>
Result  <e><i>pianomaskin</i><par n="så__n"/></e>
Element added: (<e><i>pianomaskin</i><par n="så__n"/></e>)

--------------------------------------------------------------------
Enter the word you want to add to the dictionaries,
followed by a word already in the dictionaries separated by ','

(enter --exit to finish)
--exit
Writing file new.dix


See discussion at http://thread.gmane.org/gmane.comp.nlp.apertium/2946/focus=2987 or https://sourceforge.net/p/apertium/mailman/message/30540141/

Limitations

  • Like other dixtools methods, it parses and rewrites the dictionary, which will lead to quite large diffs if you haven't already conformed to the dixtools format.
  • dixtools will strip all c (comment) attributes since it doesn't know about them
  • it probably doesn't work on metadix
  • it confusingly prints some weird irrelevant lines from the .dix before the pardef it found (see above example session)