User:Dtr5

From Apertium
Jump to navigation Jump to search

Unnamed simple dictionary insert

I am programming a little tool for inserting words into Apertium dictionaries as a university project. The tool is web-based, and it should be online in a short time. It aims for a standard and simple way of managing dictionaries.

By now, it can only manage the simplest of the apertium entries:

 <e>
    <par/>
 <e/>

It works like pastebin. When a user first connects the web, he is asked to upload an Apertium pair, and, if he has them, the configuration files. He gets a random identifier, that can be used for returning to this session.

Now he can insert words into the dictionaries.

Words should be written in its representative form (that is defined in the configuration file). Then, the user gets a list of possible paradigms for the current word, and some flexed forms of the word (the ones defined in the configuration file).

After writing both words, the user can generate the XML nodes that will be inserted into the dictionary. If one (or both) words are already defined in the monolingual dictionary, they will not be inserted. If the translation is defined in the bilingual dictionary, no changes are made.

User should be aware that this step logic is really simple: it does not check if the translation node should be tagged as secondary or directional. User should be as cautious as when editing dictionaries directly.

Then, after the insertions, the user can export the dictionaries. This will take a while. When this operation ends, the user will get download links for the updated dictionaries.

User can close his session any time. It will free the user identifier, and will delete all the dictionaries and configuration files uploaded.



The application runs over php, and uses BaseX as XML database. It also uses a bit of XSLT.


TODO
  • Insert all kind of Apertium monolingual nodes.
  • Sorted paradigm list for each word.
  • Better awareness of conflicts in bilingual insertions.
  • metadix support.