Apertium-html-tools/Paradigm dictionary

From Apertium
Jump to navigation Jump to search

Paradigm dictionary mode in HTML-Tools offers bilingual dictionary functionality with paradigms, all based on Apertium data. This is especially useful for language communities where there are few existing resources for learners.

Overview of functionality[edit]

Dictionary lookup[edit]

  • Searches both languages, displays in direction of chosen pair.
  • Based on Apertium bilingual dictionaries.
  • Uses Apertium monolingual analysers: can search any morphological form of a word, will show in results which form the searched form is. (Has a bug with synchretic forms.)
  • Semantic fuzzy search based on word embeddings.

Paradigm display[edit]

  • Generates paradigm per word based on POS.
  • Can select different modes for paradigm display, e.g., learner vs. linguist. Headings are also localised(/localisable),

Planned features[edit]

Example[edit]

Poster[edit]

A poster showing some of the features as of 2025 Q3.

Installation[edit]

Language data, APy, and HTML-Tools requirements[edit]

  1. Get APy and HTML-Tools running
    • You'll need a version of APy that supports billookup and bilsearch modes, currently the embeddings branch in master branch as of 2025-10-14.
    • You'll need a version of HTML-Tools that supports paradigm dictionary mode, currently the urum branch.
  2. Make sure your config.ts (in HTML-Tools) is updated to include the following:
    • Mode.Dictionary should be in enabledModes list (and can be set as the defaultMode)
    • apyURL and htmlUrl will need to be updated to match your setup
  3. Ensure that at least one language pair has a billookup mode and bilsearch mode in at least one direction (ideally both):
    • modes.xml will need blocks like in uum-eng.

Setting up paradigms[edit]

  1. Add a language module to dictionary/langs/
  2. Add a reference to the language file in dictionary/index.ts
  3. Localisation for labels is in strings/pos

Adding embeddings[edit]

Embeddings allow searches to return semantically similar results. This is optional.

Real documentation to become available. For now some hints:

  • You'll need a way to generate embeddings, cf. the scripts in uum-eng
  • You'll need to compile embeddings into a transducer, cf. uum-eng
  • You'll need to add a block to the modes.xml file, cf. uum-eng
  • An APy version that supports embeddings should then be able to find it and serve it.
  • You will probably want to recompile the embeddings transducer every time you update the bilingual dictionary. This should probably be implemented in Makefiles.