Apertium-html-tools/Paradigm dictionary

From Apertium
Jump to navigation Jump to search

Paradigm dictionary mode in HTML-Tools offers bilingual dictionary functionality with paradigms, all based on Apertium data. This is especially useful for language communities where there are few existing resources for learners.

Overview of functionality

Dictionary lookup

  • Searches both languages, displays in direction of chosen pair.
  • Based on Apertium bilingual dictionaries.
  • Uses Apertium monolingual analysers: can search any morphological form of a word, will show in results which form the searched form is. (Might not support synchronicity?)
  • Semantic fuzzy search based on word embeddings.

Paradigm display

  • Generates paradigm per word based on POS.
  • Can select different modes for paradigm display, e.g., learner vs. linguist. Headings are also localised(/localisable),

Planned features

Example

Poster

A poster showing some of the features as of 2025 Q3.

Installation

Language data, APy, and HTML-Tools requirements

  1. Get APy and HTML-Tools running
    • You'll need a version of APy that supports billookup and bilsearch modes, currently the embeddings branch in master branch as of 2025-10-14.
    • You'll need a version of HTML-Tools that supports paradigm dictionary mode, currently the urum branch.
  2. Make sure your config.ts (in HTML-Tools) is updated to include the following:
    • Mode.Dictionary should be in enabledModes list (and can be set as the defaultMode)
    • apyURL and htmlUrl will need to be updated to match your setup
  3. Ensure that at least one language pair has a billookup mode and bilsearch mode in at least one direction (ideally both):
    • modes.xml will need blocks like in uum-eng.

Setting up paradigms

  1. Add a language module to dictionary/langs/
  2. Add a reference to the language file in dictionary/index.ts
  3. Localisation for labels is in strings/pos

Adding embeddings

Embeddings allow searches to return semantically similar results. This is optional.

Real documentation to become available. For now some hints:

  • You'll need a way to generate embeddings, cf. the scripts in uum-eng
  • You'll need to compile embeddings into a transducer, cf. uum-eng
  • You'll need to add a block to the modes.xml file, cf. uum-eng
  • An APy version that supports embeddings should then be able to find it and serve it.
  • You will probably want to recompile the embeddings transducer every time you update the bilingual dictionary. This should probably be implemented in Makefiles.