Difference between revisions of "Apertium-html-tools/Paradigm dictionary"

From Apertium
Jump to navigation Jump to search
 
(2 intermediate revisions by the same user not shown)
Line 6: Line 6:
* Searches both languages, displays in direction of chosen pair.
* Searches both languages, displays in direction of chosen pair.
* Based on Apertium bilingual dictionaries.
* Based on Apertium bilingual dictionaries.
* Uses Apertium monolingual analysers: can search any morphological form of a word, will show in results which form the searched form is. (Might not support synchronicity?)
* Uses Apertium monolingual analysers: can search any morphological form of a word, will show in results which form the searched form is. (Has [https://github.com/apertium/apertium-html-tools/issues/521 a bug with synchretic forms].)
* Semantic fuzzy search based on word embeddings.
* Semantic fuzzy search based on word embeddings.


Line 15: Line 15:
=== Planned features ===
=== Planned features ===
* There's a [https://github.com/apertium/apertium-html-tools/issues/513 meta-issue on GitHub of planned features].
* There's a [https://github.com/apertium/apertium-html-tools/issues/513 meta-issue on GitHub of planned features].

=== Example ===
* There is a live example available to try at [https://urum.apertium.org/ https://urum.apertium.org/].


== Poster ==
== Poster ==
Line 25: Line 28:
#* You'll need a version of APy that supports billookup and bilsearch modes, <s>currently the [https://github.com/apertium/apertium-apy/tree/embeddings embeddings branch]</s> in master branch as of 2025-10-14.
#* You'll need a version of APy that supports billookup and bilsearch modes, <s>currently the [https://github.com/apertium/apertium-apy/tree/embeddings embeddings branch]</s> in master branch as of 2025-10-14.
#* You'll need a version of HTML-Tools that supports paradigm dictionary mode, currently the [https://github.com/apertium/apertium-html-tools/tree/urum urum branch].
#* You'll need a version of HTML-Tools that supports paradigm dictionary mode, currently the [https://github.com/apertium/apertium-html-tools/tree/urum urum branch].
#* Make sure your <tt>config.ts</tt> is updated to include the following:
# Make sure your <tt>config.ts</tt> (in HTML-Tools) is updated to include the following:
#** <tt>Mode.Dictionary</tt> should be in <tt>enabledModes</tt> list (and can be set as the <tt>defaultMode</tt>)
#* <tt>Mode.Dictionary</tt> should be in <tt>enabledModes</tt> list (and can be set as the <tt>defaultMode</tt>)
#** <tt>apyURL</tt> and <tt>htmlUrl</tt> will need to be updated to match your setup
#* <tt>apyURL</tt> and <tt>htmlUrl</tt> will need to be updated to match your setup
# Ensure that at least one language pair has a <tt>billookup</tt> mode and <tt>bilsearch</tt> mode in at least one direction (ideally both):
# Ensure that at least one language pair has a <tt>billookup</tt> mode and <tt>bilsearch</tt> mode in at least one direction (ideally both):
#* <tt>modes.xml</tt> will need blocks like in [https://github.com/apertium/apertium-uum-eng/blob/01896a5ce3fc58e218c27fe4b97167069683a4a7/modes.xml#L45-L65 uum-eng].
#* <tt>modes.xml</tt> will need blocks like in [https://github.com/apertium/apertium-uum-eng/blob/01896a5ce3fc58e218c27fe4b97167069683a4a7/modes.xml#L45-L65 uum-eng].

Latest revision as of 16:22, 16 October 2025

Paradigm dictionary mode in HTML-Tools offers bilingual dictionary functionality with paradigms, all based on Apertium data. This is especially useful for language communities where there are few existing resources for learners.

Overview of functionality[edit]

Dictionary lookup[edit]

  • Searches both languages, displays in direction of chosen pair.
  • Based on Apertium bilingual dictionaries.
  • Uses Apertium monolingual analysers: can search any morphological form of a word, will show in results which form the searched form is. (Has a bug with synchretic forms.)
  • Semantic fuzzy search based on word embeddings.

Paradigm display[edit]

  • Generates paradigm per word based on POS.
  • Can select different modes for paradigm display, e.g., learner vs. linguist. Headings are also localised(/localisable),

Planned features[edit]

Example[edit]

Poster[edit]

A poster showing some of the features as of 2025 Q3.

Installation[edit]

Language data, APy, and HTML-Tools requirements[edit]

  1. Get APy and HTML-Tools running
    • You'll need a version of APy that supports billookup and bilsearch modes, currently the embeddings branch in master branch as of 2025-10-14.
    • You'll need a version of HTML-Tools that supports paradigm dictionary mode, currently the urum branch.
  2. Make sure your config.ts (in HTML-Tools) is updated to include the following:
    • Mode.Dictionary should be in enabledModes list (and can be set as the defaultMode)
    • apyURL and htmlUrl will need to be updated to match your setup
  3. Ensure that at least one language pair has a billookup mode and bilsearch mode in at least one direction (ideally both):
    • modes.xml will need blocks like in uum-eng.

Setting up paradigms[edit]

  1. Add a language module to dictionary/langs/
  2. Add a reference to the language file in dictionary/index.ts
  3. Localisation for labels is in strings/pos

Adding embeddings[edit]

Embeddings allow searches to return semantically similar results. This is optional.

Real documentation to become available. For now some hints:

  • You'll need a way to generate embeddings, cf. the scripts in uum-eng
  • You'll need to compile embeddings into a transducer, cf. uum-eng
  • You'll need to add a block to the modes.xml file, cf. uum-eng
  • An APy version that supports embeddings should then be able to find it and serve it.
  • You will probably want to recompile the embeddings transducer every time you update the bilingual dictionary. This should probably be implemented in Makefiles.