Difference between revisions of "Apertium-html-tools/Paradigm dictionary"

From Apertium
Jump to navigation Jump to search
 
(8 intermediate revisions by the same user not shown)
Line 1: Line 1:
Paradigm dictionary mode in HTML-Tools offers bilingual dictionary functionality with paradigms, all based on Apertium data. This is especially useful for language communities where there are few existing resources for learners.
Paradigm dictionary mode in HTML-Tools offers bilingual dictionary functionality with paradigms, all based on Apertium data. This is especially useful for language communities where there are few existing resources for learners.


== Overview of functionalilty ==
== Overview of functionality ==


=== Dictionary lookup ===
=== Dictionary lookup ===
* Searches both languages, displays in direction of chosen pair.
* Searches both languages, displays in direction of chosen pair.
* Based on Apertium bilingual dictionaries.
* Based on Apertium bilingual dictionaries.
* Uses Apertium monolingual analysers: can search any morphological form of a word, will show in results which form the searched form is. (Has [https://github.com/apertium/apertium-html-tools/issues/521 a bug with synchretic forms].)
* Semantic fuzzy search based on word embeddings.
* Semantic fuzzy search based on word embeddings.


Line 11: Line 12:
* Generates paradigm per word based on POS.
* Generates paradigm per word based on POS.
* Can select different modes for paradigm display, e.g., learner vs. linguist. Headings are also localised(/localisable),
* Can select different modes for paradigm display, e.g., learner vs. linguist. Headings are also localised(/localisable),

=== Planned features ===
* There's a [https://github.com/apertium/apertium-html-tools/issues/513 meta-issue on GitHub of planned features].

=== Example ===
* There is a live example available to try at [https://urum.apertium.org/ https://urum.apertium.org/].


== Poster ==
== Poster ==
Line 17: Line 24:


== Installation ==
== Installation ==
=== Language data, APy, and HTML-Tools requirements ===
# Get [[APy]] and [[HTML-Tools]] running
# Get [[APy]] and [[HTML-Tools]] running
#* You'll need a version of APy that supports billookup and bilsearch modes, <s>currently the [https://github.com/apertium/apertium-apy/tree/embeddings embeddings branch]</s> in master branch as of 2025-10-14.
#* You'll need a version of APy that supports billookup and bilsearch modes, <s>currently the [https://github.com/apertium/apertium-apy/tree/embeddings embeddings branch]</s> in master branch as of 2025-10-14.
#* You'll need a version of HTML-Tools that supports paradigm dictionary mode, currently the [https://github.com/apertium/apertium-html-tools/tree/urum urum branch].
#* You'll need a version of HTML-Tools that supports paradigm dictionary mode, currently the [https://github.com/apertium/apertium-html-tools/tree/urum urum branch].
# Make sure your <tt>config.ts</tt> (in HTML-Tools) is updated to include the following:
#* <tt>Mode.Dictionary</tt> should be in <tt>enabledModes</tt> list (and can be set as the <tt>defaultMode</tt>)
#* <tt>apyURL</tt> and <tt>htmlUrl</tt> will need to be updated to match your setup
# Ensure that at least one language pair has a <tt>billookup</tt> mode and <tt>bilsearch</tt> mode in at least one direction (ideally both):
# Ensure that at least one language pair has a <tt>billookup</tt> mode and <tt>bilsearch</tt> mode in at least one direction (ideally both):
#* <tt>modes.xml</tt> will need blocks like in [https://github.com/apertium/apertium-uum-eng/blob/01896a5ce3fc58e218c27fe4b97167069683a4a7/modes.xml#L45-L65 uum-eng].
#* <tt>modes.xml</tt> will need blocks like in [https://github.com/apertium/apertium-uum-eng/blob/01896a5ce3fc58e218c27fe4b97167069683a4a7/modes.xml#L45-L65 uum-eng].
Line 36: Line 47:
* You'll need to add a block to the <tt>modes.xml</tt> file, cf. [https://github.com/apertium/apertium-uum-eng/blob/01896a5ce3fc58e218c27fe4b97167069683a4a7/modes.xml#L134-L140 uum-eng]
* You'll need to add a block to the <tt>modes.xml</tt> file, cf. [https://github.com/apertium/apertium-uum-eng/blob/01896a5ce3fc58e218c27fe4b97167069683a4a7/modes.xml#L134-L140 uum-eng]
* An APy version that supports embeddings should then be able to find it and serve it.
* An APy version that supports embeddings should then be able to find it and serve it.
* You will probably want to recompile the embeddings transducer every time you update the bilingual dictionary. This should probably be implemented in Makefiles.

Latest revision as of 16:22, 16 October 2025

Paradigm dictionary mode in HTML-Tools offers bilingual dictionary functionality with paradigms, all based on Apertium data. This is especially useful for language communities where there are few existing resources for learners.

Overview of functionality[edit]

Dictionary lookup[edit]

  • Searches both languages, displays in direction of chosen pair.
  • Based on Apertium bilingual dictionaries.
  • Uses Apertium monolingual analysers: can search any morphological form of a word, will show in results which form the searched form is. (Has a bug with synchretic forms.)
  • Semantic fuzzy search based on word embeddings.

Paradigm display[edit]

  • Generates paradigm per word based on POS.
  • Can select different modes for paradigm display, e.g., learner vs. linguist. Headings are also localised(/localisable),

Planned features[edit]

Example[edit]

Poster[edit]

A poster showing some of the features as of 2025 Q3.

Installation[edit]

Language data, APy, and HTML-Tools requirements[edit]

  1. Get APy and HTML-Tools running
    • You'll need a version of APy that supports billookup and bilsearch modes, currently the embeddings branch in master branch as of 2025-10-14.
    • You'll need a version of HTML-Tools that supports paradigm dictionary mode, currently the urum branch.
  2. Make sure your config.ts (in HTML-Tools) is updated to include the following:
    • Mode.Dictionary should be in enabledModes list (and can be set as the defaultMode)
    • apyURL and htmlUrl will need to be updated to match your setup
  3. Ensure that at least one language pair has a billookup mode and bilsearch mode in at least one direction (ideally both):
    • modes.xml will need blocks like in uum-eng.

Setting up paradigms[edit]

  1. Add a language module to dictionary/langs/
  2. Add a reference to the language file in dictionary/index.ts
  3. Localisation for labels is in strings/pos

Adding embeddings[edit]

Embeddings allow searches to return semantically similar results. This is optional.

Real documentation to become available. For now some hints:

  • You'll need a way to generate embeddings, cf. the scripts in uum-eng
  • You'll need to compile embeddings into a transducer, cf. uum-eng
  • You'll need to add a block to the modes.xml file, cf. uum-eng
  • An APy version that supports embeddings should then be able to find it and serve it.
  • You will probably want to recompile the embeddings transducer every time you update the bilingual dictionary. This should probably be implemented in Makefiles.