Apertium-html-tools/Paradigm dictionary
Jump to navigation
Jump to search
Paradigm dictionary mode in HTML-Tools offers bilingual dictionary functionality with paradigms, all based on Apertium data. This is especially useful for language communities where there are few existing resources for learners.
Contents
Overview of functionality[edit]
Dictionary lookup[edit]
- Searches both languages, displays in direction of chosen pair.
- Based on Apertium bilingual dictionaries.
- Uses Apertium monolingual analysers: can search any morphological form of a word, will show in results which form the searched form is. (Has a bug with synchretic forms.)
- Semantic fuzzy search based on word embeddings.
Paradigm display[edit]
- Generates paradigm per word based on POS.
- Can select different modes for paradigm display, e.g., learner vs. linguist. Headings are also localised(/localisable),
Planned features[edit]
- There's a meta-issue on GitHub of planned features.
Example[edit]
- There is a live example available to try at https://urum.apertium.org/.
Poster[edit]
Installation[edit]
Language data, APy, and HTML-Tools requirements[edit]
- Get APy and HTML-Tools running
- You'll need a version of APy that supports billookup and bilsearch modes,
currently the embeddings branchin master branch as of 2025-10-14. - You'll need a version of HTML-Tools that supports paradigm dictionary mode, currently the urum branch.
- You'll need a version of APy that supports billookup and bilsearch modes,
- Make sure your config.ts (in HTML-Tools) is updated to include the following:
- Mode.Dictionary should be in enabledModes list (and can be set as the defaultMode)
- apyURL and htmlUrl will need to be updated to match your setup
- Ensure that at least one language pair has a billookup mode and bilsearch mode in at least one direction (ideally both):
- modes.xml will need blocks like in uum-eng.
Setting up paradigms[edit]
- Add a language module to dictionary/langs/
- Add a reference to the language file in dictionary/index.ts
- Localisation for labels is in strings/pos
Adding embeddings[edit]
Embeddings allow searches to return semantically similar results. This is optional.
Real documentation to become available. For now some hints:
- You'll need a way to generate embeddings, cf. the scripts in uum-eng
- You'll need to compile embeddings into a transducer, cf. uum-eng
- You'll need to add a block to the modes.xml file, cf. uum-eng
- An APy version that supports embeddings should then be able to find it and serve it.
- You will probably want to recompile the embeddings transducer every time you update the bilingual dictionary. This should probably be implemented in Makefiles.