Apertium has moved from SourceForge to GitHub.
If you have any questions, please come and talk to us on #apertium on irc.freenode.net or contact the GitHub migration team.

Dictionary maintenance

From Apertium
(Difference between revisions)
Jump to: navigation, search
(Added a suggestion)
m
 
(10 intermediate revisions by 5 users not shown)
Line 1: Line 1:
== The problem ==
+
{{TOCD}}
  +
  +
== Including parts of dictionaries ==
  +
  +
The problem is that some parts of dictionaries that are standard between dictionaries in a pair are not kept in one file, but several (for example symbol definitions).
  +
  +
===Solutions===
  +
  +
* Use XInclude + xmllint to preprocess two xml files into a .dix file, then validate and compile the .dix file (cy-en, en-af use this)
  +
  +
== Different registers/varieties/standards ==
  +
  +
In some pairs, e.g. Catalan, Portuguese, there is support for generating a particular standard of a language (e.g. Brazilian Portuguese, Valencian). The way this is done may need to be looked at.
  +
  +
== Metadix ==
  +
{{main|Metadix}}
  +
  +
== Lextor ==
  +
  +
== Keeping monodix updated ==
  +
  +
'''This section is outdated, see [[Languages]] and [[Automatically trimming a monodix]]'''
   
 
Problem managing conflict edits for [[monodix]]
 
Problem managing conflict edits for [[monodix]]
Line 9: Line 9:
 
That is time consuming. For the near future it is manageable, since there are now only a half dozen developers that regulary go on irc to solve these issues. But in the long-term, it would become harder and harder. Imagine if there are 20 or 50 pairs with English. Imagine that all developers do not want to wait. There would be different [[monodix]].
 
That is time consuming. For the near future it is manageable, since there are now only a half dozen developers that regulary go on irc to solve these issues. But in the long-term, it would become harder and harder. Imagine if there are 20 or 50 pairs with English. Imagine that all developers do not want to wait. There would be different [[monodix]].
   
==Issues==
+
===Issues===
   
 
* Language specific sections of monodix files.
 
* Language specific sections of monodix files.
   
== Ideas to solve ==
+
=== Ideas to solve ===
   
 
* Language specific parts could be split out into separate files, and then XIncluded, such as currently happens in several pairs (e.g. cy-en, en-af) with the symbol definitions <sdefs>.
 
* Language specific parts could be split out into separate files, and then XIncluded, such as currently happens in several pairs (e.g. cy-en, en-af) with the symbol definitions <sdefs>.
  +
** <font color="green">A possible solution</font> could be the '''[[Sort a dictionary|sort task]]''' available in the '''[[dixtools]]''' package (--[[User:Ebenimeli|Ebenimeli]] 16:31, 11 July 2007 (BST)).
  +
  +
=== Suggestions ===
   
== Suggestion ==
+
#Table of contents for paradigms (It would be nice to have a kind of table of content of paradigms generated by script with a list of all paradigms in a monodix. For example "wo/nen, k/omen, etc". Words would be put in several categories : nouns, adjectives, verbs, etc.)
  +
#Interface to add new words (It would be nice to have an inteface to add new words. That would attract non-geeks)
  +
#Re-ordering of items in the dictionary (pardefs -> alphabetical order, sections -> alphabetical order, POS order etc.)
  +
#Splitting the data of monodix in several files : paradigms and lemmas or lemmas according to categories (verbs, nouns, adjectives, etc).
   
=== Table of content of paradigms ===
+
== See also ==
  +
* [[Sort a dictionary|How to sort a dictionary]]
   
It would be nice to have a kind of table of content of paradigms generated by script with a list of all paradigms in a monodix. For example "wo/nen, k/omen, etc". Words would be put in several categories : nouns, adjectives, verbs, etc.
+
[[Category:Documentation in English]]

Latest revision as of 16:31, 26 September 2016

Contents

[edit] Including parts of dictionaries

The problem is that some parts of dictionaries that are standard between dictionaries in a pair are not kept in one file, but several (for example symbol definitions).

[edit] Solutions

  • Use XInclude + xmllint to preprocess two xml files into a .dix file, then validate and compile the .dix file (cy-en, en-af use this)

[edit] Different registers/varieties/standards

In some pairs, e.g. Catalan, Portuguese, there is support for generating a particular standard of a language (e.g. Brazilian Portuguese, Valencian). The way this is done may need to be looked at.

[edit] Metadix

Main article: Metadix

[edit] Lextor

[edit] Keeping monodix updated

This section is outdated, see Languages and Automatically trimming a monodix

Problem managing conflict edits for monodix

There are more and more pairs, for examples 7 pairs with English and the monodix en is copied in every pair.

What happens if developer A and developer B edit both the en monodix, say in en-fr and en-es for example? Answer: another developer has to look on both version, look the diff and try to merge. Most of the time developer A tells developer to wait a few minutes or hours, then he commits and tells to developer B that he may now copy his version and starts working.

That is time consuming. For the near future it is manageable, since there are now only a half dozen developers that regulary go on irc to solve these issues. But in the long-term, it would become harder and harder. Imagine if there are 20 or 50 pairs with English. Imagine that all developers do not want to wait. There would be different monodix.

[edit] Issues

  • Language specific sections of monodix files.

[edit] Ideas to solve

  • Language specific parts could be split out into separate files, and then XIncluded, such as currently happens in several pairs (e.g. cy-en, en-af) with the symbol definitions <sdefs>.

[edit] Suggestions

  1. Table of contents for paradigms (It would be nice to have a kind of table of content of paradigms generated by script with a list of all paradigms in a monodix. For example "wo/nen, k/omen, etc". Words would be put in several categories : nouns, adjectives, verbs, etc.)
  2. Interface to add new words (It would be nice to have an inteface to add new words. That would attract non-geeks)
  3. Re-ordering of items in the dictionary (pardefs -> alphabetical order, sections -> alphabetical order, POS order etc.)
  4. Splitting the data of monodix in several files : paradigms and lemmas or lemmas according to categories (verbs, nouns, adjectives, etc).

[edit] See also

Personal tools