Difference between revisions of "Dictionary maintenance"

From Apertium
Jump to navigation Jump to search
m
m
 
(13 intermediate revisions by 6 users not shown)
Line 1: Line 1:
{{TOCD}}
== The problem ==

== Including parts of dictionaries ==

The problem is that some parts of dictionaries that are standard between dictionaries in a pair are not kept in one file, but several (for example symbol definitions).

===Solutions===

* Use XInclude + xmllint to preprocess two xml files into a .dix file, then validate and compile the .dix file (cy-en, en-af use this)

== Different registers/varieties/standards ==

In some pairs, e.g. Catalan, Portuguese, there is support for generating a particular standard of a language (e.g. Brazilian Portuguese, Valencian). The way this is done may need to be looked at.

== Metadix ==
{{main|Metadix}}

== Lextor ==

== Keeping monodix updated ==

'''This section is outdated, see [[Languages]] and [[Automatically trimming a monodix]]'''


Problem managing conflict edits for [[monodix]]
Problem managing conflict edits for [[monodix]]
Line 5: Line 26:
There are more and more pairs, for examples 7 pairs with English and the monodix en is copied in every pair.
There are more and more pairs, for examples 7 pairs with English and the monodix en is copied in every pair.


What happens if developper A and developper B edit both the en monodix, say in en-fr and en-es for example? Answer: another developper has to look on both version, look the diff and try to merge. Most of the time developper A tells developper to wait a few minutes or hours, then he commits and tells to developper B that he may now copy his version and starts working.
What happens if developer A and developer B edit both the en [[monodix]], say in en-fr and en-es for example? Answer: another developer has to look on both version, look the diff and try to merge. Most of the time developer A tells developer to wait a few minutes or hours, then he commits and tells to developer B that he may now copy his version and starts working.


That is time consuming. For the near future it is manageable, since there are now only a half dozen developpers that regulary go on irc to solve these issue. But in the long-term, it would become harder and harder. Imagine if there are 20 or 50 pairs with English. Imagine that all developpers do not want to wait. There would be different monodix.
That is time consuming. For the near future it is manageable, since there are now only a half dozen developers that regulary go on irc to solve these issues. But in the long-term, it would become harder and harder. Imagine if there are 20 or 50 pairs with English. Imagine that all developers do not want to wait. There would be different [[monodix]].


==Issues==
===Issues===


* Language specific sections of monodix files.
* Language specific sections of monodix files.


== Ideas to solve ==
=== Ideas to solve ===

* Language specific parts could be split out into separate files, and then XIncluded, such as currently happens in several pairs (e.g. cy-en, en-af) with the symbol definitions <sdefs>.
** <font color="green">A possible solution</font> could be the '''[[Sort a dictionary|sort task]]''' available in the '''[[dixtools]]''' package (--[[User:Ebenimeli|Ebenimeli]] 16:31, 11 July 2007 (BST)).

=== Suggestions ===

#Table of contents for paradigms (It would be nice to have a kind of table of content of paradigms generated by script with a list of all paradigms in a monodix. For example "wo/nen, k/omen, etc". Words would be put in several categories : nouns, adjectives, verbs, etc.)
#Interface to add new words (It would be nice to have an inteface to add new words. That would attract non-geeks)
#Re-ordering of items in the dictionary (pardefs -> alphabetical order, sections -> alphabetical order, POS order etc.)
#Splitting the data of monodix in several files : paradigms and lemmas or lemmas according to categories (verbs, nouns, adjectives, etc).

== See also ==
* [[Sort a dictionary|How to sort a dictionary]]

[[Category:Documentation in English]]

Latest revision as of 15:31, 26 September 2016

Including parts of dictionaries[edit]

The problem is that some parts of dictionaries that are standard between dictionaries in a pair are not kept in one file, but several (for example symbol definitions).

Solutions[edit]

  • Use XInclude + xmllint to preprocess two xml files into a .dix file, then validate and compile the .dix file (cy-en, en-af use this)

Different registers/varieties/standards[edit]

In some pairs, e.g. Catalan, Portuguese, there is support for generating a particular standard of a language (e.g. Brazilian Portuguese, Valencian). The way this is done may need to be looked at.

Metadix[edit]

Main article: Metadix

Lextor[edit]

Keeping monodix updated[edit]

This section is outdated, see Languages and Automatically trimming a monodix

Problem managing conflict edits for monodix

There are more and more pairs, for examples 7 pairs with English and the monodix en is copied in every pair.

What happens if developer A and developer B edit both the en monodix, say in en-fr and en-es for example? Answer: another developer has to look on both version, look the diff and try to merge. Most of the time developer A tells developer to wait a few minutes or hours, then he commits and tells to developer B that he may now copy his version and starts working.

That is time consuming. For the near future it is manageable, since there are now only a half dozen developers that regulary go on irc to solve these issues. But in the long-term, it would become harder and harder. Imagine if there are 20 or 50 pairs with English. Imagine that all developers do not want to wait. There would be different monodix.

Issues[edit]

  • Language specific sections of monodix files.

Ideas to solve[edit]

  • Language specific parts could be split out into separate files, and then XIncluded, such as currently happens in several pairs (e.g. cy-en, en-af) with the symbol definitions <sdefs>.

Suggestions[edit]

  1. Table of contents for paradigms (It would be nice to have a kind of table of content of paradigms generated by script with a list of all paradigms in a monodix. For example "wo/nen, k/omen, etc". Words would be put in several categories : nouns, adjectives, verbs, etc.)
  2. Interface to add new words (It would be nice to have an inteface to add new words. That would attract non-geeks)
  3. Re-ordering of items in the dictionary (pardefs -> alphabetical order, sections -> alphabetical order, POS order etc.)
  4. Splitting the data of monodix in several files : paradigms and lemmas or lemmas according to categories (verbs, nouns, adjectives, etc).

See also[edit]