Difference between revisions of "Dictionary maintenance"
m |
m |
||
(13 intermediate revisions by 6 users not shown) | |||
Line 1: | Line 1: | ||
{{TOCD}} |
|||
⚫ | |||
== Including parts of dictionaries == |
|||
The problem is that some parts of dictionaries that are standard between dictionaries in a pair are not kept in one file, but several (for example symbol definitions). |
|||
===Solutions=== |
|||
* Use XInclude + xmllint to preprocess two xml files into a .dix file, then validate and compile the .dix file (cy-en, en-af use this) |
|||
== Different registers/varieties/standards == |
|||
In some pairs, e.g. Catalan, Portuguese, there is support for generating a particular standard of a language (e.g. Brazilian Portuguese, Valencian). The way this is done may need to be looked at. |
|||
== Metadix == |
|||
{{main|Metadix}} |
|||
== Lextor == |
|||
== Keeping monodix updated == |
|||
'''This section is outdated, see [[Languages]] and [[Automatically trimming a monodix]]''' |
|||
Problem managing conflict edits for [[monodix]] |
Problem managing conflict edits for [[monodix]] |
||
Line 5: | Line 26: | ||
There are more and more pairs, for examples 7 pairs with English and the monodix en is copied in every pair. |
There are more and more pairs, for examples 7 pairs with English and the monodix en is copied in every pair. |
||
What happens if |
What happens if developer A and developer B edit both the en [[monodix]], say in en-fr and en-es for example? Answer: another developer has to look on both version, look the diff and try to merge. Most of the time developer A tells developer to wait a few minutes or hours, then he commits and tells to developer B that he may now copy his version and starts working. |
||
That is time consuming. For the near future it is manageable, since there are now only a half dozen |
That is time consuming. For the near future it is manageable, since there are now only a half dozen developers that regulary go on irc to solve these issues. But in the long-term, it would become harder and harder. Imagine if there are 20 or 50 pairs with English. Imagine that all developers do not want to wait. There would be different [[monodix]]. |
||
==Issues== |
===Issues=== |
||
* Language specific sections of monodix files. |
* Language specific sections of monodix files. |
||
== Ideas to solve == |
=== Ideas to solve === |
||
* Language specific parts could be split out into separate files, and then XIncluded, such as currently happens in several pairs (e.g. cy-en, en-af) with the symbol definitions <sdefs>. |
|||
** <font color="green">A possible solution</font> could be the '''[[Sort a dictionary|sort task]]''' available in the '''[[dixtools]]''' package (--[[User:Ebenimeli|Ebenimeli]] 16:31, 11 July 2007 (BST)). |
|||
=== Suggestions === |
|||
#Table of contents for paradigms (It would be nice to have a kind of table of content of paradigms generated by script with a list of all paradigms in a monodix. For example "wo/nen, k/omen, etc". Words would be put in several categories : nouns, adjectives, verbs, etc.) |
|||
#Interface to add new words (It would be nice to have an inteface to add new words. That would attract non-geeks) |
|||
#Re-ordering of items in the dictionary (pardefs -> alphabetical order, sections -> alphabetical order, POS order etc.) |
|||
#Splitting the data of monodix in several files : paradigms and lemmas or lemmas according to categories (verbs, nouns, adjectives, etc). |
|||
⚫ | |||
* [[Sort a dictionary|How to sort a dictionary]] |
|||
[[Category:Documentation in English]] |
Latest revision as of 15:31, 26 September 2016
Including parts of dictionaries[edit]
The problem is that some parts of dictionaries that are standard between dictionaries in a pair are not kept in one file, but several (for example symbol definitions).
Solutions[edit]
- Use XInclude + xmllint to preprocess two xml files into a .dix file, then validate and compile the .dix file (cy-en, en-af use this)
Different registers/varieties/standards[edit]
In some pairs, e.g. Catalan, Portuguese, there is support for generating a particular standard of a language (e.g. Brazilian Portuguese, Valencian). The way this is done may need to be looked at.
Metadix[edit]
- Main article: Metadix
Lextor[edit]
Keeping monodix updated[edit]
This section is outdated, see Languages and Automatically trimming a monodix
Problem managing conflict edits for monodix
There are more and more pairs, for examples 7 pairs with English and the monodix en is copied in every pair.
What happens if developer A and developer B edit both the en monodix, say in en-fr and en-es for example? Answer: another developer has to look on both version, look the diff and try to merge. Most of the time developer A tells developer to wait a few minutes or hours, then he commits and tells to developer B that he may now copy his version and starts working.
That is time consuming. For the near future it is manageable, since there are now only a half dozen developers that regulary go on irc to solve these issues. But in the long-term, it would become harder and harder. Imagine if there are 20 or 50 pairs with English. Imagine that all developers do not want to wait. There would be different monodix.
Issues[edit]
- Language specific sections of monodix files.
Ideas to solve[edit]
- Language specific parts could be split out into separate files, and then XIncluded, such as currently happens in several pairs (e.g. cy-en, en-af) with the symbol definitions <sdefs>.
Suggestions[edit]
- Table of contents for paradigms (It would be nice to have a kind of table of content of paradigms generated by script with a list of all paradigms in a monodix. For example "wo/nen, k/omen, etc". Words would be put in several categories : nouns, adjectives, verbs, etc.)
- Interface to add new words (It would be nice to have an inteface to add new words. That would attract non-geeks)
- Re-ordering of items in the dictionary (pardefs -> alphabetical order, sections -> alphabetical order, POS order etc.)
- Splitting the data of monodix in several files : paradigms and lemmas or lemmas according to categories (verbs, nouns, adjectives, etc).