Difference between revisions of "User:Ilnar.salimzyan/Wishlist"
Line 25: | Line 25: | ||
=== A new DTD for .dix files === |
=== A new DTD for .dix files === |
||
When creating a DTD for the multilingual dictionary described above, would be great to allow a <subsection> element which can have a "name" attribute and appears inside of <section> element. The idea is to be able to sort a .dix file automatically without breaking the internal order (which currently happens when the section names are just comments of the form <!-- SECTION: some-noun-category-we-want-to-distinguish --> |
Revision as of 14:53, 8 March 2015
Annotatrix
For Annotatrix: Comparing my annotation with annotation done by another user, calculating the inter-annotator agreement and easy merging of the two versions by highlighting sentences which differ.
Pan-turkic-english-russian dictionary
Pan-turkic+english+russian dictionary which we maintain (which is a superset of any Turkic-to-Turkic, Turkic-to-English, Turkic-to-Russian bidix and is also used for testing translators). Would help with classification of stems (i.e. with being consistent across Turkic pairs). Exporting that into OmegaWiki would be cool as well (although we need much less than what Omegawiki does).
I would store it in a format as close to the current bidix format as possible, e.g.:
<e> <p> <tat>китап<sdef n="n"/></tat> <kaz>кітап<sdef n="n"/></kaz> <eng>book<sdef n="n"/></eng> <rus>книга<sdef n="n"/><sdef n="nn"/><sdef n="f"/></rus> </p> <e>
There are some little tricks in pardefs (which you can't have when using some spreadsheet or similar) useful when translating into/from Russian or English, they would work for any Turkic language.
The main motivation for such a dictionary is that we keep everything in one place so that we can control things like "мүмкін емес". The reason it appeared in kaz.lexc was that eng-kaz.dix contained it. If there is a better way to handle it, we could agree upon one and store it in our pan-turkic dictionary. You know, one single point of leverage for harnessing the influence of non-turkic languages :)
A new DTD for .dix files
When creating a DTD for the multilingual dictionary described above, would be great to allow a <subsection> element which can have a "name" attribute and appears inside of <section> element. The idea is to be able to sort a .dix file automatically without breaking the internal order (which currently happens when the section names are just comments of the form