Difference between revisions of "User:Ilnar.salimzyan/Wishlist"

From Apertium
Jump to navigation Jump to search
 
(13 intermediate revisions by the same user not shown)
Line 1: Line 1:
  +
== Annotatrix ==
* for [[Annotatrix]]: comparing my annotation with annotation done by another user, calculating the inter-annotator agreement and easy merging of the two versions by highlighting sentences which differ.
 
  +
* pan-turkic+english+russian dictionary which we maintain (which is a superset of any Turkic-to-Turkic, Turkic-to-English, Turkic-to-Russian bidix and also is used for testing translators).
 
 
For [[Annotatrix]]: Comparing my annotation with annotation done by another user, calculating the inter-annotator agreement and easy merging of the two versions by highlighting sentences which differ.
  +
  +
== Pan-turkic-english-russian dictionary ==
  +
 
Pan-turkic+english+russian dictionary which we maintain (which is a superset of any Turkic-to-Turkic, Turkic-to-English, Turkic-to-Russian bidix and is also used for testing translators). Would help with classification of stems (i.e. with being consistent across Turkic pairs). Exporting that into OmegaWiki would be cool as well (although we need much less than what Omegawiki does).
  +
  +
I would store it in a format as close to the current bidix format as possible, e.g.:
  +
<pre>
  +
<e>
  +
<p>
  +
<tat>китап<sdef n="n"/></tat>
  +
<kaz>кітап<sdef n="n"/></kaz>
  +
<eng>book<sdef n="n"/></eng>
  +
<rus>книга<sdef n="n"/><sdef n="nn"/><sdef n="f"/></rus>
  +
</p>
  +
<e>
  +
</pre>
  +
  +
There are some little tricks in pardefs (which you can't have when using some spreadsheet or similar) useful when translating into/from Russian or English, they would work for any Turkic language.
  +
  +
The main motivation for such a dictionary is that we keep everything in one place so that we can control things like "мүмкін емес". The reason it appeared in kaz.lexc was that eng-kaz.dix required it. If there is a better way to handle it, we could agree upon one and store it in our pan-turkic dictionary. You know, one single point of leverage for harnessing the influence of non-turkic languages :)
  +
  +
=== A new DTD for .dix files ===
  +
  +
When creating a DTD for the multilingual dictionary described above, would be great to allow a <subsection> element which can have a "name" attribute and appears inside of <nowiki><section></nowiki> element. The idea is to be able to sort a .dix file automatically without breaking the internal order (which currently happens when the section names are just comments of the form <nowiki><!-- SECTION: some-noun-category-we-want-to-distinguish --></nowiki>
  +
  +
"Use/MT attribute should probably also be codified in DTD in case we ever want to export entries from the multilingual dictionary into a monolinguagual dictionary (lexc or dix, doesn't matter).

Latest revision as of 15:20, 8 March 2015

Annotatrix[edit]

For Annotatrix: Comparing my annotation with annotation done by another user, calculating the inter-annotator agreement and easy merging of the two versions by highlighting sentences which differ.

Pan-turkic-english-russian dictionary[edit]

Pan-turkic+english+russian dictionary which we maintain (which is a superset of any Turkic-to-Turkic, Turkic-to-English, Turkic-to-Russian bidix and is also used for testing translators). Would help with classification of stems (i.e. with being consistent across Turkic pairs). Exporting that into OmegaWiki would be cool as well (although we need much less than what Omegawiki does).

I would store it in a format as close to the current bidix format as possible, e.g.:

<e>
  <p>
    <tat>китап<sdef n="n"/></tat>
    <kaz>кітап<sdef n="n"/></kaz>
    <eng>book<sdef n="n"/></eng>
    <rus>книга<sdef n="n"/><sdef n="nn"/><sdef n="f"/></rus>
  </p>
<e>

There are some little tricks in pardefs (which you can't have when using some spreadsheet or similar) useful when translating into/from Russian or English, they would work for any Turkic language.

The main motivation for such a dictionary is that we keep everything in one place so that we can control things like "мүмкін емес". The reason it appeared in kaz.lexc was that eng-kaz.dix required it. If there is a better way to handle it, we could agree upon one and store it in our pan-turkic dictionary. You know, one single point of leverage for harnessing the influence of non-turkic languages :)

A new DTD for .dix files[edit]

When creating a DTD for the multilingual dictionary described above, would be great to allow a <subsection> element which can have a "name" attribute and appears inside of <section> element. The idea is to be able to sort a .dix file automatically without breaking the internal order (which currently happens when the section names are just comments of the form <!-- SECTION: some-noun-category-we-want-to-distinguish -->

"Use/MT attribute should probably also be codified in DTD in case we ever want to export entries from the multilingual dictionary into a monolinguagual dictionary (lexc or dix, doesn't matter).