Difference between revisions of "Transliteration"

From Apertium
Jump to navigation Jump to search
(Created page with ' You can do transliteration in lttoolbox using a <code>.dix</code> file and <code>lt-proc -t</code>. This is especially useful for languages with different writing systems where …')
 
m
 
(3 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
You can do transliteration in lttoolbox using a <code>.dix</code> file and <code>lt-proc -t</code>. This is especially useful for languages with different writing systems where you want to ''transliterate'' unknown words into the other writing system. For example:
   
  +
<pre>
You can do transliteration in lttoolbox using a <code>.dix</code> file and <code>lt-proc -t</code>. This is especially useful for languages with different writing systems where you want to ''transliterate'' unknown words into the other writing system.
 
  +
$ echo "njenje džedže dzedze" | lt-proc -t sh-mk.translit.bin
  +
њење џеџе ѕеѕе
  +
</pre>
  +
  +
Note: that the transliteration is done like analysis, left-to-right longest match (LRLM). So, this means you can put strings to transliterate that are as long as you want and they will always be matched LRLM.
   
==Example==
+
==Full example==
   
 
<pre>
 
<pre>
Line 48: Line 54:
   
 
<pre>
 
<pre>
$ lt-proc lr test.dix test.bin
+
$ lt-comp lr test.dix test.bin
 
</pre>
 
</pre>
   
Line 56: Line 62:
 
$ echo "Dobrodošli na Wikipediju srpskohrvatskog jezika, slobodnu enciklopediju koju svako može uređivati." | lt-proc -t sh-mk.translit.bin
 
$ echo "Dobrodošli na Wikipediju srpskohrvatskog jezika, slobodnu enciklopediju koju svako može uređivati." | lt-proc -t sh-mk.translit.bin
 
Добродошли на Wикипедију српскохрватског језика, слободну енциклопедију коју свако може уреѓивати.
 
Добродошли на Wикипедију српскохрватског језика, слободну енциклопедију коју свако може уреѓивати.
 
 
</pre>
 
</pre>
   
   
 
[[Category:Documentation]]
 
[[Category:Documentation]]
  +
[[Category:Documentation in English]]

Latest revision as of 12:19, 26 September 2016

You can do transliteration in lttoolbox using a .dix file and lt-proc -t. This is especially useful for languages with different writing systems where you want to transliterate unknown words into the other writing system. For example:

$ echo "njenje džedže dzedze" | lt-proc -t sh-mk.translit.bin 
њење џеџе ѕеѕе

Note: that the transliteration is done like analysis, left-to-right longest match (LRLM). So, this means you can put strings to transliterate that are as long as you want and they will always be matched LRLM.

Full example[edit]

<dictionary>
  <alphabet>АБВГДЃЕЖЗЅИЈКЛЉМНЊОПРСТЌУФХЦЧЏШабвгдѓежзѕијклљмнњопрстќуфхцчџшABCČĆDDžĐEFGHIJKLLjMNNjOPRSŠTUVZŽabcčćddžđefghijklljmnnjoprsštuvzžwWxXyY</alphabet>
  <sdefs/>

  <section id="main" type="inconditional">
    <e><p><l>nj</l><r>њ</r></p></e>
    <e><p><l>lj</l><r>љ</r></p></e>
    <e><p><l>dž</l><r>џ</r></p></e>
    <e><p><l>dz</l><r>ѕ</r></p></e>
    <e><p><l>a</l><r>а</r></p></e>
    <e><p><l>b</l><r>б</r></p></e>
    <e><p><l>c</l><r>ц</r></p></e>
    <e><p><l>d</l><r>д</r></p></e>
    <e><p><l>e</l><r>е</r></p></e>
    <e><p><l>f</l><r>ф</r></p></e>
    <e><p><l>g</l><r>г</r></p></e>
    <e><p><l>h</l><r>х</r></p></e>
    <e><p><l>i</l><r>и</r></p></e>
    <e><p><l>j</l><r>ј</r></p></e>
    <e><p><l>k</l><r>к</r></p></e>
    <e><p><l>l</l><r>л</r></p></e>
    <e><p><l>m</l><r>м</r></p></e>
    <e><p><l>n</l><r>н</r></p></e>
    <e><p><l>o</l><r>о</r></p></e>
    <e><p><l>p</l><r>п</r></p></e>
    <e><p><l>r</l><r>р</r></p></e>
    <e><p><l>s</l><r>с</r></p></e>
    <e><p><l>t</l><r>т</r></p></e>
    <e><p><l>u</l><r>у</r></p></e>
    <e><p><l>v</l><r>в</r></p></e>
    <e><p><l>z</l><r>з</r></p></e>
    <e><p><l>ž</l><r>ж</r></p></e>
    <e><p><l>š</l><r>ш</r></p></e>
    <e><p><l>č</l><r>ч</r></p></e>
    <e><p><l>đ</l><r>ѓ</r></p></e>
    <e><p><l>ć</l><r>ќ</r></p></e>
  </section>
</dictionary>
Compile
$ lt-comp lr test.dix test.bin
Use
$ echo "Dobrodošli na Wikipediju srpskohrvatskog jezika, slobodnu enciklopediju koju svako može uređivati." | lt-proc -t sh-mk.translit.bin
Добродошли на Wикипедију српскохрватског језика, слободну енциклопедију коју свако може уреѓивати.