Transliteration

From Apertium
Revision as of 12:29, 2 February 2014 by Tachyons (talk | contribs) (corrected a typo)
Jump to navigation Jump to search

You can do transliteration in lttoolbox using a .dix file and lt-proc -t. This is especially useful for languages with different writing systems where you want to transliterate unknown words into the other writing system. For example:

$ echo "njenje džedže dzedze" | lt-proc -t sh-mk.translit.bin 
њење џеџе ѕеѕе

Note: that the transliteration is done like analysis, left-to-right longest match (LRLM). So, this means you can put strings to transliterate that are as long as you want and they will always be matched LRLM.

Full example

<dictionary>
  <alphabet>АБВГДЃЕЖЗЅИЈКЛЉМНЊОПРСТЌУФХЦЧЏШабвгдѓежзѕијклљмнњопрстќуфхцчџшABCČĆDDžĐEFGHIJKLLjMNNjOPRSŠTUVZŽabcčćddžđefghijklljmnnjoprsštuvzžwWxXyY</alphabet>
  <sdefs/>

  <section id="main" type="inconditional">
    <e><p><l>nj</l><r>њ</r></p></e>
    <e><p><l>lj</l><r>љ</r></p></e>
    <e><p><l>dž</l><r>џ</r></p></e>
    <e><p><l>dz</l><r>ѕ</r></p></e>
    <e><p><l>a</l><r>а</r></p></e>
    <e><p><l>b</l><r>б</r></p></e>
    <e><p><l>c</l><r>ц</r></p></e>
    <e><p><l>d</l><r>д</r></p></e>
    <e><p><l>e</l><r>е</r></p></e>
    <e><p><l>f</l><r>ф</r></p></e>
    <e><p><l>g</l><r>г</r></p></e>
    <e><p><l>h</l><r>х</r></p></e>
    <e><p><l>i</l><r>и</r></p></e>
    <e><p><l>j</l><r>ј</r></p></e>
    <e><p><l>k</l><r>к</r></p></e>
    <e><p><l>l</l><r>л</r></p></e>
    <e><p><l>m</l><r>м</r></p></e>
    <e><p><l>n</l><r>н</r></p></e>
    <e><p><l>o</l><r>о</r></p></e>
    <e><p><l>p</l><r>п</r></p></e>
    <e><p><l>r</l><r>р</r></p></e>
    <e><p><l>s</l><r>с</r></p></e>
    <e><p><l>t</l><r>т</r></p></e>
    <e><p><l>u</l><r>у</r></p></e>
    <e><p><l>v</l><r>в</r></p></e>
    <e><p><l>z</l><r>з</r></p></e>
    <e><p><l>ž</l><r>ж</r></p></e>
    <e><p><l>š</l><r>ш</r></p></e>
    <e><p><l>č</l><r>ч</r></p></e>
    <e><p><l>đ</l><r>ѓ</r></p></e>
    <e><p><l>ć</l><r>ќ</r></p></e>
  </section>
</dictionary>
Compile
$ lt-comp lr test.dix test.bin
Use
$ echo "Dobrodošli na Wikipediju srpskohrvatskog jezika, slobodnu enciklopediju koju svako može uređivati." | lt-proc -t sh-mk.translit.bin
Добродошли на Wикипедију српскохрватског језика, слободну енциклопедију коју свако може уреѓивати.