Difference between revisions of "ACX format"

From Apertium
Jump to navigation Jump to search
(Documentation in English)
Line 11: Line 11:
 
<equiv-char value="C"/>
 
<equiv-char value="C"/>
 
</char>
 
</char>
</pre>
 
 
==Compilation==
 
 
The program [[lt-comp]] takes one more argument, the ACX file, for example:
 
 
<pre>
 
$ lt-comp lr apertium-es-ro.ro.dix ro-es.automorf.bin apertium-es-ro.ro.acx
 
apostrophes@postblank 104 134
 
final@inconditional 24 479
 
main@standard 43130 81174
 
 
</pre>
 
</pre>
   
Line 56: Line 45:
 
</char>
 
</char>
 
</analysis-chars>
 
</analysis-chars>
 
</pre>
  +
  +
  +
 
==Compilation==
  +
  +
===lttoolbox===
  +
 
The program [[lt-comp]] takes one more argument, the ACX file, for example:
  +
 
<pre>
 
$ lt-comp lr apertium-es-ro.ro.dix ro-es.automorf.bin apertium-es-ro.ro.acx
 
apostrophes@postblank 104 134
 
final@inconditional 24 479
 
main@standard 43130 81174
  +
</pre>
  +
  +
===HFST===
  +
  +
<pre>
  +
$ hfst-expand-equivalences -a apertium-es-ro.ro.acx romanian.hfst -o romanian-acx.hfst
 
</pre>
 
</pre>
   

Revision as of 12:40, 9 September 2012

The ACX format is used for describing equivalent characters in monodices. If a language has multiple methods of writing a character, for example with Romanian ș and ş, then you can use the file to define them as being equivalent.

It can also be used in languages where the apostrophe is grammatically important (e.g. Catalan) to make sure that several different variants are accepted for analysis. The format is defined in the file acx.rng which can be found in both the lttoolbox and apertium modules in SVN.

The character equivalence, "B and C are equivalent to A", is expressed as follows:

  <char value="A">
    <equiv-char value="B"/>
    <equiv-char value="C"/>
  </char>

Example file

The file apertium-es-ro.ro.acx from apertium-es-ro.

<?xml version="1.0"?>
<analysis-chars>
  <!-- Make apostrophe variants equal ' -->
  <char value="'">
    <equiv-char value="’"/>
    <equiv-char value="ʼ"/>
  </char>

  <!-- Legacy values for characters with comma -->
  <char value="ț">
    <equiv-char value="ţ"/>
  </char>
  <char value="Ț">
    <equiv-char value="Ţ"/>
  </char>
  <char value="ș">
    <equiv-char value="ş"/>
  </char>
  <char value="Ș">
    <equiv-char value="Ş"/>
  </char>

  <!-- Orthographic variant -->
  <char value="â">
    <equiv-char value="î"/>
  </char>
</analysis-chars>


Compilation

lttoolbox

The program lt-comp takes one more argument, the ACX file, for example:

$ lt-comp lr apertium-es-ro.ro.dix ro-es.automorf.bin apertium-es-ro.ro.acx 
apostrophes@postblank 104 134
final@inconditional 24 479
main@standard 43130 81174

HFST

$ hfst-expand-equivalences -a apertium-es-ro.ro.acx romanian.hfst -o romanian-acx.hfst