Difference between revisions of "ACX format"
Jump to navigation
Jump to search
(New page: The '''ACX format''' is used for describing equivalent characters in monodices. If a language has multiple methods of writing a character, for example with Romanian ș and ş, ...) |
|||
(6 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
[[Le format ACX|En français]] |
|||
⚫ | The '''ACX format''' is used for describing equivalent characters in [[monodix|monodices]]. If a language has multiple methods of writing a character, for example with Romanian ș and ş, then you can use the file to define them as being ''equivalent''. |
||
{{TOCD}} |
|||
⚫ | |||
⚫ | |||
⚫ | It can also be used in languages where the apostrophe is grammatically important (e.g. Catalan) to make sure that several different variants are accepted for analysis. The format is defined in the file <code>acx.rng</code> which can be found in both the <code>lttoolbox</code> and <code>apertium</code> modules on [https://github.com/apertium GitHub]. |
||
The character equivalence, "B and C are equivalent to A", is expressed as follows: |
|||
<pre> |
|||
<char value="A"> |
|||
<equiv-char value="B"/> |
|||
<equiv-char value="C"/> |
|||
</char> |
|||
</pre> |
|||
==Example file== |
==Example file== |
||
Line 35: | Line 47: | ||
</char> |
</char> |
||
</analysis-chars> |
</analysis-chars> |
||
</pre> |
|||
==Compilation== |
|||
===lttoolbox=== |
|||
The program [[lt-comp]] takes one more argument, the ACX file, for example: |
|||
<pre> |
|||
$ lt-comp lr apertium-es-ro.ro.dix ro-es.automorf.bin apertium-es-ro.ro.acx |
|||
apostrophes@postblank 104 134 |
|||
final@inconditional 24 479 |
|||
main@standard 43130 81174 |
|||
</pre> |
|||
===HFST=== |
|||
<pre> |
|||
$ hfst-expand-equivalences -a apertium-es-ro.ro.acx romanian.hfst -o romanian-acx.hfst |
|||
</pre> |
</pre> |
||
[[Category:Formats]] |
[[Category:Formats]] |
||
[[Category:Documentation in English]] |
Latest revision as of 01:17, 24 March 2018
Contents |
The ACX format is used for describing equivalent characters in monodices. If a language has multiple methods of writing a character, for example with Romanian ș and ş, then you can use the file to define them as being equivalent.
It can also be used in languages where the apostrophe is grammatically important (e.g. Catalan) to make sure that several different variants are accepted for analysis. The format is defined in the file acx.rng
which can be found in both the lttoolbox
and apertium
modules on GitHub.
The character equivalence, "B and C are equivalent to A", is expressed as follows:
<char value="A"> <equiv-char value="B"/> <equiv-char value="C"/> </char>
Example file[edit]
The file apertium-es-ro.ro.acx
from apertium-es-ro
.
<?xml version="1.0"?> <analysis-chars> <!-- Make apostrophe variants equal ' --> <char value="'"> <equiv-char value="’"/> <equiv-char value="ʼ"/> </char> <!-- Legacy values for characters with comma --> <char value="ț"> <equiv-char value="ţ"/> </char> <char value="Ț"> <equiv-char value="Ţ"/> </char> <char value="ș"> <equiv-char value="ş"/> </char> <char value="Ș"> <equiv-char value="Ş"/> </char> <!-- Orthographic variant --> <char value="â"> <equiv-char value="î"/> </char> </analysis-chars>
Compilation[edit]
lttoolbox[edit]
The program lt-comp takes one more argument, the ACX file, for example:
$ lt-comp lr apertium-es-ro.ro.dix ro-es.automorf.bin apertium-es-ro.ro.acx apostrophes@postblank 104 134 final@inconditional 24 479 main@standard 43130 81174
HFST[edit]
$ hfst-expand-equivalences -a apertium-es-ro.ro.acx romanian.hfst -o romanian-acx.hfst