Difference between revisions of "ATT format"
Jump to navigation
Jump to search
(3 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
'''ATT format''' is a transducer format based on a four-column layout. It is a tab separated four-column format. |
'''ATT format''' is a transducer format based on a four-column layout. It is a tab separated four-column format. |
||
Both lttoolbox and HFST can read ATT format as input to compile dictionaries (<code>lt-comp</code>, <code>hfst-txt2fst</code>), and print compiled dictionaries to ATT format (<code>lt-print</code>, <code>hfst-fst2txt</code>). |
|||
⚫ | |||
⚫ | |||
Say we want to represent the following transducer: |
Say we want to represent the following transducer: |
||
[[File:Test att.png|thumb|center| |
[[File:Test att.png|thumb|center|500px]] |
||
We can do it thusly: |
We can do it thusly: |
||
<pre> |
<pre> |
||
$ cat test.dix |
$ cat test.dix |
||
<dictionary> |
<dictionary> |
||
Line 33: | Line 32: | ||
3 4 t ε |
3 4 t ε |
||
4 |
4 |
||
</pre> |
|||
==Weights== |
|||
AT&T format supports "weights", for example to estimate likelihoods. The default interpretation is bigger the weight (heavier) the worse it is (aka penalties). E.g.: |
|||
<pre> |
|||
0 1 c c 1.000000 |
|||
0 2 d d 2.000000 |
|||
1 3 a a 0.000000 |
|||
2 4 o o 0.000000 |
|||
3 5 t t 0.000000 |
|||
4 5 g g 0.000000 |
|||
5 6 s s 10.000000 |
|||
5 0.000000 |
|||
6 0.000000 |
|||
</pre> |
</pre> |
||
would be appropriate to have weights 1 for cat, 2 for dog, and additional 10 pounds for beign a plural. |
|||
Commonly weights are estimated e.g. from probabilities using -log(). |
|||
==See also== |
==See also== |
||
Line 43: | Line 62: | ||
[[Category:Documentation]] |
[[Category:Documentation]] |
||
[[Category:Documentation in English]] |
Latest revision as of 21:24, 13 March 2017
ATT format is a transducer format based on a four-column layout. It is a tab separated four-column format.
Both lttoolbox and HFST can read ATT format as input to compile dictionaries (lt-comp
, hfst-txt2fst
), and print compiled dictionaries to ATT format (lt-print
, hfst-fst2txt
).
Example[edit]
Say we want to represent the following transducer:
We can do it thusly:
$ cat test.dix <dictionary> <alphabet>abcdefghijklmnopqrstuvwxyz</alphabet> <sdefs> <sdef n="n"/> </sdefs> <section id="main" type="standard"> <e><p><l>test</l><r>foo</r></p></e> </section> </dictionary> $ lt-comp lr test.dix test.bin main@standard 5 4 $ lt-print test.bin 0 1 t f 1 2 e o 2 3 s o 3 4 t ε 4
Weights[edit]
AT&T format supports "weights", for example to estimate likelihoods. The default interpretation is bigger the weight (heavier) the worse it is (aka penalties). E.g.:
0 1 c c 1.000000 0 2 d d 2.000000 1 3 a a 0.000000 2 4 o o 0.000000 3 5 t t 0.000000 4 5 g g 0.000000 5 6 s s 10.000000 5 0.000000 6 0.000000
would be appropriate to have weights 1 for cat, 2 for dog, and additional 10 pounds for beign a plural. Commonly weights are estimated e.g. from probabilities using -log().