Difference between revisions of "ATT format"

From Apertium
Jump to navigation Jump to search
(Created page with ''''ATT format''' is a transducer format based on a four-column layout. It is a tab separated four-column format. ==Example output== <pre> $ cat test.dix <dictionary> <alp…')
 
 
(5 intermediate revisions by 4 users not shown)
Line 1: Line 1:
 
'''ATT format''' is a transducer format based on a four-column layout. It is a tab separated four-column format.
 
'''ATT format''' is a transducer format based on a four-column layout. It is a tab separated four-column format.
   
  +
Both lttoolbox and HFST can read ATT format as input to compile dictionaries (<code>lt-comp</code>, <code>hfst-txt2fst</code>), and print compiled dictionaries to ATT format (<code>lt-print</code>, <code>hfst-fst2txt</code>).
   
 
==Example==
  +
Say we want to represent the following transducer:
   
  +
[[File:Test att.png|thumb|center|500px]]
==Example output==
 
   
  +
We can do it thusly:
 
<pre>
 
<pre>
 
 
$ cat test.dix
 
$ cat test.dix
 
<dictionary>
 
<dictionary>
Line 29: Line 32:
 
3 4 t ε
 
3 4 t ε
 
4
 
4
  +
</pre>
   
  +
  +
==Weights==
  +
  +
AT&T format supports "weights", for example to estimate likelihoods. The default interpretation is bigger the weight (heavier) the worse it is (aka penalties). E.g.:
  +
  +
<pre>
  +
0 1 c c 1.000000
  +
0 2 d d 2.000000
  +
1 3 a a 0.000000
  +
2 4 o o 0.000000
  +
3 5 t t 0.000000
  +
4 5 g g 0.000000
  +
5 6 s s 10.000000
  +
5 0.000000
  +
6 0.000000
 
</pre>
 
</pre>
   
  +
would be appropriate to have weights 1 for cat, 2 for dog, and additional 10 pounds for beign a plural.
  +
Commonly weights are estimated e.g. from probabilities using -log().
  +
  +
  +
==See also==
  +
  +
* [[HFST]]
  +
* [[lttoolbox]]
   
   
 
[[Category:Documentation]]
 
[[Category:Documentation]]
  +
[[Category:Documentation in English]]

Latest revision as of 21:24, 13 March 2017

ATT format is a transducer format based on a four-column layout. It is a tab separated four-column format.

Both lttoolbox and HFST can read ATT format as input to compile dictionaries (lt-comp, hfst-txt2fst), and print compiled dictionaries to ATT format (lt-print, hfst-fst2txt).

Example[edit]

Say we want to represent the following transducer:

Test att.png

We can do it thusly:

$ cat test.dix 
<dictionary>
  <alphabet>abcdefghijklmnopqrstuvwxyz</alphabet>
  <sdefs>
    <sdef n="n"/>
  </sdefs>
  <section id="main" type="standard">
    <e><p><l>test</l><r>foo</r></p></e>
  </section>
</dictionary>


$ lt-comp lr test.dix test.bin
main@standard 5 4


$ lt-print test.bin 
0	1	t	f	
1	2	e	o	
2	3	s	o	
3	4	t	ε	
4


Weights[edit]

AT&T format supports "weights", for example to estimate likelihoods. The default interpretation is bigger the weight (heavier) the worse it is (aka penalties). E.g.:

0	1	c	c	1.000000
0	2	d	d	2.000000
1	3	a	a	0.000000
2	4	o	o	0.000000
3	5	t	t	0.000000
4	5	g	g	0.000000
5	6	s	s	10.000000
5	0.000000
6	0.000000

would be appropriate to have weights 1 for cat, 2 for dog, and additional 10 pounds for beign a plural. Commonly weights are estimated e.g. from probabilities using -log().


See also[edit]