Apertium has moved from SourceForge to GitHub.
If you have any questions, please come and talk to us on #apertium on irc.freenode.net or contact the GitHub migration team.

ATT format

From Apertium
Jump to navigation Jump to search

ATT format is a transducer format based on a four-column layout. It is a tab separated four-column format.

Both lttoolbox and HFST can read ATT format as input to compile dictionaries (lt-comp, hfst-txt2fst), and print compiled dictionaries to ATT format (lt-print, hfst-fst2txt).


Say we want to represent the following transducer:

Test att.png

We can do it thusly:

$ cat test.dix 
    <sdef n="n"/>
  <section id="main" type="standard">

$ lt-comp lr test.dix test.bin
main@standard 5 4

$ lt-print test.bin 
0	1	t	f	
1	2	e	o	
2	3	s	o	
3	4	t	ε	


AT&T format supports "weights", for example to estimate likelihoods. The default interpretation is bigger the weight (heavier) the worse it is (aka penalties). E.g.:

0	1	c	c	1.000000
0	2	d	d	2.000000
1	3	a	a	0.000000
2	4	o	o	0.000000
3	5	t	t	0.000000
4	5	g	g	0.000000
5	6	s	s	10.000000
5	0.000000
6	0.000000

would be appropriate to have weights 1 for cat, 2 for dog, and additional 10 pounds for beign a plural. Commonly weights are estimated e.g. from probabilities using -log().

See also[edit]