Difference between revisions of "Post-generator"

From Apertium
Jump to navigation Jump to search
Line 3: Line 3:
 
The source dictionary is typically named something like <code>apertium-cat.post-cat.dix</code>, while the compiled file gets a name like <code>spa-cat.autopgen.bin</code>.
 
The source dictionary is typically named something like <code>apertium-cat.post-cat.dix</code>, while the compiled file gets a name like <code>spa-cat.autopgen.bin</code>.
   
Here's a minimal example:
+
Here's a minimal example for turning ~a into an before vowels:
 
<pre>
 
<pre>
 
<?xml version="1.0" encoding="UTF-8"?>
 
<?xml version="1.0" encoding="UTF-8"?>

Revision as of 12:10, 31 December 2014

Many languages use a post-generator FST to fix minor orthographical issues. This FST is in lttoolbox format and is run by lt-proc with the -p or --post-generation switch. An example of such an orthographical issue is the "a" vs "an" difference in English. The english generator will output ~a, and the post-generation FST changes that to a or an depending on the following word.

The source dictionary is typically named something like apertium-cat.post-cat.dix, while the compiled file gets a name like spa-cat.autopgen.bin.

Here's a minimal example for turning ~a into an before vowels:

<?xml version="1.0" encoding="UTF-8"?>
<dictionary>
  <alphabet/>
  <sdefs>
    <sdef n="n" c="Noun"/>
  </sdefs>
  <pardefs>
    <pardef n="vocals">
      <e>
        <i>a</i>
      </e>
      <e>
        <i>e</i>
      </e>
      <e>
        <i>i</i>
      </e>
      <e>
        <i>o</i>
      </e>
      <e>
        <i>u</i>
      </e>
    </pardef>
  </pardefs>
  <section id="main" type="standard">
    <e>
      <p>
        <l><a/>a<b/></l>
        <r>an<b/></r>
      </p>
      <par n="vocals"/>
    </e>
  </section>
</dictionary>