Difference between revisions of "Post-generator"
Jump to navigation
Jump to search
| Line 1: | Line 1: | ||
Many languages use a '''post-generator''' FST to fix minor orthographical issues. This FST is in [[lttoolbox]] format and is run by <code>lt-proc</code> with the <code>-p</code> or <code>--post-generation</code> switch. An example of such an orthographical issue is the "a" vs "an" difference in English. The english generator will output <code>~a</code>, and the post-generation FST changes that to a or an depending on the following word. |
Many languages use a '''post-generator''' FST to fix minor orthographical issues. This FST is in [[lttoolbox]] format and is run by <code>lt-proc</code> with the <code>-p</code> or <code>--post-generation</code> switch. An example of such an orthographical issue is the "a" vs "an" difference in English. The english generator will output <code>~a</code>, and the post-generation FST changes that to a or an depending on the following word. |
||
The source dictionary is typically named something like <code>apertium-cat.post-cat.dix</code>, while the compiled file gets a name like <code>spa-cat.autopgen.bin</code>. |
|||
Here's a minimal example: |
|||
<pre> |
|||
<?xml version="1.0" encoding="UTF-8"?> |
|||
<dictionary> |
|||
<alphabet/> |
|||
<sdefs> |
|||
<sdef n="n" c="Noun"/> |
|||
</sdefs> |
|||
<pardefs> |
|||
<pardef n="vocals"> |
|||
<e> |
|||
<i>a</i> |
|||
</e> |
|||
<e> |
|||
<i>e</i> |
|||
</e> |
|||
<e> |
|||
<i>i</i> |
|||
</e> |
|||
<e> |
|||
<i>o</i> |
|||
</e> |
|||
<e> |
|||
<i>u</i> |
|||
</e> |
|||
</pardef> |
|||
</pardefs> |
|||
<section id="main" type="standard"> |
|||
<e> |
|||
<p> |
|||
<l><a/>a<b/></l> |
|||
<r>an<b/></r> |
|||
</p> |
|||
<par n="vocals"/> |
|||
</e> |
|||
</section> |
|||
</dictionary> |
|||
</pre> |
|||
Revision as of 12:09, 31 December 2014
Many languages use a post-generator FST to fix minor orthographical issues. This FST is in lttoolbox format and is run by lt-proc with the -p or --post-generation switch. An example of such an orthographical issue is the "a" vs "an" difference in English. The english generator will output ~a, and the post-generation FST changes that to a or an depending on the following word.
The source dictionary is typically named something like apertium-cat.post-cat.dix, while the compiled file gets a name like spa-cat.autopgen.bin.
Here's a minimal example:
<?xml version="1.0" encoding="UTF-8"?>
<dictionary>
<alphabet/>
<sdefs>
<sdef n="n" c="Noun"/>
</sdefs>
<pardefs>
<pardef n="vocals">
<e>
<i>a</i>
</e>
<e>
<i>e</i>
</e>
<e>
<i>i</i>
</e>
<e>
<i>o</i>
</e>
<e>
<i>u</i>
</e>
</pardef>
</pardefs>
<section id="main" type="standard">
<e>
<p>
<l><a/>a<b/></l>
<r>an<b/></r>
</p>
<par n="vocals"/>
</e>
</section>
</dictionary>