Ideas for Google Summer of Code/Flag diacritics in lttoolbox

From Apertium
< Ideas for Google Summer of Code
Revision as of 11:49, 13 March 2010 by Francis Tyers (talk | contribs) (Created page with 'Flag diacritics are a method used in the HFST tools to allow the writer of a transducer to exclude impossible analyses at run-time, where removing them from the transducer wo…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Flag diacritics are a method used in the HFST tools to allow the writer of a transducer to exclude impossible analyses at run-time, where removing them from the transducer would explode its size.

This would allow us to nicely handle languages with prefix inflection, or with circumfix inflection

<dictionary>
  <alphabet/>
  <sdefs>
    <sdef n="verb"/>
    <sdef n="pres"/>
    <sdef n="past"/>
  </sdefs>
  <fdefs>
    <def-flag n="ge"    c="ge- prefix">
      <flag-value n="0" c="ge- prefix not present"/>
      <flag-value n="1" c="ge- prefix present"/>
    </def-flag>
  </fdefs>
  <pardefs>
    <pardef n="ge__prefix">
      <e><p><l></l><r/></r></p><f n="ge" v="0"/></e>
      <e><p><l>ge</l><r/></r></p><f n="ge" v="1"/></e>
    </pardef>
    <pardef n="breek__vblex">
      <e><p><l/><r><s n="verb"/><s n="pres"/></r></p><f n="ge" v="0"/></e>
      <e><p><l/><r><s n="verb"/><s n="past"/></r></p><f n="ge" v="1"/></e>
    </pardef>
  </pardefs>
  <section id="main" type="standard">
    <e lm="breek"><par n="ge__prefix"/><i>breek</i><par n="breek__vblex"/></e>
  </section>
</dictionary>

Normal lt-expand output of this would look like:

breek:breek<verb><pres>
breek:breek<verb><past>
gebreek:breek<verb><pres>
gebreek:breek<verb><past>

But with flag diacritics it would look like:

breek:breek<verb><pres>
gebreek:breek<verb><past>

See also

Further reading

  • Karttunen and Beesley (2002) "Finite State Morphology" (CLSI) ch. 8 "Flag diacritics"