Ideas for Google Summer of Code/Flag diacritics in lttoolbox
< Ideas for Google Summer of Code
Jump to navigation
Jump to search
Revision as of 21:56, 17 March 2010 by Francis Tyers (talk | contribs)
Flag diacritics are a method used in the HFST tools to allow the writer of a transducer to exclude impossible analyses at run-time, where removing them from the transducer would explode its size.
This would allow us to nicely handle languages with prefix inflection, or with circumfix inflection
<dictionary>
<alphabet/>
<sdefs>
<sdef n="verb"/>
<sdef n="pres"/>
<sdef n="past"/>
</sdefs>
<cdefs>
<cdef n="ge_0" c="ge- prefix not present"/>
<cdef n="ge_1" c="ge- prefix present"/>
</cdefs>
<pardefs>
<pardef n="ge__prefix">
<e><p><l></l><r/></r></p><c n="ge_0"/></e>
<e><p><l>ge</l><r/></r></p><c n="ge_1"</e>
</pardef>
<pardef n="breek__vblex">
<e><p><l/><r><s n="verb"/><s n="pres"/></r></p><c n="ge_0"/></e>
<e><p><l/><r><s n="verb"/><s n="past"/></r></p><c n="ge_1"/></e>
</pardef>
</pardefs>
<section id="main" type="standard">
<e lm="breek"><par n="ge__prefix"/><i>breek</i><par n="breek__vblex"/></e>
</section>
</dictionary>
Normal lt-expand output of this would look like:
breek:breek<verb><pres> breek:breek<verb><past> gebreek:breek<verb><pres> gebreek:breek<verb><past>
But with flag diacritics it would look like:
breek:breek<verb><pres> gebreek:breek<verb><past>
See also
Further reading
- Karttunen and Beesley (2002) "Finite State Morphology" (CLSI) ch. 8 "Flag diacritics"