Ideas for Google Summer of Code/Flag diacritics in lttoolbox

From Apertium
Jump to navigation Jump to search

Flag diacritics are a method used in the HFST tools to allow the writer of a transducer to exclude impossible analyses at run-time, where removing them from the transducer would explode its size. This would allow us to nicely handle languages with prefix inflection, or with circumfix inflection


  • Add support for flag diacritics to the .dix format.
  • Add support for flag diacritics to lttoolbox
  • Write a dictionary which demonstrates the use of flag diacritics (e.g. for Kurdish, Persian, Tajik, or some other language)

Coding challenge

  • Write a dictionary in the lexc formalism which uses flag diacritics to treat a particular linguistic feature (e.g. verb prefixes in Indo-Iranian languages).

Frequently asked questions

Format ideas

    <sdef n="verb"/>
    <sdef n="pres"/>
    <sdef n="past"/>
    <cdef n="ge:0" c="ge- prefix not present"/>
    <cdef n="ge:1" c="ge- prefix present"/>
    <pardef n="ge__prefix">
      <e><p><l></l><r/></r></p><c n="ge:0"/></e>
      <e><p><l>ge</l><r/></r></p><c n="ge:1"</e>
    <pardef n="breek__vblex">
      <e><p><l/><r><s n="verb"/><s n="pres"/></r></p><c n="ge:0"/></e>
      <e><p><l/><r><s n="verb"/><s n="past"/></r></p><c n="ge:1"/></e>
  <section id="main" type="standard">
    <e lm="breek"><par n="ge__prefix"/><i>breek</i><par n="breek__vblex"/></e>

Normal lt-expand output of this would look like:


But if you showed the constraints, it would look like:


See also

Further reading

  • Karttunen and Beesley (2002) "Finite State Morphology" (CLSI) ch. 8 "Flag diacritics"