Ideas for Google Summer of Code/Flag diacritics in lttoolbox
< Ideas for Google Summer of Code
Jump to navigation
Jump to search
Revision as of 13:44, 5 March 2012 by Jacob Nordfalk (talk | contribs)
Flag diacritics are a method used in the HFST tools to allow the writer of a transducer to exclude impossible analyses at run-time, where removing them from the transducer would explode its size. This would allow us to nicely handle languages with prefix inflection, or with circumfix inflection
Some work on Flag diacritics has already been made in lttoolbox-java.
Objectives
- Add support for flag diacritics to the
.dix
format. - Add support for flag diacritics to lttoolbox
- Write a dictionary which demonstrates the use of flag diacritics (e.g. for Kurdish, Persian, Tajik, or some other language)
Coding challenge
- Write a dictionary in the lexc formalism which uses flag diacritics to treat a particular linguistic feature (e.g. verb prefixes in Indo-Iranian languages).
Frequently asked questions
Format ideas
<dictionary> <alphabet/> <sdefs> <sdef n="verb"/> <sdef n="pres"/> <sdef n="past"/> </sdefs> <cdefs> <cdef n="ge:0" c="ge- prefix not present"/> <cdef n="ge:1" c="ge- prefix present"/> </cdefs> <pardefs> <pardef n="ge__prefix"> <e><p><l></l><r/></r></p><c n="ge:0"/></e> <e><p><l>ge</l><r/></r></p><c n="ge:1"</e> </pardef> <pardef n="breek__vblex"> <e><p><l/><r><s n="verb"/><s n="pres"/></r></p><c n="ge:0"/></e> <e><p><l/><r><s n="verb"/><s n="past"/></r></p><c n="ge:1"/></e> </pardef> </pardefs> <section id="main" type="standard"> <e lm="breek"><par n="ge__prefix"/><i>breek</i><par n="breek__vblex"/></e> </section> </dictionary>
Normal lt-expand
output of this would look like:
breek:breek<verb><pres> gebreek:breek<verb><past>
But if you showed the constraints, it would look like:
breek[ge:0][ge:0]:breek[ge:0]<verb><pres>[ge:0] breek[ge:0][ge:1]:breek[ge:0]<verb><past>[ge:1] gebreek[ge:1][ge:0]:breek[ge:1]<verb><pres>[ge:0] gebreek[ge:1][ge:1]:breek[ge:1]<verb><past>[ge:1]
See also
Further reading
- Karttunen and Beesley (2002) "Finite State Morphology" (CLSI) ch. 8 "Flag diacritics"