Difference between revisions of "Ideas for Google Summer of Code/Flag diacritics in lttoolbox"
Jump to navigation
Jump to search
Line 1: | Line 1: | ||
− | Flag diacritics are a method used in the [[HFST]] tools to allow the writer of a transducer to exclude impossible analyses at run-time, where removing them from the transducer would explode its size. |
+ | Flag diacritics are a method used in the [[HFST]] tools to allow the writer of a transducer to exclude impossible analyses at run-time, where removing them from the transducer would explode its size. This would allow us to nicely handle languages with prefix inflection, or with circumfix inflection |
+ | ==Objectives== |
||
− | This would allow us to nicely handle languages with prefix inflection, or with circumfix inflection |
||
+ | |||
+ | * Add support for flag diacritics to the <code>.dix</code> format. |
||
+ | * Add support for flag diacritics to [[lttoolbox]] |
||
+ | * Write a dictionary which demonstrates the use of flag diacritics (e.g. for Kurdish, Persian, Tajik, or some other language) |
||
+ | |||
+ | ==Coding challenge== |
||
+ | |||
+ | ==Frequently asked questions== |
||
+ | |||
+ | ==Format ideas== |
||
<pre> |
<pre> |
||
Line 12: | Line 22: | ||
</sdefs> |
</sdefs> |
||
<cdefs> |
<cdefs> |
||
− | <cdef n=" |
+ | <cdef n="ge:0" c="ge- prefix not present"/> |
− | <cdef n=" |
+ | <cdef n="ge:1" c="ge- prefix present"/> |
</cdefs> |
</cdefs> |
||
<pardefs> |
<pardefs> |
||
<pardef n="ge__prefix"> |
<pardef n="ge__prefix"> |
||
− | <e><p><l></l><r/></r></p><c n=" |
+ | <e><p><l></l><r/></r></p><c n="ge:0"/></e> |
− | <e><p><l>ge</l><r/></r></p><c n=" |
+ | <e><p><l>ge</l><r/></r></p><c n="ge:1"</e> |
</pardef> |
</pardef> |
||
<pardef n="breek__vblex"> |
<pardef n="breek__vblex"> |
||
− | <e><p><l/><r><s n="verb"/><s n="pres"/></r></p><c n=" |
+ | <e><p><l/><r><s n="verb"/><s n="pres"/></r></p><c n="ge:0"/></e> |
− | <e><p><l/><r><s n="verb"/><s n="past"/></r></p><c n=" |
+ | <e><p><l/><r><s n="verb"/><s n="past"/></r></p><c n="ge:1"/></e> |
</pardef> |
</pardef> |
||
</pardefs> |
</pardefs> |
||
Line 41: | Line 51: | ||
<pre> |
<pre> |
||
− | breek[ |
+ | breek[ge:0][ge:0]:breek[ge:0]<verb><pres>[ge:0] |
− | breek[ |
+ | breek[ge:0][ge:1]:breek[ge:0]<verb><past>[ge:1] |
− | gebreek[ |
+ | gebreek[ge:1][ge:0]:breek[ge:1]<verb><pres>[ge:0] |
− | gebreek[ |
+ | gebreek[ge:1][ge:1]:breek[ge:1]<verb><past>[ge:1] |
</pre> |
</pre> |
||
− | |||
− | |||
==See also== |
==See also== |
Revision as of 15:18, 4 March 2012
Flag diacritics are a method used in the HFST tools to allow the writer of a transducer to exclude impossible analyses at run-time, where removing them from the transducer would explode its size. This would allow us to nicely handle languages with prefix inflection, or with circumfix inflection
Contents
Objectives
- Add support for flag diacritics to the
.dix
format. - Add support for flag diacritics to lttoolbox
- Write a dictionary which demonstrates the use of flag diacritics (e.g. for Kurdish, Persian, Tajik, or some other language)
Coding challenge
Frequently asked questions
Format ideas
<dictionary> <alphabet/> <sdefs> <sdef n="verb"/> <sdef n="pres"/> <sdef n="past"/> </sdefs> <cdefs> <cdef n="ge:0" c="ge- prefix not present"/> <cdef n="ge:1" c="ge- prefix present"/> </cdefs> <pardefs> <pardef n="ge__prefix"> <e><p><l></l><r/></r></p><c n="ge:0"/></e> <e><p><l>ge</l><r/></r></p><c n="ge:1"</e> </pardef> <pardef n="breek__vblex"> <e><p><l/><r><s n="verb"/><s n="pres"/></r></p><c n="ge:0"/></e> <e><p><l/><r><s n="verb"/><s n="past"/></r></p><c n="ge:1"/></e> </pardef> </pardefs> <section id="main" type="standard"> <e lm="breek"><par n="ge__prefix"/><i>breek</i><par n="breek__vblex"/></e> </section> </dictionary>
Normal lt-expand
output of this would look like:
breek:breek<verb><pres> gebreek:breek<verb><past>
But if you showed the constraints, it would look like:
breek[ge:0][ge:0]:breek[ge:0]<verb><pres>[ge:0] breek[ge:0][ge:1]:breek[ge:0]<verb><past>[ge:1] gebreek[ge:1][ge:0]:breek[ge:1]<verb><pres>[ge:0] gebreek[ge:1][ge:1]:breek[ge:1]<verb><past>[ge:1]
See also
Further reading
- Karttunen and Beesley (2002) "Finite State Morphology" (CLSI) ch. 8 "Flag diacritics"