Difference between revisions of "Ideas for Google Summer of Code/Flag diacritics in lttoolbox"

From Apertium
Jump to navigation Jump to search
Line 1: Line 1:
Flag diacritics are a method used in the [[HFST]] tools to allow the writer of a transducer to exclude impossible analyses at run-time, where removing them from the transducer would explode its size.
+
Flag diacritics are a method used in the [[HFST]] tools to allow the writer of a transducer to exclude impossible analyses at run-time, where removing them from the transducer would explode its size. This would allow us to nicely handle languages with prefix inflection, or with circumfix inflection
   
  +
==Objectives==
This would allow us to nicely handle languages with prefix inflection, or with circumfix inflection
 
  +
  +
* Add support for flag diacritics to the <code>.dix</code> format.
  +
* Add support for flag diacritics to [[lttoolbox]]
  +
* Write a dictionary which demonstrates the use of flag diacritics (e.g. for Kurdish, Persian, Tajik, or some other language)
  +
  +
==Coding challenge==
  +
  +
==Frequently asked questions==
  +
  +
==Format ideas==
   
 
<pre>
 
<pre>
Line 12: Line 22:
 
</sdefs>
 
</sdefs>
 
<cdefs>
 
<cdefs>
<cdef n="ge_0" c="ge- prefix not present"/>
+
<cdef n="ge:0" c="ge- prefix not present"/>
<cdef n="ge_1" c="ge- prefix present"/>
+
<cdef n="ge:1" c="ge- prefix present"/>
 
</cdefs>
 
</cdefs>
 
<pardefs>
 
<pardefs>
 
<pardef n="ge__prefix">
 
<pardef n="ge__prefix">
<e><p><l></l><r/></r></p><c n="ge_0"/></e>
+
<e><p><l></l><r/></r></p><c n="ge:0"/></e>
<e><p><l>ge</l><r/></r></p><c n="ge_1"</e>
+
<e><p><l>ge</l><r/></r></p><c n="ge:1"</e>
 
</pardef>
 
</pardef>
 
<pardef n="breek__vblex">
 
<pardef n="breek__vblex">
<e><p><l/><r><s n="verb"/><s n="pres"/></r></p><c n="ge_0"/></e>
+
<e><p><l/><r><s n="verb"/><s n="pres"/></r></p><c n="ge:0"/></e>
<e><p><l/><r><s n="verb"/><s n="past"/></r></p><c n="ge_1"/></e>
+
<e><p><l/><r><s n="verb"/><s n="past"/></r></p><c n="ge:1"/></e>
 
</pardef>
 
</pardef>
 
</pardefs>
 
</pardefs>
Line 41: Line 51:
   
 
<pre>
 
<pre>
breek[ge_0][ge_0]:breek[ge_0]<verb><pres>[ge_0]
+
breek[ge:0][ge:0]:breek[ge:0]<verb><pres>[ge:0]
breek[ge_0][ge_1]:breek[ge_0]<verb><past>[ge_1]
+
breek[ge:0][ge:1]:breek[ge:0]<verb><past>[ge:1]
gebreek[ge_1][ge_0]:breek[ge_1]<verb><pres>[ge_0]
+
gebreek[ge:1][ge:0]:breek[ge:1]<verb><pres>[ge:0]
gebreek[ge_1][ge_1]:breek[ge_1]<verb><past>[ge_1]
+
gebreek[ge:1][ge:1]:breek[ge:1]<verb><past>[ge:1]
 
</pre>
 
</pre>
 
 
   
 
==See also==
 
==See also==

Revision as of 15:18, 4 March 2012

Flag diacritics are a method used in the HFST tools to allow the writer of a transducer to exclude impossible analyses at run-time, where removing them from the transducer would explode its size. This would allow us to nicely handle languages with prefix inflection, or with circumfix inflection

Objectives

  • Add support for flag diacritics to the .dix format.
  • Add support for flag diacritics to lttoolbox
  • Write a dictionary which demonstrates the use of flag diacritics (e.g. for Kurdish, Persian, Tajik, or some other language)

Coding challenge

Frequently asked questions

Format ideas

<dictionary>
  <alphabet/>
  <sdefs>
    <sdef n="verb"/>
    <sdef n="pres"/>
    <sdef n="past"/>
  </sdefs>
  <cdefs>
    <cdef n="ge:0" c="ge- prefix not present"/>
    <cdef n="ge:1" c="ge- prefix present"/>
  </cdefs>
  <pardefs>
    <pardef n="ge__prefix">
      <e><p><l></l><r/></r></p><c n="ge:0"/></e>
      <e><p><l>ge</l><r/></r></p><c n="ge:1"</e>
    </pardef>
    <pardef n="breek__vblex">
      <e><p><l/><r><s n="verb"/><s n="pres"/></r></p><c n="ge:0"/></e>
      <e><p><l/><r><s n="verb"/><s n="past"/></r></p><c n="ge:1"/></e>
    </pardef>
  </pardefs>
  <section id="main" type="standard">
    <e lm="breek"><par n="ge__prefix"/><i>breek</i><par n="breek__vblex"/></e>
  </section>
</dictionary>

Normal lt-expand output of this would look like:

breek:breek<verb><pres>
gebreek:breek<verb><past>

But if you showed the constraints, it would look like:

breek[ge:0][ge:0]:breek[ge:0]<verb><pres>[ge:0]
breek[ge:0][ge:1]:breek[ge:0]<verb><past>[ge:1]
gebreek[ge:1][ge:0]:breek[ge:1]<verb><pres>[ge:0]
gebreek[ge:1][ge:1]:breek[ge:1]<verb><past>[ge:1]

See also

Further reading

  • Karttunen and Beesley (2002) "Finite State Morphology" (CLSI) ch. 8 "Flag diacritics"