Difference between revisions of "Ideas for Google Summer of Code/Flag diacritics in lttoolbox"

From Apertium
Jump to navigation Jump to search
 
(6 intermediate revisions by 2 users not shown)
Line 3: Line 3:
 
Flag diacritics are a method used in the [[HFST]] tools to allow the writer of a transducer to exclude impossible analyses at run-time, where removing them from the transducer would explode its size. This would allow us to nicely handle languages with prefix inflection, or with circumfix inflection
 
Flag diacritics are a method used in the [[HFST]] tools to allow the writer of a transducer to exclude impossible analyses at run-time, where removing them from the transducer would explode its size. This would allow us to nicely handle languages with prefix inflection, or with circumfix inflection
   
  +
Some work on [[Flag diacritics]] has already been made in [[lttoolbox-java]].
==Objectives==
 
  +
  +
==Tasks==
   
 
* Add support for flag diacritics to the <code>.dix</code> format.
 
* Add support for flag diacritics to the <code>.dix</code> format.
* Add support for flag diacritics to [[lttoolbox]]
+
* Add support for flag diacritics to [[lttoolbox]] (<code>lt-comp</code>, <code>lt-proc</code> and <code>lt-expand</code>)
* Write a dictionary which demonstrates the use of flag diacritics (e.g. for Kurdish, Persian, Tajik, or some other language)
+
* Write a dictionary which demonstrates the use of flag diacritics (e.g. for Armenian, Kurdish, Persian, Tajik, or some other language)
   
 
==Coding challenge==
 
==Coding challenge==
Line 15: Line 17:
 
==Frequently asked questions==
 
==Frequently asked questions==
   
  +
* none yet, ''[[contact|ask us]] something!'' :)
==Format ideas==
 
 
<pre>
 
<dictionary>
 
<alphabet/>
 
<sdefs>
 
<sdef n="verb"/>
 
<sdef n="pres"/>
 
<sdef n="past"/>
 
</sdefs>
 
<cdefs>
 
<cdef n="ge:0" c="ge- prefix not present"/>
 
<cdef n="ge:1" c="ge- prefix present"/>
 
</cdefs>
 
<pardefs>
 
<pardef n="ge__prefix">
 
<e><p><l></l><r/></r></p><c n="ge:0"/></e>
 
<e><p><l>ge</l><r/></r></p><c n="ge:1"</e>
 
</pardef>
 
<pardef n="breek__vblex">
 
<e><p><l/><r><s n="verb"/><s n="pres"/></r></p><c n="ge:0"/></e>
 
<e><p><l/><r><s n="verb"/><s n="past"/></r></p><c n="ge:1"/></e>
 
</pardef>
 
</pardefs>
 
<section id="main" type="standard">
 
<e lm="breek"><par n="ge__prefix"/><i>breek</i><par n="breek__vblex"/></e>
 
</section>
 
</dictionary>
 
</pre>
 
 
Normal <code>lt-expand</code> output of this would look like:
 
 
<pre>
 
breek:breek<verb><pres>
 
gebreek:breek<verb><past>
 
</pre>
 
 
But if you showed the constraints, it would look like:
 
 
<pre>
 
breek[ge:0][ge:0]:breek[ge:0]<verb><pres>[ge:0]
 
breek[ge:0][ge:1]:breek[ge:0]<verb><past>[ge:1]
 
gebreek[ge:1][ge:0]:breek[ge:1]<verb><pres>[ge:0]
 
gebreek[ge:1][ge:1]:breek[ge:1]<verb><past>[ge:1]
 
</pre>
 
   
 
==See also==
 
==See also==
Line 71: Line 29:
   
 
[[Category:Ideas for Google Summer of Code|Flag diacritics in lttoolbox]]
 
[[Category:Ideas for Google Summer of Code|Flag diacritics in lttoolbox]]
  +
[[Category:Flag diacritics]]

Latest revision as of 06:40, 20 October 2014

Flag diacritics are a method used in the HFST tools to allow the writer of a transducer to exclude impossible analyses at run-time, where removing them from the transducer would explode its size. This would allow us to nicely handle languages with prefix inflection, or with circumfix inflection

Some work on Flag diacritics has already been made in lttoolbox-java.

Tasks[edit]

  • Add support for flag diacritics to the .dix format.
  • Add support for flag diacritics to lttoolbox (lt-comp, lt-proc and lt-expand)
  • Write a dictionary which demonstrates the use of flag diacritics (e.g. for Armenian, Kurdish, Persian, Tajik, or some other language)

Coding challenge[edit]

  • Write a dictionary in the lexc formalism which uses flag diacritics to treat a particular linguistic feature (e.g. verb prefixes in Indo-Iranian languages).

Frequently asked questions[edit]

  • none yet, ask us something! :)

See also[edit]

Further reading[edit]

  • Karttunen and Beesley (2002) "Finite State Morphology" (CLSI) ch. 8 "Flag diacritics"