Difference between revisions of "Development ideas for dictionary format"

From Apertium
Jump to navigation Jump to search
(Created page with 'The idea of this page is to collect ideas for how to expand the Apertium <code>.dix</code> format such that it could be a drop-in replacement for lexc. Currently it has many …')
 
Line 40: Line 40:
   
 
== Flags ==
 
== Flags ==
  +
  +
== Phonology ==
  +
  +
== Further reading ==
  +
  +
* Anssi Yli-Jyrä (2011) "Explorations on Positionwise Flag Diacritics in Finite-State Morphology". NODALIDA
  +
** This paper adds flag diacritics for implementing morphophonology to a single-tape (e.g. like lttoolbox, no intersect/compose) finite-state transducer.
   
 
[[Category:Development]]
 
[[Category:Development]]

Revision as of 11:21, 27 December 2011

The idea of this page is to collect ideas for how to expand the Apertium .dix format such that it could be a drop-in replacement for lexc. Currently it has many advantages over lexc: Convenient / easy validation, more restrictive syntax, support for multiword queues. The problem is that it doesn't support some useful features that lexc has, or not comfortably.

Archiphonemes

Perhaps use entities ?

The option of just using <s> is pretty much out,

<e><p><l><s n="pron"/></l><r><s n="L"/><s n="A"/><s n="G"/><s n="I"/></r></p><par n="CASE"/></e>

For

%<pron%>:%>%{L%}%{I%}%{K%}%{I%} CASE ;

Something like:

<e><p><l><s n="pron"/></l><r>&L;&A;&G;&I;</r></p><par n="CASE"/></e>

Might be liveable ? These would then be converted by the compiler into {L}{A}{G}{I} tags ?

Morpheme boundary

Current tags:

  • <a> = "alarm"
  • <s> = "symbol"
  • <b> = "blank"
  • <j> = "join"
  • <g> = "group"

It's desirable that it be a single letter.

Available: c d f h k m n o q t u v w x y z

Flags

Phonology

Further reading

  • Anssi Yli-Jyrä (2011) "Explorations on Positionwise Flag Diacritics in Finite-State Morphology". NODALIDA
    • This paper adds flag diacritics for implementing morphophonology to a single-tape (e.g. like lttoolbox, no intersect/compose) finite-state transducer.