Difference between revisions of "Development ideas for dictionary format"
Jump to navigation
Jump to search
Line 1: | Line 1: | ||
{{TOCD}} |
|||
The idea of this page is to collect ideas for how to expand the Apertium <code>.dix</code> format such that it could be a drop-in replacement for [[lexc]]. Currently it has many advantages over lexc: Convenient / easy validation, more restrictive syntax, support for multiword queues. The problem is that it doesn't support some useful features that lexc has, or not comfortably. |
The idea of this page is to collect ideas for how to expand the Apertium <code>.dix</code> format such that it could be a drop-in replacement for [[lexc]]. Currently it has many advantages over lexc: Convenient / easy validation, more restrictive syntax, support for multiword queues. The problem is that it doesn't support some useful features that lexc has, or not comfortably. |
||
Revision as of 11:21, 27 December 2011
The idea of this page is to collect ideas for how to expand the Apertium .dix
format such that it could be a drop-in replacement for lexc. Currently it has many advantages over lexc: Convenient / easy validation, more restrictive syntax, support for multiword queues. The problem is that it doesn't support some useful features that lexc has, or not comfortably.
Archiphonemes
Perhaps use entities ?
The option of just using <s>
is pretty much out,
<e><p><l><s n="pron"/></l><r><s n="L"/><s n="A"/><s n="G"/><s n="I"/></r></p><par n="CASE"/></e>
For
%<pron%>:%>%{L%}%{I%}%{K%}%{I%} CASE ;
Something like:
<e><p><l><s n="pron"/></l><r>&L;&A;&G;&I;</r></p><par n="CASE"/></e>
Might be liveable ? These would then be converted by the compiler into {L}{A}{G}{I}
tags ?
Morpheme boundary
Current tags:
<a>
= "alarm"<s>
= "symbol"<b>
= "blank"<j>
= "join"<g>
= "group"
It's desirable that it be a single letter.
Available: c d f h k m n o q t u v w x y z
Flags
Phonology
Further reading
- Anssi Yli-Jyrä (2011) "Explorations on Positionwise Flag Diacritics in Finite-State Morphology". NODALIDA
- This paper adds flag diacritics for implementing morphophonology to a single-tape (e.g. like lttoolbox, no intersect/compose) finite-state transducer.