Development ideas for dictionary format
Revision as of 11:16, 27 December 2011 by Francis Tyers (talk | contribs) (Created page with 'The idea of this page is to collect ideas for how to expand the Apertium <code>.dix</code> format such that it could be a drop-in replacement for lexc. Currently it has many …')
The idea of this page is to collect ideas for how to expand the Apertium .dix
format such that it could be a drop-in replacement for lexc. Currently it has many advantages over lexc: Convenient / easy validation, more restrictive syntax, support for multiword queues. The problem is that it doesn't support some useful features that lexc has, or not comfortably.
Archiphonemes
Perhaps use entities ?
The option of just using <s>
is pretty much out,
<e><p><l><s n="pron"/></l><r><s n="L"/><s n="A"/><s n="G"/><s n="I"/></r></p><par n="CASE"/></e>
For
%<pron%>:%>%{L%}%{I%}%{K%}%{I%} CASE ;
Something like:
<e><p><l><s n="pron"/></l><r>&L;&A;&G;&I;</r></p><par n="CASE"/></e>
Might be liveable ? These would then be converted by the compiler into {L}{A}{G}{I}
tags ?
Morpheme boundary
Current tags:
<a>
= "alarm"<s>
= "symbol"<b>
= "blank"<j>
= "join"<g>
= "group"
It's desirable that it be a single letter.
Available: c d f h k m n o q t u v w x y z