Difference between revisions of "Replacement for flag diacritics"

From Apertium
Jump to navigation Jump to search
Line 1: Line 1:
People like to use [[flag diacritics]] for stuff. But they are bad because they are ugly and get in the way of stuff.
People like to use [[flag diacritics]] for stuff. But they are bad because they are ugly and get in the way of stuff.


Alternative: Use symbols and finite-state operations!
Alternative: Use distinct symbols with well defined behaviours and finite-state operations!


We have <code>&lt;</code> and <code>&gt;</code> for morphological tags, and <code>{</code> and <code>}</code> for [[archiphonemes]] and morphological features. We add a new type of symbol with <code>[</code> and <code>]</code> for modelling morphotactic restrictions.
We have <code>&lt;</code> and <code>&gt;</code> for morphological tags, and <code>{</code> and <code>}</code> for [[archiphonemes]] and morphological features. We add a new type of symbol with <code>[</code> and <code>]</code> for modelling morphotactic restrictions.

Revision as of 16:02, 14 June 2014

People like to use flag diacritics for stuff. But they are bad because they are ugly and get in the way of stuff.

Alternative: Use distinct symbols with well defined behaviours and finite-state operations!

We have < and > for morphological tags, and { and } for archiphonemes and morphological features. We add a new type of symbol with [ and ] for modelling morphotactic restrictions.

Example

Multichar_Symbols

%<v%> %<cop%> %<tv%> %<aor%> %<prog%> %<p1% %<p3%> %<sg%>

%[%-aor%] %[%+aor%]

%+

LEXICON Root

Verbs ; 

LEXICON PERS

%<p1%>%<sg%>:im # ;
%<p3%>%<sg%>: # ;

LEXICON COP

%+i%<cop%>%<aor%>%[%+aor%]: PERS ;

LEXICON V-TV 

%<v%>%<tv%>%<aor%>%[%+aor%]:ir PERS ;
%<v%>%<tv%>%<aor%>%[%+aor%]:ir COP ;
%<v%>%<tv%>%<prog%>%[%-aor%]:iyor COP ;

LEXICON Verbs 

bil:bil V-TV ; ! ""
Alphabet

 a b c d e f g h i j k l m n o p q r s t u v w x y z 

 %<v%> %<tv%> %<prog%> %<aor%> %<p1%> %<p2%> %<p3%> %<sg%> %<cop%>

 %[%+aor%]:0  %[%-aor%]:0  ;

Sets 

Verb = %<v%> ;

Rules 

"No consecutive [+aor] tags"
%[%+aor%]:0 /<= %[%+aor%]:0 :* _ ; 
$ hfst-lexc test.lexc | hfst-invert -o test.hfst
$ hfst-twolc test-const.twol -o const.hfst
$ hfst-compose-intersect -1 test.hfst -2 const.hfst | hfst-fst2strings 

biliyorim:bil<v><tv><prog>+i<cop><aor><p1><sg>
biliyor:bil<v><tv><prog>+i<cop><aor><p3><sg>
bilirim:bil<v><tv><aor><p1><sg>
bilir:bil<v><tv><aor><p3><sg>