Replacement for flag diacritics

From Apertium
Jump to navigation Jump to search

People like to use flag diacritics for stuff. But they are bad because they are ugly and get in the way of stuff.

Alternative: Use symbols and finite-state operations!

We have < and > for morphological tags, and { and } for archiphonemes and morphological features. We add a new type of symbol with [ and ] for modelling morphotactic restrictions.

Example

Multichar_Symbols

%<v%> %<cop%> %<tv%> %<aor%> %<prog%> %<p1% %<p3%> %<sg%>

%[%-aor%] %[%+aor%]

%+

LEXICON Root

Verbs ; 

LEXICON PERS

%<p1%>%<sg%>:im # ;
%<p3%>%<sg%>: # ;

LEXICON COP

%+i%<cop%>%<aor%>%[%+aor%]: PERS ;

LEXICON V-TV 

%<v%>%<tv%>%<aor%>%[%+aor%]:ir PERS ;
%<v%>%<tv%>%<aor%>%[%+aor%]:ir COP ;
%<v%>%<tv%>%<prog%>%[%-aor%]:iyor COP ;

LEXICON Verbs 

bil:bil V-TV ; ! ""
Alphabet

 a b c d e f g h i j k l m n o p q r s t u v w x y z 

 %<v%> %<tv%> %<prog%> %<aor%> %<p1%> %<p2%> %<p3%> %<sg%> %<cop%>

 %[%+aor%]:0  %[%-aor%]:0  ;

Sets 

Verb = %<v%> ;

Rules 

"No consecutive [+aor] tags"
%[%+aor%]:0 /<= %[%+aor%]:0 :* _ ; 
$ hfst-lexc test.lexc | hfst-invert -o test.hfst
$ hfst-twolc test-const.twol -o const.hfst
$ hfst-compose-intersect -1 test.hfst -2 const.hfst | hfst-fst2strings 

biliyorim:bil<v><tv><prog>+i<cop><aor><p1><sg>
biliyor:bil<v><tv><prog>+i<cop><aor><p3><sg>
bilirim:bil<v><tv><aor><p1><sg>
bilir:bil<v><tv><aor><p3><sg>