Replacement for flag diacritics

From Apertium
Jump to navigation Jump to search

People like to use flag diacritics for stuff. But they are bad because they are ugly and get in the way of stuff.

Alternative: Use distinct symbols with well defined behaviours and finite-state operations!

We have < and > for morphological tags, and { and } for archiphonemes and morphological features. We add a new type of symbol with [ and ] for modelling morphotactic restrictions.

Examples

Turkish

Multichar_Symbols

%<v%> %<cop%> %<tv%> %<aor%> %<prog%> %<p1%> %<p3%> %<sg%>

%[%-aor%] %[%+aor%]

%+ ;

LEXICON Root

Verbs ; 

LEXICON PERS

%<p1%>%<sg%>:im # ;
%<p3%>%<sg%>: # ;

LEXICON COP

%+i%<cop%>%<aor%>%[%+aor%]: PERS ;

LEXICON V-TV 

%<v%>%<tv%>%<aor%>%[%+aor%]:ir PERS ;
%<v%>%<tv%>%<aor%>%[%+aor%]:ir COP ;
%<v%>%<tv%>%<prog%>%[%-aor%]:iyor COP ;

LEXICON Verbs 

bil:bil V-TV ; ! ""
Alphabet

 a b c d e f g h i j k l m n o p q r s t u v w x y z 

 %<v%> %<tv%> %<prog%> %<aor%> %<p1%> %<p2%> %<p3%> %<sg%> %<cop%>

 %[%+aor%]:0  %[%-aor%]:0  ;

Sets 

Verb = %<v%> ;

Rules 

"No consecutive [+aor] tags"
%[%+aor%]:0 /<= %[%+aor%]:0 :* _ ; 
$ hfst-lexc test.lexc | hfst-invert -o test.hfst
$ hfst-twolc test-const.twol -o const.hfst
$ hfst-compose-intersect -1 test.hfst -2 const.hfst | hfst-fst2strings 

biliyorim:bil<v><tv><prog>+i<cop><aor><p1><sg>
biliyor:bil<v><tv><prog>+i<cop><aor><p3><sg>
bilirim:bil<v><tv><aor><p1><sg>
bilir:bil<v><tv><aor><p3><sg>

Persian

Multichar_Symbols

%<v%> %<tv%> %<pri%> %<cni%> %<prs%> %<p1%> %<p3%> %<sg%>

%[%-prs%] %[%+prs%] %[%-cni%] %[%+cni%]


%+

LEXICON Root

Prefix ; 

LEXICON Prefix

%[%+prs%]%[%-cni%]:be Verbs ; 
%[%-prs%]%[%+cni%]:mi Verbs ; 

%[%-prs%]%[%-cni%]: Verbs ;

LEXICON PERS

%<p1%>%<sg%>:im # ;
%<p3%>%<sg%>: # ;

LEXICON V-TV 

%<v%>%<tv%>%<cni%>%[%+cni%]%[%-prs%]: PERS ;
%<v%>%<tv%>%<prs%>%[%-cni%]%[%+prs%]: PERS ;
%<v%>%<tv%>%<pri%>%[%-cni%]%[%-prs%]: PERS ;

LEXICON Verbs

kardan:kard V-TV ; ! ""
$ cat prefix-const.twol 

Alphabet

 a b c d e f g h i j k l m n o p q r s t u v w x y z 

 %<v%> %<tv%> %<p1%> %<p3%> %<sg%> %<pri%> %<prs%> %<cni%>

 %[%+prs%]:0  %[%-prs%]:0  %[%+cni%]:0  %[%-cni%]:0 

;

Sets 

Verb = %<v%> ;

Rules 

"Match prefixes"
Tx:0 /<= Ty:0 :* _ ; 
   where 
         Tx in ( %[%+cni%] %[%+prs%] %[%-cni%] %[%-prs%] )   
         Ty in ( %[%-cni%] %[%-prs%] %[%+cni%] %[%+prs%] )  matched ; 
$ hfst-lexc prefix.lexc | hfst-invert -o prefix.hfst
$ hfst-twolc prefix-const.twol -o prefix-const.hfst
$ hfst-compose-intersect -1 prefix.hfst -2 prefix-const.hfst | hfst-fst2strings 

bekardim:kardan<v><tv><prs><p1><sg>
bekard:kardan<v><tv><prs><p3><sg>
kardim:kardan<v><tv><pri><p1><sg>
kard:kardan<v><tv><pri><p3><sg>
mikardim:kardan<v><tv><cni><p1><sg>
mikard:kardan<v><tv><cni><p3><sg>