Replacement for flag diacritics
Jump to navigation
Jump to search
People like to use flag diacritics for stuff. But they are bad because they are ugly and get in the way of stuff.
Alternative: Use distinct symbols with well defined behaviours and finite-state operations!
We have <
and >
for morphological tags, and {
and }
for archiphonemes and morphological features. We add a new type of symbol with [
and ]
for modelling morphotactic restrictions.
Examples
Turkish
Multichar_Symbols %<v%> %<cop%> %<tv%> %<aor%> %<prog%> %<p1%> %<p3%> %<sg%> %[%-aor%] %[%+aor%] %+ ; LEXICON Root Verbs ; LEXICON PERS %<p1%>%<sg%>:im # ; %<p3%>%<sg%>: # ; LEXICON COP %+i%<cop%>%<aor%>%[%+aor%]: PERS ; LEXICON V-TV %<v%>%<tv%>%<aor%>%[%+aor%]:ir PERS ; %<v%>%<tv%>%<aor%>%[%+aor%]:ir COP ; %<v%>%<tv%>%<prog%>%[%-aor%]:iyor COP ; LEXICON Verbs bil:bil V-TV ; ! ""
Alphabet a b c d e f g h i j k l m n o p q r s t u v w x y z %<v%> %<tv%> %<prog%> %<aor%> %<p1%> %<p2%> %<p3%> %<sg%> %<cop%> %[%+aor%]:0 %[%-aor%]:0 ; Sets Verb = %<v%> ; Rules "No consecutive [+aor] tags" %[%+aor%]:0 /<= %[%+aor%]:0 :* _ ;
$ hfst-lexc test.lexc | hfst-invert -o test.hfst $ hfst-twolc test-const.twol -o const.hfst $ hfst-compose-intersect -1 test.hfst -2 const.hfst | hfst-fst2strings biliyorim:bil<v><tv><prog>+i<cop><aor><p1><sg> biliyor:bil<v><tv><prog>+i<cop><aor><p3><sg> bilirim:bil<v><tv><aor><p1><sg> bilir:bil<v><tv><aor><p3><sg>
Persian
Multichar_Symbols %<v%> %<tv%> %<pri%> %<cni%> %<prs%> %<p1%> %<p3%> %<sg%> %[%-prs%] %[%+prs%] %[%-cni%] %[%+cni%] %+ LEXICON Root Prefix ; LEXICON Prefix %[%+prs%]%[%-cni%]:be Verbs ; %[%-prs%]%[%+cni%]:mi Verbs ; %[%-prs%]%[%-cni%]: Verbs ; LEXICON PERS %<p1%>%<sg%>:im # ; %<p3%>%<sg%>: # ; LEXICON V-TV %<v%>%<tv%>%<cni%>%[%+cni%]%[%-prs%]: PERS ; %<v%>%<tv%>%<prs%>%[%-cni%]%[%+prs%]: PERS ; %<v%>%<tv%>%<pri%>%[%-cni%]%[%-prs%]: PERS ; LEXICON Verbs kardan:kard V-TV ; ! ""
$ cat prefix-const.twol Alphabet a b c d e f g h i j k l m n o p q r s t u v w x y z %<v%> %<tv%> %<p1%> %<p3%> %<sg%> %<pri%> %<prs%> %<cni%> %[%+prs%]:0 %[%-prs%]:0 %[%+cni%]:0 %[%-cni%]:0 ; Sets Verb = %<v%> ; Rules "Match prefixes" Tx:0 /<= Ty:0 :* _ ; where Tx in ( %[%+cni%] %[%+prs%] %[%-cni%] %[%-prs%] ) Ty in ( %[%-cni%] %[%-prs%] %[%+cni%] %[%+prs%] ) matched ;
$ hfst-lexc prefix.lexc | hfst-invert -o prefix.hfst $ hfst-twolc prefix-const.twol -o prefix-const.hfst $ hfst-compose-intersect -1 prefix.hfst -2 prefix-const.hfst | hfst-fst2strings bekardim:kardan<v><tv><prs><p1><sg> bekard:kardan<v><tv><prs><p3><sg> kardim:kardan<v><tv><pri><p1><sg> kard:kardan<v><tv><pri><p3><sg> mikardim:kardan<v><tv><cni><p1><sg> mikard:kardan<v><tv><cni><p3><sg>