Difference between revisions of "Replacement for flag diacritics"
Jump to navigation
Jump to search
Line 112: | Line 112: | ||
</pre> |
</pre> |
||
<pre> |
|||
$ cat prefix-const.twol |
$ cat prefix-const.twol |
||
Revision as of 18:25, 14 June 2014
People like to use flag diacritics for stuff. But they are bad because they are ugly and get in the way of stuff.
Alternative: Use distinct symbols with well defined behaviours and finite-state operations!
We have <
and >
for morphological tags, and {
and }
for archiphonemes and morphological features. We add a new type of symbol with [
and ]
for modelling morphotactic restrictions.
Examples
Turkish
Multichar_Symbols %<v%> %<cop%> %<tv%> %<aor%> %<prog%> %<p1%> %<p3%> %<sg%> %[%-aor%] %[%+aor%] %+ ; LEXICON Root Verbs ; LEXICON PERS %<p1%>%<sg%>:im # ; %<p3%>%<sg%>: # ; LEXICON COP %+i%<cop%>%<aor%>%[%+aor%]: PERS ; LEXICON V-TV %<v%>%<tv%>%<aor%>%[%+aor%]:ir PERS ; %<v%>%<tv%>%<aor%>%[%+aor%]:ir COP ; %<v%>%<tv%>%<prog%>%[%-aor%]:iyor COP ; LEXICON Verbs bil:bil V-TV ; ! ""
Alphabet a b c d e f g h i j k l m n o p q r s t u v w x y z %<v%> %<tv%> %<prog%> %<aor%> %<p1%> %<p2%> %<p3%> %<sg%> %<cop%> %[%+aor%]:0 %[%-aor%]:0 ; Sets Verb = %<v%> ; Rules "No consecutive [+aor] tags" %[%+aor%]:0 /<= %[%+aor%]:0 :* _ ;
$ hfst-lexc test.lexc | hfst-invert -o test.hfst $ hfst-twolc test-const.twol -o const.hfst $ hfst-compose-intersect -1 test.hfst -2 const.hfst | hfst-fst2strings biliyorim:bil<v><tv><prog>+i<cop><aor><p1><sg> biliyor:bil<v><tv><prog>+i<cop><aor><p3><sg> bilirim:bil<v><tv><aor><p1><sg> bilir:bil<v><tv><aor><p3><sg>
Persian
Multichar_Symbols %<v%> %<tv%> %<pri%> %<cni%> %<prs%> %<p1%> %<p3%> %<sg%> %[%-prs%] %[%+prs%] %[%-cni%] %[%+cni%] %+ LEXICON Root Prefix ; LEXICON Prefix %[%+prs%]%[%-cni%]:be Verbs ; %[%-prs%]%[%+cni%]:mi Verbs ; %[%-prs%]%[%-cni%]: Verbs ; LEXICON PERS %<p1%>%<sg%>:im # ; %<p3%>%<sg%>: # ; LEXICON V-TV %<v%>%<tv%>%<cni%>%[%+cni%]%[%-prs%]: PERS ; %<v%>%<tv%>%<prs%>%[%-cni%]%[%+prs%]: PERS ; %<v%>%<tv%>%<pri%>%[%-cni%]%[%-prs%]: PERS ; LEXICON Verbs kardan:kard V-TV ; ! ""
$ cat prefix-const.twol Alphabet a b c d e f g h i j k l m n o p q r s t u v w x y z %<v%> %<tv%> %<p1%> %<p3%> %<sg%> %<pri%> %<prs%> %<cni%> %[%+prs%]:0 %[%-prs%]:0 %[%+cni%]:0 %[%-cni%]:0 ; Sets Verb = %<v%> ; Rules "Match prefixes" Tx:0 /<= Ty:0 :* _ ; where Tx in ( %[%+cni%] %[%+prs%] %[%-cni%] %[%-prs%] ) Ty in ( %[%-cni%] %[%-prs%] %[%+cni%] %[%+prs%] ) matched ;
$ hfst-lexc prefix.lexc | hfst-invert -o prefix.hfst $ hfst-twolc prefix-const.twol -o prefix-const.hfst $ hfst-compose-intersect -1 prefix.hfst -2 prefix-const.hfst | hfst-fst2strings bekardim:kardan<v><tv><prs><p1><sg> bekard:kardan<v><tv><prs><p3><sg> kardim:kardan<v><tv><pri><p1><sg> kard:kardan<v><tv><pri><p3><sg> mikardim:kardan<v><tv><cni><p1><sg> mikard:kardan<v><tv><cni><p3><sg>