Difference between revisions of "Replacement for flag diacritics"
Jump to navigation
Jump to search
Firespeaker (talk | contribs) |
|||
(9 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
People like to use [[flag diacritics]] for stuff. But they are bad because they are ugly and get in the way of stuff. |
People like to use [[flag diacritics]] for stuff. But they are bad because they are ugly and get in the way of stuff. |
||
Alternative: Use symbols and finite-state operations! |
Alternative: Use distinct symbols with well defined behaviours and finite-state operations! |
||
We have <code><</code> and <code>></code> for morphological tags, and <code>{</code> and <code>}</code> for [[archiphonemes]] and morphological features. We add a new type of symbol with <code>[</code> and <code>]</code> for modelling morphotactic restrictions. |
We have <code><</code> and <code>></code> for morphological tags, and <code>{</code> and <code>}</code> for [[archiphonemes]] and morphological features. We add a new type of symbol with <code>[</code> and <code>]</code> for modelling morphotactic restrictions. |
||
== |
==Examples== |
||
===Turkish=== |
|||
<pre> |
<pre> |
||
Multichar_Symbols |
Multichar_Symbols |
||
%<v%> %<cop%> %<tv%> %<aor%> %<prog%> %<p1% %<p3%> %<sg%> |
%<v%> %<cop%> %<tv%> %<aor%> %<prog%> %<p1%> %<p3%> %<sg%> |
||
%[%-aor%] %[%+aor%] |
%[%-aor%] %[%+aor%] |
||
%+ |
%+ ; |
||
LEXICON Root |
LEXICON Root |
||
Line 31: | Line 33: | ||
LEXICON V-TV |
LEXICON V-TV |
||
%<v%>%<tv%>%<aor%>%[%+aor%]: |
%<v%>%<tv%>%<aor%>%[%+aor%]:ir PERS ; |
||
%<v%>%<tv%>%<aor%>%[%+aor%]: |
%<v%>%<tv%>%<aor%>%[%+aor%]:ir COP ; |
||
%<v%>%<tv%>%<prog%>%[%-aor%]:iyor COP ; |
%<v%>%<tv%>%<prog%>%[%-aor%]:iyor COP ; |
||
Line 43: | Line 45: | ||
Alphabet |
Alphabet |
||
b i l m |
a b c d e f g h i j k l m n o p q r s t u v w x y z |
||
%<v%> %<tv%> %<prog%> %<aor%> %<p1%> %<p2%> %<p3%> %<sg%> %<cop%> |
%<v%> %<tv%> %<prog%> %<aor%> %<p1%> %<p2%> %<p3%> %<sg%> %<cop%> |
||
%[%+aor%]:0 %[%-aor%]:0 |
%[%+aor%]:0 %[%-aor%]:0 ; |
||
⚫ | |||
Sets |
Sets |
||
Line 57: | Line 57: | ||
Rules |
Rules |
||
"No consecutive +aor tags" |
"No consecutive [+aor] tags" |
||
%[%+aor%]:0 /<= %[%+aor%]:0 :* _ ; |
%[%+aor%]:0 /<= %[%+aor%]:0 :* _ ; |
||
</pre> |
</pre> |
||
Line 68: | Line 68: | ||
biliyorim:bil<v><tv><prog>+i<cop><aor><p1><sg> |
biliyorim:bil<v><tv><prog>+i<cop><aor><p1><sg> |
||
biliyor:bil<v><tv><prog>+i<cop><aor><p3><sg> |
biliyor:bil<v><tv><prog>+i<cop><aor><p3><sg> |
||
bilirim:bil<v><tv><aor><p1><sg> |
|||
bilir:bil<v><tv><aor><p3><sg> |
|||
</pre> |
</pre> |
||
===Persian=== |
|||
<pre> |
|||
Multichar_Symbols |
|||
%<v%> %<tv%> %<pri%> %<cni%> %<prs%> %<p1%> %<p3%> %<sg%> |
|||
%[%-prs%] %[%+prs%] %[%-cni%] %[%+cni%] |
|||
%+ |
|||
LEXICON Root |
|||
Prefix ; |
|||
LEXICON Prefix |
|||
%[%+prs%]%[%-cni%]:be Verbs ; |
|||
%[%-prs%]%[%+cni%]:mi Verbs ; |
|||
%[%-prs%]%[%-cni%]: Verbs ; |
|||
LEXICON PERS |
|||
%<p1%>%<sg%>:im # ; |
|||
%<p3%>%<sg%>: # ; |
|||
LEXICON V-TV |
|||
%<v%>%<tv%>%<cni%>%[%+cni%]%[%-prs%]: PERS ; |
|||
%<v%>%<tv%>%<prs%>%[%-cni%]%[%+prs%]: PERS ; |
|||
%<v%>%<tv%>%<pri%>%[%-cni%]%[%-prs%]: PERS ; |
|||
LEXICON Verbs |
|||
kardan:kard V-TV ; ! "" |
|||
</pre> |
|||
<pre> |
|||
$ cat prefix-const.twol |
|||
Alphabet |
|||
a b c d e f g h i j k l m n o p q r s t u v w x y z |
|||
%<v%> %<tv%> %<p1%> %<p3%> %<sg%> %<pri%> %<prs%> %<cni%> |
|||
%[%+prs%]:0 %[%-prs%]:0 %[%+cni%]:0 %[%-cni%]:0 |
|||
⚫ | |||
Sets |
|||
Verb = %<v%> ; |
|||
Rules |
|||
"Match prefixes" |
|||
Tx:0 /<= Ty:0 :* _ ; |
|||
where |
|||
Tx in ( %[%+cni%] %[%+prs%] %[%-cni%] %[%-prs%] ) |
|||
Ty in ( %[%-cni%] %[%-prs%] %[%+cni%] %[%+prs%] ) matched ; |
|||
</pre> |
|||
<pre> |
|||
$ hfst-lexc prefix.lexc | hfst-invert -o prefix.hfst |
|||
$ hfst-twolc prefix-const.twol -o prefix-const.hfst |
|||
$ hfst-compose-intersect -1 prefix.hfst -2 prefix-const.hfst | hfst-fst2strings |
|||
bekardim:kardan<v><tv><prs><p1><sg> |
|||
bekard:kardan<v><tv><prs><p3><sg> |
|||
kardim:kardan<v><tv><pri><p1><sg> |
|||
kard:kardan<v><tv><pri><p3><sg> |
|||
mikardim:kardan<v><tv><cni><p1><sg> |
|||
mikard:kardan<v><tv><cni><p3><sg> |
|||
</pre> |
|||
== See also == |
|||
* [[Morphotactic constraints with twol]] |
|||
[[Category:Development]] |
|||
[[Category:Flag diacritics]] |
Latest revision as of 22:04, 7 February 2017
People like to use flag diacritics for stuff. But they are bad because they are ugly and get in the way of stuff.
Alternative: Use distinct symbols with well defined behaviours and finite-state operations!
We have <
and >
for morphological tags, and {
and }
for archiphonemes and morphological features. We add a new type of symbol with [
and ]
for modelling morphotactic restrictions.
Contents
Examples[edit]
Turkish[edit]
Multichar_Symbols %<v%> %<cop%> %<tv%> %<aor%> %<prog%> %<p1%> %<p3%> %<sg%> %[%-aor%] %[%+aor%] %+ ; LEXICON Root Verbs ; LEXICON PERS %<p1%>%<sg%>:im # ; %<p3%>%<sg%>: # ; LEXICON COP %+i%<cop%>%<aor%>%[%+aor%]: PERS ; LEXICON V-TV %<v%>%<tv%>%<aor%>%[%+aor%]:ir PERS ; %<v%>%<tv%>%<aor%>%[%+aor%]:ir COP ; %<v%>%<tv%>%<prog%>%[%-aor%]:iyor COP ; LEXICON Verbs bil:bil V-TV ; ! ""
Alphabet a b c d e f g h i j k l m n o p q r s t u v w x y z %<v%> %<tv%> %<prog%> %<aor%> %<p1%> %<p2%> %<p3%> %<sg%> %<cop%> %[%+aor%]:0 %[%-aor%]:0 ; Sets Verb = %<v%> ; Rules "No consecutive [+aor] tags" %[%+aor%]:0 /<= %[%+aor%]:0 :* _ ;
$ hfst-lexc test.lexc | hfst-invert -o test.hfst $ hfst-twolc test-const.twol -o const.hfst $ hfst-compose-intersect -1 test.hfst -2 const.hfst | hfst-fst2strings biliyorim:bil<v><tv><prog>+i<cop><aor><p1><sg> biliyor:bil<v><tv><prog>+i<cop><aor><p3><sg> bilirim:bil<v><tv><aor><p1><sg> bilir:bil<v><tv><aor><p3><sg>
Persian[edit]
Multichar_Symbols %<v%> %<tv%> %<pri%> %<cni%> %<prs%> %<p1%> %<p3%> %<sg%> %[%-prs%] %[%+prs%] %[%-cni%] %[%+cni%] %+ LEXICON Root Prefix ; LEXICON Prefix %[%+prs%]%[%-cni%]:be Verbs ; %[%-prs%]%[%+cni%]:mi Verbs ; %[%-prs%]%[%-cni%]: Verbs ; LEXICON PERS %<p1%>%<sg%>:im # ; %<p3%>%<sg%>: # ; LEXICON V-TV %<v%>%<tv%>%<cni%>%[%+cni%]%[%-prs%]: PERS ; %<v%>%<tv%>%<prs%>%[%-cni%]%[%+prs%]: PERS ; %<v%>%<tv%>%<pri%>%[%-cni%]%[%-prs%]: PERS ; LEXICON Verbs kardan:kard V-TV ; ! ""
$ cat prefix-const.twol Alphabet a b c d e f g h i j k l m n o p q r s t u v w x y z %<v%> %<tv%> %<p1%> %<p3%> %<sg%> %<pri%> %<prs%> %<cni%> %[%+prs%]:0 %[%-prs%]:0 %[%+cni%]:0 %[%-cni%]:0 ; Sets Verb = %<v%> ; Rules "Match prefixes" Tx:0 /<= Ty:0 :* _ ; where Tx in ( %[%+cni%] %[%+prs%] %[%-cni%] %[%-prs%] ) Ty in ( %[%-cni%] %[%-prs%] %[%+cni%] %[%+prs%] ) matched ;
$ hfst-lexc prefix.lexc | hfst-invert -o prefix.hfst $ hfst-twolc prefix-const.twol -o prefix-const.hfst $ hfst-compose-intersect -1 prefix.hfst -2 prefix-const.hfst | hfst-fst2strings bekardim:kardan<v><tv><prs><p1><sg> bekard:kardan<v><tv><prs><p3><sg> kardim:kardan<v><tv><pri><p1><sg> kard:kardan<v><tv><pri><p3><sg> mikardim:kardan<v><tv><cni><p1><sg> mikard:kardan<v><tv><cni><p3><sg>