Difference between revisions of "Replacement for flag diacritics"
Jump to navigation
Jump to search
(Created page with "People like to use flag diacritics for stuff. But they are bad because they are ugly and get in the way of stuff. Alternative: Use symbols and finite-state operations! =...") |
Firespeaker (talk | contribs) |
||
(10 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
People like to use [[flag diacritics]] for stuff. But they are bad because they are ugly and get in the way of stuff. |
People like to use [[flag diacritics]] for stuff. But they are bad because they are ugly and get in the way of stuff. |
||
Alternative: Use symbols and finite-state operations! |
Alternative: Use distinct symbols with well defined behaviours and finite-state operations! |
||
We have <code><</code> and <code>></code> for morphological tags, and <code>{</code> and <code>}</code> for [[archiphonemes]] and morphological features. We add a new type of symbol with <code>[</code> and <code>]</code> for modelling morphotactic restrictions. |
|||
==Example== |
|||
==Examples== |
|||
===Turkish=== |
|||
<pre> |
<pre> |
||
Multichar_Symbols |
Multichar_Symbols |
||
%<v%> %<cop%> %<tv%> %<aor%> %<prog%> %<p1% %<p3%> %<sg%> |
%<v%> %<cop%> %<tv%> %<aor%> %<prog%> %<p1%> %<p3%> %<sg%> |
||
%[%-aor%] %[%+aor%] |
%[%-aor%] %[%+aor%] |
||
%+ |
%+ ; |
||
LEXICON Root |
LEXICON Root |
||
Line 29: | Line 33: | ||
LEXICON V-TV |
LEXICON V-TV |
||
%<v%>%<tv%>%<aor%>%[%+aor%]: |
%<v%>%<tv%>%<aor%>%[%+aor%]:ir PERS ; |
||
%<v%>%<tv%>%<aor%>%[%+aor%]: |
%<v%>%<tv%>%<aor%>%[%+aor%]:ir COP ; |
||
%<v%>%<tv%>%<prog%>%[%-aor%]:iyor COP ; |
%<v%>%<tv%>%<prog%>%[%-aor%]:iyor COP ; |
||
Line 41: | Line 45: | ||
Alphabet |
Alphabet |
||
b i l m |
a b c d e f g h i j k l m n o p q r s t u v w x y z |
||
%<v%> %<tv%> %<prog%> %<aor%> %<p1%> %<p2%> %<p3%> %<sg%> %<cop%> |
%<v%> %<tv%> %<prog%> %<aor%> %<p1%> %<p2%> %<p3%> %<sg%> %<cop%> |
||
%[%+aor%]:0 %[%-aor%]:0 |
%[%+aor%]:0 %[%-aor%]:0 ; |
||
⚫ | |||
Sets |
Sets |
||
Line 55: | Line 57: | ||
Rules |
Rules |
||
"No consecutive +aor tags" |
"No consecutive [+aor] tags" |
||
%[%+aor%]:0 /<= %[%+aor%]:0 :* _ ; |
%[%+aor%]:0 /<= %[%+aor%]:0 :* _ ; |
||
</pre> |
</pre> |
||
Line 66: | Line 68: | ||
biliyorim:bil<v><tv><prog>+i<cop><aor><p1><sg> |
biliyorim:bil<v><tv><prog>+i<cop><aor><p1><sg> |
||
biliyor:bil<v><tv><prog>+i<cop><aor><p3><sg> |
biliyor:bil<v><tv><prog>+i<cop><aor><p3><sg> |
||
bilirim:bil<v><tv><aor><p1><sg> |
|||
bilir:bil<v><tv><aor><p3><sg> |
|||
</pre> |
</pre> |
||
===Persian=== |
|||
<pre> |
|||
Multichar_Symbols |
|||
%<v%> %<tv%> %<pri%> %<cni%> %<prs%> %<p1%> %<p3%> %<sg%> |
|||
%[%-prs%] %[%+prs%] %[%-cni%] %[%+cni%] |
|||
%+ |
|||
LEXICON Root |
|||
Prefix ; |
|||
LEXICON Prefix |
|||
%[%+prs%]%[%-cni%]:be Verbs ; |
|||
%[%-prs%]%[%+cni%]:mi Verbs ; |
|||
%[%-prs%]%[%-cni%]: Verbs ; |
|||
LEXICON PERS |
|||
%<p1%>%<sg%>:im # ; |
|||
%<p3%>%<sg%>: # ; |
|||
LEXICON V-TV |
|||
%<v%>%<tv%>%<cni%>%[%+cni%]%[%-prs%]: PERS ; |
|||
%<v%>%<tv%>%<prs%>%[%-cni%]%[%+prs%]: PERS ; |
|||
%<v%>%<tv%>%<pri%>%[%-cni%]%[%-prs%]: PERS ; |
|||
LEXICON Verbs |
|||
kardan:kard V-TV ; ! "" |
|||
</pre> |
|||
<pre> |
|||
$ cat prefix-const.twol |
|||
Alphabet |
|||
a b c d e f g h i j k l m n o p q r s t u v w x y z |
|||
%<v%> %<tv%> %<p1%> %<p3%> %<sg%> %<pri%> %<prs%> %<cni%> |
|||
%[%+prs%]:0 %[%-prs%]:0 %[%+cni%]:0 %[%-cni%]:0 |
|||
⚫ | |||
Sets |
|||
Verb = %<v%> ; |
|||
Rules |
|||
"Match prefixes" |
|||
Tx:0 /<= Ty:0 :* _ ; |
|||
where |
|||
Tx in ( %[%+cni%] %[%+prs%] %[%-cni%] %[%-prs%] ) |
|||
Ty in ( %[%-cni%] %[%-prs%] %[%+cni%] %[%+prs%] ) matched ; |
|||
</pre> |
|||
<pre> |
|||
$ hfst-lexc prefix.lexc | hfst-invert -o prefix.hfst |
|||
$ hfst-twolc prefix-const.twol -o prefix-const.hfst |
|||
$ hfst-compose-intersect -1 prefix.hfst -2 prefix-const.hfst | hfst-fst2strings |
|||
bekardim:kardan<v><tv><prs><p1><sg> |
|||
bekard:kardan<v><tv><prs><p3><sg> |
|||
kardim:kardan<v><tv><pri><p1><sg> |
|||
kard:kardan<v><tv><pri><p3><sg> |
|||
mikardim:kardan<v><tv><cni><p1><sg> |
|||
mikard:kardan<v><tv><cni><p3><sg> |
|||
</pre> |
|||
== See also == |
|||
* [[Morphotactic constraints with twol]] |
|||
[[Category:Development]] |
|||
[[Category:Flag diacritics]] |
Latest revision as of 22:04, 7 February 2017
People like to use flag diacritics for stuff. But they are bad because they are ugly and get in the way of stuff.
Alternative: Use distinct symbols with well defined behaviours and finite-state operations!
We have <
and >
for morphological tags, and {
and }
for archiphonemes and morphological features. We add a new type of symbol with [
and ]
for modelling morphotactic restrictions.
Contents
Examples[edit]
Turkish[edit]
Multichar_Symbols %<v%> %<cop%> %<tv%> %<aor%> %<prog%> %<p1%> %<p3%> %<sg%> %[%-aor%] %[%+aor%] %+ ; LEXICON Root Verbs ; LEXICON PERS %<p1%>%<sg%>:im # ; %<p3%>%<sg%>: # ; LEXICON COP %+i%<cop%>%<aor%>%[%+aor%]: PERS ; LEXICON V-TV %<v%>%<tv%>%<aor%>%[%+aor%]:ir PERS ; %<v%>%<tv%>%<aor%>%[%+aor%]:ir COP ; %<v%>%<tv%>%<prog%>%[%-aor%]:iyor COP ; LEXICON Verbs bil:bil V-TV ; ! ""
Alphabet a b c d e f g h i j k l m n o p q r s t u v w x y z %<v%> %<tv%> %<prog%> %<aor%> %<p1%> %<p2%> %<p3%> %<sg%> %<cop%> %[%+aor%]:0 %[%-aor%]:0 ; Sets Verb = %<v%> ; Rules "No consecutive [+aor] tags" %[%+aor%]:0 /<= %[%+aor%]:0 :* _ ;
$ hfst-lexc test.lexc | hfst-invert -o test.hfst $ hfst-twolc test-const.twol -o const.hfst $ hfst-compose-intersect -1 test.hfst -2 const.hfst | hfst-fst2strings biliyorim:bil<v><tv><prog>+i<cop><aor><p1><sg> biliyor:bil<v><tv><prog>+i<cop><aor><p3><sg> bilirim:bil<v><tv><aor><p1><sg> bilir:bil<v><tv><aor><p3><sg>
Persian[edit]
Multichar_Symbols %<v%> %<tv%> %<pri%> %<cni%> %<prs%> %<p1%> %<p3%> %<sg%> %[%-prs%] %[%+prs%] %[%-cni%] %[%+cni%] %+ LEXICON Root Prefix ; LEXICON Prefix %[%+prs%]%[%-cni%]:be Verbs ; %[%-prs%]%[%+cni%]:mi Verbs ; %[%-prs%]%[%-cni%]: Verbs ; LEXICON PERS %<p1%>%<sg%>:im # ; %<p3%>%<sg%>: # ; LEXICON V-TV %<v%>%<tv%>%<cni%>%[%+cni%]%[%-prs%]: PERS ; %<v%>%<tv%>%<prs%>%[%-cni%]%[%+prs%]: PERS ; %<v%>%<tv%>%<pri%>%[%-cni%]%[%-prs%]: PERS ; LEXICON Verbs kardan:kard V-TV ; ! ""
$ cat prefix-const.twol Alphabet a b c d e f g h i j k l m n o p q r s t u v w x y z %<v%> %<tv%> %<p1%> %<p3%> %<sg%> %<pri%> %<prs%> %<cni%> %[%+prs%]:0 %[%-prs%]:0 %[%+cni%]:0 %[%-cni%]:0 ; Sets Verb = %<v%> ; Rules "Match prefixes" Tx:0 /<= Ty:0 :* _ ; where Tx in ( %[%+cni%] %[%+prs%] %[%-cni%] %[%-prs%] ) Ty in ( %[%-cni%] %[%-prs%] %[%+cni%] %[%+prs%] ) matched ;
$ hfst-lexc prefix.lexc | hfst-invert -o prefix.hfst $ hfst-twolc prefix-const.twol -o prefix-const.hfst $ hfst-compose-intersect -1 prefix.hfst -2 prefix-const.hfst | hfst-fst2strings bekardim:kardan<v><tv><prs><p1><sg> bekard:kardan<v><tv><prs><p3><sg> kardim:kardan<v><tv><pri><p1><sg> kard:kardan<v><tv><pri><p3><sg> mikardim:kardan<v><tv><cni><p1><sg> mikard:kardan<v><tv><cni><p3><sg>