Difference between revisions of "Replacement for flag diacritics"

From Apertium
Jump to navigation Jump to search
(Created page with "People like to use flag diacritics for stuff. But they are bad because they are ugly and get in the way of stuff. Alternative: Use symbols and finite-state operations! =...")
 
Line 2: Line 2:


Alternative: Use symbols and finite-state operations!
Alternative: Use symbols and finite-state operations!

We have <code>&lt;</code> and <code>&gt;</code> for morphological tags, and <code>{</code> and <code>}</code> for [[archiphonemes]] and morphological features. We add a new type of symbol with <code>[</code> and <code>]</code> for modelling morphotactic restrictions.


==Example==
==Example==

Revision as of 15:44, 14 June 2014

People like to use flag diacritics for stuff. But they are bad because they are ugly and get in the way of stuff.

Alternative: Use symbols and finite-state operations!

We have < and > for morphological tags, and { and } for archiphonemes and morphological features. We add a new type of symbol with [ and ] for modelling morphotactic restrictions.

Example

Multichar_Symbols

%<v%> %<cop%> %<tv%> %<aor%> %<prog%> %<p1% %<p3%> %<sg%>

%[%-aor%] %[%+aor%]

%+

LEXICON Root

Verbs ; 

LEXICON PERS

%<p1%>%<sg%>:im # ;
%<p3%>%<sg%>: # ;

LEXICON COP

%+i%<cop%>%<aor%>%[%+aor%]: PERS ;

LEXICON V-TV 

%<v%>%<tv%>%<aor%>%[%+aor%]:ar PERS ;
%<v%>%<tv%>%<aor%>%[%+aor%]:ar COP ;
%<v%>%<tv%>%<prog%>%[%-aor%]:iyor COP ;

LEXICON Verbs 

bil:bil V-TV ; ! ""
Alphabet

b i l m i y o r u m 

%<v%> %<tv%> %<prog%> %<aor%> %<p1%> %<p2%> %<p3%> %<sg%> %<cop%>

%[%+aor%]:0  %[%-aor%]:0 

;

Sets 

Verb = %<v%> ;

Rules 

"No consecutive +aor tags"
%[%+aor%]:0 /<= %[%+aor%]:0 :* _ ; 
$ hfst-lexc test.lexc | hfst-invert -o test.hfst
$ hfst-twolc test-const.twol -o const.hfst
$ hfst-compose-intersect -1 test.hfst -2 const.hfst | hfst-fst2strings 

biliyorim:bil<v><tv><prog>+i<cop><aor><p1><sg>
biliyor:bil<v><tv><prog>+i<cop><aor><p3><sg>
bilarim:bil<v><tv><aor><p1><sg>
bilar:bil<v><tv><aor><p3><sg>