Apertium has moved from SourceForge to GitHub.
If you have any questions, please come and talk to us on #apertium on irc.freenode.net or contact the GitHub migration team.

Lexc and flag diacritics for prefix tagging

From Apertium
Revision as of 14:43, 7 November 2019 by TommiPirinen (Talk | contribs)

Jump to: navigation, search

This page describes lexc+flag diacritics approach to move analyses in the end (suffixing style) for prefixing languages.

The most logical way to tag prefixing languages in lexc would be so:

Multichar_Symbols
%<n%>
%<sg%> 
%<pl%>

LEXICON Root
0 Prefixes ;


LEXICON Prefixes
%<sg%>:0 NounRoots
%<pl%>:pl NounRoots

LEXICON NounRoots

%<n%>root1:0root1 # ;
%<n%>root2:0root2 # ;
%<n%>root3:0root3 # ;

This creates surface forms: {root1, root2, root3, plroot1, plroot2, plroot3} with analyses {%<sg%>%<n%>root1, %<sg%>%<n%>root2, ...} which does not work in further steps of apertium pipeline.

To make it look like it was typical european suffixing language, we need to move the tags at the end, and one way to do it in lexc is flag diacritics...

Multichar_Symbols
%<n%>
%<sg%>
%<pl%>

@P.NTAG.ON@
@R.NTAG.ON@
@P.SG.ON@
@R.SG.ON@
@P.PL.ON@
@R.PL.ON@

LEXICON Root
0 Prefixes ;


LEXICON Prefixes
@P.SG.ON@ NounRoots
@P.PL.ON@:@P.PL.ON@pl NounRoots

LEXICON NounRoots

@P.NTAG.ON@root1:@P.NTAG.ON@root1 ENDLEX1 ;
@P.NTAG.ON@root2:@P.NTAG.ON@root2 ENDLEX1 ;
@P.NTAG.ON@root3:@P.NTAG.ON@root3 ENDLEX1 ;

LEXICON ENDLEX1

@R.NTAG.ON@%<n%>:@R.NTAG.ON@ ENDLEX2 ;

LEXICON ENDLEX2

@R.SG.ON@%<sg%>:@R.SG.ON@ ENDLEX3 ;
@R.PL.ON@%<pl%>:@R.PL.ON@ ENDLEX3 ;

LEXICON ENDLEX3

# ;

If there are tags that are optional you need to add a path without tag and a @D flag:

Multichar_Symbols

...
@D.NTAG@
...

LEXICON Verbs

foo ENDLEX1 ;

LEXICON ENDLEX1

@D.NTAG@ ENDLEX2 ;
@R.NTAG.ON@%<n%>:@R.NTAG.ON@ ENDLEX2 ;
...

If there are some multivalued and optional tags with complementary distribution it may be easier to use one endlex for that set of tags like so:

Multichar_Symbols

@P.SUBJ.SG1@ @P.SUBJ.SG2@ ...
@R.SUBJ.SG1@ @P.SUBJ.SG2@ ...
@D.SUBJ@

LEXICON VerbPrefixes

@P.SUBJ.SG1@:@P.SUBJ.SG1@sg1 Verbs ;
@P.SUBJ.SG2@:@P.SUBJ.SG2@sg2 Verbs ;
...

LEXICON ENDLEX2

@D.SUBJ@ ENDLEX3 ;
@P.SUBJ.SG1@%<sg1%>:@P.SUBJ.SG1@ ENDLEX3 ;
...
Personal tools