Difference between revisions of "Conjoined lexical units"

From Apertium
Jump to navigation Jump to search
 
Line 1: Line 1:
A single reading of a surface form can correspond to multiple lexical units. When this happens, the lexical units are connected with <code>+</code>.
A single reading of a surface form can correspond to multiple lexical units. When this happens, the lexical units are connected with <code>+</code>.


<small>Note: In the analyser, the conjoined whole is considered one lexical unit, while during transfer they are considered separate. </small>
In the analyser, the conjoined whole is considered one lexical unit (all of it is within the same <code>^…$</code> in the output of the analyser), while during transfer they are considered separate (a single <code>+</code> will give two <code>^…$ ^…$</code>).


<small>Note also: This is not the same as ambiguity – one surface form can have several readings, and each reading may be a simple or conjoined lexical unit.</small>
<small>Note also: This is not the same as ambiguity – one surface form can have several readings, and each reading may be a simple or conjoined lexical unit.</small>

Latest revision as of 08:31, 6 April 2021

A single reading of a surface form can correspond to multiple lexical units. When this happens, the lexical units are connected with +.

In the analyser, the conjoined whole is considered one lexical unit (all of it is within the same ^…$ in the output of the analyser), while during transfer they are considered separate (a single + will give two ^…$ ^…$).

Note also: This is not the same as ambiguity – one surface form can have several readings, and each reading may be a simple or conjoined lexical unit.

For some phenomena, the join is explicitly marked in the dictionary. An example would be Catalan determiners, where a surface form like pel has a reading analysed as per<pr> + el<det> – this is represented just like the below "English Possessives" example. But + is also used for dynamic compound analysis in lttoolbox, where a full path in the dictionary can be marked with a hidden tag saying "I can be a non-final compound part" or "I can be a final compound part", but the full path of the final analysis is not in the dictionary, e.g. if vier and letter have analyses in the dictionary that give the non-final hidden tag, and woorden with the final tag, then vierletterwoorden will be analysed as a compound of conjoined lexical units. See Compounds for more information on this.

Minimal Example: English Possessives[edit]

An example of where conjoined lexical units might be useful is English plurals and possessives:

^dog/dog<n><sg>$
^dogs/dog<n><pl>$
^dog's/dog<n><sg>+'s<gen>$
^dogs'/dog<n><pl>+'s<gen>$

In monodix this is written with <j/>

<pardef n="dog__n">
  <e><p> <l></l>    <r><s n="n"/><s n="sg"/></r>                   </p></e>
  <e><p> <l>s</l>   <r><s n="n"/><s n="pl"/></r>                   </p></e>
  <e><p> <l>'s</l>  <r><s n="n"/><s n="sg"/><j/>'s<s n="gen"/></r> </p></e>
  <e><p> <l>s'</l>  <r><s n="n"/><s n="pl"/><j/>'s<s n="gen"/></r> </p></e>
</pardef>

<e lm="dog"><i>dog</i><par n="dog__n"/></e>

In lexc this is written with %+

LEXICON NounInfl
%<n%>%<sg%>:   # ;
%<n%>%<pl%>:s  # ;
%<n%>%<sg%>%+'s%<gen%>:'s  # ;
%<n%>%<pl%>%+'s%<gen%>:s'  # ;

LEXICON NounRoot
dog:dog NounInfl ;

In lexd this is written with +

LEXICON NounNumPos
<sg>:
<pl>:s
<sg>+'s<gen>:'s
<pl>+'s<gen>:s'

LEXICON NounRoot
dog:dog

PATTERNS
NounRoot NounNumPos

More Involved Example: Chukchi Incorporation[edit]

Chukchi can incorporate nouns into verbs. A simplified example is given below:

LEXICON VerbRoot
амэчатык:амэчат
анӈатык:анӈат

LEXICON NounRoot
варат
ватап

PATTERN VerbStem
VerbRoot
NounRoot [<n><incorp>+:>{ы}] VerbRoot

PATTERNS
VerbStem [<v>:]

This generates the forms

^амэчат/амэчатык<v>$
^анӈат/анӈатык<v>$
^варат>{ы}амэчат/варат<n><incorp>+амэчатык<v>$
^варат>{ы}анӈат/варат<n><incorp>+анӈатык<v>$
^ватап>{ы}амэчат/ватап<n><incorp>+амэчатык<v>$
^ватап>{ы}анӈат/ватап<n><incorp>+анӈатык<v>$

Equivalent lexc:

LEXICON Root
NounIncorp ;
VerbRoot ;

LEXICON NounIncorp
варат:варат NounIncorpInfl ;
ватап:ватап NounIncorpInfl ;

LEXICON NounIncorpInfl
%<n%>%<incorp%>%+:%>%{ы%} VerbRoot ;

LEXICON VerbRoot
амэчатык:амэчат VerbInfl ;
анӈатык:анӈат VerbInfl ;

LEXICON VerbInfl
%<v%>: # ;

Roughly equivalent monodix (replacing >{ы} with ы since these are for composing with Twol)

<pardef n="noun_root">
  <e><i>варат</i></e>
  <e><i>ватап</i></e>
</pardef>
<pardef n="verb_root">
  <e><p> <l>амэчат</l> <r>амэчатык</r> </p></e>
  <e><p> <l>анӈат</l>  <r>анӈатык</r>  </p></e>
</pardef>
<pardef n="verb_infl">
  <e><p> <l></l> <r><s n="v"/></r> </p></e>
</pardef>
<pardef n="noun_incorp_infl">
  <e><p> <l>ы</l> <r><s n="n"/><s n="incorp"/><j/></r> </p></e>
</pardef>

<e> <par n="verb_root"/> <par n="verb_infl"/> </e>
<e> <par n="noun_root"/> <par n="incorp_infl"/> <par n="verb_root"/> <par n="verb_infl"/> </e>

See Also[edit]