Difference between revisions of "Conjoined lexical units"

From Apertium
Jump to navigation Jump to search
(Created page with "A single surface form can correspond to multiple lexical units. When this happens, the lexical units are connected with <code>+</code>. An example of where this might be usef...")
 
 
(5 intermediate revisions by 2 users not shown)
Line 1: Line 1:
A single surface form can correspond to multiple lexical units. When this happens, the lexical units are connected with <code>+</code>.
+
A single reading of a surface form can correspond to multiple lexical units. When this happens, the lexical units are connected with <code>+</code>.
   
  +
In the analyser, the conjoined whole is considered one lexical unit (all of it is within the same <code>^…$</code> in the output of the analyser), while during transfer they are considered separate (a single <code>+</code> will give two <code>^…$ ^…$</code>).
An example of where this might be useful is English plurals and possessives:
 
  +
  +
<small>Note also: This is not the same as ambiguity – one surface form can have several readings, and each reading may be a simple or conjoined lexical unit.</small>
  +
  +
For some phenomena, the join is explicitly marked in the dictionary. An example would be Catalan determiners, where a surface form like <i>pel</i> has a reading analysed as <code>per&lt;pr&gt; + el&lt;det&gt;</code> – this is represented just like the below "English Possessives" example. But <code>+</code> is also used for dynamic compound analysis in [[lttoolbox]], where a full path in the dictionary can be marked with a hidden tag saying "I can be a non-final compound part" or "I can be a final compound part", but the full path of the final analysis is not in the dictionary, e.g. if <i>vier</i> and <i>letter</i> have analyses in the dictionary that give the non-final hidden tag, and <i>woorden</i> with the final tag, then <i>vierletterwoorden</i> will be analysed as a compound of conjoined lexical units. See [[Compounds]] for more information on this.
  +
  +
== Minimal Example: English Possessives ==
  +
 
An example of where conjoined lexical units might be useful is English plurals and possessives:
   
 
^dog/dog<n><sg>$
 
^dog/dog<n><sg>$
Line 49: Line 57:
 
NounRoot NounNumPos
 
NounRoot NounNumPos
 
</pre>
 
</pre>
  +
  +
== More Involved Example: Chukchi Incorporation ==
  +
  +
Chukchi can incorporate nouns into verbs. A simplified example is given below:
  +
  +
<pre>
  +
LEXICON VerbRoot
  +
амэчатык:амэчат
  +
анӈатык:анӈат
  +
  +
LEXICON NounRoot
  +
варат
  +
ватап
  +
  +
PATTERN VerbStem
  +
VerbRoot
  +
NounRoot [<n><incorp>+:>{ы}] VerbRoot
  +
  +
PATTERNS
  +
VerbStem [<v>:]
  +
</pre>
  +
  +
This generates the forms
  +
  +
<pre>
  +
^амэчат/амэчатык<v>$
  +
^анӈат/анӈатык<v>$
  +
^варат>{ы}амэчат/варат<n><incorp>+амэчатык<v>$
  +
^варат>{ы}анӈат/варат<n><incorp>+анӈатык<v>$
  +
^ватап>{ы}амэчат/ватап<n><incorp>+амэчатык<v>$
  +
^ватап>{ы}анӈат/ватап<n><incorp>+анӈатык<v>$
  +
</pre>
  +
  +
Equivalent lexc:
  +
  +
<pre>
  +
LEXICON Root
  +
NounIncorp ;
  +
VerbRoot ;
  +
  +
LEXICON NounIncorp
  +
варат:варат NounIncorpInfl ;
  +
ватап:ватап NounIncorpInfl ;
  +
  +
LEXICON NounIncorpInfl
  +
%<n%>%<incorp%>%+:%>%{ы%} VerbRoot ;
  +
  +
LEXICON VerbRoot
  +
амэчатык:амэчат VerbInfl ;
  +
анӈатык:анӈат VerbInfl ;
  +
  +
LEXICON VerbInfl
  +
%<v%>: # ;
  +
</pre>
  +
  +
Roughly equivalent monodix (replacing <code>&gt;{ы}</code> with <code>ы</code> since these are for composing with [[Twol]])
  +
  +
<pre>
  +
<pardef n="noun_root">
  +
<e><i>варат</i></e>
  +
<e><i>ватап</i></e>
  +
</pardef>
  +
<pardef n="verb_root">
  +
<e><p> <l>амэчат</l> <r>амэчатык</r> </p></e>
  +
<e><p> <l>анӈат</l> <r>анӈатык</r> </p></e>
  +
</pardef>
  +
<pardef n="verb_infl">
  +
<e><p> <l></l> <r><s n="v"/></r> </p></e>
  +
</pardef>
  +
<pardef n="noun_incorp_infl">
  +
<e><p> <l>ы</l> <r><s n="n"/><s n="incorp"/><j/></r> </p></e>
  +
</pardef>
  +
  +
<e> <par n="verb_root"/> <par n="verb_infl"/> </e>
  +
<e> <par n="noun_root"/> <par n="incorp_infl"/> <par n="verb_root"/> <par n="verb_infl"/> </e>
  +
</pre>
  +
  +
== See Also ==
  +
  +
* [[Compounds]]
  +
* [[Apertium stream format]]
  +
  +
[[Category:Documentation in English]]

Latest revision as of 08:31, 6 April 2021

A single reading of a surface form can correspond to multiple lexical units. When this happens, the lexical units are connected with +.

In the analyser, the conjoined whole is considered one lexical unit (all of it is within the same ^…$ in the output of the analyser), while during transfer they are considered separate (a single + will give two ^…$ ^…$).

Note also: This is not the same as ambiguity – one surface form can have several readings, and each reading may be a simple or conjoined lexical unit.

For some phenomena, the join is explicitly marked in the dictionary. An example would be Catalan determiners, where a surface form like pel has a reading analysed as per<pr> + el<det> – this is represented just like the below "English Possessives" example. But + is also used for dynamic compound analysis in lttoolbox, where a full path in the dictionary can be marked with a hidden tag saying "I can be a non-final compound part" or "I can be a final compound part", but the full path of the final analysis is not in the dictionary, e.g. if vier and letter have analyses in the dictionary that give the non-final hidden tag, and woorden with the final tag, then vierletterwoorden will be analysed as a compound of conjoined lexical units. See Compounds for more information on this.

Minimal Example: English Possessives[edit]

An example of where conjoined lexical units might be useful is English plurals and possessives:

^dog/dog<n><sg>$
^dogs/dog<n><pl>$
^dog's/dog<n><sg>+'s<gen>$
^dogs'/dog<n><pl>+'s<gen>$

In monodix this is written with <j/>

<pardef n="dog__n">
  <e><p> <l></l>    <r><s n="n"/><s n="sg"/></r>                   </p></e>
  <e><p> <l>s</l>   <r><s n="n"/><s n="pl"/></r>                   </p></e>
  <e><p> <l>'s</l>  <r><s n="n"/><s n="sg"/><j/>'s<s n="gen"/></r> </p></e>
  <e><p> <l>s'</l>  <r><s n="n"/><s n="pl"/><j/>'s<s n="gen"/></r> </p></e>
</pardef>

<e lm="dog"><i>dog</i><par n="dog__n"/></e>

In lexc this is written with %+

LEXICON NounInfl
%<n%>%<sg%>:   # ;
%<n%>%<pl%>:s  # ;
%<n%>%<sg%>%+'s%<gen%>:'s  # ;
%<n%>%<pl%>%+'s%<gen%>:s'  # ;

LEXICON NounRoot
dog:dog NounInfl ;

In lexd this is written with +

LEXICON NounNumPos
<sg>:
<pl>:s
<sg>+'s<gen>:'s
<pl>+'s<gen>:s'

LEXICON NounRoot
dog:dog

PATTERNS
NounRoot NounNumPos

More Involved Example: Chukchi Incorporation[edit]

Chukchi can incorporate nouns into verbs. A simplified example is given below:

LEXICON VerbRoot
амэчатык:амэчат
анӈатык:анӈат

LEXICON NounRoot
варат
ватап

PATTERN VerbStem
VerbRoot
NounRoot [<n><incorp>+:>{ы}] VerbRoot

PATTERNS
VerbStem [<v>:]

This generates the forms

^амэчат/амэчатык<v>$
^анӈат/анӈатык<v>$
^варат>{ы}амэчат/варат<n><incorp>+амэчатык<v>$
^варат>{ы}анӈат/варат<n><incorp>+анӈатык<v>$
^ватап>{ы}амэчат/ватап<n><incorp>+амэчатык<v>$
^ватап>{ы}анӈат/ватап<n><incorp>+анӈатык<v>$

Equivalent lexc:

LEXICON Root
NounIncorp ;
VerbRoot ;

LEXICON NounIncorp
варат:варат NounIncorpInfl ;
ватап:ватап NounIncorpInfl ;

LEXICON NounIncorpInfl
%<n%>%<incorp%>%+:%>%{ы%} VerbRoot ;

LEXICON VerbRoot
амэчатык:амэчат VerbInfl ;
анӈатык:анӈат VerbInfl ;

LEXICON VerbInfl
%<v%>: # ;

Roughly equivalent monodix (replacing >{ы} with ы since these are for composing with Twol)

<pardef n="noun_root">
  <e><i>варат</i></e>
  <e><i>ватап</i></e>
</pardef>
<pardef n="verb_root">
  <e><p> <l>амэчат</l> <r>амэчатык</r> </p></e>
  <e><p> <l>анӈат</l>  <r>анӈатык</r>  </p></e>
</pardef>
<pardef n="verb_infl">
  <e><p> <l></l> <r><s n="v"/></r> </p></e>
</pardef>
<pardef n="noun_incorp_infl">
  <e><p> <l>ы</l> <r><s n="n"/><s n="incorp"/><j/></r> </p></e>
</pardef>

<e> <par n="verb_root"/> <par n="verb_infl"/> </e>
<e> <par n="noun_root"/> <par n="incorp_infl"/> <par n="verb_root"/> <par n="verb_infl"/> </e>

See Also[edit]