Difference between revisions of "Conjoined lexical units"
| Popcorndude (talk | contribs)  (Created page with "A single surface form can correspond to multiple lexical units. When this happens, the lexical units are connected with <code>+</code>.  An example of where this might be usef...") | |||
| (5 intermediate revisions by 2 users not shown) | |||
| Line 1: | Line 1: | ||
| A single surface form can correspond to multiple lexical units. When this happens, the lexical units are connected with <code>+</code>. | A single reading of a surface form can correspond to multiple lexical units. When this happens, the lexical units are connected with <code>+</code>. | ||
| In the analyser, the conjoined whole is considered one lexical unit (all of it is within the same <code>^…$</code> in the output of the analyser), while during transfer they are considered separate (a single <code>+</code> will give two <code>^…$ ^…$</code>). | |||
| ⚫ | |||
| <small>Note also: This is not the same as ambiguity – one surface form can have several readings, and each reading may be a simple or conjoined lexical unit.</small> | |||
| For some phenomena, the join is explicitly marked in the dictionary. An example would be Catalan determiners, where a surface form like <i>pel</i> has a reading analysed as <code>per<pr> + el<det></code> – this is represented just like the below "English Possessives" example. But <code>+</code> is also used for dynamic compound analysis in [[lttoolbox]], where a full path in the dictionary can be marked with a hidden tag saying "I can be a non-final compound part" or "I can be a final compound part", but the full path of the final analysis is not in the dictionary, e.g. if <i>vier</i> and <i>letter</i> have analyses in the dictionary that give the non-final hidden tag, and <i>woorden</i> with the final tag, then <i>vierletterwoorden</i> will be analysed as a compound of conjoined lexical units. See [[Compounds]] for more information on this. | |||
| == Minimal Example: English Possessives == | |||
| ⚫ | |||
|  ^dog/dog<n><sg>$ |  ^dog/dog<n><sg>$ | ||
| Line 49: | Line 57: | ||
| NounRoot NounNumPos | NounRoot NounNumPos | ||
| </pre> | </pre> | ||
| == More Involved Example: Chukchi Incorporation == | |||
| Chukchi can incorporate nouns into verbs. A simplified example is given below: | |||
| <pre> | |||
| LEXICON VerbRoot | |||
| амэчатык:амэчат | |||
| анӈатык:анӈат | |||
| LEXICON NounRoot | |||
| варат | |||
| ватап | |||
| PATTERN VerbStem | |||
| VerbRoot | |||
| NounRoot [<n><incorp>+:>{ы}] VerbRoot | |||
| PATTERNS | |||
| VerbStem [<v>:] | |||
| </pre> | |||
| This generates the forms | |||
| <pre> | |||
| ^амэчат/амэчатык<v>$ | |||
| ^анӈат/анӈатык<v>$ | |||
| ^варат>{ы}амэчат/варат<n><incorp>+амэчатык<v>$ | |||
| ^варат>{ы}анӈат/варат<n><incorp>+анӈатык<v>$ | |||
| ^ватап>{ы}амэчат/ватап<n><incorp>+амэчатык<v>$ | |||
| ^ватап>{ы}анӈат/ватап<n><incorp>+анӈатык<v>$ | |||
| </pre> | |||
| Equivalent lexc: | |||
| <pre> | |||
| LEXICON Root | |||
| NounIncorp ; | |||
| VerbRoot ; | |||
| LEXICON NounIncorp | |||
| варат:варат NounIncorpInfl ; | |||
| ватап:ватап NounIncorpInfl ; | |||
| LEXICON NounIncorpInfl | |||
| %<n%>%<incorp%>%+:%>%{ы%} VerbRoot ; | |||
| LEXICON VerbRoot | |||
| амэчатык:амэчат VerbInfl ; | |||
| анӈатык:анӈат VerbInfl ; | |||
| LEXICON VerbInfl | |||
| %<v%>: # ; | |||
| </pre> | |||
| Roughly equivalent monodix (replacing <code>>{ы}</code> with <code>ы</code> since these are for composing with [[Twol]]) | |||
| <pre> | |||
| <pardef n="noun_root"> | |||
|   <e><i>варат</i></e> | |||
|   <e><i>ватап</i></e> | |||
| </pardef> | |||
| <pardef n="verb_root"> | |||
|   <e><p> <l>амэчат</l> <r>амэчатык</r> </p></e> | |||
|   <e><p> <l>анӈат</l>  <r>анӈатык</r>  </p></e> | |||
| </pardef> | |||
| <pardef n="verb_infl"> | |||
|   <e><p> <l></l> <r><s n="v"/></r> </p></e> | |||
| </pardef> | |||
| <pardef n="noun_incorp_infl"> | |||
|   <e><p> <l>ы</l> <r><s n="n"/><s n="incorp"/><j/></r> </p></e> | |||
| </pardef> | |||
| <e> <par n="verb_root"/> <par n="verb_infl"/> </e> | |||
| <e> <par n="noun_root"/> <par n="incorp_infl"/> <par n="verb_root"/> <par n="verb_infl"/> </e> | |||
| </pre> | |||
| == See Also == | |||
| * [[Compounds]] | |||
| * [[Apertium stream format]] | |||
| [[Category:Documentation in English]] | |||
Latest revision as of 08:31, 6 April 2021
A single reading of a surface form can correspond to multiple lexical units. When this happens, the lexical units are connected with +.
In the analyser, the conjoined whole is considered one lexical unit (all of it is within the same ^…$ in the output of the analyser), while during transfer they are considered separate (a single + will give two ^…$ ^…$).
Note also: This is not the same as ambiguity – one surface form can have several readings, and each reading may be a simple or conjoined lexical unit.
For some phenomena, the join is explicitly marked in the dictionary. An example would be Catalan determiners, where a surface form like pel has a reading analysed as per<pr> + el<det> – this is represented just like the below "English Possessives" example. But + is also used for dynamic compound analysis in lttoolbox, where a full path in the dictionary can be marked with a hidden tag saying "I can be a non-final compound part" or "I can be a final compound part", but the full path of the final analysis is not in the dictionary, e.g. if vier and letter have analyses in the dictionary that give the non-final hidden tag, and woorden with the final tag, then vierletterwoorden will be analysed as a compound of conjoined lexical units. See Compounds for more information on this.
Minimal Example: English Possessives[edit]
An example of where conjoined lexical units might be useful is English plurals and possessives:
^dog/dog<n><sg>$ ^dogs/dog<n><pl>$ ^dog's/dog<n><sg>+'s<gen>$ ^dogs'/dog<n><pl>+'s<gen>$
In monodix this is written with <j/>
<pardef n="dog__n"> <e><p> <l></l> <r><s n="n"/><s n="sg"/></r> </p></e> <e><p> <l>s</l> <r><s n="n"/><s n="pl"/></r> </p></e> <e><p> <l>'s</l> <r><s n="n"/><s n="sg"/><j/>'s<s n="gen"/></r> </p></e> <e><p> <l>s'</l> <r><s n="n"/><s n="pl"/><j/>'s<s n="gen"/></r> </p></e> </pardef> <e lm="dog"><i>dog</i><par n="dog__n"/></e>
In lexc this is written with %+
LEXICON NounInfl %<n%>%<sg%>: # ; %<n%>%<pl%>:s # ; %<n%>%<sg%>%+'s%<gen%>:'s # ; %<n%>%<pl%>%+'s%<gen%>:s' # ; LEXICON NounRoot dog:dog NounInfl ;
In lexd this is written with +
LEXICON NounNumPos <sg>: <pl>:s <sg>+'s<gen>:'s <pl>+'s<gen>:s' LEXICON NounRoot dog:dog PATTERNS NounRoot NounNumPos
More Involved Example: Chukchi Incorporation[edit]
Chukchi can incorporate nouns into verbs. A simplified example is given below:
LEXICON VerbRoot
амэчатык:амэчат
анӈатык:анӈат
LEXICON NounRoot
варат
ватап
PATTERN VerbStem
VerbRoot
NounRoot [<n><incorp>+:>{ы}] VerbRoot
PATTERNS
VerbStem [<v>:]
This generates the forms
^амэчат/амэчатык<v>$
^анӈат/анӈатык<v>$
^варат>{ы}амэчат/варат<n><incorp>+амэчатык<v>$
^варат>{ы}анӈат/варат<n><incorp>+анӈатык<v>$
^ватап>{ы}амэчат/ватап<n><incorp>+амэчатык<v>$
^ватап>{ы}анӈат/ватап<n><incorp>+анӈатык<v>$
Equivalent lexc:
LEXICON Root
NounIncorp ;
VerbRoot ;
LEXICON NounIncorp
варат:варат NounIncorpInfl ;
ватап:ватап NounIncorpInfl ;
LEXICON NounIncorpInfl
%<n%>%<incorp%>%+:%>%{ы%} VerbRoot ;
LEXICON VerbRoot
амэчатык:амэчат VerbInfl ;
анӈатык:анӈат VerbInfl ;
LEXICON VerbInfl
%<v%>: # ;
Roughly equivalent monodix (replacing >{ы} with ы since these are for composing with Twol)
<pardef n="noun_root"> <e><i>варат</i></e> <e><i>ватап</i></e> </pardef> <pardef n="verb_root"> <e><p> <l>амэчат</l> <r>амэчатык</r> </p></e> <e><p> <l>анӈат</l> <r>анӈатык</r> </p></e> </pardef> <pardef n="verb_infl"> <e><p> <l></l> <r><s n="v"/></r> </p></e> </pardef> <pardef n="noun_incorp_infl"> <e><p> <l>ы</l> <r><s n="n"/><s n="incorp"/><j/></r> </p></e> </pardef> <e> <par n="verb_root"/> <par n="verb_infl"/> </e> <e> <par n="noun_root"/> <par n="incorp_infl"/> <par n="verb_root"/> <par n="verb_infl"/> </e>

