Difference between revisions of "Northern Sámi and Norwegian/Derivations"
| Line 1: | Line 1: | ||
| This page describes the general mechanism for handling derivations in sme-nob, then summarised the derivations handled (and how they are dealt with). | This page describes the general mechanism for handling [[[derivations]] in sme-nob, then summarised the derivations handled (and how they are dealt with). | ||
| ==Derivations: general rules and exceptions== | ==Derivations: general rules and exceptions== | ||
Revision as of 07:41, 25 August 2014
This page describes the general mechanism for handling [[[derivations]] in sme-nob, then summarised the derivations handled (and how they are dealt with).
Contents
Derivations: general rules and exceptions
Sámi has a lot of derivation rules; sometimes the derived words have lexicalised translations in Bokmål, like ráhkisvuohta→kjærlighet, these we treat as exceptions which have to be specified in bidix. Other times we can use a general rule, like lohkagohten→begynte.1SG å lese.
We have two strategies for handling the rule/exception situation.
- For the situation where we have many exceptions, we let the analysis be eg. geavaheaddjiid/geavahit<V><TV><Der2><Actor><N><Pl>and from here there are two paths- either this specific analysis is in bidix, here translating into bruker<n><m><pl>, or
- we have to use a transfer rule, in this case translating into de som bruker
 
- either this specific analysis is in bidix, here translating into 
- For the situation where we have few exceptions, we use dev/xfst2apertium.relabelto split the analysis into two lexical units. Two lexical units can't be specified in bidix, so here- exceptions have to be added to the .lexc file as if they were lexicalised, so they remain one lexical unit
- while general transfer rules now match a pattern of two lexical units
 
The former is used for most derivations, the latter currently only for Der_goahti.
More detailed: Deverbal nouns
Sámi verbs can turn into nouns by various processes. We want to be able to put this explicitly into the bidix (eg. sometimes the nob noun is not even based on the nob verb), but if it's not in bidix we want to be able to fall back on a construction using the verb, so
- from geavaheaddjiid/geavahit<V><TV><Der2><Actor><N>
- a fallback is => de som bruker<vblex>
- but bidix can also specify => bruker<n><m>
With the following bidix entries we specify that we want bruker<n><m> in the above example:
    <e><p><l>geavahit<s n="V"/><s n="TV"/></l><r>bruke<s n="vblex"/><s n="pers"/></r></p><par n="__verb"/></e>
    <e><p><l>geavahit<s n="V"/><s n="TV"/><s n="Der2"/><s n="Actor"/><s n="N"/></l><r>bruker<s n="n"/><s n="m"/></r></p><par n="__n"/></e>
while if the second bidix line isn't there, we get the fallback. Transfer rules can now do the fallback if eg.
<equal><clip side="tl" part="pos" ...><lit-tag v="n"/></equal> <equal><clip side="sl" part="pos" ...><lit-tag v="V"/></equal>
The similar specification/fallbacks can be applied with other Derivations.
Of course, if geavaheaddjiid is fully specified in the morphology, that should be selected over the derivation analysis (by CG or by weighting the HFST).
The transfer for deverbal Actor nouns has slightly more complexity: It seems from most examples that the present tense verb form alone gives a better translation of Actor nouns iff there is no lemq to the verb and it is singular indefinite, so the above would give "bruker" (if singular, indefinite), while ovddasvástideaddji/Ovddasvástidit<V><IV><Der2><Actor><N><Sg><Nom><@SUBJ→>$ (which in nob has a lemq, "# ansvar for") should give "Den som har ansvar (for)".
Removing unhandled derivations from the analyser
Any derivations that are not handled we remove from the analyser with a twol negation rule in dev/xfst2apertium.useless.twol:
UnhandledDerivations /<= _ ; ! fail if analysis contains a tag from the set UnhandledDerivations
Derivations of derivations are removed with this rule:
Derivation /<= Derivation+ PartOfSpeech+ _ ;
since double derivations are not handled either unless there are explicit transfer rules for them. This makes the lexicon a lot easier to handle for testvoc. Summary of fallbacks below contains the list of derivations that are and aren't handled.
Note: certain double derivations are handled, these are "flattened" by removing the inner PoS tag:
"Allow +V+Der2+Der/PassL+V+Der3+Der/n+N (removing inner PoS tag)" %+V:0 <=> %+V Transitivity %+Der2 %+Der/PassL _ %+Der3 %+Der/n %+N ;
Summary of fallbacks
Derivations that do not appear in this list should be removed from the analyser before release.
Note that derivations of derivations have to be treated as a new type (eg. geavahuvvomis.V.Der2.Der_PassL.V.Der3.Der_n.N could not be handled by a combination of rules for V.Der2.Der_PassL.V and V.Der3.Der_n.N, but needs a mechanism of its own).
- N.Der1.Der_Dimin.N
- N->N (diminutive), passes through as if nothing happened (although we could add "lille" or something)
 
- A.Der3.Der_vuohta.N
- Adj->N, gets adj.def, ("grunn"->"det grunne")
- typical override: "fattig"->"fattigdom"
 
- A.Der2.Der_at.Adv
- Adj->Adv, gets adj.posi.nt.sg.ind ("vid" -> "vidt")
 
- V.Der3.Der_goahti.V
- turns into two words, eg. ^lohkagohten/lohkat<V><TV><Der3>+goahti<V><Ind><Prt><Sg1>$
 
- turns into two words, eg. 
- V.Der2.Der_PassL.V
- V->V (passive), add the "pass" tag, picked up by t1x verb rule
 
- V.Der1.Der_j.V.Der2.Der_PassL.V
- V->V (passive), add the "pass" tag, picked up by t1x verb rule
- TODO: what does Der_j add to the meaning? (ignored for now)
- Double derivation, has exception in dev/xfst2apertium.useless.twol (inner PoS tag removed in order to "flatten" it)
 
- V.Der2.Der_PassL.V.Der3.Der_n.N
- V->N (via passive), for now just outputs the plain passive infinitive (ideally should be able to enter noun phrase rules)
- Double derivation, has exception in dev/xfst2apertium.useless.twol (inner PoS tag removed in order to "flatten" it)
 
- V.Der_PassS.V
- V->V (passive), add the "pass" tag, picked up by t1x verb rule
 
- V.Der1.Der2.Der_halla.V
- V->V (passive), add the "pass" tag, picked up by t1x verb rule. Verb is tagged <ill-av>since the agent is illiative with halla-verbs
 
- V->V (passive), add the "pass" tag, picked up by t1x verb rule. Verb is tagged 
- V.Der1.Der_h.V
- V->V (causative), add the "caus" tag, picked up by t1x verb rule
 
- V.Der2.Der_ahtti.V
- V->V (causative), add the "caus" tag, picked up by t1x verb rule
 
- V.Der1.Der_d.V
- V->V (reflexive), add the "ref" tag, picked up by t1x verb rule
 
- V.Der2.Der_alla.V
- V->V (reflexive), add the "ref" tag, picked up by t1x verb rule
 
- V.Der1.Der_st.V
- V->V (diminutive), passes through as if nothing happened (although we could add the adverb "litt"?)
 
- V.Der3.Der_n.N
- V->N, gets adj.pprs
 
- V.Der1.Der2.Der_las
- V->Adj, gets adj.pprs ("gi"->"givende")
- typical override: gi->generøs
 
- V.Der2.Actor.N
- V->N (actor), gets vblex.pres.m (could tag TD instead of pres, and select between pres and pret based on earlier finite verb, TODO)
 
- V.Der1.Der_j.Der2.Actor.N
- V->N (actor), as above
- TODO: what does Der/j add to the meaning? (ignored for now)
 
- V.Der2.Der_eapmi.N
- V->N (action/process), gets vblex.inf.nt
- typical override: "seile"->"seiling"
 
- V.Der3.Der_muš.N
- V->N (action/process), gets vblex.inf.nt
 
Still TODO
- V.Der2.Der_adda.N.PrfPrc.Actio
- V.Der3.Der_amoš.N
- V.Der2.Der_asti.V
- V.Der2.Der_eamoš.N
- V.Der2.Der_easti.V
- N.Der3.Der_geahtes.???
- N.Der1.Der2.Der_heapmi.A
- A.Der1.Der_huhtti.V
- N.Der1.Der2.Der_huvva.V
- V.Der1.Der_j.V
- V.Der1.Der_l.V
- N.Der1.Der_laš.A
- Pron.Der1.Der_lágan.A
- V.Der1.Der_meahttun.A
- V.Der1.Der2.Der_stuvva.V
- V.Der3.Der_supmi.N
- V.Der3.Der_upmi.???
- N.Der_viđá.Adv
- N.SgCmp.Der3.Der_veara.A
and any combinations not mentioned here.
Derivation tags and their meanings
Note: Við eigum að breyta mörk neðan af því að það er ekki hægt að nota /. í mörkum í apertium. En þá eigum við að breyta CG líka... 
There are also derivations of derivations:
"<geavaheaddjis>"
...
          "geavvat" V* IV* Der1 Der/h V* TV Der2 Actor N Sg Acc PxSg3
For transfer purposes it might be simplest to treat these "flatly" as if they were single derivations (ie. Der1_Der_h_V_TV_Der2).
| Tag | Type | Example | in Bokmål | 
|---|---|---|---|
| Der/Dimin | N→N[diminutive] | mánáš "mánná" N Der1 Der/Dimin N Sg Nom | barn→lite barn | 
| Der/1 Der/st | V→V[diminutive] | attestit "addit" V TV Der1 Der/st V Inf | gi→gi litt | 
| Der/st | Diminutive V→V | oainnestit, várástit "várát" V TV Der1 Der/st V | se→skimte (add "litt"?) | 
| Der/adda | V→N.PrfPrc.Actio | bassaladdan "bassalit" V* TV Der2 Der/adda | →vaske tøy (bassat=vaske) | 
| Der/ahtti | V→V | vajálduhttit "vajálduvvat" V* IV* Der2 Der/ahtti V TV | →overse/glemme | 
| Der/alla | suffix | bázáhallan "bázihit" V* TV Der2 Der/alla V Actio | → | 
| Der/amoš | suffix | muitalamoš "muitalit" V TV Der3 Der/amoš N Sg Nom | fortelle→ | 
| Der/asti | suffix | muitalastit "muitalit" V TV Der2 Der/asti V Inf | fortelle→ | 
| Der/at | Adj→Adv | viidát "viiddis" A* Der2 Der/at Adv | vid→vidt | 
| Der/d | V→V[refl] | basadit "bassat" V TV Der1 Der/d V | vaske→vaske seg | 
| Der/eaddji | V→N.Actor | muitaleaddji "muitalit" V TV Der2 Actor N Sg Nom | fortelle→forteller | 
| Der/eamoš | suffix | muitaleamoš "muitalit" V* TV Der3 Der/eamoš | fortelle→ | 
| Der/eapmi | V→N | deaivvadeapmi "deaivvadit" V IV Der2 Der/eapmi N Sg Nom | møte(V)→møte(N), feire→feiring | 
| Der/easti | suffix | muitaleastit "muitalit" V TV Der2 Der/easti V Inf | fortelle → | 
| Der/geahtes | suffix | eaiggátkeahtes "eaiggát" N* Der3 Der/geahtes | eier → | 
| Der/goahti | V→VInchoative | boradišgohten "boradit" V TV Der3 Der/goahti V Ind Prt Sg1 | spise → jeg begynte å spise | 
| Der/h | suffix | geavaheaddji "geavvat" V* IV* Der1 Der/h V* TV Der2 Actor; orrohit "orrot" V* IV Der1 Der/h V | heve seg→ ; bli/synes→ | 
| Der/halla | V→V[recip] | gulahallat "gullat" V* TV Der1 Der2 Der/halla | høre→forstå hverandre («høre hverandre»?) | 
| Der/heapmi | suffix | čađaheapmi "čađđa" N* Der1 Der2 Der/heapmi A | → | 
| Der/huhtti | suffix | muosehuhttit "muoseheapme" A* Der1 Der/huhtti V* TV | urolig→ | 
| Der/huvva | suffix | čađahuvvo "čađđa" N* Der1 Der2 Der/huvva V IV Imprt Prs ConNegII | → | 
| Der/j | suffix | sáddejuvvot "sáddet" V* TV Der1 Der/j V* Der2 Der/PassL V | sende→ | 
| Der1 Der/l | V→V[subitive] | borralit "borralit" V TV Der1 Der/l V | spise→spise (i hast) | 
| Der/l | ???? | ohcalit "ohcat" V* TV Der1 Der/l V | lete→savne/lengte etter | 
| Der/las | V→Adj | addálas "addit" V TV Der1 Der2 Der/las A | gi→generøs | 
| Der/laš | N→Adj | dábálaš "dáhpi" N Der1 Der/laš A Sg Nom | skikk→vanlig | 
| Der/lágan | suffix | earálágan "eará" Pron Indef Sg Gen Der1 Der/lágan A | annen/andre→ | 
| Der/meahttun | V→Adj[Neg] | jáhkkemeahttun "jáhkkit" V TV Der1 Der/meahttun A Sg Nom | tro/anta→utrolig | 
| Der/muš | suffix | ??? "juhkat" V TV Der3 Der/muš N Sg Nom | drikke→ | 
| Der/n | suffix | oažžun "oažžut" V* TV Der3 Der/n N | få→? | 
| Der/stuvva | suffix | fuolastuvvat "fuollat" V* TV Der1 Der2 Der/stuvva V | bry seg om→ | 
| Der/supmi | suffix | čállosupmi "čállit" V* TV Der2 Der/PassL V* Der3 Der/supmi N | skrive/...→ | 
| Der/upmi | suffix | mearkkašupmi "mearkkašit" V* TV Der2 Der/PassL V* Der3 Der/upmi | merge seg→ | 
| Der/viđá | suffix | málestanviđá "málet" V TV Der1 Der/st V Der2 Der/eapmi N SgCmp Der/viđá Adv | male→ | 
| Der/vuohta | Adj→N | ráhkisvuohta "ráhkis" A Der3 Der/vuohta N Sg Nom | kjær→kjærlighet | 
| Der/veara | N→Adj | mearkkašanveara "mearkkašeapmi" N SgCmp Der3 Der/veara A | merknad→markert? | 

