Difference between revisions of "Northern Sámi and Norwegian/Derivations"

From Apertium
Jump to navigation Jump to search
 
(35 intermediate revisions by the same user not shown)
Line 1: Line 1:
This page describes the general mechanism for handling [[derivations]] in sme-nob, then summarised the derivations handled (and how they are dealt with).

==Derivations: general rules and exceptions==
==Derivations: general rules and exceptions==
Sámi has a lot of derivation rules. There are various strategies used for translating these:
Sámi has a lot of derivation rules; sometimes the derived words have lexicalised translations in Bokmål, like ''ráhkisvuohta→kjærlighet'', these we treat as '''exceptions''' which have to be specified in bidix. Other times we can use a '''general rule''', like ''lohkagohten→begynte.1SG å lese''.


# We can lexicalise the sme derivation – this is the easiest, safest and leads to the best translations
We have two strategies for handling the rule/exception situation.
# We can use a transfer rule which might add some periphrasis and/or change the form of the main word
#* ''geavahit.V.Der/Actor.N.Pl'' → ''de som bruker''
#* ''lohkat.V.Der/goahti.1SG.Pret'' → ''begynte å lese''
# We can put the full specific analysis in bidix to override the general translation, like translating ''geavahit.V.Der/Actor.N.Pl'' into <code>bruker<n><m><pl></code>
#* DEPRECATED – we should move away from this method, it leads to too much transfer complexity, and might not even work if the bidix pardef also has the full path (making the bidix entry ambiguous). Much better to just lexicalise.
# We can tag-relabel so that the derivation looks like a compound
#* DEPRECATED – we used to do this with Der/goahti (lohkagohten→lese+begynte→begynte å lese), but it works just fine with the regular transfer method, so no need to have yet another method.


== Removing unhandled derivations from the analyser ==
# For the situation where we have many exceptions, we let the analysis be eg. <code>geavaheaddjiid/geavahit<V><TV><Der2><Actor><N><Pl></code> and from here there are two paths
Any derivations that are not handled we remove from the analyser with a twol negation rule in rm-deriv-cmp.twol
## either this specific analysis is in bidix, here translating into <code>bruker<n><m><pl></code>, or
## we have to use a transfer rule, in this case translating into <code>de som bruker</code>
# For the situation where we have few exceptions, we use <code>dev/xfst2apertium.relabel</code> to split the analysis into two lexical units. Two lexical units can't be specified in bidix, so here
## exceptions have to be added to the .lexc file as if they were lexicalised, so they remain one lexical unit
## while general transfer rules now match a pattern of two lexical units


The former is used for most derivations, the latter currently only for Der_goahti.

===More detailed: Deverbal nouns===
Sámi verbs can turn into nouns. We want to be able to put this explicitly into the bidix (eg. sometimes the nob noun is not even based on the nob verb), but if it's not in bidix we want to be able to fall back on a construction using the verb, so

* from <code>geavaheaddjiid/geavahit<V><TV><Der2><Actor><N></code>
* with fallback <code>=> de som bruker<vblex></code> (or something)
* bidix specified <code>=> bruker<n><m></code>

With the following bidix entries we specify that we want <code>bruker<n><m></code> in the above example:
<pre>
<pre>
UnhandledDerivations /<= _ ; ! fail if analysis contains a tag from the set UnhandledDerivations
<e><p><l>geavahit<s n="V"/><s n="TV"/></l><r>bruke<s n="vblex"/><s n="pers"/></r></p><par n="__verb"/></e>
<e><p><l>geavahit<s n="V"/><s n="TV"/><s n="Der2"/><s n="Actor"/><s n="N"/></l><r>bruker<s n="n"/><s n="m"/></r></p><par n="__n"/></e>
</pre>
</pre>


Derivations of derivations are removed with this rule (if these are ever needed, we should just lexicalise):
while if the second bidix line isn't there, we get the fallback. Transfer rules can now check

<pre>
<pre>
Derivation /<= Derivation+ PartOfSpeech+ _ ;
<equal><clip side="tl" part="pos" ...><lit-tag v="N"/></equal>
<equal><clip side="sl" part="pos" ...><lit-tag v="V"/></equal>
</pre>
</pre>


since double derivations are not handled either unless there are explicit transfer rules for them. This makes the lexicon a lot easier to handle for testvoc. [[Northern Sámi and Norwegian/Derivations#Summary of fallbacks|Summary of fallbacks]] below contains the list of derivations that are and aren't handled.
The same specification/fallback might be applied with other Derivations.


==Summary of fallbacks==
This transfer rule is still TODO. It seems from most examples that the present tense verb form alone gives a better translation iff there is no lemh to the verb and it is singular, so the above would give "bruker" (if singular), while <code>ovddasvástideaddji/Ovddasvástidit<V><IV><Der2><Actor><N><Sg><Nom><@SUBJ→>$</code> (which in nob has a lemh, "# ansvar for") should give "Den som har ansvar (for)".


(Tags used here are a bit outdated.)
===Other fallbacks===
* Der/at, adj->adv, gets adj.posi.nt.sg.ind ("vid" => "vidt")
* Der/vuohta, adj->n, gets adj...def, eg. "grunn" -> "det grunne" (bidix overrides "fattig" to "fattigdom")
* Der/st, v->v_diminutive, passes through as if nothing happened (although we could add the adverb "litt"?)


* N.Der_Dimin.N
==Derivation tags and their meanings==
** N->N (diminutive), passes through as if nothing happened (although we could add "lille" or something)


* A.Der_vuohta.N
Note: Við eigum að breyta mörk neðan af því að það er ekki hægt að nota <code>/</code>. í mörkum í apertium. En þá eigum við að breyta CG líka...
** Adj->N, gets adj.def, ("grunn"->"det grunne")
** typical override: "fattig"->"fattigdom"
* A.Der_at.Adv
** Adj->Adv, gets adj.posi.nt.sg.ind ("vid" -> "vidt")


* V.Der_goahti.V
There are also derivations of derivations:
** turns into two words, eg. <code>^lohkagohten/lohkat<V><TV><Der3>+goahti<V><Ind><Prt><Sg1>$</code>

* V.Der_PassL.V
** V->V (passive), add the "pass" tag, picked up by t1x verb rule
* V.Der_j.V.Der_PassL.V
** V->V (passive), add the "pass" tag, picked up by t1x verb rule
** TODO: what does Der_j add to the meaning? (ignored for now)
** Double derivation, has exception in dev/xfst2apertium.useless.twol (inner PoS tag removed in order to "flatten" it)
* V.Der_PassL.V.Der_n.N
** V->N (via passive), for now just outputs the plain passive infinitive (ideally should be able to enter noun phrase rules)
** Double derivation, has exception in dev/xfst2apertium.useless.twol (inner PoS tag removed in order to "flatten" it)
* V.Der_PassS.V
** V->V (passive), add the "pass" tag, picked up by t1x verb rule
* V.Der2.Der_halla.V
** V->V (passive), add the "pass" tag, picked up by t1x verb rule. Verb is tagged <code><ill-av></code> since the agent is illiative with halla-verbs
* V.Der_h.V
** V->V (causative), add the "caus" tag, picked up by t1x verb rule
* V.Der_ahtti.V
** V->V (causative), add the "caus" tag, picked up by t1x verb rule
* V.Der_d.V
** V->V (reflexive), add the "ref" tag, picked up by t1x verb rule
* V.Der_alla.V
** V->V (reflexive), add the "ref" tag, picked up by t1x verb rule

* V.Der_st.V
** V->V (diminutive), passes through as if nothing happened (although we could add the adverb "litt"?)

* V.Der_n.N
** V->N, gets adj.pprs
* V.Der2.Der_las
** V->Adj, gets adj.pprs ("gi"->"givende")
** typical override: gi->generøs

* V.Actor.N
** V->N (actor), gets vblex.pres.m (could tag TD instead of pres, and select between pres and pret based on earlier finite verb, TODO)
* V.Der_j.Actor.N
** V->N (actor), as above
** TODO: what does Der/j add to the meaning? (ignored for now)
* V.Der_eapmi.N
** V->N (action/process), gets vblex.inf.nt
** typical override: "seile"->"seiling"
* V.Der_muš.N
** V->N (action/process), gets vblex.inf.nt

===Der-tag hitparade===
(counts of analyses, not forms, so probably a bit skewed, but gives an idea)
<pre>
<pre>
165550 <der_nomact>
"<geavaheaddjis>"
103985 <der_nomag>
...
60906 <der_h>
"geavvat" V* IV* Der1 Der/h V* TV Der2 Actor N Sg Acc PxSg3
50394 <der_laš>
49664 <der_dimin>
40246 <der_vuohta>
34868 <der_d>
29189 <der_passs>
23625 <der_passl>
17455 <der_st>
11718 <der_at>
11292 <der_heapmi>
10369 <der_alla>
5793 <der_l>
3542 <der_t>
3459 <der_ahtti>
3199 <der_muš>
2917 <der_halla>
2009 <der_huvva>
1355 <der_meahttun>
1208 <der_lágan>
978 <der_stuvva>
925 <der_las>
844 <der_a>
777 <der_upmi>
471 <der_saš>
421 <der_huhtti>
230 <der_adda>
150 <der_asti>
126 <der_veara>
44 <der_geahtes>
31 <der_easti>
22 <der_keahtta>
18 <der_adv>
16 <der_nammasaš>
8 <der_jagáš>
8 <der_ár>
4 <der_stávval>
4 <der_lágaš>
4 <der_dáfot>
3 <der_eamoš>
</pre>
</pre>
==Derivation tags and their meanings==
For transfer purposes it might be simplest to treat these "flatly" as if they were single derivations (ie. Der1_Der_h_V_TV_Der2).

Note: Við eigum að breyta mörk neðan af því að það er ekki hægt að nota <code>/</code>. í mörkum í apertium. En þá eigum við að breyta CG líka...


{|class=wikitable
{|class=wikitable
! Tag !! Type !! Example !! in Bokmål
! Tag !! Type !! Example !! in Bokmål
|-
|-
|<code>Der/Dimin</code> || Diminutive || mánáš "mánná" N Der1 Der/Dimin N Sg Nom || barn→lite barn
|<code>Der/Dimin</code> || <code>N→N[diminutive]</code> || mánáš "mánná" N Der1 Der/Dimin N Sg Nom || barn→lite barn
|-
|-
|<code>Der/1 Der/st</code> || Diminutive verb || attestit "addit" V TV Der1 Der/st V Inf || gi→gi litt
|<code>Der/1 Der/st</code> || <code>V→V[diminutive]</code> || attestit "addit" V TV Der1 Der/st V Inf || gi→gi litt
|-
|<code>Der/st</code> || Diminutive <code>V→V</code> || oainnestit, várástit "várát" V TV Der1 Der/st V || se→skimte (add "litt"?)
|-
|-
|<code>Der/adda</code> || <code>V→N/PrfPrc/Actio</code> || bassaladdan "bassalit" V* TV Der2 Der/adda || →vaske tøy (bassat=vaske)
|<code>Der/adda</code> || <code>V→N.PrfPrc.Actio</code> || bassaladdan "bassalit" V* TV Der2 Der/adda || →vaske tøy (bassat=vaske)
|-
|-
|<code>Der/ahtti</code> || <code>V→V</code>|| vajálduhttit "vajálduvvat" V* IV* Der2 Der/ahtti V TV || →overse/glemme
|<code>Der/ahtti</code> || <code>V→V</code>|| vajálduhttit "vajálduvvat" V* IV* Der2 Der/ahtti V TV || →overse/glemme
Line 78: Line 162:
|<code>Der/eamoš</code> || suffix || muitaleamoš "muitalit" V* TV Der3 Der/eamoš || fortelle→
|<code>Der/eamoš</code> || suffix || muitaleamoš "muitalit" V* TV Der3 Der/eamoš || fortelle→
|-
|-
|<code>Der/eapmi</code> || <code>V→N</code> || deaivvadeapmi "deaivvadit" V IV Der2 Der/eapmi N Sg Nom || møte(V) → møte(N)
|<code>Der/eapmi</code> || <code>V→N</code> || deaivvadeapmi "deaivvadit" V IV Der2 Der/eapmi N Sg Nom || møte(V)→møte(N), feire→feiring
|-
|-
|<code>Der/easti</code> || suffix || muitaleastit "muitalit" V TV Der2 Der/easti V Inf || fortelle →
|<code>Der/easti</code> || suffix || muitaleastit "muitalit" V TV Der2 Der/easti V Inf || fortelle →
Line 102: Line 186:
|<code>Der/l</code> || ???? || ohcalit "ohcat" V* TV Der1 Der/l V || lete→savne/lengte etter
|<code>Der/l</code> || ???? || ohcalit "ohcat" V* TV Der1 Der/l V || lete→savne/lengte etter
|-
|-
|<code>Der/las</code> || suffix || lotnolas "lotnut" V* TV Der1 Der2 Der/las A || betale→
|<code>Der/las</code> || <code>V→Adj</code> || addálas "addit" V TV Der1 Der2 Der/las A || gi→generøs
|-
|-
|<code>Der/laš</code> || <code>N→Adj</code> || dábálaš "dáhpi" N Der1 Der/laš A Sg Nom || skikk→vanlig
|<code>Der/laš</code> || <code>N→Adj</code> || dábálaš "dáhpi" N Der1 Der/laš A Sg Nom || skikk→vanlig
Line 113: Line 197:
|-
|-
|<code>Der/n</code> || suffix || oažžun "oažžut" V* TV Der3 Der/n N || få→?
|<code>Der/n</code> || suffix || oažžun "oažžut" V* TV Der3 Der/n N || få→?
|-
|<code>Der/st</code> || Diminutive <code>V→V</code> || oainnestit, várástit "várát" V TV Der1 Der/st V || se→skimte (add "litt"?)
|-
|-
|<code>Der/stuvva</code> || suffix || fuolastuvvat "fuollat" V* TV Der1 Der2 Der/stuvva V || bry seg om→
|<code>Der/stuvva</code> || suffix || fuolastuvvat "fuollat" V* TV Der1 Der2 Der/stuvva V || bry seg om→
Line 125: Line 207:
|-
|-
|<code>Der/vuohta</code> || <code>Adj→N</code> || ráhkisvuohta "ráhkis" A Der3 Der/vuohta N Sg Nom || kjær→kjærlighet
|<code>Der/vuohta</code> || <code>Adj→N</code> || ráhkisvuohta "ráhkis" A Der3 Der/vuohta N Sg Nom || kjær→kjærlighet
|-
|<code>Der/veara<code> || <code>N→Adj</code> || mearkkašanveara "mearkkašeapmi" N SgCmp Der3 Der/veara A || merknad→markert?
|-
|-
|}
|}

Latest revision as of 10:53, 16 April 2015

This page describes the general mechanism for handling derivations in sme-nob, then summarised the derivations handled (and how they are dealt with).

Derivations: general rules and exceptions[edit]

Sámi has a lot of derivation rules. There are various strategies used for translating these:

  1. We can lexicalise the sme derivation – this is the easiest, safest and leads to the best translations
  2. We can use a transfer rule which might add some periphrasis and/or change the form of the main word
    • geavahit.V.Der/Actor.N.Plde som bruker
    • lohkat.V.Der/goahti.1SG.Pretbegynte å lese
  3. We can put the full specific analysis in bidix to override the general translation, like translating geavahit.V.Der/Actor.N.Pl into bruker<n><m><pl>
    • DEPRECATED – we should move away from this method, it leads to too much transfer complexity, and might not even work if the bidix pardef also has the full path (making the bidix entry ambiguous). Much better to just lexicalise.
  4. We can tag-relabel so that the derivation looks like a compound
    • DEPRECATED – we used to do this with Der/goahti (lohkagohten→lese+begynte→begynte å lese), but it works just fine with the regular transfer method, so no need to have yet another method.

Removing unhandled derivations from the analyser[edit]

Any derivations that are not handled we remove from the analyser with a twol negation rule in rm-deriv-cmp.twol

UnhandledDerivations /<= _ ; ! fail if analysis contains a tag from the set UnhandledDerivations

Derivations of derivations are removed with this rule (if these are ever needed, we should just lexicalise):

Derivation /<= Derivation+ PartOfSpeech+ _ ;

since double derivations are not handled either unless there are explicit transfer rules for them. This makes the lexicon a lot easier to handle for testvoc. Summary of fallbacks below contains the list of derivations that are and aren't handled.

Summary of fallbacks[edit]

(Tags used here are a bit outdated.)

  • N.Der_Dimin.N
    • N->N (diminutive), passes through as if nothing happened (although we could add "lille" or something)
  • A.Der_vuohta.N
    • Adj->N, gets adj.def, ("grunn"->"det grunne")
    • typical override: "fattig"->"fattigdom"
  • A.Der_at.Adv
    • Adj->Adv, gets adj.posi.nt.sg.ind ("vid" -> "vidt")
  • V.Der_goahti.V
    • turns into two words, eg. ^lohkagohten/lohkat<V><TV><Der3>+goahti<V><Ind><Prt><Sg1>$
  • V.Der_PassL.V
    • V->V (passive), add the "pass" tag, picked up by t1x verb rule
  • V.Der_j.V.Der_PassL.V
    • V->V (passive), add the "pass" tag, picked up by t1x verb rule
    • TODO: what does Der_j add to the meaning? (ignored for now)
    • Double derivation, has exception in dev/xfst2apertium.useless.twol (inner PoS tag removed in order to "flatten" it)
  • V.Der_PassL.V.Der_n.N
    • V->N (via passive), for now just outputs the plain passive infinitive (ideally should be able to enter noun phrase rules)
    • Double derivation, has exception in dev/xfst2apertium.useless.twol (inner PoS tag removed in order to "flatten" it)
  • V.Der_PassS.V
    • V->V (passive), add the "pass" tag, picked up by t1x verb rule
  • V.Der2.Der_halla.V
    • V->V (passive), add the "pass" tag, picked up by t1x verb rule. Verb is tagged <ill-av> since the agent is illiative with halla-verbs
  • V.Der_h.V
    • V->V (causative), add the "caus" tag, picked up by t1x verb rule
  • V.Der_ahtti.V
    • V->V (causative), add the "caus" tag, picked up by t1x verb rule
  • V.Der_d.V
    • V->V (reflexive), add the "ref" tag, picked up by t1x verb rule
  • V.Der_alla.V
    • V->V (reflexive), add the "ref" tag, picked up by t1x verb rule
  • V.Der_st.V
    • V->V (diminutive), passes through as if nothing happened (although we could add the adverb "litt"?)
  • V.Der_n.N
    • V->N, gets adj.pprs
  • V.Der2.Der_las
    • V->Adj, gets adj.pprs ("gi"->"givende")
    • typical override: gi->generøs
  • V.Actor.N
    • V->N (actor), gets vblex.pres.m (could tag TD instead of pres, and select between pres and pret based on earlier finite verb, TODO)
  • V.Der_j.Actor.N
    • V->N (actor), as above
    • TODO: what does Der/j add to the meaning? (ignored for now)
  • V.Der_eapmi.N
    • V->N (action/process), gets vblex.inf.nt
    • typical override: "seile"->"seiling"
  • V.Der_muš.N
    • V->N (action/process), gets vblex.inf.nt

Der-tag hitparade[edit]

(counts of analyses, not forms, so probably a bit skewed, but gives an idea)

165550 <der_nomact>
103985 <der_nomag>
60906 <der_h>
50394 <der_laš>
49664 <der_dimin>
40246 <der_vuohta>
34868 <der_d>
29189 <der_passs>
23625 <der_passl>
17455 <der_st>
11718 <der_at>
11292 <der_heapmi>
10369 <der_alla>
5793 <der_l>
3542 <der_t>
3459 <der_ahtti>
3199 <der_muš>
2917 <der_halla>
2009 <der_huvva>
1355 <der_meahttun>
1208 <der_lágan>
978 <der_stuvva>
925 <der_las>
844 <der_a>
777 <der_upmi>
471 <der_saš>
421 <der_huhtti>
230 <der_adda>
150 <der_asti>
126 <der_veara>
44 <der_geahtes>
31 <der_easti>
22 <der_keahtta>
18 <der_adv>
16 <der_nammasaš>
8 <der_jagáš>
8 <der_ár>
4 <der_stávval>
4 <der_lágaš>
4 <der_dáfot>
3 <der_eamoš>

Derivation tags and their meanings[edit]

Note: Við eigum að breyta mörk neðan af því að það er ekki hægt að nota /. í mörkum í apertium. En þá eigum við að breyta CG líka...

Tag Type Example in Bokmål
Der/Dimin N→N[diminutive] mánáš "mánná" N Der1 Der/Dimin N Sg Nom barn→lite barn
Der/1 Der/st V→V[diminutive] attestit "addit" V TV Der1 Der/st V Inf gi→gi litt
Der/st Diminutive V→V oainnestit, várástit "várát" V TV Der1 Der/st V se→skimte (add "litt"?)
Der/adda V→N.PrfPrc.Actio bassaladdan "bassalit" V* TV Der2 Der/adda →vaske tøy (bassat=vaske)
Der/ahtti V→V vajálduhttit "vajálduvvat" V* IV* Der2 Der/ahtti V TV →overse/glemme
Der/alla suffix bázáhallan "bázihit" V* TV Der2 Der/alla V Actio
Der/amoš suffix muitalamoš "muitalit" V TV Der3 Der/amoš N Sg Nom fortelle→
Der/asti suffix muitalastit "muitalit" V TV Der2 Der/asti V Inf fortelle→
Der/at Adj→Adv viidát "viiddis" A* Der2 Der/at Adv vid→vidt
Der/d V→V[refl] basadit "bassat" V TV Der1 Der/d V vaske→vaske seg
Der/eaddji V→N.Actor muitaleaddji "muitalit" V TV Der2 Actor N Sg Nom fortelle→forteller
Der/eamoš suffix muitaleamoš "muitalit" V* TV Der3 Der/eamoš fortelle→
Der/eapmi V→N deaivvadeapmi "deaivvadit" V IV Der2 Der/eapmi N Sg Nom møte(V)→møte(N), feire→feiring
Der/easti suffix muitaleastit "muitalit" V TV Der2 Der/easti V Inf fortelle →
Der/geahtes suffix eaiggátkeahtes "eaiggát" N* Der3 Der/geahtes eier →
Der/goahti V→V Inchoative boradišgohten "boradit" V TV Der3 Der/goahti V Ind Prt Sg1 spise → jeg begynte å spise
Der/h suffix geavaheaddji "geavvat" V* IV* Der1 Der/h V* TV Der2 Actor; orrohit "orrot" V* IV Der1 Der/h V heve seg→ ; bli/synes→
Der/halla V→V[recip] gulahallat "gullat" V* TV Der1 Der2 Der/halla høre→forstå hverandre («høre hverandre»?)
Der/heapmi suffix čađaheapmi "čađđa" N* Der1 Der2 Der/heapmi A
Der/huhtti suffix muosehuhttit "muoseheapme" A* Der1 Der/huhtti V* TV urolig→
Der/huvva suffix čađahuvvo "čađđa" N* Der1 Der2 Der/huvva V IV Imprt Prs ConNegII
Der/j suffix sáddejuvvot "sáddet" V* TV Der1 Der/j V* Der2 Der/PassL V sende→
Der1 Der/l V→V[subitive] borralit "borralit" V TV Der1 Der/l V spise→spise (i hast)
Der/l ???? ohcalit "ohcat" V* TV Der1 Der/l V lete→savne/lengte etter
Der/las V→Adj addálas "addit" V TV Der1 Der2 Der/las A gi→generøs
Der/laš N→Adj dábálaš "dáhpi" N Der1 Der/laš A Sg Nom skikk→vanlig
Der/lágan suffix earálágan "eará" Pron Indef Sg Gen Der1 Der/lágan A annen/andre→
Der/meahttun V→Adj[Neg] jáhkkemeahttun "jáhkkit" V TV Der1 Der/meahttun A Sg Nom tro/anta→utrolig
Der/muš suffix ??? "juhkat" V TV Der3 Der/muš N Sg Nom drikke→
Der/n suffix oažžun "oažžut" V* TV Der3 Der/n N få→?
Der/stuvva suffix fuolastuvvat "fuollat" V* TV Der1 Der2 Der/stuvva V bry seg om→
Der/supmi suffix čállosupmi "čállit" V* TV Der2 Der/PassL V* Der3 Der/supmi N skrive/...→
Der/upmi suffix mearkkašupmi "mearkkašit" V* TV Der2 Der/PassL V* Der3 Der/upmi merge seg→
Der/viđá suffix málestanviđá "málet" V TV Der1 Der/st V Der2 Der/eapmi N SgCmp Der/viđá Adv male→
Der/vuohta Adj→N ráhkisvuohta "ráhkis" A Der3 Der/vuohta N Sg Nom kjær→kjærlighet
Der/veara N→Adj mearkkašanveara "mearkkašeapmi" N SgCmp Der3 Der/veara A merknad→markert?