Difference between revisions of "Northern Sámi and Norwegian/bidix"

From Apertium
Jump to navigation Jump to search
(more)
(table)
Line 18: Line 18:
 
</pre>
 
</pre>
 
Here transfer will try to make a causative construction with this verb, by prepending "la" and distributing the finite temps tag there, while making the verb infinite. Thus given ''divuhin'', tagged &lt;V&gt;&lt;TV&gt;&lt;Ind&gt;&lt;Prt&gt;&lt;Sg1&gt;, bidix will output ^reparere&lt;vblex&gt;&lt;pers&gt;&lt;caus&gt;&lt;pret&gt;&lt;sg&gt;&lt;p1&gt;$, and when transfer sees the verb is tagged &gt;caus&gt;, it creates ^la&lt;vblex&gt;&lt;pret&gt;$ ^reparere&lt;vblex&gt;&lt;inf&gt;$ (perhaps also inserting a pronoun as above).
 
Here transfer will try to make a causative construction with this verb, by prepending "la" and distributing the finite temps tag there, while making the verb infinite. Thus given ''divuhin'', tagged &lt;V&gt;&lt;TV&gt;&lt;Ind&gt;&lt;Prt&gt;&lt;Sg1&gt;, bidix will output ^reparere&lt;vblex&gt;&lt;pers&gt;&lt;caus&gt;&lt;pret&gt;&lt;sg&gt;&lt;p1&gt;$, and when transfer sees the verb is tagged &gt;caus&gt;, it creates ^la&lt;vblex&gt;&lt;pret&gt;$ ^reparere&lt;vblex&gt;&lt;inf&gt;$ (perhaps also inserting a pronoun as above).
 
 
 
   
 
Similarly, with
 
Similarly, with
Line 26: Line 23:
 
<e><p><l>viidánit<s n="V"/><s n="IV"/></l><r>spre<s n="vblex"/><s n="pers"/></r></p><par n="refl__verb"/></e>
 
<e><p><l>viidánit<s n="V"/><s n="IV"/></l><r>spre<s n="vblex"/><s n="pers"/></r></p><par n="refl__verb"/></e>
 
</pre>
 
</pre>
we get a reflexive (seg/meg/...) appended by transfer on seeing the "refl" tag added by <code>refl__verb</code>.
+
we get a reflexive (seg/meg/...) appended by transfer on seeing the &lt;refl&gt; tag added by <code>refl__verb</code>.
   
 
With
 
With
 
 
<pre>
 
<pre>
 
<e><p><l>suovganit<s n="V"/><s n="IV"/></l><r>slite<s n="vblex"/><s n="pers"/></r></p><par n="pass__verb"/></e>
 
<e><p><l>suovganit<s n="V"/><s n="IV"/></l><r>slite<s n="vblex"/><s n="pers"/></r></p><par n="pass__verb"/></e>
 
</pre>
 
</pre>
we get a "pass" tag and a passive construction, with a participle (here: ''bli slitt''). However, with the passive, the predicate might also be an adjective, which we mark like this:
+
we get a "pass" tag and a passive construction, with a participle (here: ''bli slitt''). However, with the passive, the predicate can also be an adjective, which we mark like this:
 
<pre>
 
<pre>
 
<e><p><l>viessat<s n="V"/><s n="IV"/></l><r>trøtt<s n="adj"/><s n="pers"/></r></p><par n="pass__verb"/></e>
 
<e><p><l>viessat<s n="V"/><s n="IV"/></l><r>trøtt<s n="adj"/><s n="pers"/></r></p><par n="pass__verb"/></e>
Line 39: Line 35:
 
(other parts of speech for the passive predicates are currently TODO-marked in bidix)
 
(other parts of speech for the passive predicates are currently TODO-marked in bidix)
   
The <code>deverbal__n</code> pardef is used to give lemma-specific overrides for the derivations (Der2.Actor, Der3.Der_n) which turn verbs into nouns:
 
<pre>
 
<e><p><l>geavahit<s n="V"/><s n="TV"/></l><r>bruke<s n="vblex"/><s n="pers"/></r></p><par n="__verb"/></e>
 
<e><p><l>geavahit<s n="V"/><s n="TV"/></l><r>bruker<s n="n"/><s n="m"/></r></p><par n="deverbal__n"/></e>
 
</pre>
 
(see [[Northern_S%C3%A1mi_and_Norwegian/Derivations]]
 
   
 
It's up to transfer (mainly the chunker, t1x) to make sense of and clean up these tag combinations.
 
It's up to transfer (mainly the chunker, t1x) to make sense of and clean up these tag combinations.
  +
  +
{|class="wikitable sortable"
  +
! Pardef !! Description !! Example !! Notes
  +
|-
  +
| __verb || Regular verb transfer || vurken → (jeg) oppbevarer ||
  +
|-
  +
| pass__verb || sme verb to norwegian dynamic passive || áibat → bli forsinket || use lemma ''forsinke'' in the &lt;r&gt;; this pardef also works with adjectives (e.g. čuččodit translates to &lt;r&gt;stående&lt;s n="adj"/&gt;&lt;s n="pers"/&gt;&lt;/r&gt;&lt;/p&gt;&lt;par n="pass__verb"/&gt;, ''bli stående'')
  +
|-
  +
| pstv__verb || sme verb to norwegian lexicalised passive || čoggot → samles || use lemma ''samles'' in the &lt;r&gt;
  +
|-
  +
|}
  +
  +
   
   

Revision as of 09:41, 24 August 2012

The apertium-sme-nob bidix makes heavy use of bidix pardefs. The main uses for these are:

  • To change the tag format from the Giellatekno standard to the apertium standard
  • To mark certain sme verbs as inherently passive/causative/reflexive
    • these markings again triggers certain transfer rules, most of them in the chunker (t1x)
  • To transfer from one part of speech to another


The most complex part of the bidix is probably the verb section. A typical one looks like:

<e><p><l>vurket<s n="V"/><s n="TV"/></l><r>oppbevare<s n="vblex"/><s n="pers"/></r></p><par n="__verb"/></e>

where "pers" marks that the agent is typically animate, and __verb handles the changes in tags for person, number, temps. When translating vurken, the tags <V><IV><Ind><Prs><Sg1> are turned into <vblex><pers><pres><sg><p1> by bidix, then the transfer rules distribute the tags <vblex><pres> onto the verb lemma, creating oppbevarer (and perhaps insert a pronoun using the other tags, creating jeg oppbevarer). Additionally, the pardef handles certain derivations, so when translating vurkejuvvot, the tags <V><TV><Der3><Der_PassL><V><Inf> will turn into <vblex><pers><inf><pass>, transfer rules add <vblex><inf><pass> to the verb lemma, creating oppbevares.


However, we can also have another pardef which, in addition to the above, also adds a causative tag <caus> which is picked up by transfer:

<e><p><l>divuhit<s n="V"/><s n="TV"/></l><r>reparere<s n="vblex"/><s n="pers"/></r></p><par n="caus__verb"/></e>

Here transfer will try to make a causative construction with this verb, by prepending "la" and distributing the finite temps tag there, while making the verb infinite. Thus given divuhin, tagged <V><TV><Ind><Prt><Sg1>, bidix will output ^reparere<vblex><pers><caus><pret><sg><p1>$, and when transfer sees the verb is tagged >caus>, it creates ^la<vblex><pret>$ ^reparere<vblex><inf>$ (perhaps also inserting a pronoun as above).

Similarly, with

<e><p><l>viidánit<s n="V"/><s n="IV"/></l><r>spre<s n="vblex"/><s n="pers"/></r></p><par n="refl__verb"/></e>

we get a reflexive (seg/meg/...) appended by transfer on seeing the <refl> tag added by refl__verb.

With

<e><p><l>suovganit<s n="V"/><s n="IV"/></l><r>slite<s n="vblex"/><s n="pers"/></r></p><par n="pass__verb"/></e>

we get a "pass" tag and a passive construction, with a participle (here: bli slitt). However, with the passive, the predicate can also be an adjective, which we mark like this:

<e><p><l>viessat<s n="V"/><s n="IV"/></l><r>trøtt<s n="adj"/><s n="pers"/></r></p><par n="pass__verb"/></e>

(other parts of speech for the passive predicates are currently TODO-marked in bidix)


It's up to transfer (mainly the chunker, t1x) to make sense of and clean up these tag combinations.

Pardef Description Example Notes
__verb Regular verb transfer vurken → (jeg) oppbevarer
pass__verb sme verb to norwegian dynamic passive áibat → bli forsinket use lemma forsinke in the <r>; this pardef also works with adjectives (e.g. čuččodit translates to <r>stående<s n="adj"/><s n="pers"/></r></p><par n="pass__verb"/>, bli stående)
pstv__verb sme verb to norwegian lexicalised passive čoggot → samles use lemma samles in the <r>



PlcSur__np

For all Plc-tagged proper noun lemmas in bidix, we have to have a Sur-tagged entry too. Even though "Hammerfeasta" is never used as a Sur, sme-dis.rle (and thus apertium-sme-nob.sme-nob.rlx) has a rule that can change arbitrary Plc-tagged proper nouns to Sur. So bidix has to be able to handle that.

If the translation is identical no matter whether it's Plc or Sur, we use a pardef:

<e><p><l>Isuzu<s n="N"/><s n="Prop"/></l><r>Isuzu<s n="np"/><s n="top"/></r></p><par n="PlcSur__np"/></e>

If it's not, we do like this:

<e>       <p><l>Ádjáčohkka<s n="N"/><s n="Prop"/><s n="Plc"/></l><r>Emmenesveten<s n="np"/><s n="top"/></r></p><par n="__np"/></e>
<e r="LR"><p><l>Ádjáčohkka<s n="N"/><s n="Prop"/><s n="Sur"/></l><r>Ádjáčohkka<s n="np"/><s n="top"/></r></p><par n="__np"/></e>

(since we should never change surnames in translation).