Difference between revisions of "Northern Sámi and Norwegian/bidix"

From Apertium
Jump to navigation Jump to search
(more)
Line 1: Line 1:
The [http://apertium.svn.sourceforge.net/viewvc/apertium/incubator/apertium-sme-nob/apertium-sme-nob.sme-nob.dix apertium-sme-nob bidix] makes heavy use of bidix pardefs. There are two main uses for these:
+
The [http://apertium.svn.sourceforge.net/viewvc/apertium/incubator/apertium-sme-nob/apertium-sme-nob.sme-nob.dix apertium-sme-nob bidix] makes heavy use of bidix pardefs. The main uses for these are:
* To change the tag format from the Giellatekno standard (e.g. <V><IV><Ind><Prs><Pl1>) to the apertium standard (<vblex><pers><pres><pl><p1>)
+
* To change the tag format from the Giellatekno standard to the apertium standard
 
* To mark certain sme verbs as inherently passive/causative/reflexive
 
* To mark certain sme verbs as inherently passive/causative/reflexive
 
** these markings again triggers certain transfer rules, most of them in the chunker ([http://apertium.svn.sourceforge.net/viewvc/apertium/incubator/apertium-sme-nob/apertium-sme-nob.sme-nob.t1x t1x])
 
** these markings again triggers certain transfer rules, most of them in the chunker ([http://apertium.svn.sourceforge.net/viewvc/apertium/incubator/apertium-sme-nob/apertium-sme-nob.sme-nob.t1x t1x])
  +
* To transfer from one part of speech to another
  +
   
 
The most complex part of the bidix is probably the verb section. A typical one looks like:
 
The most complex part of the bidix is probably the verb section. A typical one looks like:
Line 8: Line 10:
 
<e><p><l>vurket<s n="V"/><s n="TV"/></l><r>oppbevare<s n="vblex"/><s n="pers"/></r></p><par n="__verb"/></e>
 
<e><p><l>vurket<s n="V"/><s n="TV"/></l><r>oppbevare<s n="vblex"/><s n="pers"/></r></p><par n="__verb"/></e>
 
</pre>
 
</pre>
  +
where "pers" marks that the agent is typically animate, and __verb handles the changes in tags for person, number, temps. When translating ''vurken'', the tags &lt;V&gt;&lt;IV&gt;&lt;Ind&gt;&lt;Prs&gt;&lt;Sg1&gt; are turned into &lt;vblex&gt;&lt;pers&gt;&lt;pres&gt;&lt;sg&gt;&lt;p1&gt; by bidix, then the transfer rules distribute the tags &lt;vblex&gt;&lt;pres&gt; onto the verb lemma, creating ''oppbevarer'' (and perhaps insert a pronoun using the other tags, creating ''jeg oppbevarer''). Additionally, the pardef handles certain derivations, so when translating ''vurkejuvvot'', the tags &lt;V&gt;&lt;TV&gt;&lt;Der3&gt;&lt;Der_PassL&gt;&lt;V&gt;&lt;Inf&gt; will turn into &lt;vblex&gt;&lt;pers&gt;&lt;inf&gt;&lt;pass&gt;, transfer rules add &lt;vblex&gt;&lt;inf&gt;&lt;pass&gt; to the verb lemma, creating ''oppbevares''.
where "pers" marks that the agent is typically animate, and __verb
 
  +
handles the changes in tags for person, number, temps. However, we can
 
  +
also have another pardef which does the same thing but also adds a
 
causative tag "caus" which is picked up by transfer:
+
However, we can also have another pardef which, in addition to the above, also adds a causative tag &lt;caus&gt; which is picked up by transfer:
 
<pre>
 
<pre>
 
<e><p><l>divuhit<s n="V"/><s n="TV"/></l><r>reparere<s n="vblex"/><s n="pers"/></r></p><par n="caus__verb"/></e>
 
<e><p><l>divuhit<s n="V"/><s n="TV"/></l><r>reparere<s n="vblex"/><s n="pers"/></r></p><par n="caus__verb"/></e>
 
</pre>
 
</pre>
Here transfer will try to make a causative construction with this verb, by prepending "la" and putting the finite temps there while making the verb infinite.
+
Here transfer will try to make a causative construction with this verb, by prepending "la" and distributing the finite temps tag there, while making the verb infinite. Thus given ''divuhin'', tagged &lt;V&gt;&lt;TV&gt;&lt;Ind&gt;&lt;Prt&gt;&lt;Sg1&gt;, bidix will output ^reparere&lt;vblex&gt;&lt;pers&gt;&lt;caus&gt;&lt;pret&gt;&lt;sg&gt;&lt;p1&gt;$, and when transfer sees the verb is tagged &gt;caus&gt;, it creates ^la&lt;vblex&gt;&lt;pret&gt;$ ^reparere&lt;vblex&gt;&lt;inf&gt;$ (perhaps also inserting a pronoun as above).
   
   

Revision as of 09:02, 24 August 2012

The apertium-sme-nob bidix makes heavy use of bidix pardefs. The main uses for these are:

  • To change the tag format from the Giellatekno standard to the apertium standard
  • To mark certain sme verbs as inherently passive/causative/reflexive
    • these markings again triggers certain transfer rules, most of them in the chunker (t1x)
  • To transfer from one part of speech to another


The most complex part of the bidix is probably the verb section. A typical one looks like:

<e><p><l>vurket<s n="V"/><s n="TV"/></l><r>oppbevare<s n="vblex"/><s n="pers"/></r></p><par n="__verb"/></e>

where "pers" marks that the agent is typically animate, and __verb handles the changes in tags for person, number, temps. When translating vurken, the tags <V><IV><Ind><Prs><Sg1> are turned into <vblex><pers><pres><sg><p1> by bidix, then the transfer rules distribute the tags <vblex><pres> onto the verb lemma, creating oppbevarer (and perhaps insert a pronoun using the other tags, creating jeg oppbevarer). Additionally, the pardef handles certain derivations, so when translating vurkejuvvot, the tags <V><TV><Der3><Der_PassL><V><Inf> will turn into <vblex><pers><inf><pass>, transfer rules add <vblex><inf><pass> to the verb lemma, creating oppbevares.


However, we can also have another pardef which, in addition to the above, also adds a causative tag <caus> which is picked up by transfer:

<e><p><l>divuhit<s n="V"/><s n="TV"/></l><r>reparere<s n="vblex"/><s n="pers"/></r></p><par n="caus__verb"/></e>

Here transfer will try to make a causative construction with this verb, by prepending "la" and distributing the finite temps tag there, while making the verb infinite. Thus given divuhin, tagged <V><TV><Ind><Prt><Sg1>, bidix will output ^reparere<vblex><pers><caus><pret><sg><p1>$, and when transfer sees the verb is tagged >caus>, it creates ^la<vblex><pret>$ ^reparere<vblex><inf>$ (perhaps also inserting a pronoun as above).



Similarly, with

<e><p><l>viidánit<s n="V"/><s n="IV"/></l><r>spre<s n="vblex"/><s n="pers"/></r></p><par n="refl__verb"/></e>

we get a reflexive (seg/meg/...) appended by transfer on seeing the "refl" tag added by refl__verb.

With

<e><p><l>suovganit<s n="V"/><s n="IV"/></l><r>slite<s n="vblex"/><s n="pers"/></r></p><par n="pass__verb"/></e>

we get a "pass" tag and a passive construction, with a participle (here: bli slitt). However, with the passive, the predicate might also be an adjective, which we mark like this:

<e><p><l>viessat<s n="V"/><s n="IV"/></l><r>trøtt<s n="adj"/><s n="pers"/></r></p><par n="pass__verb"/></e>

(other parts of speech for the passive predicates are currently TODO-marked in bidix)

The deverbal__n pardef is used to give lemma-specific overrides for the derivations (Der2.Actor, Der3.Der_n) which turn verbs into nouns:

<e><p><l>geavahit<s n="V"/><s n="TV"/></l><r>bruke<s n="vblex"/><s n="pers"/></r></p><par n="__verb"/></e>
<e><p><l>geavahit<s n="V"/><s n="TV"/></l><r>bruker<s n="n"/><s n="m"/></r></p><par n="deverbal__n"/></e>

(see Northern_Sámi_and_Norwegian/Derivations

It's up to transfer (mainly the chunker, t1x) to make sense of and clean up these tag combinations.


PlcSur__np

For all Plc-tagged proper noun lemmas in bidix, we have to have a Sur-tagged entry too. Even though "Hammerfeasta" is never used as a Sur, sme-dis.rle (and thus apertium-sme-nob.sme-nob.rlx) has a rule that can change arbitrary Plc-tagged proper nouns to Sur. So bidix has to be able to handle that.

If the translation is identical no matter whether it's Plc or Sur, we use a pardef:

<e><p><l>Isuzu<s n="N"/><s n="Prop"/></l><r>Isuzu<s n="np"/><s n="top"/></r></p><par n="PlcSur__np"/></e>

If it's not, we do like this:

<e>       <p><l>Ádjáčohkka<s n="N"/><s n="Prop"/><s n="Plc"/></l><r>Emmenesveten<s n="np"/><s n="top"/></r></p><par n="__np"/></e>
<e r="LR"><p><l>Ádjáčohkka<s n="N"/><s n="Prop"/><s n="Sur"/></l><r>Ádjáčohkka<s n="np"/><s n="top"/></r></p><par n="__np"/></e>

(since we should never change surnames in translation).