Difference between revisions of "Separable verbs"

From Apertium
Jump to navigation Jump to search
 
(8 intermediate revisions by 4 users not shown)
Line 13: Line 13:


Currently Apertium has difficulty supporting this kind of feature in the morphological dictionaries.
Currently Apertium has difficulty supporting this kind of feature in the morphological dictionaries.

Some times ''both'' the verb and the "particle" inflect:
"se quedó sorprendido/a"
she was surprised
quedar = to stay/remain
http://forum.wordreference.com/showthread.php?t=468499


==Possible solutions==
==Possible solutions==
Line 121: Line 127:
„Sie sagen den Zuhörern, dass sie die Erfindung ansagen, an.“
„Sie sagen den Zuhörern, dass sie die Erfindung ansagen, an.“
They announce to the listeners that they the discovery announce AFF
They announce to the listeners that they the discovery announce AFF
</pre>

<pre>
Sie sagten es den Zuhoerern, die etwas anderes ansagten, an.
Sie sagten es den Zuhoerern an, die etwas anderes ansagten.


They announced it to the listeners, who were announcing something else.
</pre>
</pre>


Line 170: Line 184:


Neither "bang maak" nor "rooi geverf" is a separable verb complex. Both display separable verb behaviours. These kinds of verbs must be treated as separable verbs in Afrikaans and phrasal verbs in English.
Neither "bang maak" nor "rooi geverf" is a separable verb complex. Both display separable verb behaviours. These kinds of verbs must be treated as separable verbs in Afrikaans and phrasal verbs in English.


===Light verbs===

The Indo-Iranian and Indo-Aryan languages, such as Tajik, Kurdish, Persian, Hindi and Bengali have light verbs which could also be dealt with in this way.

;Hindi

<pre>
निकल गया
nikal gayā,
exit went
`went out'

निकल पड़ा
nikal parā
exit fell
`departed'
</pre>

;Bengali

<pre>
সিদ্ধান্ত নে
shiddhānto nē
`decide'

ঘোষণা কর
ghōṣōṇā kor
`declare'


অস্বীকার কর
osvikār kor
`decline'


উৎসর্গ কর
utshorgo kor
`dedicate'
</pre>

Note: For transliteration [http://en.wikipedia.org/wiki/National_Library_at_Kolkata_romanization NLK scheme] is being used.


==See also==
==See also==
Line 176: Line 233:
* [[Multiwords]] (the Nynorsk hack there is for the same kinds of separable verbs)
* [[Multiwords]] (the Nynorsk hack there is for the same kinds of separable verbs)
* [[Módulo_de_procesamiento_de_expresiones_separables]]
* [[Módulo_de_procesamiento_de_expresiones_separables]]
* [[Yiddish morphology#Verbs]]


==Further reading==
==Further reading==
Line 183: Line 241:
[[Category:Multiwords]]
[[Category:Multiwords]]
[[Category:Writing dictionaries]]
[[Category:Writing dictionaries]]
[[Category:Documentation in English]]

Latest revision as of 08:11, 2 January 2016

Apertium may have some problems when dealing with separable verbs. Separable verbs are verbs that are formed with a verb stem, and a particle. For futher information see Wikipedia article here. These exist in most Germanic languages (Afrikaans, Danish, Dutch, German, Swedish, Norwegian, ...), and also languages such as Hungarian.

For example, in Afrikaans, the verb "to announce" is "aankondig". The usage is as follows:

  • Sterrekundiges kondig [die ontdekking] aan.
  • Astronomers announce [the discovery].

The stem "kondig" does not by itself mean anything, only in conjunction with the particle "aan", however this is not always the case. The past participle is formed by inserting "ge" in between the particle and the stem, for example:

  • Sterrekundiges het [die ontdekking] aangekondig.
  • Astronomers have announced [the discovery].

Currently Apertium has difficulty supporting this kind of feature in the morphological dictionaries.

Some times both the verb and the "particle" inflect:

"se quedó sorprendido/a"
she was surprised
quedar = to stay/remain
http://forum.wordreference.com/showthread.php?t=468499

Possible solutions[edit]

Several paradigms[edit]

Currently in the Afrikaans-English pair, separable verbs are dealt with as follows: Three paradigms are defined for verbs. The first is a list of possible particles/affixes (for example, "aan", "op", "onder", ...), the second is the "ge" past tense marker, the third is the standard verb ending paradigm.

So, for each separable verb, the definition looks something like:

  <e lm="kondig"><par n="attached__particles"/><par n="ge__past"/><i>kondig</i><par n="breek__vblex"/>

This allows us to analyse:

  • aankondig (announce)
  • aangekondig (announced)
  • kondig (announce) — Note: this is incorrect!

However, in an example such as above, where the "aan" portion is moved after the noun phrase in the sentence, we cannot analyse this, we instead rely on the fact that "kondig" does not have a meaning without "aan". Unfortunately this is not always the case...

Take for example, the verbs "onderdruk" and "druk". The former means "to suppress", the latter means "to press" or "to squeeze". So when we try and translate "onderdruk" → "suppress", instead we get "press under", or "squeeze under". This is not a good translation in this instance (although in many cases it can work, viz. "terugkry" → "kry terug" → "get back").

Furthermore, we cannot define "druk" as "suppress" and simply let the particle take care of itself, because "druk" has another meaning.

Infix paradigm[edit]

We could also consider using an infix paradigm. This is differently unclean from the other method. So for example, we would have a paradigm like ge__pref:

  <pardef n="ge__pref">
    <e lm="ge">
      <p>
        <l>ge</l>
        <r>ge</r>
       </p>
    </e>
    <e>
      <p>
        <l></l>
        <r></r>
      </p>
    </e>
  </pardef>

Note that in this case, we don't have any grammatical symbols on the right side. We then specify a multiword as follows:

  <e lm="wegloop"><i>weg</i><par n="ge__pref"/><i>loop</i><par n="breek__vblex"/></e>

This allows us to analyse:

  • wegloop (run away)
  • weggeloop (ran away)

This does not allow us to analyse simply "loop" (to run), we would need a separate paradigm for this. It also has the downside that both forms need to be specified in the bilingual dictionary, so for example:

  <e><p><l>run away</l><s n="vblex"/></l><r>wegloop<s n="vblex"/></r></p></e>
  <e><p><l>run away</l><s n="vblex"/><s n="past"/></l><r>weggeloop<s n="vblex"/></r></p></e>

It remains to be seen if the pay-off here, in having better translations is worth the cost in duplication of entries. Furthermore this still does not take care of "real separable" verbs.

Marking separable stems[edit]

If we mark the lemmata of verbs that can be used in separable contexts. We then use rules to say for example:

Sterrekundiges kondig [die ontdekking] aan.
               kondig NP               aan.   → announce NP

Sterrekundiges druk   [die ontdekking] onder.
               druk   NP               onder. → suppress NP

Sterrekundiges druk   [die ontdekking].
               druk   NP               ø      → press NP

This could be dealt with either in transfer or pre-transfer. If it was dealt with in pre-transfer,

^Sterrekundige<n><pl>$ ^kondig<vblex><pres><sep>$ ^die<det><def><sg>$ ^ontdekking<n><sg>$ ^aan<pr><sep>$^.<sent>$

Upon seeing the <sep> tag, the pre-transfer would chomp NPs until reaching either <sent> or an adverb, preposition, or whatever with another <sep> tag. Upon finding this tag, it would re-order the fragment thusly:

^Sterrekundige<n><pl>$ ^aankondig<vblex><pres>$ ^die<det><def><sg>$ ^ontdekking<n><sg>$^.<sent>$

The affix is put in its proper place before the verb, the <sep> tags are removed, and then the fragment is passed onto the transfer.

Alternative

If this violates LRLM, we can just shove the verb where the preposition is, and deal with the re-ordering in interchunk. Output:

^Sterrekundige<n><pl>$ ^die<det><def><sg>$ ^ontdekking<n><sg>$ ^aankondig<vblex><pres>$^.<sent>$

The verbs would be popped onto a stack, or added into a queue, depending on how they nest.

„Sie  sagen    den    Zuhörern, dass sie  die Erfindung ansagen, an.“
 They announce to the listeners that they the discovery announce AFF
Sie  sagten    es    den Zuhoerern, die etwas anderes ansagten, an.
Sie sagten es den Zuhoerern an, die etwas anderes ansagten.


They announced it to the listeners, who were announcing something else.

Also we have the problem of embedded prepositions:

 Hulle breek by     die tronk uit.
 They  break by     the jail  out.
`They  break out of the jail.'

In the above example, the final "uit" is the one that should be attached to the verb. So we may have to have the separable affixes in a queue.

Pseudo separable verbs[edit]

We call adjective-verb pairs which behave like separable verbs "pseudo separable verbs".

Consider the present tense usage of "bang maak" (scared make):

 Die spook maak   my bang.
 The ghost makes  me scared.
`The ghost scares me.'

And consider the past tense usage:

 Die spook het    my bang gemaak.
 The ghost has    me scared made.
`The ghost scared me.'

Now consider the present tense construction containing "rooi verf" (red paint):

 Hulle verf  die dorp rooi.
 They  paint the town red.
`They  paint the town red.'

The past tense construction of the construction above is:

 Hulle het     die dorp rooi geverf.
 They  have    the town red  painted.
`They  painted the town red.'

Neither "bang maak" nor "rooi geverf" is a separable verb complex. Both display separable verb behaviours. These kinds of verbs must be treated as separable verbs in Afrikaans and phrasal verbs in English.


Light verbs[edit]

The Indo-Iranian and Indo-Aryan languages, such as Tajik, Kurdish, Persian, Hindi and Bengali have light verbs which could also be dealt with in this way.

Hindi
निकल   गया 
nikal  gayā, 
exit   went
`went out' 

निकल   पड़ा 
nikal  parā
exit   fell
`departed'
Bengali
সিদ্ধান্ত নে
shiddhānto nē
`decide'

ঘোষণা কর
ghōṣōṇā kor
`declare'


অস্বীকার কর
osvikār kor
`decline'


উৎসর্গ কর
utshorgo kor
`dedicate'

Note: For transliteration NLK scheme is being used.

See also[edit]

Further reading[edit]

  • ten Hacken, P. and Bopp, S. (1998) "Separable Verbs in a Reusable Morphological Dictionary for German". Proceedings of the 36th annual meeting on Association for Computational Linguistics. pp. 471 - 475