Difference between revisions of "Afrikaans and English"

From Apertium
Jump to navigation Jump to search
 
(31 intermediate revisions by 6 users not shown)
Line 16: Line 16:
   
 
==Transfer==
 
==Transfer==
  +
  +
=== Both directions ===
  +
  +
==== Synthetic to analytic adjectives ====
  +
  +
Adjectives that are like 'foo', 'fooer', 'fooest' should be translated as 'foo', 'more foo', 'the most foo' sometimes in both directions.
  +
  +
==== Automatically deriving adjectives from past participles ====
  +
  +
In both English and Afrikaans, many past participles can act as adjectives. If this property is exploited, both the Afrikaans and English dictionaries should shrink.
   
 
=== Afrikaans to English ===
 
=== Afrikaans to English ===
Line 138: Line 148:
   
 
==== Separable verbs ====
 
==== Separable verbs ====
  +
{{main|Separable verbs}}
 
 
* Ek '''tree op''' as verteenwoordiger"
 
* Ek '''tree op''' as verteenwoordiger"
 
* I '''act''' as representative"
 
* I '''act''' as representative"
Line 161: Line 171:
 
= announce.
 
= announce.
   
==== Multi-prepositions ====
 
   
* om met
 
* om uit
 
* oor te
 
* om in
 
* toe om
 
* nou dat -> now that
 
* uiteindelik -> at last
 
 
* ... list more here ...
 
   
  +
;Other examples
==== Verbs with attached prepositions ====
 
   
 
e.g. terugkry, oopmaak, weghardloop, teruggebring, aankondig, afkondig, verkondig, opgedaag, aangerand, aangesê, teëgekom, weggekom, uitgeklim
 
e.g. terugkry, oopmaak, weghardloop, teruggebring, aankondig, afkondig, verkondig, opgedaag, aangerand, aangesê, teëgekom, weggekom, uitgeklim
Line 207: Line 207:
 
= grew up
 
= grew up
 
</pre>
 
</pre>
  +
  +
==== Multi-prepositions ====
  +
  +
* om met
  +
* om uit
  +
* oor te
  +
* om in
  +
* toe om
  +
* nou dat -> now that
  +
* uiteindelik -> at last
  +
  +
* ... list more here ...
   
 
==== Generating correct tense forms ====
 
==== Generating correct tense forms ====
Line 280: Line 292:
   
 
Note, this is not perfect because many times English uses the present progressive (e.g. I am writing a letter) where Afrikaans would use the present indicative (e.g. I write a letter).
 
Note, this is not perfect because many times English uses the present progressive (e.g. I am writing a letter) where Afrikaans would use the present indicative (e.g. I write a letter).
  +
  +
;Examples
  +
  +
<pre>
  +
"The highest point is currently being debated"
  +
Die hoogste punt word huidiglik gedebatteer
  +
`The highest point become currently debated'
  +
</pre>
  +
  +
<pre>
  +
"The boy is being selfish"
  +
Die seun is besig om selfsugtig te wees
  +
`The boy is busy ** selfish to be'
  +
</pre>
  +
  +
<pre>
  +
"I am eating"
  +
Ek eet
  +
`I eat'
  +
</pre>
   
 
====Passive sentence structure====
 
====Passive sentence structure====
Line 306: Line 338:
 
* [In other countries political or military regimes] '''actively suppress''' [trade unions]
 
* [In other countries political or military regimes] '''actively suppress''' [trade unions]
 
* ''[In ander lande] '''onderdruk''' [politieke en militêre regimes] [vakbonde] '''aktief'''''
 
* ''[In ander lande] '''onderdruk''' [politieke en militêre regimes] [vakbonde] '''aktief'''''
  +
  +
====Double helpwerkwoorde====
  +
  +
<pre>
  +
<Anrie> "was die kodenaam het gegee" = "was die kodenaam gegee" (no "het")
  +
<Anrie> I think in this instance there is no "het", because we already have "was"
  +
<Anrie> Thus: I saw him - Ek het hom gesien, but He was seen - Hy was gesien
  +
</pre>
  +
  +
====Infinitives====
  +
  +
<pre>
  +
another preposition: "to" do something = "om" iets "te" doen, thus it's not
  +
  +
"sy missie: na behaal", but
  +
  +
"sy missie: om te behaal"
  +
  +
(of course, this isn't strictly correct, since the verb should come at
  +
the end, but I assume we're leaving those type of errors for now)
  +
</pre>
   
 
==Roadmap==
 
==Roadmap==
Line 315: Line 368:
 
* "cheaty" prepositions using <pr><vblex> -> <vblex> <pr>
 
* "cheaty" prepositions using <pr><vblex> -> <vblex> <pr>
 
* Basic word re-ordering for simple phrases.
 
* Basic word re-ordering for simple phrases.
  +
* Word error rate (WER) ~20%
   
 
;Aims and uses
 
;Aims and uses
Line 323: Line 377:
 
* Sentences of up to 5 words should be translated reasonably well in both directions.
 
* Sentences of up to 5 words should be translated reasonably well in both directions.
 
* To give better translations than <code>interpret.co.za</code>.
 
* To give better translations than <code>interpret.co.za</code>.
  +
  +
===apertium-en-af 0.3===
  +
  +
* Clean up.
  +
* Arrange the dictionaries in a more sane manner, check sections.
  +
  +
;Aims and uses
  +
  +
* No new features.
   
 
===apertium-en-af 0.5===
 
===apertium-en-af 0.5===
   
  +
* At least 10,000 words in each dictionary.
 
* Correct dealing with detachable prepositions.
 
* Correct dealing with detachable prepositions.
 
* Correct translation of active/passive.
 
* Correct translation of active/passive.
  +
* Word error rate (WER) ~20% - ~15%
   
 
;Aims and uses
 
;Aims and uses
Line 336: Line 401:
 
===apertium-en-af 1.0===
 
===apertium-en-af 1.0===
   
* 10,000 of the highest frequency words in each dictionary.
+
* 15,000 of the highest frequency words in each dictionary.
 
* Compound noun identification and translation.
 
* Compound noun identification and translation.
* Rules dealing with
 
   
 
;Aims and uses
 
;Aims and uses
Line 344: Line 408:
 
* Post-editing should be markedly faster than translating from scratch.
 
* Post-editing should be markedly faster than translating from scratch.
 
* Sentences of up to 12 words should be translated reasonably well in both directions.
 
* Sentences of up to 12 words should be translated reasonably well in both directions.
  +
  +
==Evaluation material==
  +
{{main|Evaluation material for English to Afrikaans}}
  +
  +
The link above gives apertium output and post-editted apertium output that can be used to calculate the WER or PER for the apertium-en-af pair.
  +
  +
==Dictionaries==
  +
  +
* [http://www.dieknoop.co.za/#woordeboeke Dictionary links at Die Knoop.]
  +
* [http://www.dbnl.org/tekst/toit001patr01_01/index.htm Patriot Woordeboek.]
  +
*[http://www.geocities.com/Wellesley/5897/le11.html Samuel Murray's motor industry wordlist]
   
 
==Competing products==
 
==Competing products==
Line 361: Line 436:
 
:<span style="color: red">"the police have him handcuffed and behind in the bakkie sat down . at mimosa is he allow goes . he has believe not she/his selfoon teruggekry not . nel have allegedly also weggehardloop and in a tree geskuil. ..."</span>
 
:<span style="color: red">"the police have him handcuffed and behind in the bakkie sat down . at mimosa is he allow goes . he has believe not she/his selfoon teruggekry not . nel have allegedly also weggehardloop and in a tree geskuil. ..."</span>
 
:<span style="color: green">"The police had him handcuffed and after put in the truck. At Mimosa are he allows go. He had believe not got #back #his mobile phone . Nel had allegedly also ran away and hid in a tree."</span>
 
:<span style="color: green">"The police had him handcuffed and after put in the truck. At Mimosa are he allows go. He had believe not got #back #his mobile phone . Nel had allegedly also ran away and hid in a tree."</span>
  +
:''Mnr. Prince Mbiza van die Mpumalanga-nooddienste was een van die eerste paramedici op die toneel. Hy het gister gesê Matthysen is klaarblyklik dood weens “ernstige kopbeserings”.''
  +
:<span style="color: red">"mr . prince mbiza of the mpumalanga-nooddienste has been one of the first paramedici on the scenic . he has yesterday being said matthysen is evidently dead on account of “ernstige kopbeserings”."</span>
  +
:<span style="color: green">"Mr. Prince Mbiza of the Mpumalanga-emergency services were one of the first paramedics on the scene. He had yesterday said Matthysen are evidently dead owing to “serious head injuries”."</span>
   
  +
==Press==
  +
* [[General press letter]] for Afrikaans media
   
[[Category:Discussions]]
+
[[Category:Afrikaans and English|*]]

Latest revision as of 15:42, 30 November 2010

This file contains some observations and a general "TODO" list / discussion.

# Afrikaans
: lit. English
@ English

* Solutions?

Tagger[edit]

A tagger needs to be generated. Currently both en-af.prob and af-en.prob are copies of the de-en.prob

Transfer[edit]

Both directions[edit]

Synthetic to analytic adjectives[edit]

Adjectives that are like 'foo', 'fooer', 'fooest' should be translated as 'foo', 'more foo', 'the most foo' sometimes in both directions.

Automatically deriving adjectives from past participles[edit]

In both English and Afrikaans, many past participles can act as adjectives. If this property is exploited, both the Afrikaans and English dictionaries should shrink.

Afrikaans to English[edit]

SOV to SVO transfer[edit]

Example 1[edit]
# nie een van hulle die taal     kan praat nie.
: not one of  them  the language can speak not.
  NP                NP           V 

@ [not one of them] [can  speak not] [the language] 
   NP                V                NP
Example 4[edit]
# Ek dink  hulle sal  hulp waardeer   van mense  wat Afrikaans ken
: I        think   they     will   help appreciate of   people who   Afrikaans know

  <prpers> <vblex> <prpers> <vaux> <n>  <vblex>    <pr> <n>    <rel> <n>       <vblex>

  NP       VBLEX   NP       VAUX   NP   VBLEX      PR   NP     REL   NP        VBLEX                      

  I  think they  [will help appreciate] of people [who Afrikaans know]
  
  I  think they  [will appreciate help] of people who Afrikaans know

  I  think they  will appreciate help of people [who know Afrikaans]

@ I think they will appreciate help of people who know Afrikaans.
  • <rel> <n> <vblex> -> <rel> <vblex> <n>
  • <vaux> <n> <vblex> -> <vaux> <vblex> <n>
Example 3[edit]
# Sterrekundiges kondig   die ontdekking aan van Gliese 581 c, 'n Aarde-agtige planeet buite   ons sonnestelsel wat  lewe mag     onderhou
: Astronomers    announce the discovery      of  Gliese 581 c, an Earth-like   planet  outside our solar system that life may     sustain
Astronomers    announce the discovery      of  Gliese 581 c, an Earth-like   planet  outside our solar system [that life may     sustain]

@ Astronomers    announce the discovery      of  Gliese 581 c, an Earth-like   planet  outside our solar system [that may  sustain life]
  • <dem><n><vaux><vblex> -> <dem><vaux><vblex><n>


Example 5[edit]
# Sy het   later badkamer  toe    gevlug nadat   sy  hulle sonder  sukses  gevra het   om te bedaar.
: She had [later bathroom  toward fled   after] [she them  without success asked had ] to    calm down.
           ADV   N         PR     VBLEX  PR      PRN PRN   PR      N       V     V

           ADV   VBLEX PR     N          PR      PRN V     PR      N       V     PRN     
: She had [later fled  toward bathroom   after] [she had   without success asked them] to    calm down.

@ She had later fled toward bathroom after she had without success asked them to calm down.

or 

@ She had later fled toward bathroom after she had asked him to calm down without success.

Double negatives[edit]

# nie een van hulle die taal kan praat nie.
: not one of them the language can speak not.

@ [not one of them] [can speak] [the language]

One solution:

  • <vaux><vblex>nie --> <vaux><vblex>

Or... basically drop the extra negative at the end of all sentences (well, at the full stop).

  • nie<sent> --> <sent>
Example 1[edit]
# Ek is nie so bekend   met   presies hoe die opstelling werk  nie
: I  am not so familiar with  exactly how the setup      works not 

@ I  am not so familiar with  exactly how the setup      works

Constructions with 'do'[edit]

# Nee, ek het  ook  nie       'n idee wat  dit beteken nie

No,  I [have also not]      an idea what it  means   not

No,  I [also have not]      an idea what it  means

No,  I [also do   not have] an idea what it  means

No,  I [also don't    have] an idea what it  means

@ No, I don't have an idea what it means either

Tenses[edit]

The ge<verb> construction[edit]
Fixed — If broken, report a bug!

The past tense is formed regularly by adding the prefix ge- to the verb's infinitive/present form.

  • Ek breek - I break
  • Ek het gebreek - I broke, I have broken, I had broken
het               gebreek
^het<vaux><pres>$ ^ge<past><prefix>+breek<vblex><inf>$

= breek<vblex><past>

= break<vblex><past>

Separable verbs[edit]

Main article: Separable verbs
  • Ek tree op as verteenwoordiger"
  • I act as representative"
  tree op
  op+tree
  optree

= act or perform

  • Sterrekundiges kondig [die ontdekking] aan.
  • Astronomers announce [the discovery].
  konding <NP> aan
  aan+kondig
  aankondig

= announce.


Other examples

e.g. terugkry, oopmaak, weghardloop, teruggebring, aankondig, afkondig, verkondig, opgedaag, aangerand, aangesê, teëgekom, weggekom, uitgeklim

For example:

teruggekry
terug+ge+kry

back+PAST+get

= got back

List of prepositions: terug, oop, op, weg, aan, af, ver, teë, uit, ...

More examples: that don't work!!!

aangery
aan+ge+ry

lit. on+PAST+ride = rode on

= drove
grootgeword
groot+ge+word

lit. big+PAST+become = became big

= grew up

Multi-prepositions[edit]

  • om met
  • om uit
  • oor te
  • om in
  • toe om
  • nou dat -> now that
  • uiteindelik -> at last
  • ... list more here ...

Generating correct tense forms[edit]

  • Ek gaan ... verbind -> I will connect
  • Ek het gaan verbind -> I have gone to connect

The problem is that we have to wait until interchunk to find out if the verb is next to 'gaan' or not.


English to Afrikaans[edit]

Adjective inflection[edit]

Fixed — If broken, report a bug!

Attributive = before noun Predicative = after noun

Adjectives in Afrikaans sometimes change depending on their position. Adjectives in the attributive position often morph. e.g.[1]

Die blompot is goud. > Die goue   blompot.
The vase    is gold. > The golden vase.

goud -> goue

This    man is     most famous   , this    is the most famous   man
Hierdie man is die meeste beroemd, hierdie is die mees beroemde man

Rule that says:

  • <adj><noun> -> <adj><attr><noun>
  • <noun><adj> -> <noun><adj><pred>

Determiners[edit]

His dog is red.
Sy hond is rooi.

This is his dog, the dog is his = 
Dit is sy hond,  die hond is syne.

Verbs[edit]

That was a nice = Dit was lekker, Dit was lekker gewees

It was a good poem = Dit was 'n goeie gedig, dit was 'n goeie gedig gewees

It will be a good poem = dit sal 'n goeie gedig wees

It would have been a good poem = Dit sou 'n goeie gedig gewees het.

it would have been a good idea = dit sou 'n goeie idee wees, dit sou 'n goeie idee gewees het (both correct)

it would have been a good poem = dit sou 'n goeie gedig gewees het (only this form is correct)

Separable verbs[edit]

afskei, oplaai, inkoop, aftel

Ek het  dit op   die kar gelaai.
I  have it  into the car loaded.

I loaded it onto the car

Present progressive[edit]

to be + verb gerund -> verb present + tans

Note, this is not perfect because many times English uses the present progressive (e.g. I am writing a letter) where Afrikaans would use the present indicative (e.g. I write a letter).

Examples
"The highest point is     currently being debated"
 Die hoogste punt  word   huidiglik       gedebatteer
`The highest point become currently       debated'
"The boy  is being    selfish"
 Die seun is besig om selfsugtig te wees
`The boy  is busy  ** selfish    to be'
"I am eating"
 Ek   eet
`I    eat'

Passive sentence structure[edit]

Passive
  • [In other countries] [trade unions] are actively suppressed [by political or military regimes]
  • [In ander lande] word [vakbonde] aktief onderdruk [deur politieke en militêre regimes]
*         [trade unions] are    actively suppressed [by regimes]
*          NP            VBSER  ADV      VBLEX+PAST  PR NP
* word    [vakbonde]            aktief   onderdruk  [deur regimes]
* BECOME   NP                   ADV      VBLEX       PR   NP

* NP VBSER ADV VBLEX+PAST PR NP -> VBLEX NP ADV VBLEX PR NP

Perhaps we need to have a separate tag for "to become" ?

"word" ("ge-") vb. (copula) become, get (angry, cold, dark, drunk, late, tired); grow (old); go (blind, mad); turn (grey, pale, Democrat); fall (due, dumb, silent, ill, in love); (pass. auxiliary) is, are, (infml.) get;


Active
  • [In other countries political or military regimes] actively suppress [trade unions]
  • [In ander lande] onderdruk [politieke en militêre regimes] [vakbonde] aktief

Double helpwerkwoorde[edit]

<Anrie> "was die kodenaam het gegee" = "was die kodenaam gegee" (no "het")
<Anrie> I think in this instance there is no "het", because we already have "was" 
<Anrie> Thus: I saw him - Ek het hom gesien, but He was seen - Hy was gesien

Infinitives[edit]

another preposition: "to" do something = "om" iets "te" doen, thus it's not 

"sy missie: na behaal", but 

"sy missie: om te behaal" 

(of course, this isn't strictly correct, since the verb should come at 
 the end, but I assume we're leaving those type of errors for now)

Roadmap[edit]

apertium-en-af 0.1[edit]

  • 5,000 of the highest frequency words in each dictionary.
  • Rules dealing with basic verb tenses (past, present, future)
  • "cheaty" prepositions using <pr><vblex> -> <vblex> <pr>
  • Basic word re-ordering for simple phrases.
  • Word error rate (WER) ~20%
Aims and uses
  • For a non-native speaker to be able to discern the topic of a general news item.
  • To be able to identify who said what to who.
  • To be able to distinguish is a particular item is interesting enough to be translated properly.
  • Sentences of up to 5 words should be translated reasonably well in both directions.
  • To give better translations than interpret.co.za.

apertium-en-af 0.3[edit]

  • Clean up.
  • Arrange the dictionaries in a more sane manner, check sections.
Aims and uses
  • No new features.

apertium-en-af 0.5[edit]

  • At least 10,000 words in each dictionary.
  • Correct dealing with detachable prepositions.
  • Correct translation of active/passive.
  • Word error rate (WER) ~20% - ~15%
Aims and uses
  • Post-editing translation made by Apertium should be slightly faster than translating from scratch.
  • Sentences of up to 7 words should be translated reasonably well in both directions.

apertium-en-af 1.0[edit]

  • 15,000 of the highest frequency words in each dictionary.
  • Compound noun identification and translation.
Aims and uses
  • Post-editing should be markedly faster than translating from scratch.
  • Sentences of up to 12 words should be translated reasonably well in both directions.

Evaluation material[edit]

Main article: Evaluation material for English to Afrikaans

The link above gives apertium output and post-editted apertium output that can be used to calculate the WER or PER for the apertium-en-af pair.

Dictionaries[edit]

Competing products[edit]

Apertium output to compare in green

interpret.co.za[edit]

Vir elke Engelse woord moet jy een Afrikaanse woord kies en dis nie altyd so duidelik wat om te kies nie.
"for each english word must you one afrikaans word choose and it's not always so clear that/what to select not."
"For each English word you must choose one Afrikaans word and it is not always so clear wat to choose ."
Hierdie man is die meeste beroemd, hierdie is die mees beroemde man
"this man is the most famous , this is the most renowned man ..."
"This man are the most famous, #this are the #most famous man"
Die polisie het hom geboei en agter in die bakkie gesit. By Mimosa is hy laat gaan. Hy het glo nie sy selfoon teruggekry nie. Nel het na bewering ook weggehardloop en in 'n boom geskuil.
"the police have him handcuffed and behind in the bakkie sat down . at mimosa is he allow goes . he has believe not she/his selfoon teruggekry not . nel have allegedly also weggehardloop and in a tree geskuil. ..."
"The police had him handcuffed and after put in the truck. At Mimosa are he allows go. He had believe not got #back #his mobile phone . Nel had allegedly also ran away and hid in a tree."
Mnr. Prince Mbiza van die Mpumalanga-nooddienste was een van die eerste paramedici op die toneel. Hy het gister gesê Matthysen is klaarblyklik dood weens “ernstige kopbeserings”.
"mr . prince mbiza of the mpumalanga-nooddienste has been one of the first paramedici on the scenic . he has yesterday being said matthysen is evidently dead on account of “ernstige kopbeserings”."
"Mr. Prince Mbiza of the Mpumalanga-emergency services were one of the first paramedics on the scene. He had yesterday said Matthysen are evidently dead owing to “serious head injuries”."

Press[edit]