Difference between revisions of "Afrikaans and English"
(→Press) |
|||
(35 intermediate revisions by 6 users not shown) | |||
Line 16: | Line 16: | ||
==Transfer== |
==Transfer== |
||
=== Both directions === |
|||
==== Synthetic to analytic adjectives ==== |
|||
Adjectives that are like 'foo', 'fooer', 'fooest' should be translated as 'foo', 'more foo', 'the most foo' sometimes in both directions. |
|||
==== Automatically deriving adjectives from past participles ==== |
|||
In both English and Afrikaans, many past participles can act as adjectives. If this property is exploited, both the Afrikaans and English dictionaries should shrink. |
|||
=== Afrikaans to English === |
=== Afrikaans to English === |
||
Line 137: | Line 147: | ||
</pre> |
</pre> |
||
==== |
==== Separable verbs ==== |
||
{{main|Separable verbs}} |
|||
* Ek '''tree op''' as verteenwoordiger" |
|||
* I '''act''' as representative" |
|||
<pre> |
<pre> |
||
tree op |
|||
op+tree |
|||
optree |
|||
</pre> |
|||
= act or perform |
|||
optree |
|||
* Sterrekundiges '''kondig''' [die ontdekking] '''aan'''. |
|||
* Astronomers '''announce''' [the discovery]. |
|||
<pre> |
|||
konding <NP> aan |
|||
aan+kondig |
|||
aankondig |
|||
</pre> |
</pre> |
||
==== Multi-prepositions ==== |
|||
= announce. |
|||
* om met |
|||
* om uit |
|||
* oor te |
|||
* om in |
|||
* toe om |
|||
* nou dat -> now that |
|||
* uiteindelik -> at last |
|||
* ... list more here ... |
|||
;Other examples |
|||
==== Verbs with attached prepositions ==== |
|||
e.g. terugkry, oopmaak, weghardloop, teruggebring, aankondig, afkondig, verkondig, opgedaag, aangerand, aangesê, teëgekom, weggekom, uitgeklim |
e.g. terugkry, oopmaak, weghardloop, teruggebring, aankondig, afkondig, verkondig, opgedaag, aangerand, aangesê, teëgekom, weggekom, uitgeklim |
||
Line 189: | Line 207: | ||
= grew up |
= grew up |
||
</pre> |
</pre> |
||
==== Multi-prepositions ==== |
|||
* om met |
|||
* om uit |
|||
* oor te |
|||
* om in |
|||
* toe om |
|||
* nou dat -> now that |
|||
* uiteindelik -> at last |
|||
* ... list more here ... |
|||
==== Generating correct tense forms ==== |
==== Generating correct tense forms ==== |
||
Line 262: | Line 292: | ||
Note, this is not perfect because many times English uses the present progressive (e.g. I am writing a letter) where Afrikaans would use the present indicative (e.g. I write a letter). |
Note, this is not perfect because many times English uses the present progressive (e.g. I am writing a letter) where Afrikaans would use the present indicative (e.g. I write a letter). |
||
;Examples |
|||
<pre> |
|||
"The highest point is currently being debated" |
|||
Die hoogste punt word huidiglik gedebatteer |
|||
`The highest point become currently debated' |
|||
</pre> |
|||
<pre> |
|||
"The boy is being selfish" |
|||
Die seun is besig om selfsugtig te wees |
|||
`The boy is busy ** selfish to be' |
|||
</pre> |
|||
<pre> |
|||
"I am eating" |
|||
Ek eet |
|||
`I eat' |
|||
</pre> |
|||
====Passive sentence structure==== |
====Passive sentence structure==== |
||
Line 288: | Line 338: | ||
* [In other countries political or military regimes] '''actively suppress''' [trade unions] |
* [In other countries political or military regimes] '''actively suppress''' [trade unions] |
||
* ''[In ander lande] '''onderdruk''' [politieke en militêre regimes] [vakbonde] '''aktief''''' |
* ''[In ander lande] '''onderdruk''' [politieke en militêre regimes] [vakbonde] '''aktief''''' |
||
====Double helpwerkwoorde==== |
|||
<pre> |
|||
<Anrie> "was die kodenaam het gegee" = "was die kodenaam gegee" (no "het") |
|||
<Anrie> I think in this instance there is no "het", because we already have "was" |
|||
<Anrie> Thus: I saw him - Ek het hom gesien, but He was seen - Hy was gesien |
|||
</pre> |
|||
====Infinitives==== |
|||
<pre> |
|||
another preposition: "to" do something = "om" iets "te" doen, thus it's not |
|||
"sy missie: na behaal", but |
|||
"sy missie: om te behaal" |
|||
(of course, this isn't strictly correct, since the verb should come at |
|||
the end, but I assume we're leaving those type of errors for now) |
|||
</pre> |
|||
==Roadmap== |
==Roadmap== |
||
Line 297: | Line 368: | ||
* "cheaty" prepositions using <pr><vblex> -> <vblex> <pr> |
* "cheaty" prepositions using <pr><vblex> -> <vblex> <pr> |
||
* Basic word re-ordering for simple phrases. |
* Basic word re-ordering for simple phrases. |
||
* Word error rate (WER) ~20% |
|||
;Aims and uses |
;Aims and uses |
||
Line 305: | Line 377: | ||
* Sentences of up to 5 words should be translated reasonably well in both directions. |
* Sentences of up to 5 words should be translated reasonably well in both directions. |
||
* To give better translations than <code>interpret.co.za</code>. |
* To give better translations than <code>interpret.co.za</code>. |
||
===apertium-en-af 0.3=== |
|||
* Clean up. |
|||
* Arrange the dictionaries in a more sane manner, check sections. |
|||
;Aims and uses |
|||
* No new features. |
|||
===apertium-en-af 0.5=== |
===apertium-en-af 0.5=== |
||
* At least 10,000 words in each dictionary. |
|||
* Correct dealing with detachable prepositions. |
* Correct dealing with detachable prepositions. |
||
* Correct translation of active/passive. |
* Correct translation of active/passive. |
||
* Word error rate (WER) ~20% - ~15% |
|||
;Aims and uses |
;Aims and uses |
||
Line 318: | Line 401: | ||
===apertium-en-af 1.0=== |
===apertium-en-af 1.0=== |
||
* |
* 15,000 of the highest frequency words in each dictionary. |
||
* Compound noun identification and translation. |
* Compound noun identification and translation. |
||
* Rules dealing with |
|||
;Aims and uses |
;Aims and uses |
||
Line 326: | Line 408: | ||
* Post-editing should be markedly faster than translating from scratch. |
* Post-editing should be markedly faster than translating from scratch. |
||
* Sentences of up to 12 words should be translated reasonably well in both directions. |
* Sentences of up to 12 words should be translated reasonably well in both directions. |
||
==Evaluation material== |
|||
{{main|Evaluation material for English to Afrikaans}} |
|||
The link above gives apertium output and post-editted apertium output that can be used to calculate the WER or PER for the apertium-en-af pair. |
|||
==Dictionaries== |
|||
* [http://www.dieknoop.co.za/#woordeboeke Dictionary links at Die Knoop.] |
|||
* [http://www.dbnl.org/tekst/toit001patr01_01/index.htm Patriot Woordeboek.] |
|||
*[http://www.geocities.com/Wellesley/5897/le11.html Samuel Murray's motor industry wordlist] |
|||
==Competing products== |
==Competing products== |
||
Line 343: | Line 436: | ||
:<span style="color: red">"the police have him handcuffed and behind in the bakkie sat down . at mimosa is he allow goes . he has believe not she/his selfoon teruggekry not . nel have allegedly also weggehardloop and in a tree geskuil. ..."</span> |
:<span style="color: red">"the police have him handcuffed and behind in the bakkie sat down . at mimosa is he allow goes . he has believe not she/his selfoon teruggekry not . nel have allegedly also weggehardloop and in a tree geskuil. ..."</span> |
||
:<span style="color: green">"The police had him handcuffed and after put in the truck. At Mimosa are he allows go. He had believe not got #back #his mobile phone . Nel had allegedly also ran away and hid in a tree."</span> |
:<span style="color: green">"The police had him handcuffed and after put in the truck. At Mimosa are he allows go. He had believe not got #back #his mobile phone . Nel had allegedly also ran away and hid in a tree."</span> |
||
:''Mnr. Prince Mbiza van die Mpumalanga-nooddienste was een van die eerste paramedici op die toneel. Hy het gister gesê Matthysen is klaarblyklik dood weens “ernstige kopbeserings”.'' |
|||
:<span style="color: red">"mr . prince mbiza of the mpumalanga-nooddienste has been one of the first paramedici on the scenic . he has yesterday being said matthysen is evidently dead on account of “ernstige kopbeserings”."</span> |
|||
:<span style="color: green">"Mr. Prince Mbiza of the Mpumalanga-emergency services were one of the first paramedics on the scene. He had yesterday said Matthysen are evidently dead owing to “serious head injuries”."</span> |
|||
==Press== |
|||
* [[General press letter]] for Afrikaans media |
|||
[[Category: |
[[Category:Afrikaans and English|*]] |
Latest revision as of 15:42, 30 November 2010
This file contains some observations and a general "TODO" list / discussion.
# Afrikaans : lit. English @ English * Solutions?
Tagger[edit]
A tagger needs to be generated. Currently both en-af.prob and af-en.prob are copies of the de-en.prob
Transfer[edit]
Both directions[edit]
Synthetic to analytic adjectives[edit]
Adjectives that are like 'foo', 'fooer', 'fooest' should be translated as 'foo', 'more foo', 'the most foo' sometimes in both directions.
Automatically deriving adjectives from past participles[edit]
In both English and Afrikaans, many past participles can act as adjectives. If this property is exploited, both the Afrikaans and English dictionaries should shrink.
Afrikaans to English[edit]
SOV to SVO transfer[edit]
Example 1[edit]
# nie een van hulle die taal kan praat nie. : not one of them the language can speak not. NP NP V @ [not one of them] [can speak not] [the language] NP V NP
Example 4[edit]
# Ek dink hulle sal hulp waardeer van mense wat Afrikaans ken : I think they will help appreciate of people who Afrikaans know <prpers> <vblex> <prpers> <vaux> <n> <vblex> <pr> <n> <rel> <n> <vblex> NP VBLEX NP VAUX NP VBLEX PR NP REL NP VBLEX I think they [will help appreciate] of people [who Afrikaans know] I think they [will appreciate help] of people who Afrikaans know I think they will appreciate help of people [who know Afrikaans] @ I think they will appreciate help of people who know Afrikaans.
- <rel> <n> <vblex> -> <rel> <vblex> <n>
- <vaux> <n> <vblex> -> <vaux> <vblex> <n>
Example 3[edit]
# Sterrekundiges kondig die ontdekking aan van Gliese 581 c, 'n Aarde-agtige planeet buite ons sonnestelsel wat lewe mag onderhou : Astronomers announce the discovery of Gliese 581 c, an Earth-like planet outside our solar system that life may sustain Astronomers announce the discovery of Gliese 581 c, an Earth-like planet outside our solar system [that life may sustain] @ Astronomers announce the discovery of Gliese 581 c, an Earth-like planet outside our solar system [that may sustain life]
- <dem><n><vaux><vblex> -> <dem><vaux><vblex><n>
Example 5[edit]
# Sy het later badkamer toe gevlug nadat sy hulle sonder sukses gevra het om te bedaar. : She had [later bathroom toward fled after] [she them without success asked had ] to calm down. ADV N PR VBLEX PR PRN PRN PR N V V ADV VBLEX PR N PR PRN V PR N V PRN : She had [later fled toward bathroom after] [she had without success asked them] to calm down. @ She had later fled toward bathroom after she had without success asked them to calm down. or @ She had later fled toward bathroom after she had asked him to calm down without success.
Double negatives[edit]
# nie een van hulle die taal kan praat nie. : not one of them the language can speak not. @ [not one of them] [can speak] [the language]
One solution:
- <vaux><vblex>nie --> <vaux><vblex>
Or... basically drop the extra negative at the end of all sentences (well, at the full stop).
- nie<sent> --> <sent>
Example 1[edit]
# Ek is nie so bekend met presies hoe die opstelling werk nie : I am not so familiar with exactly how the setup works not @ I am not so familiar with exactly how the setup works
Constructions with 'do'[edit]
# Nee, ek het ook nie 'n idee wat dit beteken nie No, I [have also not] an idea what it means not No, I [also have not] an idea what it means No, I [also do not have] an idea what it means No, I [also don't have] an idea what it means @ No, I don't have an idea what it means either
Tenses[edit]
The ge<verb> construction[edit]
- Fixed — If broken, report a bug!
The past tense is formed regularly by adding the prefix ge- to the verb's infinitive/present form.
- Ek breek - I break
- Ek het gebreek - I broke, I have broken, I had broken
het gebreek ^het<vaux><pres>$ ^ge<past><prefix>+breek<vblex><inf>$ = breek<vblex><past> = break<vblex><past>
Separable verbs[edit]
- Main article: Separable verbs
- Ek tree op as verteenwoordiger"
- I act as representative"
tree op op+tree optree
= act or perform
- Sterrekundiges kondig [die ontdekking] aan.
- Astronomers announce [the discovery].
konding <NP> aan aan+kondig aankondig
= announce.
- Other examples
e.g. terugkry, oopmaak, weghardloop, teruggebring, aankondig, afkondig, verkondig, opgedaag, aangerand, aangesê, teëgekom, weggekom, uitgeklim
For example:
teruggekry terug+ge+kry back+PAST+get = got back
List of prepositions: terug, oop, op, weg, aan, af, ver, teë, uit, ...
More examples: that don't work!!!
aangery aan+ge+ry lit. on+PAST+ride = rode on = drove
grootgeword groot+ge+word lit. big+PAST+become = became big = grew up
Multi-prepositions[edit]
- om met
- om uit
- oor te
- om in
- toe om
- nou dat -> now that
- uiteindelik -> at last
- ... list more here ...
Generating correct tense forms[edit]
- Ek gaan ... verbind -> I will connect
- Ek het gaan verbind -> I have gone to connect
The problem is that we have to wait until interchunk to find out if the verb is next to 'gaan' or not.
English to Afrikaans[edit]
Adjective inflection[edit]
- Fixed — If broken, report a bug!
Attributive = before noun Predicative = after noun
Adjectives in Afrikaans sometimes change depending on their position. Adjectives in the attributive position often morph. e.g.[1]
Die blompot is goud. > Die goue blompot. The vase is gold. > The golden vase. goud -> goue This man is most famous , this is the most famous man Hierdie man is die meeste beroemd, hierdie is die mees beroemde man
Rule that says:
- <adj><noun> -> <adj><attr><noun>
- <noun><adj> -> <noun><adj><pred>
Determiners[edit]
His dog is red. Sy hond is rooi. This is his dog, the dog is his = Dit is sy hond, die hond is syne.
Verbs[edit]
That was a nice = Dit was lekker, Dit was lekker gewees It was a good poem = Dit was 'n goeie gedig, dit was 'n goeie gedig gewees It will be a good poem = dit sal 'n goeie gedig wees It would have been a good poem = Dit sou 'n goeie gedig gewees het. it would have been a good idea = dit sou 'n goeie idee wees, dit sou 'n goeie idee gewees het (both correct) it would have been a good poem = dit sou 'n goeie gedig gewees het (only this form is correct)
Separable verbs[edit]
afskei, oplaai, inkoop, aftel
Ek het dit op die kar gelaai. I have it into the car loaded. I loaded it onto the car
Present progressive[edit]
to be + verb gerund -> verb present + tans
Note, this is not perfect because many times English uses the present progressive (e.g. I am writing a letter) where Afrikaans would use the present indicative (e.g. I write a letter).
- Examples
"The highest point is currently being debated" Die hoogste punt word huidiglik gedebatteer `The highest point become currently debated'
"The boy is being selfish" Die seun is besig om selfsugtig te wees `The boy is busy ** selfish to be'
"I am eating" Ek eet `I eat'
Passive sentence structure[edit]
- Passive
- [In other countries] [trade unions] are actively suppressed [by political or military regimes]
- [In ander lande] word [vakbonde] aktief onderdruk [deur politieke en militêre regimes]
* [trade unions] are actively suppressed [by regimes] * NP VBSER ADV VBLEX+PAST PR NP * word [vakbonde] aktief onderdruk [deur regimes] * BECOME NP ADV VBLEX PR NP * NP VBSER ADV VBLEX+PAST PR NP -> VBLEX NP ADV VBLEX PR NP
Perhaps we need to have a separate tag for "to become" ?
"word" ("ge-") vb. (copula) become, get (angry, cold, dark, drunk, late, tired); grow (old); go (blind, mad); turn (grey, pale, Democrat); fall (due, dumb, silent, ill, in love); (pass. auxiliary) is, are, (infml.) get;
- Active
- [In other countries political or military regimes] actively suppress [trade unions]
- [In ander lande] onderdruk [politieke en militêre regimes] [vakbonde] aktief
Double helpwerkwoorde[edit]
<Anrie> "was die kodenaam het gegee" = "was die kodenaam gegee" (no "het") <Anrie> I think in this instance there is no "het", because we already have "was" <Anrie> Thus: I saw him - Ek het hom gesien, but He was seen - Hy was gesien
Infinitives[edit]
another preposition: "to" do something = "om" iets "te" doen, thus it's not "sy missie: na behaal", but "sy missie: om te behaal" (of course, this isn't strictly correct, since the verb should come at the end, but I assume we're leaving those type of errors for now)
Roadmap[edit]
apertium-en-af 0.1[edit]
- 5,000 of the highest frequency words in each dictionary.
- Rules dealing with basic verb tenses (past, present, future)
- "cheaty" prepositions using <pr><vblex> -> <vblex> <pr>
- Basic word re-ordering for simple phrases.
- Word error rate (WER) ~20%
- Aims and uses
- For a non-native speaker to be able to discern the topic of a general news item.
- To be able to identify who said what to who.
- To be able to distinguish is a particular item is interesting enough to be translated properly.
- Sentences of up to 5 words should be translated reasonably well in both directions.
- To give better translations than
interpret.co.za
.
apertium-en-af 0.3[edit]
- Clean up.
- Arrange the dictionaries in a more sane manner, check sections.
- Aims and uses
- No new features.
apertium-en-af 0.5[edit]
- At least 10,000 words in each dictionary.
- Correct dealing with detachable prepositions.
- Correct translation of active/passive.
- Word error rate (WER) ~20% - ~15%
- Aims and uses
- Post-editing translation made by Apertium should be slightly faster than translating from scratch.
- Sentences of up to 7 words should be translated reasonably well in both directions.
apertium-en-af 1.0[edit]
- 15,000 of the highest frequency words in each dictionary.
- Compound noun identification and translation.
- Aims and uses
- Post-editing should be markedly faster than translating from scratch.
- Sentences of up to 12 words should be translated reasonably well in both directions.
Evaluation material[edit]
- Main article: Evaluation material for English to Afrikaans
The link above gives apertium output and post-editted apertium output that can be used to calculate the WER or PER for the apertium-en-af pair.
Dictionaries[edit]
Competing products[edit]
Apertium output to compare in green
interpret.co.za[edit]
- Afrikaans-English, English-Afrikaans — word-for-word dictionary lookup (machine translation).
- Vir elke Engelse woord moet jy een Afrikaanse woord kies en dis nie altyd so duidelik wat om te kies nie.
- "for each english word must you one afrikaans word choose and it's not always so clear that/what to select not."
- "For each English word you must choose one Afrikaans word and it is not always so clear wat to choose ."
- Hierdie man is die meeste beroemd, hierdie is die mees beroemde man
- "this man is the most famous , this is the most renowned man ..."
- "This man are the most famous, #this are the #most famous man"
- Die polisie het hom geboei en agter in die bakkie gesit. By Mimosa is hy laat gaan. Hy het glo nie sy selfoon teruggekry nie. Nel het na bewering ook weggehardloop en in 'n boom geskuil.
- "the police have him handcuffed and behind in the bakkie sat down . at mimosa is he allow goes . he has believe not she/his selfoon teruggekry not . nel have allegedly also weggehardloop and in a tree geskuil. ..."
- "The police had him handcuffed and after put in the truck. At Mimosa are he allows go. He had believe not got #back #his mobile phone . Nel had allegedly also ran away and hid in a tree."
- Mnr. Prince Mbiza van die Mpumalanga-nooddienste was een van die eerste paramedici op die toneel. Hy het gister gesê Matthysen is klaarblyklik dood weens “ernstige kopbeserings”.
- "mr . prince mbiza of the mpumalanga-nooddienste has been one of the first paramedici on the scenic . he has yesterday being said matthysen is evidently dead on account of “ernstige kopbeserings”."
- "Mr. Prince Mbiza of the Mpumalanga-emergency services were one of the first paramedics on the scene. He had yesterday said Matthysen are evidently dead owing to “serious head injuries”."
Press[edit]
- General press letter for Afrikaans media