Difference between revisions of "Talk:Welsh to English"

From Apertium
Jump to navigation Jump to search
Line 177: Line 177:


So we need a rule saying that if the shifted subject phrase is marked plural, the verb it was shifted before needs to have its number changed to plural at or before the postchunk stage. Not sure how to express that in pseudo-code!
So we need a rule saying that if the shifted subject phrase is marked plural, the verb it was shifted before needs to have its number changed to plural at or before the postchunk stage. Not sure how to express that in pseudo-code!

Incidentally, the subject shift hasn't been done for pronouns yet:

; roedden nhw'n hapus -> *were theyin happy - they were happy

And there's a missing space somewhere ....


Another incidentally: if the noun in the subject phrase is unknown, it gets separated from the det.def:
Another incidentally: if the noun in the subject phrase is unknown, it gets separated from the det.def:


; roedd y cwningod yn hapus -> *the was *cwningod in happy - the rabbits were happy
; roedd y cwningod yn hapus -> *the was *cwningod in happy - the rabbits were happy






==Regression tests==
==Regression tests==

Revision as of 07:44, 27 June 2008

Note: Comments should not include '=' as it confuses the Wiki templating system (as I just found out myself)

English to Welsh

Macros

This will contain chunks of rules that we need to split out to make them more maintainable

Patterns

Determiner Adjective Noun

When the determiner is indefinite,
  output noun + adjective
When the determiner is definite,
  output determiner + noun + adjective.
Tests

(1) A red cat

   coch cath

(2) The red cat

   Y coch cath



Notes for areas to be covered

A sort of scratchpad / todo list, based on things that come up when putting phrases into the testing webform.


Conjunctive genitive

gwallt yr eneth - *hair the girl - the hair of the girl - the girl's hair
llaw y bachgen - *hand the boy - the hand of the boy - the boy's hand

Note that the noun phrase in English is definite - contrast "merch y meddyg" (the doctor's daughter) and "merch meddyg" (a doctor's daughter).

For an English phrase 
of the type "def + noun1 + of + def + noun2"
or of the type "def + noun2 + 's + noun1"
convert in Welsh to "noun1 + def + noun2".
Here can noun1 be a simple noun, or can it be a noun phrase? For example "the red cat of the young boy" - Francis Tyers
e.g.
For the pattern det.def + noun1 + of + det.def + noun2:
Output noun1 + det.def + noun2


Yes, as long as you like, eg,
cath goch bachgen bach merch ifanc bert rheolwr y banc mawr du
the red cat of the little boy of the pretty young daughter of the manager of the big black bank
It's only the last NP of the sequence that gets the def.det. Donnek


Ok, so this requires a three level rule.
t1x -> t2x SN_(the cat red) of_(of) SN_(the boy little) of_(of) SN_(the daughter young pretty) of_(of) SN_(the manager) of_(of) SN_(the bank big black)
t2x -> t3x SN_(the cat red) SN_(the boy little) SN_(the daughter young pretty) SN_(the manager) SN_(the bank big black)
t3x -> gen (cat red boy little daughter young pretty manager the bank big black)
What I'll do for now is get the chunks working ('SN' -- noun phrase, and 'of'), for values of 'noun', 'det noun', 'det adj noun', 'det adj adj noun', 'det adj adj adj noun', etc. Then look at taking care of more frequent cases (e.g. the first example). Francis Tyers


For a Welsh phrase of the type "!det + noun1 + def + noun2"
convert in English to "def + noun1 + of + def + noun2"
or to "def + noun2 + 's + noun1".


The second noun is probably historically a genitive, but it has lost all case markers. The equivalent in Irish would be:

ceann an chapaill - *head the of-horse (gen) - the head of the horse - the horse's head
ceann capaill - *head of-horse (gen) - the head of a horse - a horse's head


"was"

"roedd" ([he/she/it] was) is unknown, but I seem to remember adding entries for "to be" to the dixes in the mists of time. Was I dreaming? (roedd <- yr + oedd)

There are entries for 'bod', but 'roedd' doesn't get processed as all of the 'bod' entries start with 'b' (see this link). I will need to fix this in the analyser. If I understand you correctly, 'roedd' is a contraction of 'yr' (determiner ...) + 'oedd' (verb 'bod', past tense ...)? Francis Tyers


Some serious errors have crept in to those entries. I've sent an amended version to you by email. You're right - roedd -> yr + oedd, but in the amended version I've sent, I've put (e.g.) "roedd" and "oedd" as alternate forms, because "Roedd" is the spoken form, and even in written Welsh you hardly ever see "Yr oedd" nowadays. Donnek


the boy was in the garden -> *y bachgen bu yn yr ardd - bu'r bachgen yn yr ardd

Almost correct, except for word-order, and the fact that the preterite is being used instead of the imperfect ("roedd y bachgen yn yr ardd"). The preterite needs to be marked as only being used in written Welsh, and to have a lower likelihood than the imperfect. This is too rough a rule, but would do for the time being.


Marking and word-order

The above brings up a useful point about this. If the standard VSO sequence is changed to SVO (ie unchanged from the English standard), this is a marked pattern, conveying a relative clause. In written Welsh, the verb will be preceded by "a" + soft mutation, but in spoken Welsh the "a" usually disappears.

y bachgen [a] fu yn yr ardd ddydd Llun (the boy who was in the garden on Monday)
yr eneth [a] welodd y ci (the girl who saw the dog)

contrast

gwelodd yr eneth y ci (the girl saw the dog)

Hmmm. Relative clauses are going to be difficult.

For Welsh pattern "noun + a + soft-mutated_verb"
output "noun + who/which + verb".


"i" as preposition

Welsh "i" (to) is getting translated as "[f]i" (I, me).

if Welsh "i" occurs immediately after a verb marked as 1p sing
output pronoun 1p sing
otherwise output preposition "to"
This is a good rule for the tagger. - Francis Tyers 12:19, 26 June 2008 (UTC)


"yn" as "-ing"

For Welsh pattern "yn + verb<vblex><inf>"
output English "verb + ing"

For instance:

yn mynd -> *in go - going
yn gweld -> *in see - seeing

"yn" as stative

For Welsh pattern "yn + adj"
output "adj"

There is a problem here in that this pattern can also be an adverb:

siaradodd yn hapus am ei fywyd - he talked happily about his life
For English pattern "adverb_formed_from_adj + ly"
output Welsh "yn + adj"
This second one will be difficult to do, as we don't have adverbs in the English dictionary marked as derivatives from adjectives or not. - Francis Tyers

Preferential choice between noun and verbform

atebodd hi'r cwestiwn -> *answered shethe #hold an inquiry - she answered the question

proc selects 'cwestiwn' (question) - correct - and 1p pl imperative of 'cwestio' (an infrequent verb for 'hold an inquiry'). The 1p pl present would also have been a possibility, and indeed a more likely one. tagger selects the second of these.

Not sure how widespread this would be, but the tagger should give precedence to the noun choice whenever the verb form is preceded by 'y':

For Welsh pattern "[y | yr | 'r] + [noun | verb]"
output "[y | yr | 'r] + [noun]"

This is not perfect, because "y | yr" can also be an indirect relative clause pronoun before a verb, but it would catch most things until we can resolve the latter point.


Number agreement of verb

roedd y bechgyn yn hapus -> *the boys was in happy - the boys were happy

1.3.6 above would fix "in happy". The fixes today are now moving the subject noun phrase to the front (woop woop!). Unfortunately, there is a little wrinkle in Welsh, where a plural subject only takes the plural form of the verb if it (the subject) is a pronoun. Thus "roedd" above is the singular, but if we said "roedden nhw'n hapus" (they were happy), we would use the plural.

So we need a rule saying that if the shifted subject phrase is marked plural, the verb it was shifted before needs to have its number changed to plural at or before the postchunk stage. Not sure how to express that in pseudo-code!

Another incidentally: if the noun in the subject phrase is unknown, it gets separated from the det.def:

roedd y cwningod yn hapus -> *the was *cwningod in happy - the rabbits were happy

Regression tests

Treatment of 'is' in present tense.
  • The boy is in the garden. → mae y bachgen yn yr ardd. (note: yr → 'r is an open bug)
  • mae'r bachgen yn yr ardd. → the boy is in the garden.


These are both correct (apart from the 'r), but I thought "regressions" were when you fix something and in the process break something else? Re 'r:
In Welsh pattern "aeiouwy + space + y[r]"
output "aeiouwy + 'r"

Donnek

Yep, so these should be 'regression tests' :) --
Yep, I know the pattern, the problem is that the post-generator insists on having a ~ before anything that it deals with -- This would mean that we have to have '~' before every vowel, which would be quite difficult. There is another possibility though, if we can't fix that and it would be to just use a plain transliterator to replace:
"aeiouwy + space + ~yr + space" with "aeiouwy'r + space"
Can you think of anything this might catch by accident? or is it a fairly safe search/replace? - Francis Tyers


I would go with this in the meantime - I think it's pretty safe. Note that your rule can act on both 'y' and 'yr'. The system is:
consonant + space + y + space + consonant
consonant + space + yr + space + vowel
vowel + 'r + space + consonant-or-vowel
Donnek
No subject shift with imperative
  • gwasgwch y botwm! → squeeze the button!
  • squeeze the button! → gwasgu y botwm! (note: infinitive for imperative is an open bug)
"yn" as stative
  • yn falch -> proud
  • yn hapus -> happy