Talk:Welsh to English/Archive 1

From Apertium
Jump to navigation Jump to search

Section numbers from this version.

(1.3.2) "was"

"roedd" ([he/she/it] was) is unknown, but I seem to remember adding entries for "to be" to the dixes in the mists of time. Was I dreaming? (roedd <- yr + oedd)

There are entries for 'bod', but 'roedd' doesn't get processed as all of the 'bod' entries start with 'b' (see this link). I will need to fix this in the analyser. If I understand you correctly, 'roedd' is a contraction of 'yr' (determiner ...) + 'oedd' (verb 'bod', past tense ...)? Francis Tyers


Some serious errors have crept in to those entries. I've sent an amended version to you by email. You're right - roedd -> yr + oedd, but in the amended version I've sent, I've put (e.g.) "roedd" and "oedd" as alternate forms, because "Roedd" is the spoken form, and even in written Welsh you hardly ever see "Yr oedd" nowadays. Donnek


The "bod" paradigm should now be all ok, there remains however to choose the restrictions (e.g. which forms we will generate for each set of tags). - Francis Tyers
the boy was in the garden -> *y bachgen bu yn yr ardd - bu'r bachgen yn yr ardd

Almost correct, except for word-order, and the fact that the preterite is being used instead of the imperfect ("roedd y bachgen yn yr ardd"). The preterite needs to be marked as only being used in written Welsh, and to have a lower likelihood than the imperfect. This is too rough a rule, but would do for the time being.

(1.3.4) Preferential choice between noun and verbform

atebodd hi'r cwestiwn -> *answered shethe #hold an inquiry - she answered the question

proc selects 'cwestiwn' (question) - correct - and 1p pl imperative of 'cwestio' (an infrequent verb for 'hold an inquiry'). The 1p pl present would also have been a possibility, and indeed a more likely one. tagger selects the second of these.

Not sure how widespread this would be, but the tagger should give precedence to the noun choice whenever the verb form is preceded by 'y':

For Welsh pattern "{y,yr,'r} + word_tagged_as_either_noun_or_verb"
output "{y,yr,'r} + noun"

This is not perfect, because "y | yr" can also be an indirect relative clause pronoun before a verb, but it would catch most things until we can resolve the latter point.

gwelodd y dyn y llyfr -> *the man saw the books - the man saw the book

This is similar, but is tricksy because it is superficially correct apart from the plural. But in fact, tagger is reading "llyfr" as pres 3p sing of "llyfru" (to book). Apart from being infrequent, and therefore much less likely to appear ("bwcio" would be the usual word), Eurfa has "llyfra" as the pres 3p sing, so there may be a paradigm problem too. The above rule would throw out the verb in the meantime.


It is currently using the aberth/u__vblex paradigm (see output here). Is this incorrect? - Francis Tyers


The problem is that "aberthu", apart from the 'regular' "abertha" also has a written "aberth". So yes, it probably is incorrect. The problem is that a lot of less common verbs are very rarely inflected. It might have been better to use something like "gwenu" or "siomi". In the meantime, perhaps just changing "aberth" to "abertha" in the pres 3p sing will do. - Donnek

(1.3.6) Number agreement of verb

I added 'rabbits' to the dictionary, but the problem of unknown words and phrase movement is one we're experiencing in Basque too... - Francis Tyers
OK - so it's basically an issue that you can't do much about until the word is logged. Hmm. I suppose that makes sense, since Apertium can't figure out what to do with something until it knows what it should do with it ... In a practical sense, this is going to be problematic if we demo Apertium using unseen text. Is there any way of doing some blind choosing, eg
if this word is
preceded by [y,yr,'r]
we will assume it's a noun
preceded by yn
we will assume it's a verb
unless a verb has been identified in the current phrase
in which case we'll assume it's an adjective
This might break Apertium - I don't know. In theory, though, we might be able to get relative probabilities for a particular sequences from a corpus. - Donnek

I'd be reluctant to add one as we'd not be able to get the translation, on the other hand, it wouldn't cause messing up of word order. It's an open problem, and we're thinking about it :) - Francis Tyers


(1.3.7) Prepositional noun phrase should not be a subject

cerddodd fo i'r dref -> he walked in the town

Fine, except that the preposition "i" should really be glossed as "to" ("yn y dref" would be "in the town")

Contrast:

cerddodd i'r dref -> *the town walked in - [he/she] walked to the town
Welsh pattern "prep + det.def + noun" is never a subject phrase

and therefore the "det.def + noun" section shouldn't be shifted. (I can't think of any exceptions to this, but there may be one.)

There was a rule to do this, I've commented it out, I think there was a reason for it, but I can't recall now. I've run the regression tests below and it doesn't seem to have broken anything. Regarding the preposition, should I change "i" to be "to" instead of "in" ? - Francis Tyers
Re "i", yes, change it to "to". - Donnek
The problem here was the dictionary only had i'r → yn+yr... i've added i'r → i+yr and now it is picking the right one, although I don't know what will happen for other contexts... - Francis Tyers
Not sure where that would have come from. The only vaguely relevant thing I can think of is "i mewn i" (into). - Donnek


allan i'r cyfarfod -> *the meeting #exit<vblex><pres><p3> in - out to the meeting

This is similar - "in" should be "to", and should be kept with "the meeting".

However, there is another issue here, which is in effect the same as "Preferential choice between noun and verbform" above. In this case, the verb "allanu" (to exit) is being chosen instead of the much more likely "allan" (out).

roedd o ar dy lyfr -> *was of on your books - it was on your book

1.3.9 would deal with "of", and 1.3.6 would deal with "books". Subject shift would then produce a reasonable translation.

However:

roedd ar dy lyfr -> *your #be<vbser><past><p3> on books - (it) was on your book

Omitting the subject pronoun can happen quite frequently in speech if the subject has already been mentioned. The <sg> tag gets lost at interchunk, which means the verb can't be conjugated (this came up somewhere else, but I think it's been taken off the page - maybe it would be better just to mark the issue heading as "addressed" rather than delete it). But there is an additional issue, in that the possessive pronoun is getting treated as the subject and moved separately. So maybe we need a broader rule to say that "prep + det.def/pr.poss/whatever + noun" is an indivisible chunk, and must be dealt with as a block. No part of it would be moved in this case anyway.

Regarding page cleanup, ok. perhaps having a separate section, and then moving sections down would be a good idea. - Francis Tyers

It would also be nice in the longer term to fill in the pronoun if it is omitted.

For Welsh pattern "verb + non-subject noun phrase"
output English "verb + pronoun agreeing in number and person + non-subject noun phrase"

The NSNP could be a prepositional phrase (marked by an initial preposition), or an object phrase (marked with initial soft mutation).

(1.3.8) "-ing" as "yn + verb"

For English pattern "subject + verb<vbser> + verb + ing"
output for Welsh "verb<vbser> + subject + yn + verb"

(1.3.9) Inflected verbs not being parsed

aeth -> *aeth - (he/she/it) went

However, "aeth" is listed in cy.dix.xml (line 27491) as past 3p sing in the mynd_vblex paradigm, which is what "mynd" (to go) gets conjugated against (line 54444).

Ah - a bug in the segmentation.

*myndaeth fo -> he went
he went -> *myndaeth fe

The infinitive is getting added to the irregular forms, instead of being replaced by them.

Yep, this is a problem in the paradigm for 'mynd', I'll need to rewrite it, fortunately it is only used once... New paradigm output here - Francis Tyers
Fine, but the imperative forms also need "mynd" excised. - Donnek
Done. - Francis Tyers

(1.3.17) Infinitive after "yn lle"

For Welsh pattern "yn lle + verb<vblex><inf>"
output English "instead of + verb + ing"
For English pattern "instead of + verb + ing"
output Welsh "yn lle + verb<vblex><inf>"
Would you say "yn lle" is a multi-word preposition? - Francis Tyers
Yes, it is a compound preposition. "yn ei le" - instead of him (lit. in his place), "yn eu lle" - instead of them. But I don't want those included here, because there are places where you might want to translate them "in his place". - Donnek
yn lle mynd dros y ffordd -> *instead of go *dros the road - instead of going over the road

Incidentally, why is "dros" coming up as unknown here? I remember sweating over putting it in the dictionary (cy.dix line 46972, cy-en.dix line 49) :-)

It is on there, but under the paradigm /tros__pr, which means that it will never get detected... is 'dros' a mutation of 'tros' or a separate preposition, the same oddness goes for trwy and drwy. - Francis Tyers
Yes, they occur in both mutated and unmutated forms. I would think the mutated forms are more common. Hmm. I didn't realise the paradigms overwrote the cited form like that. In that case, we either need to do some [t,d] substitution, or (perhaps simpler) replicate the entire "tros" paradigm for "dros", replacing the t with d. Same for "trwy, drwy". - Donnek
This seems to be done. - Francis Tyers

(1.3.18) "i gyd"

This mean "all", and occurs after the noun:

roedd y cwningod i gyd yn ddiogel -> *the rabbits were I joint safe - all the rabbits were safe
For Welsh pattern "det.def + noun + [qualifiers] + i gyd"
output English "all + det.def + [qualifiers] + noun"
Is 'i gyd' considered an adjective or pronoun? (or something else?) :)- Francis Tyers
An adjective, I suppose. Certainly a qualifier of some sort. - Donnek
Making progress. We now get:
roedd y cwningod i gyd yn ddiogel -> *the all rabbits were safe
Just need to massage that slightly. - Donnek

(1.3.24) "llawer"

"llawer o" (a lot of, many) seems to be OK. But another rule would be useful to deal with the third coolness sentence:

For Welsh pattern "llawer + adj.comp"
output English "much + adj.comp"
dyn llawer hŷn -> *older many man - a much older man (see 1.3.10 for the "a")
Done. - Francis Tyers

(1.3.25) bod (inf) + subject pronoun + cael (inf)

Should this be mangled into:

be(inf)  + subject pronoun + get(inf)
that     + subject pronoun + had

e.g.

honiadau ei bod hi'n cael perthynas â
his allegations be she getting relation with → his allegations that she had relation with

... also on this subject, we currently don't have a verb for "have" in the bidix, the grammar i have suggests that "cael" might be it in a modal sense. - Francis Tyers

(1.3.26) Subordinate ("reported speech") clauses with "bod" + pronoun

The above (1.3.25) may be worth doing, but it may be better to deal with the more general construction.

In effect, this is the same construction as 1.3.15, but with the noun replaced by a pronoun. However, while in English the pronoun is a subject pronoun, in Welsh it is a possessive pronoun. This means that the "that" word ("bod") gets sandwiched by a the two parts of the possessive pronoun (either of which may not appear, depending on style), and is also mutated accordingly.

We're currently calling "ei" and friends possessive determiners, should we change this to possessive pronoun, or does it not make much of a difference? - Francis Tyers
Not much difference I think. Possessive determiner has the benefit that they can be considered to specify the noun in the same way as det.def. - Donnek

Thus, with a noun:

clywodd y dyn bod y trên yn cyrraedd yn hwyr (the man heard that the train was arriving late)

becomes, with a pronoun:

clywodd y dyn ei fod o'n cyrraedd yn hwyr (the man heard that it was arriving late)
For Welsh pattern "[pr.poss] + VBSER.inf_mutated + [pr.subj] + yn + verb"
output English "that + pr.subj + is + verb + -ing"
Hmm, this one is problematic as we throw away the "yn" in stage one transfer in the rule that turns "yn + vblex.inf" → "vblex.ger",
^det<SD><det><pos><sp>{^his<det><pos><2>$}$ ^verbinf<SV><vbser><inf>{^be<vbser><3>$}$ 
^prnsubj<SN><p3><m><sg>{^prpers<prn><subj><2><3><4>$}$ ^verbinf<SV><vblex><ger>{^arrive<vblex><3>$}$
Would it cause any problems if we made this rule:
For Welsh pattern "[pr.poss] + VBSER.inf_mutated + [pr.subj] + verb.ger"
output English "that + pr.subj + is + verb + -ing"
This would give: "the man heard that he is arriving late" (the "he" is an open issue)
- Francis Tyers
Done. - Francis Tyers
I was actually typing in the same suggestion, but you got there first! I don't think you can do much about the "he" without some sort of semantic check, which is not realistic at this stage. the only thing you could do would be to use "he/it", but that looks clumsy. - Donnek
For Welsh pattern "[pr.poss] + VBSER.inf_mutated + [pr.subj] + wedi + verb"
output English "that + pr.subj + have/s + verb.pp"
Done. - Francis Tyers


For Welsh pattern "[pr.poss] + VBSER.inf_mutated + [pr.subj] + am + SM_verb"
output English "that + pr.subj + will + verb"
Done. - Francis Tyers
For Welsh pattern "[pr.poss] + VBSER.inf_mutated + [pr.subj] + ar + SM_verb"
output English "that + pr.subj + is about to + verb"
Done. - Francis Tyers
For Welsh pattern "[pr.poss] + VBSER.inf_mutated + [pr.subj] + newydd + SM_verb"
output English "that + pr.subj + have/s just + verb.pp"

In the above Welsh patterns, at least one of pr.poss and pr.subj must be present.

mae hi'n dweud eu bod nhw wedi mynd -> *she is saying their that they have gone - fine apart from the redundant possessive
Now gives: she is saying that they have gone - Francis Tyers
Excellent. - Donnek
mae'n amlwg ei fod o'n dweud y gwir -> *is #obvious<adj><sint> his be hesaying the true - it is obvious that he is telling the truth
Now gives: is obvious that he is saying the true (perhaps "dweud y gwir" might be a good multiword verb → "tell the truth"?) - Francis Tyers
That would be a good shortcut. "gwir" is both adj (true) and noun.m (truth), but the second is not in Eurfa! - Donnek


cy-en    mae'n amlwg ei fod o'n dweud y gwir 
         is obvious that he is telling the truth 
dywedodd y bachgen ei bod hi newydd siarad â nhw -> *the boy said his be she new talk with they - the boy said that she had just talked to them
I'm leaving this one for now "... newydd ..", as it is harder to do as we don't mark adjective chunks with their lemma. - Francis Tyers