Talk:Welsh to English

From Apertium
Jump to navigation Jump to search
Note: Comments should not include '=' as it confuses the Wiki templating system (as I just found out myself)
Note 2: Suggestions for part-of-speech disambiguation should go here.
OK, I'll try, but I'm not entirely sure of the distinction. some of the stuff at the end of that page, for instance, is covered here. - Donnek
Note 3: Comments should not include the '|' symbol either, at least within double quotes, since it too confuses the wiki.

English to Welsh

Macros

This will contain chunks of rules that we need to split out to make them more maintainable

Patterns

Determiner Adjective Noun

Notes for areas to be covered

A sort of scratchpad / todo list, based on things that come up when putting phrases into the testing webform.


Conjunctive genitive

gwallt yr eneth - *hair the girl - the hair of the girl - the girl's hair
llaw y bachgen - *hand the boy - the hand of the boy - the boy's hand

Note that the noun phrase in English is definite - contrast "merch y meddyg" (the doctor's daughter) and "merch meddyg" (a doctor's daughter).

For an English phrase 
of the type "def + noun1 + of + def + noun2"
or of the type "def + noun2 + 's + noun1"
convert in Welsh to "noun1 + def + noun2".
Here can noun1 be a simple noun, or can it be a noun phrase? For example "the red cat of the young boy" - Francis Tyers
e.g.
For the pattern det.def + noun1 + of + det.def + noun2:
Output noun1 + det.def + noun2


Yes, as long as you like, eg,
cath goch bachgen bach merch ifanc bert rheolwr y banc mawr du
the red cat of the little boy of the pretty young daughter of the manager of the big black bank
It's only the last NP of the sequence that gets the def.det. Donnek


Ok, so this requires a three level rule.
t1x -> t2x SN_(the cat red) of_(of) SN_(the boy little) of_(of) SN_(the daughter young pretty) of_(of) SN_(the manager) of_(of) SN_(the bank big black)
t2x -> t3x SN_(the cat red) SN_(the boy little) SN_(the daughter young pretty) SN_(the manager) SN_(the bank big black)
t3x -> gen (cat red boy little daughter young pretty manager the bank big black)
What I'll do for now is get the chunks working ('SN' -- noun phrase, and 'of'), for values of 'noun', 'det noun', 'det adj noun', 'det adj adj noun', 'det adj adj adj noun', etc. Then look at taking care of more frequent cases (e.g. the first example). Francis Tyers


For a Welsh phrase of the type "!det + noun1 + def + noun2"
convert in English to "def + noun1 + of + def + noun2"
or to "def + noun2 + 's + noun1".


The second noun is probably historically a genitive, but it has lost all case markers. The equivalent in Irish would be:

ceann an chapaill - *head the of-horse (gen) - the head of the horse - the horse's head
ceann capaill - *head of-horse (gen) - the head of a horse - a horse's head


"was"

"roedd" ([he/she/it] was) is unknown, but I seem to remember adding entries for "to be" to the dixes in the mists of time. Was I dreaming? (roedd <- yr + oedd)

There are entries for 'bod', but 'roedd' doesn't get processed as all of the 'bod' entries start with 'b' (see this link). I will need to fix this in the analyser. If I understand you correctly, 'roedd' is a contraction of 'yr' (determiner ...) + 'oedd' (verb 'bod', past tense ...)? Francis Tyers


Some serious errors have crept in to those entries. I've sent an amended version to you by email. You're right - roedd -> yr + oedd, but in the amended version I've sent, I've put (e.g.) "roedd" and "oedd" as alternate forms, because "Roedd" is the spoken form, and even in written Welsh you hardly ever see "Yr oedd" nowadays. Donnek


the boy was in the garden -> *y bachgen bu yn yr ardd - bu'r bachgen yn yr ardd

Almost correct, except for word-order, and the fact that the preterite is being used instead of the imperfect ("roedd y bachgen yn yr ardd"). The preterite needs to be marked as only being used in written Welsh, and to have a lower likelihood than the imperfect. This is too rough a rule, but would do for the time being.


Marking and word-order

The above brings up a useful point about this. If the standard VSO sequence is changed to SVO (ie unchanged from the English standard), this is a marked pattern, conveying a relative clause. In written Welsh, the verb will be preceded by "a" + soft mutation, but in spoken Welsh the "a" usually disappears.

y bachgen [a] fu yn yr ardd ddydd Llun (the boy who was in the garden on Monday)
yr eneth [a] welodd y ci (the girl who saw the dog)

contrast

gwelodd yr eneth y ci (the girl saw the dog)

Hmmm. Relative clauses are going to be difficult.

For Welsh pattern "noun + a + soft-mutated_verb"
output English pattern "noun + who/which + verb".
The dictionary only has 'a' down as a co-ordinating conjunction "and", does it have other meanings? - Francis Tyers
Yes. "a" - relative "who, which" in a relative clause where the subject is the same as that of the main clause, and "a" - interrogative pre-verbal particle (eg a weles ti hwnna? - did you see that?). Both are followed by soft mutation. Note that interrogative "a" is usually omitted in speech, leaving only the mutation. - Donnek

"yn" as stative

For Welsh pattern "yn + adj"
output English "adj"

There is a problem here in that this pattern can also be an adverb:

siaradodd yn hapus am ei fywyd - he talked happily about his life
For English pattern "adverb_formed_from_adj + ly"
output Welsh "yn + adj"
This second one will be difficult to do, as we don't have adverbs in the English dictionary marked as derivatives from adjectives or not. - Francis Tyers


OK. Unfortunately, since "yn + adj" can be either an adj or an adv in Welsh, I don't even mark them separately in Eurfa - perhaps I should. Would one option be to replicate all the Welsh adj entries in Apertium by preceding them with "yn + space", and adding "-ly" to the English side? This would get the EW direction, but I don't know whether it would cause problems on the WE direction. - Donnek

The above rule has been applied (way!), but does not catch mutated adjectives ("yn" causes soft mutation):

*tyfodd fo yn mawr -> he grew big
tyfodd fo yn fawr -> *he grew in *fawr
This was a dictionary error, 'fawr' did not have the initial-m paradigm. Now added. - Francis Tyers
OK - there are a couple of others I've come across: mwy (fwy), bach (fach), gwyn (wyn). there may be a few more. - Donnek
Taken care of the first two, 'gwyn' doesn't seem to appear in the dictionary (only as 'complaint'). does it inflect at all? - Francis Tyers
LOL! There are some obvious words not in Eurfa, tut tut to me! gwyn (white), *gwen (in practice "wen", fem), gwynion (occasionally, plural), gwynnach (whiter), gwynnaf (whitest). There may be fem comp and super forms too, but we can ignore those. By the way, "da" also has this problem too. - Donnek

 :) -- Ok, I've added gwyn/gwnnach/gwynnaf for now, adding the genders would probably mess up some rules and these are probably fairly low frequency and can be taken care of later. - Francis Tyers


We could also extend this to nouns:

roedd hi'n waith anodd -> *was in ~a #difficult<adj><sint> work - (it) was hard work

(though "work" gets lost in the second proc run).

For Welsh pattern "yn + non-place noun-phrase"
output English "noun-phrase"

This is a bit complex. There are two "yn"s in Welsh: "yn" showing state or condition, or extension in time (yn hapus - happy; yn mynd - going), and "yn" the preposition showing location in a specific place (yn y tŷ - in the house (contrast: mewn tŷ - in a house); yn Nolgellau - in Dolgellau). (They are probably related historically.) The stative "yn" soft-mutates nouns and adjectives, but not verbs; the location "yn" nasal-mutates (and changes to "ym" to match an initial "m" in the following noun, eg ym Mangor - in Bangor).

So - as it stands, the above will clash with 1.3.10 (change prep+noun to prep+det.indef+noun), even though "yn" the preposition will never occur before a non-specific noun (it must have specificity), and even though the above is not actually "yn" the preposition (it's "yn" the stative). We can't use the stative soft-mutation to decide, because (a) that doesn't apply to some consonant initials, and (b) other prepositions cause mutation too, and it would be overkill to check for each one. So the easiest thing is to adjust 1.3.10 to exclude "yn" as one of the prepositions that will be caught. Is it easy? I don't know :-)

I've added the yn "stative" to the analyser as well as the yn "preposition", but until we retrain the tagger it will not pick the former. If you could think of any rules that will choose the right one in a given context it would help (for ideas on the kinds of restrictions to these rules, see here and here). - Francis Tyers
The simplest would be:
Welsh word "yn" is a preposition
when it is followed by "det.def" or by a capitalised word
otherwise it is a stative
That may not be perfect, but it is good enough. I'll bear in mind the tagger pages, but it may take a while to get to that stage. - Donnek

Preferential choice between noun and verbform

atebodd hi'r cwestiwn -> *answered shethe #hold an inquiry - she answered the question

proc selects 'cwestiwn' (question) - correct - and 1p pl imperative of 'cwestio' (an infrequent verb for 'hold an inquiry'). The 1p pl present would also have been a possibility, and indeed a more likely one. tagger selects the second of these.

Not sure how widespread this would be, but the tagger should give precedence to the noun choice whenever the verb form is preceded by 'y':

For Welsh pattern "{y,yr,'r} + word_tagged_as_either_noun_or_verb"
output "{y,yr,'r} + noun"

This is not perfect, because "y | yr" can also be an indirect relative clause pronoun before a verb, but it would catch most things until we can resolve the latter point.

gwelodd y dyn y llyfr -> *the man saw the books - the man saw the book

This is similar, but is tricksy because it is superficially correct apart from the plural. But in fact, tagger is reading "llyfr" as pres 3p sing of "llyfru" (to book). Apart from being infrequent, and therefore much less likely to appear ("bwcio" would be the usual word), Eurfa has "llyfra" as the pres 3p sing, so there may be a paradigm problem too. The above rule would throw out the verb in the meantime.


It is currently using the aberth/u__vblex paradigm (see output here). Is this incorrect? - Francis Tyers


The problem is that "aberthu", apart from the 'regular' "abertha" also has a written "aberth". So yes, it probably is incorrect. The problem is that a lot of less common verbs are very rarely inflected. It might have been better to use something like "gwenu" or "siomi". In the meantime, perhaps just changing "aberth" to "abertha" in the pres 3p sing will do. - Donnek

Number agreement of verb

I added 'rabbits' to the dictionary, but the problem of unknown words and phrase movement is one we're experiencing in Basque too... - Francis Tyers
OK - so it's basically an issue that you can't do much about until the word is logged. Hmm. I suppose that makes sense, since Apertium can't figure out what to do with something until it knows what it should do with it ... In a practical sense, this is going to be problematic if we demo Apertium using unseen text. Is there any way of doing some blind choosing, eg
if this word is
preceded by [y,yr,'r]
we will assume it's a noun
preceded by yn
we will assume it's a verb
unless a verb has been identified in the current phrase
in which case we'll assume it's an adjective
This might break Apertium - I don't know. In theory, though, we might be able to get relative probabilities for a particular sequences from a corpus. - Donnek

I'd be reluctant to add one as we'd not be able to get the translation, on the other hand, it wouldn't cause messing up of word order. It's an open problem, and we're thinking about it :) - Francis Tyers


Prepositional noun phrase should not be a subject

cerddodd fo i'r dref -> he walked in the town

Fine, except that the preposition "i" should really be glossed as "to" ("yn y dref" would be "in the town")

Contrast:

cerddodd i'r dref -> *the town walked in - [he/she] walked to the town
Welsh pattern "prep + det.def + noun" is never a subject phrase

and therefore the "det.def + noun" section shouldn't be shifted. (I can't think of any exceptions to this, but there may be one.)

There was a rule to do this, I've commented it out, I think there was a reason for it, but I can't recall now. I've run the regression tests below and it doesn't seem to have broken anything. Regarding the preposition, should I change "i" to be "to" instead of "in" ? - Francis Tyers
Re "i", yes, change it to "to". - Donnek
The problem here was the dictionary only had i'r → yn+yr... i've added i'r → i+yr and now it is picking the right one, although I don't know what will happen for other contexts... - Francis Tyers
Not sure where that would have come from. The only vaguely relevant thing I can think of is "i mewn i" (into). - Donnek


allan i'r cyfarfod -> *the meeting #exit<vblex><pres><p3> in - out to the meeting

This is similar - "in" should be "to", and should be kept with "the meeting".

However, there is another issue here, which is in effect the same as "Preferential choice between noun and verbform" above. In this case, the verb "allanu" (to exit) is being chosen instead of the much more likely "allan" (out).

roedd o ar dy lyfr -> *was of on your books - it was on your book

1.3.9 would deal with "of", and 1.3.6 would deal with "books". Subject shift would then produce a reasonable translation.

However:

roedd ar dy lyfr -> *your #be<vbser><past><p3> on books - (it) was on your book

Omitting the subject pronoun can happen quite frequently in speech if the subject has already been mentioned. The <sg> tag gets lost at interchunk, which means the verb can't be conjugated (this came up somewhere else, but I think it's been taken off the page - maybe it would be better just to mark the issue heading as "addressed" rather than delete it). But there is an additional issue, in that the possessive pronoun is getting treated as the subject and moved separately. So maybe we need a broader rule to say that "prep + det.def/pr.poss/whatever + noun" is an indivisible chunk, and must be dealt with as a block. No part of it would be moved in this case anyway.

Regarding page cleanup, ok. perhaps having a separate section, and then moving sections down would be a good idea. - Francis Tyers

It would also be nice in the longer term to fill in the pronoun if it is omitted.

For Welsh pattern "verb + non-subject noun phrase"
output English "verb + pronoun agreeing in number and person + non-subject noun phrase"

The NSNP could be a prepositional phrase (marked by an initial preposition), or an object phrase (marked with initial soft mutation).

"-ing" as "yn + verb"

For English pattern "subject + verb<vbser> + verb + ing"
output for Welsh "verb<vbser> + subject + yn + verb"


Inflected verbs not being parsed

aeth -> *aeth - (he/she/it) went

However, "aeth" is listed in cy.dix.xml (line 27491) as past 3p sing in the mynd_vblex paradigm, which is what "mynd" (to go) gets conjugated against (line 54444).

Ah - a bug in the segmentation.

*myndaeth fo -> he went
he went -> *myndaeth fe

The infinitive is getting added to the irregular forms, instead of being replaced by them.

Yep, this is a problem in the paradigm for 'mynd', I'll need to rewrite it, fortunately it is only used once... New paradigm output here - Francis Tyers
Fine, but the imperative forms also need "mynd" excised. - Donnek
Done. - Francis Tyers

Insert det.indef in prepositional NP

daeth Taid â lamp -> Grandfather came with lamp (preferably "with a lamp")
dychwelodd y rheolwr gyda gŵr tew -> the manager returned with fat man (preferably "with a fat man")
For Welsh pattern "prep + noun"
output English pattern "prep + det.indef + noun"

this should probably be working now, - Francis Tyers

I am tempted to retire this in favour of a broader rule:

For Welsh pattern "non-specific noun.sg"
output English "a + non-specific noun.sg"

"Non-specific" here means a noun that is not qualified by det.def, pr.poss, etc.

daeth car ar hyd y ffordd -> *car came along the road - a car ...
mae'r athro yn licio chwarae gêm o golff -> *the teacher is liking play game he golf - the teacher likes to play a game of golf

1.3.16 would deal with "play". For "gêm o golff" we would need to prevent "*a game of a golf" (which the existing rule would in fact have produced for "o golff"). Perhaps:

For Welsh pattern "non-specific noun + o + non-specific noun"
output English pattern "a + noun + of + noun"

This would need to fire before the revised rule above, or we need some other way of sorting out the possible doubling of "a" (a a game of golf) - LOL - let it happen and then have a rule:

For English pattern "det.indef + det.indef"
output "det.indef"
gwelodd y bachgen gath yn yr ardd -> the boy saw cat in the garden (preferably "a cat")

Preferential choice between verbforms

bydd y lamp yn rhoi golau -> *are the lamp giving light - the lamp will be giving light (and presumably we could massage this into "the lamp will give light" later, since that would be the more natural English equivalent)

A couple of things here. The most important is that tagger chooses the less frequent imperative out of the imperative/future choice for the verb. Presumably this then means that the subject shift can't take place. But even with the imperative choice, the imperative 2p sing info gets lost between interchunk and postchunk, and replaced with a generic? present which gets output as "are". Odd.

(I'm assuming that "bydd" would get output as "will be", since that would be the correct English tense.)

I fixed this (crudely) by commenting out the imperative for "be" 2pSg (You are!) When I train the tagger next I'll see if i can take care of it there. - Francis Tyers

Another example:

roedd y bechgyn yn gallu croesi dan y ffordd -> *the boys were #be<vbmod><ger># able to #vblex><vblex><pres> under the road - the boys were able to cross under the road

Here, proc decides to ignore "croesi" (cy-en.dix line 6203) as an infinitive in favour of conjugating it as present 2p sing. The infinitive option doesn't even show! Can we add some such rule as:

When you have homographous Welsh verb options <inf> and conjugated_verb
choose <inf> unless verb is followed by pr

This isn't very good, because you could have the conjugated from without a pronoun, but it might deal with this to some extent.

The example above also has some funkiness going on with "gallu" - "were be able" needs to be transformed into "were able". However, I don't know enough about how Apertium treats modal verbs to make a suggestion.

Comparative adjectives with "less/more"

tyfodd y twnnel yn llai llachar -> *the tunnel grew small bright - the tunnel grew less bright
tyfodd y twnnel yn fwy tywyll -> *the tunnel grew big dark - the tunnel grew more dark
For Welsh pattern "fwy/llai + adj"
output English "more/less + adj"
For English pattern "more/less + adj"
output Welsh "fwy/llai + adj"
I'll see if i can copy in a rule from Spanish--English for this :) - Francis Tyers

Synthetic comparative adjectives

Many of these seem to have faulty dictionary entries:

tyfodd y twnnel yn fwy -> the tunnel grew big (should be "bigger")
tyfodd y twnnel yn llai -> the tunnel grew small (should be "smaller")
tyfodd y twnnel yn hirach -> the tunnel grew long (should be "longer")
tyfodd y twnnel yn uwch -> the tunnel grew high (should be "higher")
Dictionary error in the bidix, now fixed. - Francis Tyers
Cool! - Donnek

Note, also a rule needs to be written for:

the expensive house → y tŷ drud
the more expensive house → y tŷ drutach
the most expensive house → y tŷ drutaf

Verb + preposition

Re "coolness factor" below (woop woop!), we need to cater for verbs such as "ymchwilio" which are followed by a preposition that is different from English, or where there is no preposition in English.

For example:

ymchwilio i - research into, investigate
siarad am - talk about
dweud wrth - say to, tell
gofyn am - ask for

Is there any way to get the verb+prep phrase parsed as a phrase, rather than separately? Perhaps an entry in one of the dictionaries? This would only need to be done for those phrases where the preposition differs in English and Welsh.

Not, for instance for:

neidio dros - jump over
cerdded i - walk to
delio gyda - deal with

where there is a regular correlation between the meanings of the Welsh and English prepositions.

Yes, these are multiword constructions, like for example "He became accustomed to the taste." → "cynefinodd Fe i y blas." (try it in the testing interface). Is there a way of getting a list of these? (actually there are many I currently need to fix in the bidix/English dict, but if you have a list I can look at them. At the moment we only seem to have multiword verbs on the English side. - Francis Tyers
I will try to compile a list of the most common, and send it to you tomorrow. - Donnek

Subordinate ("reported speech") clauses with "bod" + noun

Also referring to the cool sentence, we have two sentences as follows:

(1) roedd y Comisiwn yn ymchwilio i'r honiadau - the Commission was investigating the allegations
(2) mae yr AS wedi methu datgan £103,000 o roddion - the MP has failed to declare £103,000 of gifts

Subordinate clauses, like the relative clauses, will be difficult. But a first stab at this might be as follows:

For Welsh pattern "[b/f]od + [det.def] + noun + [qualifiers] + wedi + verb"
output English "that + [det.def] + noun + [qualifiers] + has/have (number agreeing with noun) + verb_past_participle"
clywodd y dyn bod y trên wedi cyrraedd yn hwyr -> *the man heard be the train after arrive #late
This would be:
VBSER(INF) + DEFINITE_NP + wedi + VERB
THAT + DEFINITE_NP + HAS + VERBPP
Where DEFINITE_NP is any noun phrase preceded by the definite article? - Francis Tyers
Actually, thinking about this again, it doesn't have to be definite - it just so happened that in those sentences it was. You could have something like:
clywodd ysbïwr bod y trên .... -> a spy heard that the train ....
So the NP could be "[det.def, rhyw (some), pr.poss] + [adj - eg hen] + noun + [qualifiers - adjectives, demonstratives, etc]", or it could just be "pr.subj" (clywodd fo bod y trên ...). The same applies to the "am" construction below. Another point is the the VBSER can be soft-mutated - "fod" instead of "bod". - Donnek
This rule is broadly working for now. At least it is inserting the 'that', a form of 'have' and changing the verb to a pp. It is not however robust, and seems to me a bit hacky. Could you give some more examples so I can fine tune it? - Francis Tyers
Hacky? Surely not .... I did say that relative and subordinate clauses will be difficult, so we may have to refactor as we go along. An alternative to the above (which would also cover the adjective example below) would be:
For Welsh pattern "[b/f]od + NP + complement"
output English "that + NP + is + complement"
This would give:
clywodd y dyn bod y trên wedi cyrraedd yn hwyr -> *the man heard that the train is after arrive #late
You would then have further rules to transform "is + after + verb" to "has + verbpp", and "is + for + verb" to "will + verb". (Irish and Gaidhlig have a similar construction, by the way, using "ar, air" instead of "wedi", so whatever rule bundle you use here would be transferable to that branch of Celtic too.)
There is also another similar construction using "ar" in place of "wedi" and "am" - this one means "about to":
clywodd y dyn bod y trên ar gyrraedd yn hwyr -> *the man heard be the train on arrive late - the man heard that the train was about to arrive late
So an additional rule "is + on + verb -> was + about to + arrive".
Oh, there's another one too, with "newydd", meaning "just now":
clywodd y dyn bod y trên newydd gyrraedd yn hwyr -> *the man heard be the new train arrive late - the man heard that the train had just arrived late
This one has been caught by the adjective rule, but "newydd" belongs to the VP, not the NP in this case, so we'd need some prioritisation.
Other examples:
roedd y bachgen yn dweud bod y tŷ wedi mynd ar werth -> the boy was saying that the house has gone on value
(fine, except that "ar werth" means "on sale" - see 1.3.19 below.
dywedodd hi bod y trên yn hwyr -> *she said be the train late - she said that the train is late.
You could deal with this one by adding a similar rule:
VBSER(INF) + NP + ADJ
THAT + NP + VBSER + ADJ
dwi'n meddwl bod y glaw wedi stopio -> *dwithinking that the rain has stopped - I think that ....
(We need to get the present tense of "bod" sorted out too)
dywedodd yr eneth bod y siop am agor ar amser -> the girl said that the shop will open on #time
(I'm noticing the lack of adverbs, eg "tomorrow", "today", "afterwards", etc. I suppose the remaining bits of Eurfa need importing at some point.)
Donnek

The above rule would give "the man heard that the train has arrived late" - not perfect, since in English we would use pluperfect rather than perfect here, but a lot better.

We can extend this to another construction:

For Welsh pattern "[b/f]od + [det.def] + noun + [qualifiers] + am + verb"
output English "that + [det.def] + noun + [qualifiers] + will + verb"
clywodd y dyn bod y trên am gyrraedd yn hwyr -> *the man heard be the train for arrive #late

The above rule would give "the man heard that the train will arrive late" - not perfect, since in English we would use conditional rather than future here.

This now seems to be working. - Francis Tyers

These could be improved if it were possible to refer back to the verb of the main clause. Thus where it is past, the subordinate would use pluperfect or conditional; where it is non-past, the subordinate would use perfect or future.

We can probably set this as a variable, but what would be the triggers to set/unset the variable? - Francis Tyers
End of the clause? Full stop or comma, perhaps? - Donnek


There are other varieties of subordinate clause that I give other suggestions about.

Incidentally, in the above the det.def should be taken to include other prenominal qualifiers like possessives.

Verbal nouns / Infinitives

roedd y dyn yn gwerthu pethau rhad -> the man was selling cheap things
roedd fo yn palu -> he was digging

Both of these are fine.

Making the verbal noun/infinitive the subject doesn't work quite so well:

roedd gwerthu pethau rhad yn hawdd -> *was sell cheap things easy - selling cheap things was easy
roedd palu yn waith caled -> *was dig in a hard work - digging was hard work

The latter would be benefit from the extension of the "yn as stative" rule to nouns, as suggested in 1.3.4 above. But we also need to define the VN as a subject, so that it can be shifted. This is not easy, because the rule may cause problems with other constructions later. But we can take a stab at it.

First, we can use the infinitival form in English - "to sell cheap things was easy" and "to dig was hard work" are equivalent to the above sentences.

For Welsh pattern "verb<vblex><inf>"
output English "to + verb"
For English pattern "to + verb"
output Welsh "verb<vblex><inf>"

This allows the same rule to be used in sentences like:

ceisiodd y dyn agor y bocs -> *the man sought open the box

which should produce "the man tried to open the box". (Can we delete the "seek" entry for "ceisio" until we have refined choices between different entries? The "try" entry is more frequent.)

"seek" entry commented out and replaced with "try". Francis Tyers

Second, we can assume that a verbal noun phrase will occur after an inflected verb (mostly forms of "bod"). So we might try expanding the above to say:

For Welsh pattern "verb_inflected + verb<vblex><inf> + [noun phrase]"
output English "to + verb + [noun phrase] + verb_inflected"

for the first two sentences ("roedd gwerthu pethau rhad" and "roedd palu"), and:

For Welsh pattern "verb_inflected + subject + verb<vblex><inf> + noun phrase"
output English "subject + verb_inflected + to + verb + noun phrase"

for the third ("ceisiodd y dyn agor y bocs").

This is not perfect, and I am not sure how it would cut across the existing rule for subject shift.

There are also interesting issues with nesting of infinitival subject phrases:

roedd gwerthu pethau rhad yn hawdd yn neis -> *was sell cheap things easy nice - to sell cheap things easily was nice
roedd gwerthu pethau rhad yn hawdd yn beth neis -> *was sell cheap things easy in a nice thing - to sell cheap things easily was a nice thing

where I'm not sure how you specify the boundaries of the noun phrase. Any views, or is that too complex for the present iteration.

At the moment to define noun phrases we just define fixed length patterns of tags which are matched in left-to-right, longest-match way. So for example for Welsh to English:
NOUN → SN_(NOUN)
PRNSUBJ → SN_(PRNSUBJ)
DET → SN_(DET)
DET NOUN → SN_(DET NOUN)
NOUN1 NOUN2 → SN_(NOUN2-NOUN1)
NOUN ADJ → SN_(ADJ NOUN)
yn ADJ → SN_(ADJ)
DET NOUN ADJ → SN_(DET ADJ NOUN)
NOUN ADJ1 ADJ2 → SN_(ADJ2 ADJ1 NOUN)
The first is the pattern we detect in Welsh, and the second is the "chunk" that we output in English. Any suggestions on defining more of these (the most frequently occurring), or changing them would be appreciated. - Francis Tyers

Note also in the above that we have the adverb problem from 1.3.4.


Infinitive after "yn lle"

For Welsh pattern "yn lle + verb<vblex><inf>"
output English "instead of + verb + ing"
For English pattern "instead of + verb + ing"
output Welsh "yn lle + verb<vblex><inf>"
Would you say "yn lle" is a multi-word preposition? - Francis Tyers
Yes, it is a compound preposition. "yn ei le" - instead of him (lit. in his place), "yn eu lle" - instead of them. But I don't want those included here, because there are places where you might want to translate them "in his place". - Donnek
yn lle mynd dros y ffordd -> *instead of go *dros the road - instead of going over the road

Incidentally, why is "dros" coming up as unknown here? I remember sweating over putting it in the dictionary (cy.dix line 46972, cy-en.dix line 49) :-)

It is on there, but under the paradigm /tros__pr, which means that it will never get detected... is 'dros' a mutation of 'tros' or a separate preposition, the same oddness goes for trwy and drwy. - Francis Tyers
Yes, they occur in both mutated and unmutated forms. I would think the mutated forms are more common. Hmm. I didn't realise the paradigms overwrote the cited form like that. In that case, we either need to do some [t,d] substitution, or (perhaps simpler) replicate the entire "tros" paradigm for "dros", replacing the t with d. Same for "trwy, drwy". - Donnek
This seems to be done. - Francis Tyers

"i gyd"

This mean "all", and occurs after the noun:

roedd y cwningod i gyd yn ddiogel -> *the rabbits were I joint safe - all the rabbits were safe
For Welsh pattern "det.def + noun + [qualifiers] + i gyd"
output English "all + det.def + [qualifiers] + noun"
Is 'i gyd' considered an adjective or pronoun? (or something else?) :)- Francis Tyers
An adjective, I suppose. Certainly a qualifier of some sort. - Donnek
Making progress. We now get:
roedd y cwningod i gyd yn ddiogel -> *the all rabbits were safe
Just need to massage that slightly. - Donnek

Non-compositional multiword phrases

This section is for phrases that have to be scoped as a whole, rather than broken down to their constituent parts.

ar werth - on sale
hyd yn oed - even (adv)
wrth gwrs - of course (adv)

Superlative adjective + "oll"

y rhai lleiaf oll -> *the some smallest *oll - the smallest ones of all
yn gyntaf oll -> *first *oll - first of all
For Welsh pattern "adj.super + oll"
output English "adj.super + of all"

We need to add "oll" (all) to the dictionary, but this would still be a necessary rule.


"rhai"

y rhai bach -> *the some small - the small ones
rhai mawr -> *some big - big ones

"rhai" (some) can be considered the plural of "un" (one).

For Welsh pattern "rhai + adj"
output English "adj + ones"

This also applies to phrases like "y rhai lleiaf oll" above. We need to convert "oll" first on the basis that is follows an adj, and then we need to convert "rhai" on the basis that it precedes an adj.


Dictionary errors (refs to cy-en.dix)

"hefyd" (152) is correctly listed as "also", but is wrongly coming up as "then".

Should be fixed - Francis Tyers

"da" (5318), is correctly listed as "good", but is coming up as unknown.

"sydd" / "sy"

This is a relative present form of "bod" - "who/which is/are". The elided form "sy" is more common in speech. "sydd" is not listed in the dictionary, but "sy" is.

mae'r dyn yn adeiladu gwesty sy'n darparu llawer o ystafelloedd -> *the man is building hotel is provide many of rooms
mae'r blaid yn gwneud rhywbeth sy'n cyfrannu at ennill yr etholiad -> *the party is doing something is contribute towards win the election - ... which contributes towards winning the election
mae'r dafad yn pori yn y maes sy'n cynnig bwyd da -> *the sheep is grazing in the field is offer good food - ... which offers good food

(Note: there are some frustrating shortcomings in the output. If we use a variant of the last sentence:

mae'r defaid yn pori yn y cae sy'n cynnig bwyd da -> *the sheep #be<vbser><pres><p3><pl> grazing in the closes is offer good food

it appears that the conversion can't handle the plural of "sheep", and tagger insists on choosing an inflected verb ("closes", from "cau") instead of the noun (cae - field) - 1.3.5 really needs to be implemented.)

For Welsh pattern "{sydd yn, sy'n} + verb_infin"
output English "that + verb.pres.3p.sing"
This should also broadly be fixed. Can you check the output below:
mae'r dyn yn adeiladu gwesty sy'n darparu llawer o ystafelloedd → the man is building hotel that provides a lot of rooms
mae'r blaid yn gwneud rhywbeth sy'n cyfrannu at ennill yr etholiad → the party is doing something that contributes towards win the election
mae'r dafad yn pori yn y maes sy'n cynnig bwyd da → the sheep is grazing in the field that good food offers
mae'r defaid yn pori yn y cae sy'n cynnig bwyd da → the sheep are grazing in the closes that good food offers.
- Francis Tyers
Terrific. The only thing is that the last two sentences have subject shift, even though "bwyd da" is an object. Would it be possible to ban subject shift after "sy[dd]"? - Donnek

Placeholder

Placeholder

Placeholder

Placeholder

Placeholder

Placeholder

Placeholder

Placeholder

Placeholder

Regression tests

Main article: Welsh to English/Regression tests

Coolness factor

Roedd y Comisiwn yn ymchwilio i'r honiadau bod yr AS wedi methu datgan £103,000 o roddion.
the Commission Was investigating the allegations that the MP has failed declare £103,000 of gifts.
"He was the Commission crookedly ymchwiliad I ' group claims be he drives ACE has failed declare he gifts." (InterTran)
Dywedodd yr heddlu fod y troseddau honedig wedi digwydd rhwng 2003 a 2007 yn Sir Benfro a Sir Gaerfyrddin.
the police Said that the alleged crimes have happened between 2003 and 2007 in Pembrokeshire and Carmarthenshire.
"He said he drives police force be the transgressions alleged has happened between 2003 I go 2007 crookedly Shire ble I go Shire Gaerfyrddin." (InterTran)
Mae'r heddlu hefyd yn ymchwilio i honiadau ei bod hi'n cael perthynas â dyn llawer hŷn.
the police Are also investigating his allegations be she getting relation with older many man.
"He ' is being group police force also crookedly ymchwiliad I claims you go be she ' heartburn have relation he goes tight much hn & #375." (InterTran)