Difference between revisions of "Talk:Welsh to English/Archive 1"
| (11 intermediate revisions by the same user not shown) | |||
| Line 131: | Line 131: | ||
| {{comment|::::Done. - [[User:Francis Tyers|Francis Tyers]]}} | {{comment|::::Done. - [[User:Francis Tyers|Francis Tyers]]}} | ||
| ==(1.3.14) Verb + preposition== | |||
| Re "coolness factor" below (woop woop!), we need to cater for verbs such as "ymchwilio" which are followed by a preposition that is different from English, or where there is no preposition in English. | |||
| For example: | |||
| ;<s>ymchwilio i - research into, investigate</s> | |||
| ;<s>siarad am - talk about</s> | |||
| ;<s>dweud wrth - say to, tell</s> | |||
| ;<s>gofyn am - ask for</s> | |||
| Is there any way to get the verb+prep phrase parsed as a phrase, rather than separately?  Perhaps an entry in one of the dictionaries?  This would only need to be done for those phrases where the preposition differs in English and Welsh. | |||
| Not, for instance for: | |||
| ;neidio dros - jump over | |||
| ;cerdded i - walk to | |||
| ;delio gyda - deal with | |||
| where there is a regular correlation between the meanings of the Welsh and English prepositions. | |||
| {{comment| | |||
| :: Yes, these are multiword constructions, like for example "He became accustomed to the taste." → "cynefinodd Fe i y blas." (try it in the testing interface). Is there a way of getting a list of these? (actually there are many I currently need to fix in the bidix/English dict, but if you have a list I can look at them. At the moment we only seem to have multiword verbs on the English side. - [[User:Francis Tyers|Francis Tyers]] | |||
| }} | |||
| {{comment|:::I will try to compile a list of the most common, and send it to you tomorrow. - [[User:Donnek|Donnek]]}} | |||
| ==(1.3.17) Infinitive after "yn lle"== | ==(1.3.17) Infinitive after "yn lle"== | ||
| Line 170: | Line 194: | ||
| :::; roedd y cwningod i gyd yn ddiogel -> *the all rabbits were safe | :::; roedd y cwningod i gyd yn ddiogel -> *the all rabbits were safe | ||
| :::Just need to massage that slightly. - [[User:Donnek|Donnek]]}} | :::Just need to massage that slightly. - [[User:Donnek|Donnek]]}} | ||
| ==(1.3.22) Dictionary errors (refs to cy-en.dix and cy.dix)== | |||
| "hefyd" (152) is correctly listed as "also", but is wrongly coming up as "then". | |||
| {{comment|::Should be fixed - [[User:Francis Tyers|Francis Tyers]]}} | |||
| "da" (5318), is correctly listed as "good", but is coming up as unknown. | |||
| {{comment|::Where is it coming up as unknown? - [[User:Francis Tyers|Francis Tyers]]}} | |||
| {{comment|:::In :"mae'r bachgen yn licio'r eneth sy'n dda".  But that is now coming up OK as well. - [[User:Donnek|Donnek]]}} | |||
| <pre> | |||
|     <pardef n="anghydwel/d__vblex"> | |||
|     <e lm="anghydweld"><i>anghydwel</i><par n="anghydwel/d__vblex"/></e> | |||
|     <e lm="cyfweld"><par n="initial-c"/><i>yfwel</i><par n="anghydwel/d__vblex"/></e> | |||
|     <e lm="gweld"><par n="initial-g"/><i>wel</i><par n="anghydwel/d__vblex"/></e> | |||
|     <e lm="rhagweld"><par n="initial-rh"/><i>agwel</i><par n="anghydwel/d__vblex"/></e> | |||
|     <e lm="ymweld"><i>ymwel</i><par n="anghydwel/d__vblex"/></e> | |||
| </pre> | |||
| This paradigm appears to be broken in that some of the <r> sides are of different lengths (they should all be 'd' if /d and 'eld' if /eld) - [[User:Francis Tyers|Francis Tyers]] | |||
| {{comment|::I'm missing something, sorry.  To me they are all segmented /d. - [[User:Donnek|Donnek]] | |||
| }} | |||
| {{comment| | |||
| :::See for example the entries in the paradigm [http://www.nopaste.com/p/as3si2tMT here]. - [[User:Francis Tyers|Francis Tyers]]}} | |||
| ==(1.3.23) "sydd" / "sy"== | |||
| This is a relative present form of "bod" - "who/which is/are".  The elided form "sy" is more common in speech.  "sydd" is not listed in the dictionary, but "sy" is. | |||
| ; mae'r dyn yn adeiladu gwesty sy'n darparu llawer o ystafelloedd -> *the man is building hotel is provide many of rooms  | |||
| ; mae'r blaid yn gwneud rhywbeth sy'n cyfrannu at ennill yr etholiad -> *the party is doing something is contribute towards win the election - ... which contributes towards winning the election | |||
| ; mae'r dafad yn pori yn y maes sy'n cynnig bwyd da -> *the sheep is grazing in the field is offer good food - ... which offers good food | |||
| (Note: there are some frustrating shortcomings in the output.  If we use a variant of the last sentence: | |||
| ; mae'r defaid yn pori yn y cae sy'n cynnig bwyd da -> *the sheep #be<vbser><pres><p3><pl> grazing in the closes is offer good food | |||
| it appears that the conversion can't handle the plural of "sheep", and tagger insists on choosing an inflected verb ("closes", from "cau") instead of the noun (cae - field) - 1.3.5 really needs to be implemented.) | |||
|  For Welsh pattern "{sydd yn, sy'n} + verb_infin" | |||
|  output English "that + verb.pres.3p.sing" | |||
| {{comment|::This should also broadly be fixed. Can you check the output below: | |||
| :::mae'r dyn yn adeiladu gwesty sy'n darparu llawer o ystafelloedd → the man is building hotel that provides a lot of rooms | |||
| :::mae'r blaid yn gwneud rhywbeth sy'n cyfrannu at ennill yr etholiad → the party is doing something that contributes towards win the election | |||
| :::mae'r dafad yn pori yn y maes sy'n cynnig bwyd da  → the sheep is grazing in the field that good food offers  | |||
| :::mae'r defaid yn pori yn y cae sy'n cynnig bwyd da → the sheep are grazing in the closes that good food offers. | |||
| ::- [[User:Francis Tyers|Francis Tyers]] | |||
| }} | |||
| {{comment| | |||
| :::Terrific.  The only thing is that the last two sentences have subject shift, even though "bwyd da" is an object.  Would it be possible to ban subject shift after "sy[dd]"? Also: | |||
| :::: For Welsh pattern "at + vb.infin" | |||
| :::: output English "towards + verb +-ing" | |||
| :::[[User:Donnek|Donnek]] | |||
| }} | |||
| {{comment|:::::Done. There are a few regressions because of the new tagger, but I'm looking into them. - [[User:Francis Tyers|Francis Tyers]]}} | |||
| ==(1.3.24) "llawer"== | ==(1.3.24) "llawer"== | ||
| Line 284: | Line 371: | ||
| {{comment|::I'm leaving this one for now "... newydd ..", as it is harder to do as we don't mark adjective chunks with their lemma. - [[User:Francis Tyers|Francis Tyers]]}} | {{comment|::I'm leaving this one for now "... newydd ..", as it is harder to do as we don't mark adjective chunks with their lemma. - [[User:Francis Tyers|Francis Tyers]]}} | ||
| ==(1.3.27) Subject pronoun + verb (marked construction?) == | |||
| "Fe dagodd ei wraig ar eu gwely yn eu cartref yn Abercynon ym mis Ebrill y llynedd ar ôl iddi ddweud ei bod yn ei adael am ddyn arall." | |||
| Gives: | |||
| :"He his wife choked on their bed in their home in Abercynon in the April last year after to her say his be in his leave for other man." | |||
| The "normal" (VSO) order ("Dagodd fe ei wraig") would give: | |||
| :he Choked his wife | |||
| Should we re-order {{sc|prnsubj + verbcj → verbcj + prnsubj}} in the initial stage in order to normalise the word order? | |||
| {{comment|::Interesting - "fe dagodd ei wraig" is actually ambiguous in Welsh.  It could mean "his wife choked", "he choked his wife", or at a pinch "it was he who choked his wife" (which would have suprasegmental differences).  The news item is carrying over the subject of the previous sentence (the man) into this one, so the second choice is the correct one.  But in isolation we wouldn't know.   | |||
| ::I would hold on your suggestion, because the sentence actually omits the pr.3p.sing which would disambiguate - the "fe" at the beginning is not the (southern) pr.3p.sing - it is a (mostly southern) preverbal affirmative particle (equivalent to the (mostly northern) "mi"), in the same class as preverbal interrogative particle "a" and  preverbal negative particle "ni".  In fact, we need a rule to delete "fe/mi" before a conjugated verb (not perfect, but better than default), and in this case the sentence would have come out as the first choice ("his wife choked").   | |||
| ::This is wrong in this context, but I can't see what Apertium could do about this without some complex inter-sentence parsing (if it's any consolation, a human reading this sentence in isolation might also make the same mistake until he came to the latter part).  | |||
| ::Incidentally, if the pr.poss were omitted from "fe dagodd ei wraig" it would not be ambiguous - "fe dagodd gwraig - a woman choked", but "fe dagodd wraig - (he) choked a woman". | |||
| ::(Another incidentally - I have asked around, and while people do seem to accept that "his wife choked" is a possible interpretation, they discount it as a likely one.  They would either use a different construction for this ("naeth ei wraig dagu"), or expect some additional information before considering it ("tagodd ei wraig ar afal" - his wife choked on an apple).  We can't replicate this feeling in Apertium, however, because the sequence is ubiquitous with other verbs.  In this case, there may be something inherent in the transitive/intransitive meanings that make people interpret the sequence as one rather than the other.  It's certainly an interesting issue - surely a journal article in there somewhere!) | |||
| ::The "i + infin" needs a separate section.   | |||
| ::The last part of the sentence "... ei bod yn ei adael am ddyn arall - ... that she is leaving him for another man" is in fact a special case of 1.3.26, with an object pronoun on the verb.  We can see this if we take out the pr.obj: | |||
| :::; ei bod hi'n gadael am ddyn arall -> that she is leaving for other man  | |||
| ::Correct apart from "other", and that may be fixed if the revised 1.3.10 can be addressed.  But note: | |||
| :::; ei bod yn gadael am ddyn arall -> *his be leaving for other man | |||
| ::where the omission of the pr.subj means that the 1.3.26 rule is not being applied - we need to allow for this.  In informal Welsh, the usual thing is to omit the pr.poss; in formal Welsh the usual thing is to omit the pr.subj.  That may not be easy to handle. | |||
| ::I'm not going to deal with pr.obj in this construction until I have a few more basic constructions flagged - perhaps at the end of the week. - [[User:Donnek|Donnek]] | |||
| }} | |||
| {{comment| | |||
| ::::Aha, thanks. Regarding the pr.obj construction, no problem. - [[User:Francis Tyers|Francis Tyers]] | |||
| }} | |||
| ==(1.3.32) Subordinate "bod + pr" should not apply to inflected forms== | |||
| ; mae o wedi dod -> *that he has come - he has come | |||
| ; mae o am adael -> *that he will leave - he will leave | |||
| {{comment|::I've restricted the "bod" to infinitive, now it gives: | |||
| :::He is after come | |||
| :::He is for leave | |||
| ::This does not seem to be a regression (I checked the tests), just that we haven't had a rule for <code>verbcj{bod} SN{prnsubj} SP verbinf{SV}</code> yet. - [[User:Francis Tyers|Francis Tyers]] | |||
| }} | |||
| {{comment|:::That's right - this is what I would have expected.  We need to develop rules for periphrastic tenses.  - [[User:Donnek|Donnek]]}} | |||
| The rule in 1.3.26 above is being applied too broadly.  As stated there, it should only apply when we have "bod" (ie the infinitive) in the verb place, eg: | |||
| ; ei fod o wedi dod -> that he has come | |||
| ; ei fod o am adael -> that he will leave | |||
| ==(1.3.33) Elided pr.poss after a vowel== | |||
| ; neidiodd Cwchwlin ar ei draed a'i wyneb yn fwgwd o waed, ac â'i gleddyf torrodd bob un o'r pennau sgrechlyd aflafar oddiar eu hysgwyddau | |||
| ; -> *Jumped *Cwchwlin on his #feet<n><sg> and his face in mask of blood, and with'to *gleddyf broke each one of the heads *sgrechlyd *aflafar *oddi on their *hysgwyddau | |||
| ; Cwchwlin jumped to his feet, his face a mask of blood, and with his sword cut each one of the shrieking, harsh heads from their shoulders. (Cwchwlin p110.  Apologies to those of a nervous disposition!) | |||
| The key point here is "â'i" (we'll use a noun that is actually in the dix) | |||
| ; *â ei car -> *with his car | |||
| ; â ei gar -> with his car - with his car | |||
| ; â ei char -> *with his car - with her car | |||
| ; *â'i car -> With his car | |||
| ; â'i gar -> With his car - with his car | |||
| ; â'i char -> *With his car - with her car | |||
| Elided versions of the pr.poss are not being picked up, and no attention is being paid to the mutation of the following noun even with the non-elided pr.poss. | |||
| I'm not sure whether the elided (sometimes called infixed) pr.poss, which occur after vowels, should be entered separately in the dix, or handled with rules. | |||
| {{comment|At the moment I'm handling them in the dictionary, making ad-hoc additions to the "postblank" section (right at the bottom of the file) as I go along. - [[User:Francis Tyers|Francis Tyers]]}} | |||
| ; 1p.sing - 'm | |||
| fy mrawd a'm chwaer (my brother and sister) - see below | |||
| ; 2p.sing - 'th + SM | |||
| gyda'th dad (with your dad) - see below | |||
| ; 3p.sing.m - 'w + SM after "i", 'i + SM elsewhere | |||
| i'w dŷ (to his house), ei fam a'i dad (his mother and father), gyda'i arian (with his money) | |||
| ; 3p.sing.f - 'w + {AM, h before a vowel} after "i", 'i + {AM, h before a vowel} elsewhere | |||
| i'w thŷ (to her house), ei mam a'i thad (her mother and father), gyda'i harian (with her money) | |||
| ; 1p.pl - 'n + h before a vowel | |||
| o'n gwlad (from our country) | |||
| ; 2p.pl - 'ch | |||
| cario'ch pethau (to carry your things) | |||
| ; 3p.pl -  'w after "i", 'u + h before a vowel elsewhere | |||
| i'w tŷ (to their house), ei fam a'i dad (his mother and father), gyda'u harian (with their money) | |||
| Note that this will deal with "ar eu hysgwyddau -> on their *hysgwyddau" in the sentence above.  This works fine at present if we drop the h-mutation: "ar eu ysgwyddau -> on their shoulders". | |||
| Although these elided forms can be used after verbs etc, they are most likely after prepositions.  All except 1p.sing and 2p.sing are used after any vowel, but these two ('m, 'th) can only be used after: | |||
| * a (and) | |||
| * â (with) | |||
| * gyda (with) | |||
| * efo (with) | |||
| * tua (towards) | |||
| * na (than, nor) | |||
| * i (to) | |||
| * o (from) | |||
| There are also forms that are used when the pronoun is a direct object.  In most cases, these are the same as the above, but in 3p we have 's instead of 'w.  There are also slight mutation changes.  These forms are typical of old-fashioned written Welsh, so perhaps we could ignore them for now. | |||
| As regards rules: | |||
|  For Welsh pattern "vowel_!i + 'i + {SM_noun, SM_infin}" | |||
|  output "his {noun, infin}" | |||
|  For Welsh pattern "vowel_i + 'w + {SM_noun, SM_infin}" | |||
|  output "his {noun, infin}" | |||
|  For Welsh pattern "vowel_!i + 'i + {AMH_noun, AMH_infin}" | |||
|  output "her {noun, infin}" | |||
|  For Welsh pattern "vowel_i + 'w + {AMH_noun, AMH_infin}" | |||
|  output "her {noun, infin}" | |||
|  For Welsh pattern "vowel_!i + 'u + {AMH_noun, AMH_infin}" | |||
|  output "their {noun, infin}" | |||
|  For Welsh pattern "vowel_i + 'w + {AMH_noun, AMH_infin}" | |||
|  output "their {noun, infin}" | |||
|  For Welsh pattern "vowel + 'ch + {noun, infin}" | |||
|  output "your {noun, infin}" | |||
|  For Welsh pattern "vowel + 'n + {noun, infin}" | |||
|  output "our {noun, infin}" | |||
| This will clash with cases where "yn" becomes "'n" - you can implement it, or leave it for now. | |||
|  For Welsh pattern "{a, â, gyda, efo, tua, na, i, o} + 'm + {noun, infin}" | |||
|  output "my {noun, infin}" | |||
|  For Welsh pattern "{a, â, gyda, efo, tua, na, i, o} + 'm + {SM_noun, SM_infin}" | |||
|  output "your {noun, infin}" | |||
| Other issues:  | |||
| "oddi ar" (from on - the opposite of "ar", using the ar paradigm) is not in the dix, although "oddi wrth" (from - the opposite of "at", using the wrth paradigm) is. | |||
| I'm not sure why "ar ei draed" (onto his feet) stumbles (if you'll pardon the pun). | |||
| {{comment|We didn't have "feet" in the English dictionary, only foot sg/pl. I made an entry in the bidix that maps 'traed' → foot(pl), and now it seems to work. - [[User:Francis Tyers|Francis Tyers]]}} | |||
| ==(1.3.35) Preverbal particles - interrogative== | |||
|  For Welsh pattern "a + SM_verb.inflected + subject" | |||
|  output English "auxiliary + subject + verb" | |||
| Again, I assume that Apertium can deal with the technicalities of producing the proper English verbform (he went - did he go?; they see - do they see?; etc). | |||
| Note that "a" is often omitted, especially in spoken Welsh, but the soft mutation remains: | |||
| ; a gyrhaeddodd y llythyr? -> *And the letter arrived? - Did the letter arrive? | |||
| ; gyrhaeddodd y llythyr? -> *The letter arrived? - Did the letter arrive? | |||
| Introducing subordinate clauses, "a" means "if, whether": | |||
|  For Welsh pattern "!noun + a + SM_verb.inflected + subject" | |||
|  output English "!noun + if + subject + verb" | |||
| ; ewch i ofyn a fyddai hi'n dod - *Go I ask and she would be coming - go and ask if she will be coming | |||
| Note that we still have the frustrating issue of the tagger choosing the wrong "i" (see: http://wiki.apertium.org/wiki/Welsh_to_English#.22i.22_as_preposition). | |||
| {{comment|::Ok, after a morning of furious hacking, I've fixed the problem in the constraint grammar parser I was using with Apertium. Now this should be selected properly. You can see the file [http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-cy-en/apertium-cy-en.cy-en.rlx here]... you might find it interesting, not least because it doesn't require messing about with any XML! :) If you fancy writing rules in that formalism (to improve tagger performance), I can include them directly. I've sent you an email with an example CG for faroese that you might find interesting. - [[User:Francis Tyers|Francis Tyers]]}} | |||
| {{comment|:::That's good news.  I'll do a bit of reading on CG, which looks promising, since it is largely bracketless! - [[User:Donnek|Donnek]]}} | |||
| "oni" (with mixed mutation) is used when a positive answer is expected: | |||
|  For Welsh pattern "oni+ MM_verb.inflected + subject" | |||
|  output English "auxiliary + not + subject + verb" | |||
| ; oni phrynodd yr athro'r papur? -> **oni The teacher bought the paper? - didn't the teacher buy the paper? | |||
| "oni" becomes "onid" before a vowel, but not before a vowel "exposed" by the soft mutation of a "g": | |||
| ; onid aeth y bws heibio'r ysgol? -> **onid The bus went #past<pr>the school? - didn't the bus go past the school? | |||
| ; oni welsoch chi? -> **oni Dog saw? - didn't you see? | |||
| LOL!  Subject pronouns really need to be slapped into shape!  That is a very existentialist translation, especially if you add "b" at the beginning. | |||
| {{comment|This seems to be largely done. - [[User:Francis Tyers|Francis Tyers]]}} | |||
| ==(1.3.36) Preverbal particles - affirmative== | |||
|  For Welsh pattern "{fe, mi} + SM_verb.inflected + subject" | |||
|  output English "subject + verb" | |||
| "fe " is mainly southern, and "mi" is mainly northern.  "fe" is the form usually used before impersonals (fe werthwyd y fferm -> *He the farm sold - the farm was sold). | |||
| There is another affirmative particle "y[r]" used before "bod": | |||
| ; yr ydych yn mynd -> *The are going - you are going | |||
| but that is only seen now in older written Welsh.  The elided forms (eg "rydych") are already included in the "bod" paradigm, but to handle older Welsh, where "yr" is not elided, you could also have a rule: | |||
|  For Welsh pattern "y[r] + VBSER.inflected + subject" | |||
|  output English "subject + VBSER" | |||
| Incidentally, we get some oddities for variants of the fragment above, all of which mean "you are going": | |||
| ; yr ydych yn mynd -> *The are going | |||
| There is no attempt to mark the verb as 2p.pl. | |||
| ; yr ydych chi yn mynd -> *The dog are going | |||
| That dog is back again!  "chi" meaning "dog" would never occur in this location, because there is nothing that could give it AM. | |||
| {{comment| | |||
| ::I think it's [http://www.sanfalibero.it/imm/Muttley2.jpg this one]...  | |||
| :::I made a rule: REMOVE N IF (-1 V) (0 N) (0 PrnSubj) (1 Pr); | |||
| ::It fixes that problem, but maybe it will cause others... | |||
| ::- [[User:Francis Tyers|Francis Tyers]]}} | |||
| {{comment|:::LOL - catch the pigeon!  If I interpret that rule correctly, it says "if a cohort contains a noun and a pr.subj, with a verb on the immediate left and a preposition on the immediate right, delete the noun"?  (I see you've added quite a few rules since yesterday.)  I think this should work OK, actually, but what I'm slightly nervous about is that we seem to be ignoring relevant data in the stream itself, namely (a) a 2p.pl verb is likely to be followed by a 2p.pl pr, and (b) an AM noun would only occur in an AM context, of which this is not one.  Capturing (a) especially seems important - I know the constraint grammar (ch 11) did mention weighting choices by probability.  I need to be more familiar with the CG syntax, and the Faroese file takes a lot of concentration!  Talking of which, could you maybe do a comment below each of your Welsh rules as you do them, giving a Welsh example of what the rule acts on?  This would make it easier to review them later. - [[User:Donnek|Donnek]]}} | |||
| {{comment|::::Yep, at the moment I'm just writing them as hacks. Hoping that as I add rules they'll become more clear and I'll be able to unify some. The probability stuff is for accepting prob info from an HMM tagger before (not generating it itself)... And, I'll go through and document the rules later this evening. Also, I can pass you a CG for Norwegian which uses more basic rules (but more). - [[User:Francis Tyers|Francis Tyers]]}} | |||
| ; yr ydych chi'n mynd -> *The you are going | |||
| This is the closest translation. | |||
| ; rydych chi yn mynd -> *Dog rust going | |||
| I think that's the name of a 70s album by Genesis .... | |||
| {{comment|::LOL!! :DD - [[User:Francis Tyers|Francis Tyers]]}}  | |||
| ; rydych chi'n mynd -> *You rust going | |||
| I may do, but that is none of Apertium's business .... | |||
| The above two are because cg-proc thinks "rydych" is a form of "rhydu (to rust)" instead of "bod". | |||
| {{comment|::This isn't cg-proc, its the apertium-tagger. Is there a rule I can try for disambiguating vblex.pres from vblex.prs? E.g. could the construction vblex.prs + prpers + yn + vblex.inf ever occur? - [[User:Francis Tyers|Francis Tyers]]}} | |||
| {{comment|:::Although it could in theory,  in practice it would be extremely unlikely.  The subjunctive is more or less moribund now, except in stock phrases such as "doed a ddelo" (come what may), and in older written (and perhaps spoken) usage.  The most common remnant is, unsurprisingly, for "bod", in the person of "bo" (3p.sing) (although other persons appear occasionally), and even then it could be argued that these are stock phrases, eg "pan fo angen" (as needed).  I think that in the meantime, therefore, your rule would be OK.  I also wonder about going farther, and automatically deleting vblex.prs from a cohort?  This would not do in the longer run, but would cause few problems for 0.1.  Phrases containing "bo" could be added to the multiword phrase section. - [[User:Donnek|Donnek]]}} | |||
| The conclusion seems to be (apart from the "rhydu" problem) that the absence or non-elision of pr.subj can cause problems. | |||
| {{comment|This section now seems to be dealt with. - [[User:Francis Tyers|Francis Tyers]]}} | |||
| ==(1.3.37) infin + object pronoun== | |||
| ; canodd cloch y drws amser cinio ac aeth dy fam i'w ateb a chlywaist ti llais Paul yn dweud nad oeddet ti ddim yn yr ysgol | |||
| ; -> *The bell of the door sang dinner #time<n><sg> and your mother went to its answer and you heard voice *Paul saying *nad you were not in the school | |||
| ; The doorbell rang at dinner-time and your mother went to answer it and you heard Paul's voice saying that you were not in school. | |||
| ("Pan Oeddwn Fachgen", Mihangel Morgan, p71) | |||
| ; yn ei adael am ddyn arall -> *In its leave for an other man - leaving him for another man | |||
| (from 1.3.27 above) | |||
| "i'w ateb" is literally "its answering", "yn ei adael" is literally "in (the state of) his leaving".  The Welsh (indeed, Celtic) construction is to use a possessive pronoun with an infinitive instead of an object pronoun.  This is often the "sandwich" possessive, where the pr.subj follows the infinitive - " i'w ateb fo, yn ei adael fo", but the pr.subj is frequently omitted, as the above examples indicate.  In some cases, speakers omit the possessive and keep the pr.subj, so we get something similar to English - "i ateb fo, yn gadael fo" - but we probably don't need to cater for that at the moment.   | |||
|  For Welsh pattern "pr.poss + verb.infin + [pr.subj]" | |||
|  output English "verb + pr.obj" | |||
| {{comment|::Done, | |||
| :::The bell of the door sang dinner time and your mother went to answer it and you heard the voice of Paul saying *nad you were not in the school | |||
| ::"bell of the door" and "voice of Paul" are a bit weird but intelligible, the loss of the preposition "at" is more problematic for understanding. - [[User:Francis Tyers|Francis Tyers]]}} | |||
|  For Welsh pattern "yn + pr.poss + verb.infin + [pr.subj]" | |||
|  output English "verb + -ing + pr.obj" | |||
| {{comment|::Done, although there is a problem with ei → him/her/it -- is there a way to ascertain the gender using mutation? (like in the â'i gar example). - [[User:Francis Tyers|Francis Tyers]]}} | |||
|  For English pattern "verb.inf + pr.obj" | |||
|  output Welsh "pr.poss + verb.infin + [pr.subj]" | |||
|  For English pattern "verb + -ing + pr.obj" | |||
|  output Welsh "yn + pr.poss + erb.infin + [pr.subj]" | |||
| Possible CG rule to deal with mistagging of "fo" in "i'w ateb fo", etc: | |||
| ; REMOVE V IF (0 V) (0 PrnSubj) (-1 VBlexInf) | |||
| I don't think CG applies to the above, because they involve resequencing instead of tagger choices - let me know if I'm wrong. | |||
| "llais Paul - voice *Paul" works fine if we use a known proper noun: "llais Elin" - the voice of Elin". | |||
| ==(1.3.38) Fragility of "bod + noun" (1.3.15)== | |||
| ; Un bore yn yr asembli dywedodd y prifathro bod ymwelwyr tramor wedi dod i'r ysgol. -> one Morning in the *asembli the headmaster said that foreign visitors have come to the school. | |||
| This sentence is minimally adjusted from MM p43, and translates pretty well (though I find the odd capitalisation placement distracting - it would be better to have none). | |||
| However, it breaks fairly easily.  The original sentence had "yn" instead of "wedi", and for this we get: | |||
| ; *one Morning in the *asembli the headmaster said be foreign visitors coming to the school. | |||
| The 1.3.15 rule needs to be expanded along the same lines as 1.3.26: | |||
|  For Welsh pattern "VBSER.inf + [det.def] + noun + [qualifiers] + yn + verb" | |||
|  output English "that + [det.def] + noun + [qualifiers] + is/are + verb + ing" | |||
| The original sentence also had "ymwelwyr o dramor", and this gives: | |||
| ; *one Morning in the *asembli the headmaster said be visitors he foreign after come to the school. | |||
| Possible CG rules: | |||
| ; LIST P3MSg = (p3 m sg) | |||
| ; SET PrnSubjP3MSg = PrnSubj + P3MSg | |||
| ; REMOVE PrnSubP3MSg IF (0 Pr) (0 PrnSubjP) (-1 N | VblexInf) (1 N) | |||
| I'm not sure how to express "or" in the specs, so I'm using the pipe - please adjust as necessary!  "tramor" needs to be added to the dix as a noun - "overseas" - m. | |||
| {{comment| | |||
| ::I've added this rule, the keyword for 'or' is 'OR'.  | |||
| <pre> | |||
| # ymwelwyr o tramor  | |||
| REMOVE PrnSubj IF (0 Pr)  | |||
|                   (0 PrnSubjP3Sg)  | |||
|                   ((-1 NC) OR (-1 NP) OR (-1 VblexInf)) | |||
|                   ((1 NC) OR (1 NP) OR (1 Adj)); | |||
| </pre> | |||
| ::I also added tramor to the dictionary, and this gives: | |||
| :::one Morning in the *asembli the headmaster said be visitors of overseas coming to the school. | |||
| ::I think the problem here is that "SN SP SN" is not detected as SN, and instead processed as three. I'll look into this. Btw, is there a plural for 'asembli' (I had a look around on the internet but couldn't find it.) - [[User:Francis Tyers|Francis Tyers]] | |||
| }} | |||
| {{comment|::Now we get:  | |||
| :::one Morning in the *asembli the headmaster said that foreign visitors are coming to the school. | |||
| ::- [[User:Francis Tyers|Francis Tyers]] | |||
| }} | |||
| ==(1.3.39) Periphrastic tenses with "bod"== | |||
| Re comments in 1.3.32 above: | |||
|  For Welsh pattern "VBSER.pres + subject + [qualifiers] + yn + verb.infin" | |||
|  output English "subject  + [qualifiers] + is/are + verb + -ing" | |||
| ; mae'r bachgen yn mynd -> The boy is going | |||
| Fine - the equivalent of 1.3.8 above, and it seems to be catered for already. | |||
|  For Welsh pattern "VBSER.pii + subject + [qualifiers] + yn + verb.infin" | |||
|  output English "subject  + [qualifiers] + was/were + verb + -ing" | |||
| ; roedd y bachgen yn mynd -> The boy was going | |||
| Fine again. | |||
|  For Welsh pattern "VBSER.pres + subject + [qualifiers] + wedi + verb.infin" | |||
|  output English "subject  + [qualifiers] + has/have + verb.pp" | |||
| ; mae'r bachgen wedi mynd -> *The boy is after go - the boy has gone | |||
| {{comment|::Done. - [[User:Francis Tyers|Francis Tyers]]}}  | |||
|  For Welsh pattern "VBSER.pii + subject + [qualifiers] + wedi + verb.infin" | |||
|  output English "subject  + [qualifiers] + had + verb.pp" | |||
| ; roedd y bachgen wedi mynd -> *The boy was after go - the boy had gone | |||
| {{comment|::Done. - [[User:Francis Tyers|Francis Tyers]]}}  | |||
|  For Welsh pattern "VBSER.pres + subject + [qualifiers] + am + SM_verb.infin" | |||
|  output English "subject  + [qualifiers] + is going to + verb" | |||
| ; mae'r bachgen am fynd -> *The boy is for go - the boy is going to go | |||
| {{comment|::Done. - [[User:Francis Tyers|Francis Tyers]]}} | |||
|  For Welsh pattern "VBSER.pii + subject + [qualifiers] + am + SM_verb.infin" | |||
|  output English "subject  + [qualifiers] + was going to + verb.pp" | |||
| ; roedd y bachgen am fynd -> *The boy was for go - the boy was going to go | |||
| {{comment|::Done. - [[User:Francis Tyers|Francis Tyers]]}} | |||
| There are other forms, but these will do for the moment. | |||
| Note that the last two of these forms are ubiquitous in Irish English: "he's after buying another one", "he was for going yesterday". | |||
| {{comment|::These now all seem to be done. - [[User:Francis Tyers|Francis Tyers]]}} | |||
Latest revision as of 14:35, 29 July 2008
Section numbers from this version.
(1.3.2) "was"[edit]
"roedd" ([he/she/it] was) is unknown, but I seem to remember adding entries for "to be" to the dixes in the mists of time. Was I dreaming? (roedd <- yr + oedd)
- There are entries for 'bod', but 'roedd' doesn't get processed as all of the 'bod' entries start with 'b' (see this link). I will need to fix this in the analyser. If I understand you correctly, 'roedd' is a contraction of 'yr' (determiner ...) + 'oedd' (verb 'bod', past tense ...)? Francis Tyers
 
- Some serious errors have crept in to those entries. I've sent an amended version to you by email. You're right - roedd -> yr + oedd, but in the amended version I've sent, I've put (e.g.) "roedd" and "oedd" as alternate forms, because "Roedd" is the spoken form, and even in written Welsh you hardly ever see "Yr oedd" nowadays. Donnek
 
- The "bod" paradigm should now be all ok, there remains however to choose the restrictions (e.g. which forms we will generate for each set of tags). - Francis Tyers
 
 
- the boy was in the garden -> *y bachgen bu yn yr ardd - bu'r bachgen yn yr ardd
Almost correct, except for word-order, and the fact that the preterite is being used instead of the imperfect ("roedd y bachgen yn yr ardd"). The preterite needs to be marked as only being used in written Welsh, and to have a lower likelihood than the imperfect. This is too rough a rule, but would do for the time being.
(1.3.4) Preferential choice between noun and verbform[edit]
- atebodd hi'r cwestiwn -> *answered shethe #hold an inquiry - she answered the question
proc selects 'cwestiwn' (question) - correct - and 1p pl imperative of 'cwestio' (an infrequent verb for 'hold an inquiry'). The 1p pl present would also have been a possibility, and indeed a more likely one. tagger selects the second of these.
Not sure how widespread this would be, but the tagger should give precedence to the noun choice whenever the verb form is preceded by 'y':
For Welsh pattern "{y,yr,'r} + word_tagged_as_either_noun_or_verb"
output "{y,yr,'r} + noun"
This is not perfect, because "y | yr" can also be an indirect relative clause pronoun before a verb, but it would catch most things until we can resolve the latter point.
- gwelodd y dyn y llyfr -> *the man saw the books - the man saw the book
This is similar, but is tricksy because it is superficially correct apart from the plural. But in fact, tagger is reading "llyfr" as pres 3p sing of "llyfru" (to book). Apart from being infrequent, and therefore much less likely to appear ("bwcio" would be the usual word), Eurfa has "llyfra" as the pres 3p sing, so there may be a paradigm problem too. The above rule would throw out the verb in the meantime.
- It is currently using the aberth/u__vblex paradigm (see output here). Is this incorrect? - Francis Tyers
 
- The problem is that "aberthu", apart from the 'regular' "abertha" also has a written "aberth". So yes, it probably is incorrect. The problem is that a lot of less common verbs are very rarely inflected. It might have been better to use something like "gwenu" or "siomi". In the meantime, perhaps just changing "aberth" to "abertha" in the pres 3p sing will do. - Donnek
 
 
(1.3.6) Number agreement of verb[edit]
- I added 'rabbits' to the dictionary, but the problem of unknown words and phrase movement is one we're experiencing in Basque too... - Francis Tyers
- OK - so it's basically an issue that you can't do much about until the word is logged. Hmm. I suppose that makes sense, since Apertium can't figure out what to do with something until it knows what it should do with it ... In a practical sense, this is going to be problematic if we demo Apertium using unseen text. Is there any way of doing some blind choosing, eg
 
- if this word is
- preceded by [y,yr,'r]
- we will assume it's a noun
 
- preceded by yn
- we will assume it's a verb
- unless a verb has been identified in the current phrase
- in which case we'll assume it's an adjective
 
 
 
- preceded by [y,yr,'r]
 
- if this word is
 
 
- This might break Apertium - I don't know. In theory, though, we might be able to get relative probabilities for a particular sequences from a corpus. - Donnek
 
I'd be reluctant to add one as we'd not be able to get the translation, on the other hand, it wouldn't cause messing up of word order. It's an open problem, and we're thinking about it :) - Francis Tyers
(1.3.7) Prepositional noun phrase should not be a subject[edit]
- cerddodd fo i'r dref -> he walked in the town
Fine, except that the preposition "i" should really be glossed as "to" ("yn y dref" would be "in the town")
Contrast:
- cerddodd i'r dref -> *the town walked in - [he/she] walked to the town
Welsh pattern "prep + det.def + noun" is never a subject phrase
and therefore the "det.def + noun" section shouldn't be shifted. (I can't think of any exceptions to this, but there may be one.)
- There was a rule to do this, I've commented it out, I think there was a reason for it, but I can't recall now. I've run the regression tests below and it doesn't seem to have broken anything. Regarding the preposition, should I change "i" to be "to" instead of "in" ? - Francis Tyers
 
- Re "i", yes, change it to "to". - Donnek
 
 
- The problem here was the dictionary only had i'r → yn+yr... i've added i'r → i+yr and now it is picking the right one, although I don't know what will happen for other contexts... - Francis Tyers
 
 
 
- Not sure where that would have come from. The only vaguely relevant thing I can think of is "i mewn i" (into). - Donnek
 
 
 
 
- allan i'r cyfarfod -> *the meeting #exit<vblex><pres><p3> in - out to the meeting
This is similar - "in" should be "to", and should be kept with "the meeting".
However, there is another issue here, which is in effect the same as "Preferential choice between noun and verbform" above. In this case, the verb "allanu" (to exit) is being chosen instead of the much more likely "allan" (out).
- roedd o ar dy lyfr -> *was of on your books - it was on your book
1.3.9 would deal with "of", and 1.3.6 would deal with "books". Subject shift would then produce a reasonable translation.
However:
- roedd ar dy lyfr -> *your #be<vbser><past><p3> on books - (it) was on your book
Omitting the subject pronoun can happen quite frequently in speech if the subject has already been mentioned. The <sg> tag gets lost at interchunk, which means the verb can't be conjugated (this came up somewhere else, but I think it's been taken off the page - maybe it would be better just to mark the issue heading as "addressed" rather than delete it). But there is an additional issue, in that the possessive pronoun is getting treated as the subject and moved separately. So maybe we need a broader rule to say that "prep + det.def/pr.poss/whatever + noun" is an indivisible chunk, and must be dealt with as a block. No part of it would be moved in this case anyway.
- Regarding page cleanup, ok. perhaps having a separate section, and then moving sections down would be a good idea. - Francis Tyers
 
It would also be nice in the longer term to fill in the pronoun if it is omitted.
For Welsh pattern "verb + non-subject noun phrase" output English "verb + pronoun agreeing in number and person + non-subject noun phrase"
The NSNP could be a prepositional phrase (marked by an initial preposition), or an object phrase (marked with initial soft mutation).
(1.3.8) "-ing" as "yn + verb"[edit]
For English pattern "subject + verb<vbser> + verb + ing" output for Welsh "verb<vbser> + subject + yn + verb"
(1.3.9) Inflected verbs not being parsed[edit]
- aeth -> *aeth - (he/she/it) went
However, "aeth" is listed in cy.dix.xml (line 27491) as past 3p sing in the mynd_vblex paradigm, which is what "mynd" (to go) gets conjugated against (line 54444).
Ah - a bug in the segmentation.
- *myndaeth fo -> he went
- he went -> *myndaeth fe
The infinitive is getting added to the irregular forms, instead of being replaced by them.
- Yep, this is a problem in the paradigm for 'mynd', I'll need to rewrite it, fortunately it is only used once... New paradigm output here - Francis Tyers
 
- Fine, but the imperative forms also need "mynd" excised. - Donnek
 
 
- Done. - Francis Tyers
 
 
 
(1.3.14) Verb + preposition[edit]
Re "coolness factor" below (woop woop!), we need to cater for verbs such as "ymchwilio" which are followed by a preposition that is different from English, or where there is no preposition in English.
For example:
- ymchwilio i - research into, investigate
- siarad am - talk about
- dweud wrth - say to, tell
- gofyn am - ask for
Is there any way to get the verb+prep phrase parsed as a phrase, rather than separately? Perhaps an entry in one of the dictionaries? This would only need to be done for those phrases where the preposition differs in English and Welsh.
Not, for instance for:
- neidio dros - jump over
- cerdded i - walk to
- delio gyda - deal with
where there is a regular correlation between the meanings of the Welsh and English prepositions.
- Yes, these are multiword constructions, like for example "He became accustomed to the taste." → "cynefinodd Fe i y blas." (try it in the testing interface). Is there a way of getting a list of these? (actually there are many I currently need to fix in the bidix/English dict, but if you have a list I can look at them. At the moment we only seem to have multiword verbs on the English side. - Francis Tyers
 
- I will try to compile a list of the most common, and send it to you tomorrow. - Donnek
 
 
(1.3.17) Infinitive after "yn lle"[edit]
For Welsh pattern "yn lle + verb<vblex><inf>" output English "instead of + verb + ing"
For English pattern "instead of + verb + ing" output Welsh "yn lle + verb<vblex><inf>"
- Would you say "yn lle" is a multi-word preposition? - Francis Tyers
 
- Yes, it is a compound preposition. "yn ei le" - instead of him (lit. in his place), "yn eu lle" - instead of them. But I don't want those included here, because there are places where you might want to translate them "in his place". - Donnek
 
 
- yn lle mynd dros y ffordd -> *instead of go *dros the road - instead of going over the road
Incidentally, why is "dros" coming up as unknown here? I remember sweating over putting it in the dictionary (cy.dix line 46972, cy-en.dix line 49) :-)
- It is on there, but under the paradigm /tros__pr, which means that it will never get detected... is 'dros' a mutation of 'tros' or a separate preposition, the same oddness goes for trwy and drwy. - Francis Tyers
 
- Yes, they occur in both mutated and unmutated forms. I would think the mutated forms are more common. Hmm. I didn't realise the paradigms overwrote the cited form like that. In that case, we either need to do some [t,d] substitution, or (perhaps simpler) replicate the entire "tros" paradigm for "dros", replacing the t with d. Same for "trwy, drwy". - Donnek
 
 
- This seems to be done. - Francis Tyers
 
 
 
(1.3.18) "i gyd"[edit]
This mean "all", and occurs after the noun:
- roedd y cwningod i gyd yn ddiogel -> *the rabbits were I joint safe - all the rabbits were safe
For Welsh pattern "det.def + noun + [qualifiers] + i gyd" output English "all + det.def + [qualifiers] + noun"
- Is 'i gyd' considered an adjective or pronoun? (or something else?) :)- Francis Tyers
 
- An adjective, I suppose. Certainly a qualifier of some sort. - Donnek
 
 
- Making progress.  We now get:
- roedd y cwningod i gyd yn ddiogel -> *the all rabbits were safe
 
- Just need to massage that slightly. - Donnek
 
- Making progress.  We now get:
 
(1.3.22) Dictionary errors (refs to cy-en.dix and cy.dix)[edit]
"hefyd" (152) is correctly listed as "also", but is wrongly coming up as "then".
- Should be fixed - Francis Tyers
 
"da" (5318), is correctly listed as "good", but is coming up as unknown.
- Where is it coming up as unknown? - Francis Tyers
 
- In :"mae'r bachgen yn licio'r eneth sy'n dda". But that is now coming up OK as well. - Donnek
 
 
    <pardef n="anghydwel/d__vblex">
    <e lm="anghydweld"><i>anghydwel</i><par n="anghydwel/d__vblex"/></e>
    <e lm="cyfweld"><par n="initial-c"/><i>yfwel</i><par n="anghydwel/d__vblex"/></e>
    <e lm="gweld"><par n="initial-g"/><i>wel</i><par n="anghydwel/d__vblex"/></e>
    <e lm="rhagweld"><par n="initial-rh"/><i>agwel</i><par n="anghydwel/d__vblex"/></e>
    <e lm="ymweld"><i>ymwel</i><par n="anghydwel/d__vblex"/></e>
This paradigm appears to be broken in that some of the <r> sides are of different lengths (they should all be 'd' if /d and 'eld' if /eld) - Francis Tyers
- I'm missing something, sorry. To me they are all segmented /d. - Donnek
 
- See for example the entries in the paradigm here. - Francis Tyers
 
 
(1.3.23) "sydd" / "sy"[edit]
This is a relative present form of "bod" - "who/which is/are". The elided form "sy" is more common in speech. "sydd" is not listed in the dictionary, but "sy" is.
- mae'r dyn yn adeiladu gwesty sy'n darparu llawer o ystafelloedd -> *the man is building hotel is provide many of rooms
- mae'r blaid yn gwneud rhywbeth sy'n cyfrannu at ennill yr etholiad -> *the party is doing something is contribute towards win the election - ... which contributes towards winning the election
- mae'r dafad yn pori yn y maes sy'n cynnig bwyd da -> *the sheep is grazing in the field is offer good food - ... which offers good food
(Note: there are some frustrating shortcomings in the output. If we use a variant of the last sentence:
- mae'r defaid yn pori yn y cae sy'n cynnig bwyd da -> *the sheep #be<vbser><pres><p3><pl> grazing in the closes is offer good food
it appears that the conversion can't handle the plural of "sheep", and tagger insists on choosing an inflected verb ("closes", from "cau") instead of the noun (cae - field) - 1.3.5 really needs to be implemented.)
For Welsh pattern "{sydd yn, sy'n} + verb_infin"
output English "that + verb.pres.3p.sing"
- This should also broadly be fixed. Can you check the output below:
 
- mae'r dyn yn adeiladu gwesty sy'n darparu llawer o ystafelloedd → the man is building hotel that provides a lot of rooms
- mae'r blaid yn gwneud rhywbeth sy'n cyfrannu at ennill yr etholiad → the party is doing something that contributes towards win the election
- mae'r dafad yn pori yn y maes sy'n cynnig bwyd da → the sheep is grazing in the field that good food offers
- mae'r defaid yn pori yn y cae sy'n cynnig bwyd da → the sheep are grazing in the closes that good food offers.
 
 
- Terrific.  The only thing is that the last two sentences have subject shift, even though "bwyd da" is an object.  Would it be possible to ban subject shift after "sy[dd]"? Also:
- For Welsh pattern "at + vb.infin"
- output English "towards + verb +-ing"
 
- Donnek
 
- Terrific.  The only thing is that the last two sentences have subject shift, even though "bwyd da" is an object.  Would it be possible to ban subject shift after "sy[dd]"? Also:
 
- Done. There are a few regressions because of the new tagger, but I'm looking into them. - Francis Tyers
 
 
 
 
(1.3.24) "llawer"[edit]
"llawer o" (a lot of, many) seems to be OK. But another rule would be useful to deal with the third coolness sentence:
For Welsh pattern "llawer + adj.comp" output English "much + adj.comp"
- dyn llawer hŷn -> *older many man - a much older man (see 1.3.10 for the "a")
- Done. - Francis Tyers
 
(1.3.25) bod (inf) + subject pronoun + cael (inf)[edit]
Should this be mangled into:
be(inf) + subject pronoun + get(inf) that + subject pronoun + had
e.g.
- honiadau ei bod hi'n cael perthynas â
- his allegations be she getting relation with → his allegations that she had relation with
... also on this subject, we currently don't have a verb for "have" in the bidix, the grammar i have suggests that "cael" might be it in a modal sense. - Francis Tyers
(1.3.26) Subordinate ("reported speech") clauses with "bod" + pronoun[edit]
The above (1.3.25) may be worth doing, but it may be better to deal with the more general construction.
In effect, this is the same construction as 1.3.15, but with the noun replaced by a pronoun. However, while in English the pronoun is a subject pronoun, in Welsh it is a possessive pronoun. This means that the "that" word ("bod") gets sandwiched by a the two parts of the possessive pronoun (either of which may not appear, depending on style), and is also mutated accordingly.
- We're currently calling "ei" and friends possessive determiners, should we change this to possessive pronoun, or does it not make much of a difference? - Francis Tyers
 
- Not much difference I think. Possessive determiner has the benefit that they can be considered to specify the noun in the same way as det.def. - Donnek
 
 
Thus, with a noun:
- clywodd y dyn bod y trên yn cyrraedd yn hwyr (the man heard that the train was arriving late)
becomes, with a pronoun:
- clywodd y dyn ei fod o'n cyrraedd yn hwyr (the man heard that it was arriving late)
For Welsh pattern "[pr.poss] + VBSER.inf_mutated + [pr.subj] + yn + verb" output English "that + pr.subj + is + verb + -ing"
- Hmm, this one is problematic as we throw away the "yn" in stage one transfer in the rule that turns "yn + vblex.inf" → "vblex.ger",
 
^det<SD><det><pos><sp>{^his<det><pos><2>$}$ ^verbinf<SV><vbser><inf>{^be<vbser><3>$}$ 
^prnsubj<SN><p3><m><sg>{^prpers<prn><subj><2><3><4>$}$ ^verbinf<SV><vblex><ger>{^arrive<vblex><3>$}$
- Would it cause any problems if we made this rule:
 
For Welsh pattern "[pr.poss] + VBSER.inf_mutated + [pr.subj] + verb.ger" output English "that + pr.subj + is + verb + -ing"
- This would give: "the man heard that he is arriving late" (the "he" is an open issue)
- Done. - Francis Tyers
 
 
- I was actually typing in the same suggestion, but you got there first! I don't think you can do much about the "he" without some sort of semantic check, which is not realistic at this stage. the only thing you could do would be to use "he/it", but that looks clumsy. - Donnek
 
 
For Welsh pattern "[pr.poss] + VBSER.inf_mutated + [pr.subj] + wedi + verb" output English "that + pr.subj + have/s + verb.pp"
- Done. - Francis Tyers
 
For Welsh pattern "[pr.poss] + VBSER.inf_mutated + [pr.subj] + am + SM_verb" output English "that + pr.subj + will + verb"
- Done. - Francis Tyers
 
For Welsh pattern "[pr.poss] + VBSER.inf_mutated + [pr.subj] + ar + SM_verb" output English "that + pr.subj + is about to + verb"
- Done. - Francis Tyers
 
For Welsh pattern "[pr.poss] + VBSER.inf_mutated + [pr.subj] + newydd + SM_verb" output English "that + pr.subj + have/s just + verb.pp"
In the above Welsh patterns, at least one of pr.poss and pr.subj must be present.
- mae hi'n dweud eu bod nhw wedi mynd -> *she is saying their that they have gone - fine apart from the redundant possessive
- Now gives: she is saying that they have gone - Francis Tyers
 
- Excellent. - Donnek
 
 
- mae'n amlwg ei fod o'n dweud y gwir -> *is #obvious<adj><sint> his be hesaying the true - it is obvious that he is telling the truth
- Now gives: is obvious that he is saying the true (perhaps "dweud y gwir" might be a good multiword verb → "tell the truth"?) - Francis Tyers
 
- That would be a good shortcut. "gwir" is both adj (true) and noun.m (truth), but the second is not in Eurfa! - Donnek
 
 
cy-en    mae'n amlwg ei fod o'n dweud y gwir 
         is obvious that he is telling the truth 
- dywedodd y bachgen ei bod hi newydd siarad â nhw -> *the boy said his be she new talk with they - the boy said that she had just talked to them
- I'm leaving this one for now "... newydd ..", as it is harder to do as we don't mark adjective chunks with their lemma. - Francis Tyers
 
(1.3.27) Subject pronoun + verb (marked construction?)[edit]
"Fe dagodd ei wraig ar eu gwely yn eu cartref yn Abercynon ym mis Ebrill y llynedd ar ôl iddi ddweud ei bod yn ei adael am ddyn arall."
Gives:
- "He his wife choked on their bed in their home in Abercynon in the April last year after to her say his be in his leave for other man."
The "normal" (VSO) order ("Dagodd fe ei wraig") would give:
- he Choked his wife
Should we re-order prnsubj + verbcj → verbcj + prnsubj in the initial stage in order to normalise the word order?
- Interesting - "fe dagodd ei wraig" is actually ambiguous in Welsh. It could mean "his wife choked", "he choked his wife", or at a pinch "it was he who choked his wife" (which would have suprasegmental differences). The news item is carrying over the subject of the previous sentence (the man) into this one, so the second choice is the correct one. But in isolation we wouldn't know.
 
- I would hold on your suggestion, because the sentence actually omits the pr.3p.sing which would disambiguate - the "fe" at the beginning is not the (southern) pr.3p.sing - it is a (mostly southern) preverbal affirmative particle (equivalent to the (mostly northern) "mi"), in the same class as preverbal interrogative particle "a" and preverbal negative particle "ni". In fact, we need a rule to delete "fe/mi" before a conjugated verb (not perfect, but better than default), and in this case the sentence would have come out as the first choice ("his wife choked").
 
- This is wrong in this context, but I can't see what Apertium could do about this without some complex inter-sentence parsing (if it's any consolation, a human reading this sentence in isolation might also make the same mistake until he came to the latter part).
 
- Incidentally, if the pr.poss were omitted from "fe dagodd ei wraig" it would not be ambiguous - "fe dagodd gwraig - a woman choked", but "fe dagodd wraig - (he) choked a woman".
 
- (Another incidentally - I have asked around, and while people do seem to accept that "his wife choked" is a possible interpretation, they discount it as a likely one. They would either use a different construction for this ("naeth ei wraig dagu"), or expect some additional information before considering it ("tagodd ei wraig ar afal" - his wife choked on an apple). We can't replicate this feeling in Apertium, however, because the sequence is ubiquitous with other verbs. In this case, there may be something inherent in the transitive/intransitive meanings that make people interpret the sequence as one rather than the other. It's certainly an interesting issue - surely a journal article in there somewhere!)
 
- The "i + infin" needs a separate section.
 
- The last part of the sentence "... ei bod yn ei adael am ddyn arall - ... that she is leaving him for another man" is in fact a special case of 1.3.26, with an object pronoun on the verb.  We can see this if we take out the pr.obj:
- ei bod hi'n gadael am ddyn arall -> that she is leaving for other man
 
 
- Correct apart from "other", and that may be fixed if the revised 1.3.10 can be addressed.  But note:
- ei bod yn gadael am ddyn arall -> *his be leaving for other man
 
 
- where the omission of the pr.subj means that the 1.3.26 rule is not being applied - we need to allow for this. In informal Welsh, the usual thing is to omit the pr.poss; in formal Welsh the usual thing is to omit the pr.subj. That may not be easy to handle.
 
- The last part of the sentence "... ei bod yn ei adael am ddyn arall - ... that she is leaving him for another man" is in fact a special case of 1.3.26, with an object pronoun on the verb.  We can see this if we take out the pr.obj:
- I'm not going to deal with pr.obj in this construction until I have a few more basic constructions flagged - perhaps at the end of the week. - Donnek
 
- Aha, thanks. Regarding the pr.obj construction, no problem. - Francis Tyers
 
 
 
(1.3.32) Subordinate "bod + pr" should not apply to inflected forms[edit]
- mae o wedi dod -> *that he has come - he has come
- mae o am adael -> *that he will leave - he will leave
- I've restricted the "bod" to infinitive, now it gives:
- He is after come
- He is for leave
 
- This does not seem to be a regression (I checked the tests), just that we haven't had a rule for verbcj{bod} SN{prnsubj} SP verbinf{SV}yet. - Francis Tyers
 
- I've restricted the "bod" to infinitive, now it gives:
- That's right - this is what I would have expected. We need to develop rules for periphrastic tenses. - Donnek
 
 
The rule in 1.3.26 above is being applied too broadly. As stated there, it should only apply when we have "bod" (ie the infinitive) in the verb place, eg:
- ei fod o wedi dod -> that he has come
- ei fod o am adael -> that he will leave
(1.3.33) Elided pr.poss after a vowel[edit]
- neidiodd Cwchwlin ar ei draed a'i wyneb yn fwgwd o waed, ac â'i gleddyf torrodd bob un o'r pennau sgrechlyd aflafar oddiar eu hysgwyddau
- -> *Jumped *Cwchwlin on his #feet<n><sg> and his face in mask of blood, and with'to *gleddyf broke each one of the heads *sgrechlyd *aflafar *oddi on their *hysgwyddau
- Cwchwlin jumped to his feet, his face a mask of blood, and with his sword cut each one of the shrieking, harsh heads from their shoulders. (Cwchwlin p110. Apologies to those of a nervous disposition!)
The key point here is "â'i" (we'll use a noun that is actually in the dix)
- *â ei car -> *with his car
- â ei gar -> with his car - with his car
- â ei char -> *with his car - with her car
- *â'i car -> With his car
- â'i gar -> With his car - with his car
- â'i char -> *With his car - with her car
Elided versions of the pr.poss are not being picked up, and no attention is being paid to the mutation of the following noun even with the non-elided pr.poss.
I'm not sure whether the elided (sometimes called infixed) pr.poss, which occur after vowels, should be entered separately in the dix, or handled with rules.
At the moment I'm handling them in the dictionary, making ad-hoc additions to the "postblank" section (right at the bottom of the file) as I go along. - Francis Tyers
- 1p.sing - 'm
fy mrawd a'm chwaer (my brother and sister) - see below
- 2p.sing - 'th + SM
gyda'th dad (with your dad) - see below
- 3p.sing.m - 'w + SM after "i", 'i + SM elsewhere
i'w dŷ (to his house), ei fam a'i dad (his mother and father), gyda'i arian (with his money)
- 3p.sing.f - 'w + {AM, h before a vowel} after "i", 'i + {AM, h before a vowel} elsewhere
i'w thŷ (to her house), ei mam a'i thad (her mother and father), gyda'i harian (with her money)
- 1p.pl - 'n + h before a vowel
o'n gwlad (from our country)
- 2p.pl - 'ch
cario'ch pethau (to carry your things)
- 3p.pl - 'w after "i", 'u + h before a vowel elsewhere
i'w tŷ (to their house), ei fam a'i dad (his mother and father), gyda'u harian (with their money) Note that this will deal with "ar eu hysgwyddau -> on their *hysgwyddau" in the sentence above. This works fine at present if we drop the h-mutation: "ar eu ysgwyddau -> on their shoulders".
Although these elided forms can be used after verbs etc, they are most likely after prepositions. All except 1p.sing and 2p.sing are used after any vowel, but these two ('m, 'th) can only be used after:
- a (and)
- â (with)
- gyda (with)
- efo (with)
- tua (towards)
- na (than, nor)
- i (to)
- o (from)
There are also forms that are used when the pronoun is a direct object. In most cases, these are the same as the above, but in 3p we have 's instead of 'w. There are also slight mutation changes. These forms are typical of old-fashioned written Welsh, so perhaps we could ignore them for now.
As regards rules:
For Welsh pattern "vowel_!i + 'i + {SM_noun, SM_infin}"
output "his {noun, infin}"
For Welsh pattern "vowel_i + 'w + {SM_noun, SM_infin}"
output "his {noun, infin}"
For Welsh pattern "vowel_!i + 'i + {AMH_noun, AMH_infin}"
output "her {noun, infin}"
For Welsh pattern "vowel_i + 'w + {AMH_noun, AMH_infin}"
output "her {noun, infin}"
For Welsh pattern "vowel_!i + 'u + {AMH_noun, AMH_infin}"
output "their {noun, infin}"
For Welsh pattern "vowel_i + 'w + {AMH_noun, AMH_infin}"
output "their {noun, infin}"
For Welsh pattern "vowel + 'ch + {noun, infin}"
output "your {noun, infin}"
For Welsh pattern "vowel + 'n + {noun, infin}"
output "our {noun, infin}"
This will clash with cases where "yn" becomes "'n" - you can implement it, or leave it for now.
For Welsh pattern "{a, â, gyda, efo, tua, na, i, o} + 'm + {noun, infin}"
output "my {noun, infin}"
For Welsh pattern "{a, â, gyda, efo, tua, na, i, o} + 'm + {SM_noun, SM_infin}"
output "your {noun, infin}"
Other issues:
"oddi ar" (from on - the opposite of "ar", using the ar paradigm) is not in the dix, although "oddi wrth" (from - the opposite of "at", using the wrth paradigm) is.
I'm not sure why "ar ei draed" (onto his feet) stumbles (if you'll pardon the pun).
We didn't have "feet" in the English dictionary, only foot sg/pl. I made an entry in the bidix that maps 'traed' → foot(pl), and now it seems to work. - Francis Tyers
(1.3.35) Preverbal particles - interrogative[edit]
For Welsh pattern "a + SM_verb.inflected + subject" output English "auxiliary + subject + verb"
Again, I assume that Apertium can deal with the technicalities of producing the proper English verbform (he went - did he go?; they see - do they see?; etc).
Note that "a" is often omitted, especially in spoken Welsh, but the soft mutation remains:
- a gyrhaeddodd y llythyr? -> *And the letter arrived? - Did the letter arrive?
- gyrhaeddodd y llythyr? -> *The letter arrived? - Did the letter arrive?
Introducing subordinate clauses, "a" means "if, whether":
For Welsh pattern "!noun + a + SM_verb.inflected + subject" output English "!noun + if + subject + verb"
- ewch i ofyn a fyddai hi'n dod - *Go I ask and she would be coming - go and ask if she will be coming
Note that we still have the frustrating issue of the tagger choosing the wrong "i" (see: http://wiki.apertium.org/wiki/Welsh_to_English#.22i.22_as_preposition).
- Ok, after a morning of furious hacking, I've fixed the problem in the constraint grammar parser I was using with Apertium. Now this should be selected properly. You can see the file here... you might find it interesting, not least because it doesn't require messing about with any XML! :) If you fancy writing rules in that formalism (to improve tagger performance), I can include them directly. I've sent you an email with an example CG for faroese that you might find interesting. - Francis Tyers
 
- That's good news. I'll do a bit of reading on CG, which looks promising, since it is largely bracketless! - Donnek
 
 
"oni" (with mixed mutation) is used when a positive answer is expected:
For Welsh pattern "oni+ MM_verb.inflected + subject" output English "auxiliary + not + subject + verb"
- oni phrynodd yr athro'r papur? -> **oni The teacher bought the paper? - didn't the teacher buy the paper?
"oni" becomes "onid" before a vowel, but not before a vowel "exposed" by the soft mutation of a "g":
- onid aeth y bws heibio'r ysgol? -> **onid The bus went #past<pr>the school? - didn't the bus go past the school?
- oni welsoch chi? -> **oni Dog saw? - didn't you see?
LOL! Subject pronouns really need to be slapped into shape! That is a very existentialist translation, especially if you add "b" at the beginning.
This seems to be largely done. - Francis Tyers
(1.3.36) Preverbal particles - affirmative[edit]
For Welsh pattern "{fe, mi} + SM_verb.inflected + subject"
output English "subject + verb"
"fe " is mainly southern, and "mi" is mainly northern. "fe" is the form usually used before impersonals (fe werthwyd y fferm -> *He the farm sold - the farm was sold).
There is another affirmative particle "y[r]" used before "bod":
- yr ydych yn mynd -> *The are going - you are going
but that is only seen now in older written Welsh. The elided forms (eg "rydych") are already included in the "bod" paradigm, but to handle older Welsh, where "yr" is not elided, you could also have a rule:
For Welsh pattern "y[r] + VBSER.inflected + subject" output English "subject + VBSER"
Incidentally, we get some oddities for variants of the fragment above, all of which mean "you are going":
- yr ydych yn mynd -> *The are going
There is no attempt to mark the verb as 2p.pl.
- yr ydych chi yn mynd -> *The dog are going
That dog is back again! "chi" meaning "dog" would never occur in this location, because there is nothing that could give it AM.
- I think it's this one...
- I made a rule: REMOVE N IF (-1 V) (0 N) (0 PrnSubj) (1 Pr);
 
- It fixes that problem, but maybe it will cause others...
- - Francis Tyers
 
- I think it's this one...
- LOL - catch the pigeon! If I interpret that rule correctly, it says "if a cohort contains a noun and a pr.subj, with a verb on the immediate left and a preposition on the immediate right, delete the noun"? (I see you've added quite a few rules since yesterday.) I think this should work OK, actually, but what I'm slightly nervous about is that we seem to be ignoring relevant data in the stream itself, namely (a) a 2p.pl verb is likely to be followed by a 2p.pl pr, and (b) an AM noun would only occur in an AM context, of which this is not one. Capturing (a) especially seems important - I know the constraint grammar (ch 11) did mention weighting choices by probability. I need to be more familiar with the CG syntax, and the Faroese file takes a lot of concentration! Talking of which, could you maybe do a comment below each of your Welsh rules as you do them, giving a Welsh example of what the rule acts on? This would make it easier to review them later. - Donnek
 
 
- Yep, at the moment I'm just writing them as hacks. Hoping that as I add rules they'll become more clear and I'll be able to unify some. The probability stuff is for accepting prob info from an HMM tagger before (not generating it itself)... And, I'll go through and document the rules later this evening. Also, I can pass you a CG for Norwegian which uses more basic rules (but more). - Francis Tyers
 
 
 
- yr ydych chi'n mynd -> *The you are going
This is the closest translation.
- rydych chi yn mynd -> *Dog rust going
I think that's the name of a 70s album by Genesis ....
- LOL!! :DD - Francis Tyers
 
- rydych chi'n mynd -> *You rust going
I may do, but that is none of Apertium's business .... The above two are because cg-proc thinks "rydych" is a form of "rhydu (to rust)" instead of "bod".
- This isn't cg-proc, its the apertium-tagger. Is there a rule I can try for disambiguating vblex.pres from vblex.prs? E.g. could the construction vblex.prs + prpers + yn + vblex.inf ever occur? - Francis Tyers
 
- Although it could in theory, in practice it would be extremely unlikely. The subjunctive is more or less moribund now, except in stock phrases such as "doed a ddelo" (come what may), and in older written (and perhaps spoken) usage. The most common remnant is, unsurprisingly, for "bod", in the person of "bo" (3p.sing) (although other persons appear occasionally), and even then it could be argued that these are stock phrases, eg "pan fo angen" (as needed). I think that in the meantime, therefore, your rule would be OK. I also wonder about going farther, and automatically deleting vblex.prs from a cohort? This would not do in the longer run, but would cause few problems for 0.1. Phrases containing "bo" could be added to the multiword phrase section. - Donnek
 
 
The conclusion seems to be (apart from the "rhydu" problem) that the absence or non-elision of pr.subj can cause problems.
This section now seems to be dealt with. - Francis Tyers
(1.3.37) infin + object pronoun[edit]
- canodd cloch y drws amser cinio ac aeth dy fam i'w ateb a chlywaist ti llais Paul yn dweud nad oeddet ti ddim yn yr ysgol
- -> *The bell of the door sang dinner #time<n><sg> and your mother went to its answer and you heard voice *Paul saying *nad you were not in the school
- The doorbell rang at dinner-time and your mother went to answer it and you heard Paul's voice saying that you were not in school.
("Pan Oeddwn Fachgen", Mihangel Morgan, p71)
- yn ei adael am ddyn arall -> *In its leave for an other man - leaving him for another man
(from 1.3.27 above)
"i'w ateb" is literally "its answering", "yn ei adael" is literally "in (the state of) his leaving". The Welsh (indeed, Celtic) construction is to use a possessive pronoun with an infinitive instead of an object pronoun. This is often the "sandwich" possessive, where the pr.subj follows the infinitive - " i'w ateb fo, yn ei adael fo", but the pr.subj is frequently omitted, as the above examples indicate. In some cases, speakers omit the possessive and keep the pr.subj, so we get something similar to English - "i ateb fo, yn gadael fo" - but we probably don't need to cater for that at the moment.
For Welsh pattern "pr.poss + verb.infin + [pr.subj]" output English "verb + pr.obj"
- Done,
- The bell of the door sang dinner time and your mother went to answer it and you heard the voice of Paul saying *nad you were not in the school
 
- "bell of the door" and "voice of Paul" are a bit weird but intelligible, the loss of the preposition "at" is more problematic for understanding. - Francis Tyers
 
- Done,
For Welsh pattern "yn + pr.poss + verb.infin + [pr.subj]" output English "verb + -ing + pr.obj"
- Done, although there is a problem with ei → him/her/it -- is there a way to ascertain the gender using mutation? (like in the â'i gar example). - Francis Tyers
 
For English pattern "verb.inf + pr.obj" output Welsh "pr.poss + verb.infin + [pr.subj]"
For English pattern "verb + -ing + pr.obj" output Welsh "yn + pr.poss + erb.infin + [pr.subj]"
Possible CG rule to deal with mistagging of "fo" in "i'w ateb fo", etc:
- REMOVE V IF (0 V) (0 PrnSubj) (-1 VBlexInf)
I don't think CG applies to the above, because they involve resequencing instead of tagger choices - let me know if I'm wrong.
"llais Paul - voice *Paul" works fine if we use a known proper noun: "llais Elin" - the voice of Elin".
(1.3.38) Fragility of "bod + noun" (1.3.15)[edit]
- Un bore yn yr asembli dywedodd y prifathro bod ymwelwyr tramor wedi dod i'r ysgol. -> one Morning in the *asembli the headmaster said that foreign visitors have come to the school.
This sentence is minimally adjusted from MM p43, and translates pretty well (though I find the odd capitalisation placement distracting - it would be better to have none).
However, it breaks fairly easily. The original sentence had "yn" instead of "wedi", and for this we get:
- *one Morning in the *asembli the headmaster said be foreign visitors coming to the school.
The 1.3.15 rule needs to be expanded along the same lines as 1.3.26:
For Welsh pattern "VBSER.inf + [det.def] + noun + [qualifiers] + yn + verb" output English "that + [det.def] + noun + [qualifiers] + is/are + verb + ing"
The original sentence also had "ymwelwyr o dramor", and this gives:
- *one Morning in the *asembli the headmaster said be visitors he foreign after come to the school.
Possible CG rules:
- LIST P3MSg = (p3 m sg)
- SET PrnSubjP3MSg = PrnSubj + P3MSg
- REMOVE PrnSubP3MSg IF (0 Pr) (0 PrnSubjP) (-1 N | VblexInf) (1 N)
I'm not sure how to express "or" in the specs, so I'm using the pipe - please adjust as necessary! "tramor" needs to be added to the dix as a noun - "overseas" - m.
- I've added this rule, the keyword for 'or' is 'OR'.
 
# ymwelwyr o tramor 
REMOVE PrnSubj IF (0 Pr) 
                  (0 PrnSubjP3Sg) 
                  ((-1 NC) OR (-1 NP) OR (-1 VblexInf))
                  ((1 NC) OR (1 NP) OR (1 Adj));
- I also added tramor to the dictionary, and this gives:
 
- one Morning in the *asembli the headmaster said be visitors of overseas coming to the school.
 
 
- I think the problem here is that "SN SP SN" is not detected as SN, and instead processed as three. I'll look into this. Btw, is there a plural for 'asembli' (I had a look around on the internet but couldn't find it.) - Francis Tyers
 
- Now we get:
- one Morning in the *asembli the headmaster said that foreign visitors are coming to the school.
 
- - Francis Tyers
 
- Now we get:
(1.3.39) Periphrastic tenses with "bod"[edit]
Re comments in 1.3.32 above:
For Welsh pattern "VBSER.pres + subject + [qualifiers] + yn + verb.infin" output English "subject + [qualifiers] + is/are + verb + -ing"
- mae'r bachgen yn mynd -> The boy is going
Fine - the equivalent of 1.3.8 above, and it seems to be catered for already.
For Welsh pattern "VBSER.pii + subject + [qualifiers] + yn + verb.infin" output English "subject + [qualifiers] + was/were + verb + -ing"
- roedd y bachgen yn mynd -> The boy was going
Fine again.
For Welsh pattern "VBSER.pres + subject + [qualifiers] + wedi + verb.infin" output English "subject + [qualifiers] + has/have + verb.pp"
- mae'r bachgen wedi mynd -> *The boy is after go - the boy has gone
- Done. - Francis Tyers
 
For Welsh pattern "VBSER.pii + subject + [qualifiers] + wedi + verb.infin" output English "subject + [qualifiers] + had + verb.pp"
- roedd y bachgen wedi mynd -> *The boy was after go - the boy had gone
- Done. - Francis Tyers
 
For Welsh pattern "VBSER.pres + subject + [qualifiers] + am + SM_verb.infin" output English "subject + [qualifiers] + is going to + verb"
- mae'r bachgen am fynd -> *The boy is for go - the boy is going to go
- Done. - Francis Tyers
 
For Welsh pattern "VBSER.pii + subject + [qualifiers] + am + SM_verb.infin" output English "subject + [qualifiers] + was going to + verb.pp"
- roedd y bachgen am fynd -> *The boy was for go - the boy was going to go
- Done. - Francis Tyers
 
There are other forms, but these will do for the moment.
Note that the last two of these forms are ubiquitous in Irish English: "he's after buying another one", "he was for going yesterday".
- These now all seem to be done. - Francis Tyers
 

