Difference between revisions of "Welsh to English"
Jump to navigation
Jump to search
(→Tagger) |
|||
(20 intermediate revisions by the same user not shown) | |||
Line 4: | Line 4: | ||
==Todo== |
==Todo== |
||
− | * Fix multiword verbs in bilingual dictionary -- and add ones non-existent in English dictionary to that dictionary |
+ | * <s>Fix multiword verbs in bilingual dictionary -- and add ones non-existent in English dictionary to that dictionary</s> |
* Remove items which are in English dictionary but not Welsh/Bilingual |
* Remove items which are in English dictionary but not Welsh/Bilingual |
||
− | * Fix verb conjugation in the Welsh analyser |
+ | * <s>Fix verb conjugation in the Welsh analyser</s> |
− | * Add restrictions in the bidix |
+ | * <s>Add restrictions in the bidix</s> |
+ | * Fix numbers |
||
+ | * <s>Add adverbs</s> |
||
+ | * <s>More thorough handling of contractions (i'ch, a'u, ...) — including preblank</s> |
||
+ | * <s>Add pre-verbal particles (basic functionality)</s> |
||
+ | * Add adjective macro to all chunks |
||
==Roadmap== |
==Roadmap== |
||
Line 22: | Line 27: | ||
* To be able to identify ''who'' said ''what'' to ''who''. |
* To be able to identify ''who'' said ''what'' to ''who''. |
||
* To be able to distinguish is a particular item is interesting enough to be translated properly. |
* To be able to distinguish is a particular item is interesting enough to be translated properly. |
||
− | * Sentences of up to 5 words should be translated reasonably well |
+ | * Sentences of up to 5 words should be translated reasonably well from Welsh to English. |
+ | ;Report |
||
− | ===apertium-cy-en 0.5=== |
||
+ | * Coverage: |
||
− | ===apertium-cy-en 1.0=== |
||
+ | ** Wikipedia (753,741 words): 85.5% |
||
+ | ** PNAW (11,684,177 words): 94% |
||
+ | ** BBC Newyddion (144,887 words): 91% |
||
+ | ===apertium-cy-en 0.2=== |
||
− | == Tagger == |
||
+ | * 0.1 performance and coverage for English to Welsh. |
||
− | Tagger needs to be retrained to take into account new POS, e.g. "relative pronoun", "adverb" |
||
+ | ===apertium-cy-en 0.5=== |
||
− | ===="i" as preposition==== |
||
− | Ambiguity: <code>^i/i<pr>/prpers<prn><subj><p1><mf><sg>$ ^foderneiddio/moderneiddio<vblex><inf>/moderneiddio<vblex><prs><p3><sg>$</code> |
||
+ | * Properly capitalised sentences. |
||
− | Welsh "i" (to) is getting translated as "[f]i" (I, me). |
||
+ | * Get the number for nouns from the appropriate place. e.g. sometimes from the det, sometimes from the noun. |
||
+ | ===apertium-cy-en 1.0=== |
||
− | if Welsh "i" occurs immediately after a verb marked as 1p sing |
||
− | output pronoun 1p sing |
||
− | otherwise output preposition "to" |
||
− | |||
− | ===="o'n" - disambiguate "he" and "from"==== |
||
− | |||
− | ; mae fo'n mynd -> he isgoing |
||
− | Fine (apart from the missing space). |
||
− | |||
− | Contrast: |
||
− | ; mae o'n mynd -> *is ofgoing - he is going |
||
− | |||
− | The elided form "o" is more common here than "fo". Following the 1.3.4 pattern above: |
||
− | |||
− | if Welsh "o" occurs immediately after a verb marked as 3p sing |
||
− | output pronoun 3p sing |
||
− | otherwise output preposition "of/from" |
||
− | |||
− | This is probably better than the earlier version I had here: |
||
− | |||
− | For Welsh pattern "verb + o" |
||
− | output "verb + 3p sing pronoun" |
||
− | |||
− | == Transfer == |
||
− | |||
− | <pre> |
||
− | # Welsh |
||
− | : Literal |
||
− | @ Gloss (English) |
||
− | </pre> |
||
− | |||
− | === Welsh to English === |
||
− | |||
− | ==== Word order (VSO to SVO) ==== |
||
− | <pre> |
||
− | # Genir pawb yn rhydd ac yn gydradd â 'i gilydd mewn urddas a hawliau. |
||
− | : Be born everyone free and equal with each other in dignity and rights. |
||
− | |||
− | @ Everyone is born free and equal with each other in dignity and rights. |
||
− | </pre> |
||
− | ==== Noun Noun -> Noun of Noun ==== |
||
− | <pre> |
||
− | # Llywodraeth Cynulliad Cymru |
||
− | : Government Assembly Wales ==> Government (of) Assembly (of) Wales |
||
− | |||
− | @ Welsh Assembly Government |
||
− | </pre> |
||
− | |||
− | ==== Noun Adjective -> Adjective Noun==== |
||
− | <pre> |
||
− | # bachgen hapus |
||
− | : boy happy |
||
− | |||
− | @ happy boy |
||
− | |||
− | # geneth bert |
||
− | : girl pretty |
||
− | |||
− | @ pretty girl |
||
− | </pre> |
||
− | |||
− | ====Compound prepositions==== |
||
− | <pre> |
||
− | <donnek> I've also thought of another wrinkle - compound prepositions |
||
− | <spectie> i will probably need to write a rule |
||
− | <donnek> eg ar ben (on top of) |
||
− | <donnek> lit on head |
||
− | <spectie> we can do a similar thing with those |
||
− | <spectie> for example: |
||
− | <donnek> becomes ar fy mhen (on my head, literally) = on top of me |
||
− | <donnek> ar ei ben, ar ei phen, ar ein pennau |
||
− | <spectie> are there many of them |
||
− | <donnek> maybe we don't need to think about them now, but just to flag them for later |
||
− | <spectie> if there are not many it might be worth making them multiwords |
||
− | <donnek> how do multiwords work |
||
− | <spectie> there are a few ways |
||
− | <spectie> depending on if one of the words inside the multiword inflects or not |
||
− | <donnek> that would be the case here |
||
− | <spectie> for example "take care" |
||
− | <spectie> "i take care of", "you take care of", "he takes care of" |
||
− | <spectie> but "take care" is treated as one verb |
||
− | <donnek> ok |
||
− | </pre> |
||
− | |||
− | ====Attributive and predicative adjectives==== |
||
− | |||
− | <pre> |
||
− | <spectie> its a problem with attributive/predicative |
||
− | <donnek> it's say something (which is) nice |
||
− | <spectie> but in english we don't distinguish between the two (at least in terms of morphology) |
||
− | <spectie> yes |
||
− | <spectie> in afrikaans they have a -e for attributive (e.g. feodale stelsel -- feudal system) |
||
− | <spectie> and "the system is feudal" - "die stelsel is feodaal" |
||
− | <spectie> donnek, aye |
||
− | <donnek> in Welsh the second would have yn before the adj |
||
− | <donnek> so we may not need anything to mark attrib/pred |
||
− | </pre> |
||
− | |||
− | * Dywedodd rhywbeth neis wrthi = He said something nice to her |
||
− | * Mae'r peth yno yn neis = That thing is nice |
||
− | : Mae yr peth yno yn neis |
||
− | * Mae'n gar neis = It is a nice car |
||
− | : Mae yn gar neis |
||
− | |||
− | <pre> |
||
− | <donnek> at first glance, we may just need a rule for rhyw+thing |
||
− | <donnek> rhyw = some |
||
− | <donnek> rhywbeth (something), rhywfaint (somewhat), etc |
||
− | <donnek> rhywle (somewhere) |
||
− | </pre> |
||
− | |||
− | ====Possession==== |
||
− | |||
− | <pre> |
||
− | Mae cath 'da Bwflw |
||
− | Bod+p1.sg.pres cath gyda Bwflw |
||
− | Be+p1.sg.pres cat with Beefalo |
||
− | `Beefalo has a cat' |
||
− | </pre> |
||
− | |||
− | ;Apertium notes |
||
− | |||
− | We can probably deal with this in interchunk as follows |
||
− | |||
− | vbbod NP1 pr_gyda NP2 |
||
− | |||
− | -> |
||
− | |||
− | NP2 vbhave NP1 |
||
− | |||
− | ====The 'yn' particle==== |
||
− | |||
− | |||
− | As well as meaning 'in', 'yn' is used to form the present participle of a verb in welsh. For example: |
||
− | |||
− | *dysgu = to learn |
||
− | *yn dysgu = learning |
||
− | |||
− | The present tense is formed by combining 'yn' with the corresponding form of 'bod' (to be) as follows: |
||
− | |||
− | *Mae Beefalo yn gweithio = Beefalo is working/Beefalo works |
||
− | |||
− | Note: when following a vowel, yn is abbreviated to 'n, e.g. |
||
− | |||
− | *Mae Beefalo'n gweithio |
||
− | |||
− | ====Genitive Phrases==== |
||
+ | * Handling of gender and number in adjectives |
||
− | To form the indefinite genitive, a simple construct of <object><subject> can be used. |
||
− | For example, "Soldiers of Wales" would be "milwyr Cymru", literally "soldier Wales" |
||
− | Definite genitives are formed with a similar construction, just with the addition of y between the object and the subject. |
||
− | For example, "Beic y gath" = "The cat's bike" literally "bike the cat" |
||
− | Note: feminine nouns incur a soft mutation after the word "y" |
||
[[Category:Discussions]] |
[[Category:Discussions]] |
||
− | [[Category: |
+ | [[Category:Welsh to English]] |
Latest revision as of 13:24, 10 December 2010
Todo[edit]
Fix multiword verbs in bilingual dictionary -- and add ones non-existent in English dictionary to that dictionary- Remove items which are in English dictionary but not Welsh/Bilingual
Fix verb conjugation in the Welsh analyserAdd restrictions in the bidix- Fix numbers
Add adverbsMore thorough handling of contractions (i'ch, a'u, ...) — including preblankAdd pre-verbal particles (basic functionality)- Add adjective macro to all chunks
Roadmap[edit]
apertium-cy-en 0.1[edit]
- 8,000 of the highest frequency words in each dictionary.
- Rules dealing with basic verb tenses (past, present, future)
- Basic word re-ordering for simple phrases.
- Aims and uses
- For a non-native speaker to be able to discern the topic of a general news item.
- To be able to identify who said what to who.
- To be able to distinguish is a particular item is interesting enough to be translated properly.
- Sentences of up to 5 words should be translated reasonably well from Welsh to English.
- Report
- Coverage:
- Wikipedia (753,741 words): 85.5%
- PNAW (11,684,177 words): 94%
- BBC Newyddion (144,887 words): 91%
apertium-cy-en 0.2[edit]
- 0.1 performance and coverage for English to Welsh.
apertium-cy-en 0.5[edit]
- Properly capitalised sentences.
- Get the number for nouns from the appropriate place. e.g. sometimes from the det, sometimes from the noun.
apertium-cy-en 1.0[edit]
- Handling of gender and number in adjectives