Difference between revisions of "Welsh to English"
Jump to navigation
Jump to search
(32 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
{{TOCD}} |
{{TOCD}} |
||
+ | |||
+ | |||
+ | ==Todo== |
||
+ | |||
+ | * <s>Fix multiword verbs in bilingual dictionary -- and add ones non-existent in English dictionary to that dictionary</s> |
||
+ | * Remove items which are in English dictionary but not Welsh/Bilingual |
||
+ | * <s>Fix verb conjugation in the Welsh analyser</s> |
||
+ | * <s>Add restrictions in the bidix</s> |
||
+ | * Fix numbers |
||
+ | * <s>Add adverbs</s> |
||
+ | * <s>More thorough handling of contractions (i'ch, a'u, ...) — including preblank</s> |
||
+ | * <s>Add pre-verbal particles (basic functionality)</s> |
||
+ | * Add adjective macro to all chunks |
||
==Roadmap== |
==Roadmap== |
||
Line 14: | Line 27: | ||
* To be able to identify ''who'' said ''what'' to ''who''. |
* To be able to identify ''who'' said ''what'' to ''who''. |
||
* To be able to distinguish is a particular item is interesting enough to be translated properly. |
* To be able to distinguish is a particular item is interesting enough to be translated properly. |
||
− | * Sentences of up to 5 words should be translated reasonably well |
+ | * Sentences of up to 5 words should be translated reasonably well from Welsh to English. |
− | |||
− | == Transfer == |
||
− | |||
− | <pre> |
||
− | # Welsh |
||
− | : Literal |
||
− | @ Gloss (English) |
||
− | </pre> |
||
− | |||
− | === Welsh to English === |
||
− | |||
− | ==== Word order (VSO to SVO) ==== |
||
− | <pre> |
||
− | # Genir pawb yn rhydd ac yn gydradd â 'i gilydd mewn urddas a hawliau. |
||
− | : Be born everyone free and equal with each other in dignity and rights. |
||
− | |||
− | @ Everyone is born free and equal with each other in dignity and rights. |
||
− | </pre> |
||
− | ==== Noun Noun -> Noun of Noun ==== |
||
− | <pre> |
||
− | # Llywodraeth Cynulliad Cymru |
||
− | : Government Assembly Wales ==> Government (of) Assembly (of) Wales |
||
− | |||
− | @ Welsh Assembly Government |
||
− | </pre> |
||
− | |||
− | ==== Noun Adjective -> Adjective Noun==== |
||
− | <pre> |
||
− | # bachgen hapus |
||
− | : boy happy |
||
− | |||
− | @ happy boy |
||
− | |||
− | # geneth bert |
||
− | : girl pretty |
||
− | |||
− | @ pretty girl |
||
− | </pre> |
||
− | |||
− | ====Compound prepositions==== |
||
− | <pre> |
||
− | <donnek> I've also thought of another wrinkle - compound prepositions |
||
− | <spectie> i will probably need to write a rule |
||
− | <donnek> eg ar ben (on top of) |
||
− | <donnek> lit on head |
||
− | <spectie> we can do a similar thing with those |
||
− | <spectie> for example: |
||
− | <donnek> becomes ar fy mhen (on my head, literally) = on top of me |
||
− | <donnek> ar ei ben, ar ei phen, ar ein pennau |
||
− | <spectie> are there many of them |
||
− | <donnek> maybe we don't need to think about them now, but just to flag them for later |
||
− | <spectie> if there are not many it might be worth making them multiwords |
||
− | <donnek> how do multiwords work |
||
− | <spectie> there are a few ways |
||
− | <spectie> depending on if one of the words inside the multiword inflects or not |
||
− | <donnek> that would be the case here |
||
− | <spectie> for example "take care" |
||
− | <spectie> "i take care of", "you take care of", "he takes care of" |
||
− | <spectie> but "take care" is treated as one verb |
||
− | <donnek> ok |
||
− | </pre> |
||
− | |||
− | ====Attributive and predicative adjectives==== |
||
− | |||
− | <pre> |
||
− | <spectie> its a problem with attributive/predicative |
||
− | <donnek> it's say something (which is) nice |
||
− | <spectie> but in english we don't distinguish between the two (at least in terms of morphology) |
||
− | <spectie> yes |
||
− | <spectie> in afrikaans they have a -e for attributive (e.g. feodale stelsel -- feudal system) |
||
− | <spectie> and "the system is feudal" - "die stelsel is feodaal" |
||
− | <spectie> donnek, aye |
||
− | <donnek> in Welsh the second would have yn before the adj |
||
− | <donnek> so we may not need anything to mark attrib/pred |
||
− | |||
− | * Dywedodd rhywbeth neis wrthi = He said something nice to her |
||
− | * Mae'r peth yno yn neis = That thing is nice |
||
− | * Mae'n gar neis = It is a nice car |
||
− | |||
− | <donnek> at first glance, we may just need a rule for rhyw+thing |
||
− | <donnek> rhyw = some |
||
− | <donnek> rhywbeth (something), rhywfaint (somewhat), etc |
||
− | <donnek> rhywle (somewhere) |
||
− | </pre> |
||
− | |||
− | ====Possession==== |
||
− | |||
− | <pre> |
||
− | Mae cath 'da Bwflw |
||
− | Bod+p1.sg.pres cath gyda Bwflw |
||
− | Be+p1.sg.pres cat with Beefalo |
||
− | `Beefalo has a cat' |
||
− | </pre> |
||
+ | ;Report |
||
− | ;Apertium notes |
||
+ | * Coverage: |
||
− | We can probably deal with this in interchunk as follows |
||
+ | ** Wikipedia (753,741 words): 85.5% |
||
+ | ** PNAW (11,684,177 words): 94% |
||
+ | ** BBC Newyddion (144,887 words): 91% |
||
+ | ===apertium-cy-en 0.2=== |
||
− | vbbod NP1 pr_gyda NP2 |
||
+ | * 0.1 performance and coverage for English to Welsh. |
||
− | -> |
||
+ | ===apertium-cy-en 0.5=== |
||
− | NP2 vbhave NP1 |
||
+ | * Properly capitalised sentences. |
||
− | ====The 'yn' particle==== |
||
+ | * Get the number for nouns from the appropriate place. e.g. sometimes from the det, sometimes from the noun. |
||
+ | ===apertium-cy-en 1.0=== |
||
− | <pre> |
||
− | As well as meaning 'in', 'yn' is used to form the present participle of a verb in welsh. For example: |
||
+ | * Handling of gender and number in adjectives |
||
− | dysgu = to learn |
||
− | yn dysgu = learning |
||
− | The present tense is formed by combining 'yn' with the corresponding form of 'bod' (to be) as follows: |
||
− | Mae Beefalo yn gweithio = Beefalo is working/Beefalo works |
||
− | note: when following a vowel, yn is abbreviated to 'n, e.g. |
||
− | Mae Beefalo'n gweithio |
||
− | </pre> |
||
[[Category:Discussions]] |
[[Category:Discussions]] |
||
+ | [[Category:Welsh to English]] |
Latest revision as of 13:24, 10 December 2010
Todo[edit]
Fix multiword verbs in bilingual dictionary -- and add ones non-existent in English dictionary to that dictionary- Remove items which are in English dictionary but not Welsh/Bilingual
Fix verb conjugation in the Welsh analyserAdd restrictions in the bidix- Fix numbers
Add adverbsMore thorough handling of contractions (i'ch, a'u, ...) — including preblankAdd pre-verbal particles (basic functionality)- Add adjective macro to all chunks
Roadmap[edit]
apertium-cy-en 0.1[edit]
- 8,000 of the highest frequency words in each dictionary.
- Rules dealing with basic verb tenses (past, present, future)
- Basic word re-ordering for simple phrases.
- Aims and uses
- For a non-native speaker to be able to discern the topic of a general news item.
- To be able to identify who said what to who.
- To be able to distinguish is a particular item is interesting enough to be translated properly.
- Sentences of up to 5 words should be translated reasonably well from Welsh to English.
- Report
- Coverage:
- Wikipedia (753,741 words): 85.5%
- PNAW (11,684,177 words): 94%
- BBC Newyddion (144,887 words): 91%
apertium-cy-en 0.2[edit]
- 0.1 performance and coverage for English to Welsh.
apertium-cy-en 0.5[edit]
- Properly capitalised sentences.
- Get the number for nouns from the appropriate place. e.g. sometimes from the det, sometimes from the noun.
apertium-cy-en 1.0[edit]
- Handling of gender and number in adjectives