Difference between revisions of "Welsh to English"
Jump to navigation
Jump to search
(27 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
{{TOCD}} |
{{TOCD}} |
||
==Todo== |
|||
* <s>Fix multiword verbs in bilingual dictionary -- and add ones non-existent in English dictionary to that dictionary</s> |
|||
* Remove items which are in English dictionary but not Welsh/Bilingual |
|||
* <s>Fix verb conjugation in the Welsh analyser</s> |
|||
* <s>Add restrictions in the bidix</s> |
|||
* Fix numbers |
|||
* <s>Add adverbs</s> |
|||
* <s>More thorough handling of contractions (i'ch, a'u, ...) — including preblank</s> |
|||
* <s>Add pre-verbal particles (basic functionality)</s> |
|||
* Add adjective macro to all chunks |
|||
==Roadmap== |
==Roadmap== |
||
Line 14: | Line 27: | ||
* To be able to identify ''who'' said ''what'' to ''who''. |
* To be able to identify ''who'' said ''what'' to ''who''. |
||
* To be able to distinguish is a particular item is interesting enough to be translated properly. |
* To be able to distinguish is a particular item is interesting enough to be translated properly. |
||
* Sentences of up to 5 words should be translated reasonably well |
* Sentences of up to 5 words should be translated reasonably well from Welsh to English. |
||
;Report |
|||
===apertium-cy-en 0.5=== |
|||
* Coverage: |
|||
===apertium-cy-en 1.0=== |
|||
** Wikipedia (753,741 words): 85.5% |
|||
** PNAW (11,684,177 words): 94% |
|||
** BBC Newyddion (144,887 words): 91% |
|||
===apertium-cy-en 0.2=== |
|||
== Transfer == |
|||
* 0.1 performance and coverage for English to Welsh. |
|||
<pre> |
|||
# Welsh |
|||
: Literal |
|||
@ Gloss (English) |
|||
</pre> |
|||
===apertium-cy-en 0.5=== |
|||
=== Welsh to English === |
|||
* Properly capitalised sentences. |
|||
==== Word order (VSO to SVO) ==== |
|||
* Get the number for nouns from the appropriate place. e.g. sometimes from the det, sometimes from the noun. |
|||
<pre> |
|||
# Genir pawb yn rhydd ac yn gydradd â 'i gilydd mewn urddas a hawliau. |
|||
: Be born everyone free and equal with each other in dignity and rights. |
|||
===apertium-cy-en 1.0=== |
|||
@ Everyone is born free and equal with each other in dignity and rights. |
|||
</pre> |
|||
==== Noun Noun -> Noun of Noun ==== |
|||
<pre> |
|||
# Llywodraeth Cynulliad Cymru |
|||
: Government Assembly Wales ==> Government (of) Assembly (of) Wales |
|||
@ Welsh Assembly Government |
|||
</pre> |
|||
==== Noun Adjective -> Adjective Noun==== |
|||
<pre> |
|||
# bachgen hapus |
|||
: boy happy |
|||
@ happy boy |
|||
# geneth bert |
|||
: girl pretty |
|||
@ pretty girl |
|||
</pre> |
|||
====Compound prepositions==== |
|||
<pre> |
|||
<donnek> I've also thought of another wrinkle - compound prepositions |
|||
<spectie> i will probably need to write a rule |
|||
<donnek> eg ar ben (on top of) |
|||
<donnek> lit on head |
|||
<spectie> we can do a similar thing with those |
|||
<spectie> for example: |
|||
<donnek> becomes ar fy mhen (on my head, literally) = on top of me |
|||
<donnek> ar ei ben, ar ei phen, ar ein pennau |
|||
<spectie> are there many of them |
|||
<donnek> maybe we don't need to think about them now, but just to flag them for later |
|||
<spectie> if there are not many it might be worth making them multiwords |
|||
<donnek> how do multiwords work |
|||
<spectie> there are a few ways |
|||
<spectie> depending on if one of the words inside the multiword inflects or not |
|||
<donnek> that would be the case here |
|||
<spectie> for example "take care" |
|||
<spectie> "i take care of", "you take care of", "he takes care of" |
|||
<spectie> but "take care" is treated as one verb |
|||
<donnek> ok |
|||
</pre> |
|||
====Attributive and predicative adjectives==== |
|||
<pre> |
|||
<spectie> its a problem with attributive/predicative |
|||
<donnek> it's say something (which is) nice |
|||
<spectie> but in english we don't distinguish between the two (at least in terms of morphology) |
|||
<spectie> yes |
|||
<spectie> in afrikaans they have a -e for attributive (e.g. feodale stelsel -- feudal system) |
|||
<spectie> and "the system is feudal" - "die stelsel is feodaal" |
|||
<spectie> donnek, aye |
|||
<donnek> in Welsh the second would have yn before the adj |
|||
<donnek> so we may not need anything to mark attrib/pred |
|||
</pre> |
|||
* Dywedodd rhywbeth neis wrthi = He said something nice to her |
|||
* Mae'r peth yno yn neis = That thing is nice |
|||
: Mae yr peth yno yn neis |
|||
* Mae'n gar neis = It is a nice car |
|||
: Mae yn gar neis |
|||
<pre> |
|||
<donnek> at first glance, we may just need a rule for rhyw+thing |
|||
<donnek> rhyw = some |
|||
<donnek> rhywbeth (something), rhywfaint (somewhat), etc |
|||
<donnek> rhywle (somewhere) |
|||
</pre> |
|||
====Possession==== |
|||
<pre> |
|||
Mae cath 'da Bwflw |
|||
Bod+p1.sg.pres cath gyda Bwflw |
|||
Be+p1.sg.pres cat with Beefalo |
|||
`Beefalo has a cat' |
|||
</pre> |
|||
;Apertium notes |
|||
We can probably deal with this in interchunk as follows |
|||
vbbod NP1 pr_gyda NP2 |
|||
-> |
|||
NP2 vbhave NP1 |
|||
====The 'yn' particle==== |
|||
As well as meaning 'in', 'yn' is used to form the present participle of a verb in welsh. For example: |
|||
*dysgu = to learn |
|||
*yn dysgu = learning |
|||
The present tense is formed by combining 'yn' with the corresponding form of 'bod' (to be) as follows: |
|||
*Mae Beefalo yn gweithio = Beefalo is working/Beefalo works |
|||
Note: when following a vowel, yn is abbreviated to 'n, e.g. |
|||
*Mae Beefalo'n gweithio |
|||
====Genitive Phrases==== |
|||
* Handling of gender and number in adjectives |
|||
To form the indefinite genitive, a simple construct of <object><subject> can be used. |
|||
For example, "Soldiers of Wales" would be "milwyr Cymru", literally "soldier Wales" |
|||
Definite genitives are formed with a similar construction, just with the addition of y between the object and the subject. |
|||
For example, "Beic y gath" = "The cat's bike" literally "bike the cat" |
|||
Note: feminine nouns incur a soft mutation after the word "y" |
|||
[[Category:Discussions]] |
[[Category:Discussions]] |
||
[[Category:Welsh to English]] |
Latest revision as of 13:24, 10 December 2010
Todo[edit]
Fix multiword verbs in bilingual dictionary -- and add ones non-existent in English dictionary to that dictionary- Remove items which are in English dictionary but not Welsh/Bilingual
Fix verb conjugation in the Welsh analyserAdd restrictions in the bidix- Fix numbers
Add adverbsMore thorough handling of contractions (i'ch, a'u, ...) — including preblankAdd pre-verbal particles (basic functionality)- Add adjective macro to all chunks
Roadmap[edit]
apertium-cy-en 0.1[edit]
- 8,000 of the highest frequency words in each dictionary.
- Rules dealing with basic verb tenses (past, present, future)
- Basic word re-ordering for simple phrases.
- Aims and uses
- For a non-native speaker to be able to discern the topic of a general news item.
- To be able to identify who said what to who.
- To be able to distinguish is a particular item is interesting enough to be translated properly.
- Sentences of up to 5 words should be translated reasonably well from Welsh to English.
- Report
- Coverage:
- Wikipedia (753,741 words): 85.5%
- PNAW (11,684,177 words): 94%
- BBC Newyddion (144,887 words): 91%
apertium-cy-en 0.2[edit]
- 0.1 performance and coverage for English to Welsh.
apertium-cy-en 0.5[edit]
- Properly capitalised sentences.
- Get the number for nouns from the appropriate place. e.g. sometimes from the det, sometimes from the noun.
apertium-cy-en 1.0[edit]
- Handling of gender and number in adjectives