Difference between revisions of "Welsh to English"

From Apertium
Jump to navigation Jump to search
 
(43 intermediate revisions by 3 users not shown)
Line 1: Line 1:
{{TOCD}}
{{TOCD}}


<pre>
# Welsh
: Literal
@ Gloss (English)
</pre>


== Transfer ==
==Todo==


* <s>Fix multiword verbs in bilingual dictionary -- and add ones non-existent in English dictionary to that dictionary</s>
=== Welsh to English ===
* Remove items which are in English dictionary but not Welsh/Bilingual
* <s>Fix verb conjugation in the Welsh analyser</s>
* <s>Add restrictions in the bidix</s>
* Fix numbers
* <s>Add adverbs</s>
* <s>More thorough handling of contractions (i'ch, a'u, ...) &mdash; including preblank</s>
* <s>Add pre-verbal particles (basic functionality)</s>
* Add adjective macro to all chunks


==Roadmap==
==== Word order (VSO to SVO) ====
<pre>
# Genir pawb yn rhydd ac yn gydradd â 'i gilydd mewn urddas a hawliau.
: Be born everyone free and equal with each other in dignity and rights.


===apertium-cy-en 0.1===
@ Everyone is born free and equal with each other in dignity and rights.
</pre>
==== Noun Noun -> Noun of Noun ====
<pre>
# Llywodraeth Cynulliad Cymru
: Government Assembly Wales ==> Government (of) Assembly (of) Wales


* 8,000 of the highest frequency words in each dictionary.
@ Welsh Assembly Government
* Rules dealing with basic verb tenses (past, present, future)
</pre>
* Basic word re-ordering for simple phrases.


;Aims and uses
==== Noun Adjective -> Adjective Noun====
<pre>
# bachgen hapus
: boy happy


* For a non-native speaker to be able to discern the topic of a general news item.
@ happy boy
* To be able to identify ''who'' said ''what'' to ''who''.
* To be able to distinguish is a particular item is interesting enough to be translated properly.
* Sentences of up to 5 words should be translated reasonably well from Welsh to English.


;Report
# geneth bert
: girl pretty


* Coverage:
@ pretty girl
** Wikipedia (753,741 words): 85.5%
</pre>
** PNAW (11,684,177 words): 94%
** BBC Newyddion (144,887 words): 91%


====Compound prepositions====
===apertium-cy-en 0.2===
<pre>
<donnek> I've also thought of another wrinkle - compound prepositions
<spectie> i will probably need to write a rule
<donnek> eg ar ben (on top of)
<donnek> lit on head
<spectie> we can do a similar thing with those
<spectie> for example:
<donnek> becomes ar fy mhen (on my head, literally) = on top of me
<donnek> ar ei ben, ar ei phen, ar ein pennau
<spectie> are there many of them
<donnek> maybe we don't need to think about them now, but just to flag them for later
<spectie> if there are not many it might be worth making them multiwords
<donnek> how do multiwords work
<spectie> there are a few ways
<spectie> depending on if one of the words inside the multiword inflects or not
<donnek> that would be the case here
<spectie> for example "take care"
<spectie> "i take care of", "you take care of", "he takes care of"
<spectie> but "take care" is treated as one verb
<donnek> ok
</pre>


* 0.1 performance and coverage for English to Welsh.
====Attributive and predicative adjectives==


===apertium-cy-en 0.5===
<pre>
<spectie> its a problem with attributive/predicative
<donnek> it's say something (which is) nice
<spectie> but in english we don't distinguish between the two (at least in terms of morphology)
<spectie> yes
<spectie> in afrikaans they have a -e for attributive (e.g. feodale stelsel -- feudal system)
<spectie> and "the system is feudal" - "die stelsel is feodal"
<spectie> donnek, aye
<donnek> in Welsh the second would have yn before the adj
<donnek> so we may not need anything to mark attrib/pred


* Properly capitalised sentences.
* Dywedodd rhywbeth neis wrthi = He said something nice to her
* Get the number for nouns from the appropriate place. e.g. sometimes from the det, sometimes from the noun.
* Mae'r peth yno yn neis = That thing is nice

* Mae'n gar neis = It is a nice car
===apertium-cy-en 1.0===

* Handling of gender and number in adjectives




<donnek> at first glance, we may just need a rule for rhyw+thing
<donnek> rhyw=some
<donnek> rhywbeth (something), rhywfaint (somewhat), etc
<donnek> rhywle (somewhere)
</pre>


[[Category:Discussions]]
[[Category:Discussions]]
[[Category:Welsh to English]]

Latest revision as of 13:24, 10 December 2010


Todo[edit]

  • Fix multiword verbs in bilingual dictionary -- and add ones non-existent in English dictionary to that dictionary
  • Remove items which are in English dictionary but not Welsh/Bilingual
  • Fix verb conjugation in the Welsh analyser
  • Add restrictions in the bidix
  • Fix numbers
  • Add adverbs
  • More thorough handling of contractions (i'ch, a'u, ...) — including preblank
  • Add pre-verbal particles (basic functionality)
  • Add adjective macro to all chunks

Roadmap[edit]

apertium-cy-en 0.1[edit]

  • 8,000 of the highest frequency words in each dictionary.
  • Rules dealing with basic verb tenses (past, present, future)
  • Basic word re-ordering for simple phrases.
Aims and uses
  • For a non-native speaker to be able to discern the topic of a general news item.
  • To be able to identify who said what to who.
  • To be able to distinguish is a particular item is interesting enough to be translated properly.
  • Sentences of up to 5 words should be translated reasonably well from Welsh to English.
Report
  • Coverage:
    • Wikipedia (753,741 words): 85.5%
    • PNAW (11,684,177 words): 94%
    • BBC Newyddion (144,887 words): 91%

apertium-cy-en 0.2[edit]

  • 0.1 performance and coverage for English to Welsh.

apertium-cy-en 0.5[edit]

  • Properly capitalised sentences.
  • Get the number for nouns from the appropriate place. e.g. sometimes from the det, sometimes from the noun.

apertium-cy-en 1.0[edit]

  • Handling of gender and number in adjectives