Difference between revisions of "Welsh to English"

From Apertium
Jump to navigation Jump to search
 
(40 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
{{TOCD}}
 
{{TOCD}}
   
<pre>
 
# Welsh
 
: Literal
 
@ Gloss (English)
 
</pre>
 
   
== Transfer ==
+
==Todo==
   
  +
* <s>Fix multiword verbs in bilingual dictionary -- and add ones non-existent in English dictionary to that dictionary</s>
=== Welsh to English ===
 
  +
* Remove items which are in English dictionary but not Welsh/Bilingual
  +
* <s>Fix verb conjugation in the Welsh analyser</s>
  +
* <s>Add restrictions in the bidix</s>
  +
* Fix numbers
  +
* <s>Add adverbs</s>
  +
* <s>More thorough handling of contractions (i'ch, a'u, ...) &mdash; including preblank</s>
  +
* <s>Add pre-verbal particles (basic functionality)</s>
  +
* Add adjective macro to all chunks
   
  +
==Roadmap==
==== Word order (VSO to SVO) ====
 
<pre>
 
# Genir pawb yn rhydd ac yn gydradd â 'i gilydd mewn urddas a hawliau.
 
: Be born everyone free and equal with each other in dignity and rights.
 
   
  +
===apertium-cy-en 0.1===
@ Everyone is born free and equal with each other in dignity and rights.
 
</pre>
 
==== Noun Noun -> Noun of Noun ====
 
<pre>
 
# Llywodraeth Cynulliad Cymru
 
: Government Assembly Wales ==> Government (of) Assembly (of) Wales
 
   
  +
* 8,000 of the highest frequency words in each dictionary.
@ Welsh Assembly Government
 
  +
* Rules dealing with basic verb tenses (past, present, future)
</pre>
 
  +
* Basic word re-ordering for simple phrases.
   
  +
;Aims and uses
==== Noun Adjective -> Adjective Noun====
 
<pre>
 
# bachgen hapus
 
: boy happy
 
   
  +
* For a non-native speaker to be able to discern the topic of a general news item.
@ happy boy
 
  +
* To be able to identify ''who'' said ''what'' to ''who''.
  +
* To be able to distinguish is a particular item is interesting enough to be translated properly.
  +
* Sentences of up to 5 words should be translated reasonably well from Welsh to English.
   
  +
;Report
# geneth bert
 
: girl pretty
 
   
  +
* Coverage:
@ pretty girl
 
  +
** Wikipedia (753,741 words): 85.5%
</pre>
 
  +
** PNAW (11,684,177 words): 94%
  +
** BBC Newyddion (144,887 words): 91%
   
====Compound prepositions====
+
===apertium-cy-en 0.2===
<pre>
 
<donnek> I've also thought of another wrinkle - compound prepositions
 
<spectie> i will probably need to write a rule
 
<donnek> eg ar ben (on top of)
 
<donnek> lit on head
 
<spectie> we can do a similar thing with those
 
<spectie> for example:
 
<donnek> becomes ar fy mhen (on my head, literally) = on top of me
 
<donnek> ar ei ben, ar ei phen, ar ein pennau
 
<spectie> are there many of them
 
<donnek> maybe we don't need to think about them now, but just to flag them for later
 
<spectie> if there are not many it might be worth making them multiwords
 
<donnek> how do multiwords work
 
<spectie> there are a few ways
 
<spectie> depending on if one of the words inside the multiword inflects or not
 
<donnek> that would be the case here
 
<spectie> for example "take care"
 
<spectie> "i take care of", "you take care of", "he takes care of"
 
<spectie> but "take care" is treated as one verb
 
<donnek> ok
 
</pre>
 
   
  +
* 0.1 performance and coverage for English to Welsh.
====Attributive and predicative adjectives====
 
   
  +
===apertium-cy-en 0.5===
<pre>
 
  +
<spectie> its a problem with attributive/predicative
 
  +
* Properly capitalised sentences.
<donnek> it's say something (which is) nice
 
  +
* Get the number for nouns from the appropriate place. e.g. sometimes from the det, sometimes from the noun.
<spectie> but in english we don't distinguish between the two (at least in terms of morphology)
 
  +
<spectie> yes
 
  +
===apertium-cy-en 1.0===
<spectie> in afrikaans they have a -e for attributive (e.g. feodale stelsel -- feudal system)
 
  +
<spectie> and "the system is feudal" - "die stelsel is feodaal"
 
  +
* Handling of gender and number in adjectives
<spectie> donnek, aye
 
<donnek> in Welsh the second would have yn before the adj
 
<donnek> so we may not need anything to mark attrib/pred
 
   
* Dywedodd rhywbeth neis wrthi = He said something nice to her
 
* Mae'r peth yno yn neis = That thing is nice
 
* Mae'n gar neis = It is a nice car
 
   
<donnek> at first glance, we may just need a rule for rhyw+thing
 
<donnek> rhyw = some
 
<donnek> rhywbeth (something), rhywfaint (somewhat), etc
 
<donnek> rhywle (somewhere)
 
</pre>
 
   
 
[[Category:Discussions]]
 
[[Category:Discussions]]
  +
[[Category:Welsh to English]]

Latest revision as of 13:24, 10 December 2010


Todo[edit]

  • Fix multiword verbs in bilingual dictionary -- and add ones non-existent in English dictionary to that dictionary
  • Remove items which are in English dictionary but not Welsh/Bilingual
  • Fix verb conjugation in the Welsh analyser
  • Add restrictions in the bidix
  • Fix numbers
  • Add adverbs
  • More thorough handling of contractions (i'ch, a'u, ...) — including preblank
  • Add pre-verbal particles (basic functionality)
  • Add adjective macro to all chunks

Roadmap[edit]

apertium-cy-en 0.1[edit]

  • 8,000 of the highest frequency words in each dictionary.
  • Rules dealing with basic verb tenses (past, present, future)
  • Basic word re-ordering for simple phrases.
Aims and uses
  • For a non-native speaker to be able to discern the topic of a general news item.
  • To be able to identify who said what to who.
  • To be able to distinguish is a particular item is interesting enough to be translated properly.
  • Sentences of up to 5 words should be translated reasonably well from Welsh to English.
Report
  • Coverage:
    • Wikipedia (753,741 words): 85.5%
    • PNAW (11,684,177 words): 94%
    • BBC Newyddion (144,887 words): 91%

apertium-cy-en 0.2[edit]

  • 0.1 performance and coverage for English to Welsh.

apertium-cy-en 0.5[edit]

  • Properly capitalised sentences.
  • Get the number for nouns from the appropriate place. e.g. sometimes from the det, sometimes from the noun.

apertium-cy-en 1.0[edit]

  • Handling of gender and number in adjectives