Difference between revisions of "Welsh to English"

From Apertium
Jump to navigation Jump to search
Line 34: Line 34:
 
** Wikipedia (753,741 words): 85.5%
 
** Wikipedia (753,741 words): 85.5%
 
** PNAW (11,684,177 words): 94%
 
** PNAW (11,684,177 words): 94%
** BBC Newyddion (144,887 words): ~91%
+
** BBC Newyddion (144,887 words): 91%
   
 
===apertium-cy-en 0.2===
 
===apertium-cy-en 0.2===

Revision as of 16:29, 25 July 2008


Todo

  • Fix multiword verbs in bilingual dictionary -- and add ones non-existent in English dictionary to that dictionary
  • Remove items which are in English dictionary but not Welsh/Bilingual
  • Fix verb conjugation in the Welsh analyser
  • Add restrictions in the bidix
  • Fix numbers
  • Add adverbs
  • More thorough handling of contractions (i'ch, a'u, ...) — including preblank
  • Add pre-verbal particles (basic functionality)
  • Add adjective macro to all chunks

Roadmap

apertium-cy-en 0.1

  • 8,000 of the highest frequency words in each dictionary.
  • Rules dealing with basic verb tenses (past, present, future)
  • Basic word re-ordering for simple phrases.
Aims and uses
  • For a non-native speaker to be able to discern the topic of a general news item.
  • To be able to identify who said what to who.
  • To be able to distinguish is a particular item is interesting enough to be translated properly.
  • Sentences of up to 5 words should be translated reasonably well from Welsh to English.
Report
  • Coverage:
    • Wikipedia (753,741 words): 85.5%
    • PNAW (11,684,177 words): 94%
    • BBC Newyddion (144,887 words): 91%

apertium-cy-en 0.2

  • 0.1 performance and coverage for English to Welsh.

apertium-cy-en 0.5

  • Properly capitalised sentences.

apertium-cy-en 1.0

Tagger

Tagger needs to be retrained to take into account new POS, e.g. "relative pronoun", "adverb"

"i" as preposition

Ambiguity: ^i/i<pr>/prpers<prn><subj><p1><mf><sg>$ ^foderneiddio/moderneiddio<vblex><inf>/moderneiddio<vblex><prs><p3><sg>$

Welsh "i" (to) is getting translated as "[f]i" (I, me).

if Welsh "i" occurs immediately after a verb marked as 1p sing
output pronoun 1p sing
otherwise output preposition "to"

"o'n" - disambiguate "he" and "from"

mae fo'n mynd -> he isgoing

Fine (apart from the missing space).

Contrast:

mae o'n mynd -> *is ofgoing - he is going

The elided form "o" is more common here than "fo". Following the 1.3.4 pattern above:

if Welsh "o" occurs immediately after a verb marked as 3p sing
output pronoun 3p sing
otherwise output preposition "of/from"

This is probably better than the earlier version I had here:

For Welsh pattern "verb + o"
output "verb + 3p sing pronoun"

Preferential choice between verbforms

bydd y lamp yn rhoi golau -> *are the lamp giving light - the lamp will be giving light
(and presumably we could massage this into "the lamp will give light" later, since that would be the more natural English equivalent)

A couple of things here. The most important is that tagger chooses the less frequent imperative out of the imperative/future choice for the verb. Presumably this then means that the subject shift can't take place. But even with the imperative choice, the imperative 2p sing info gets lost between interchunk and postchunk, and replaced with a generic? present which gets output as "are". Odd.

(I'm assuming that "bydd" would get output as "will be", since that would be the correct English tense.)

Transfer