Difference between revisions of "Welsh to English"

From Apertium
Jump to navigation Jump to search
 
(22 intermediate revisions by the same user not shown)
Line 4: Line 4:
==Todo==
==Todo==


* Fix multiword verbs in bilingual dictionary -- and add ones non-existent in English dictionary to that dictionary
* <s>Fix multiword verbs in bilingual dictionary -- and add ones non-existent in English dictionary to that dictionary</s>
* Remove items which are in English dictionary but not Welsh/Bilingual
* Remove items which are in English dictionary but not Welsh/Bilingual
* Fix verb conjugation in the Welsh analyser
* <s>Fix verb conjugation in the Welsh analyser</s>
* Add restrictions in the bidix
* <s>Add restrictions in the bidix</s>
* Fix numbers
* <s>Add adverbs</s>
* <s>More thorough handling of contractions (i'ch, a'u, ...) &mdash; including preblank</s>
* <s>Add pre-verbal particles (basic functionality)</s>
* Add adjective macro to all chunks


==Roadmap==
==Roadmap==
Line 22: Line 27:
* To be able to identify ''who'' said ''what'' to ''who''.
* To be able to identify ''who'' said ''what'' to ''who''.
* To be able to distinguish is a particular item is interesting enough to be translated properly.
* To be able to distinguish is a particular item is interesting enough to be translated properly.
* Sentences of up to 5 words should be translated reasonably well in both directions.
* Sentences of up to 5 words should be translated reasonably well from Welsh to English.


;Report
===apertium-cy-en 0.5===


* Coverage:
===apertium-cy-en 1.0===
** Wikipedia (753,741 words): 85.5%
** PNAW (11,684,177 words): 94%
** BBC Newyddion (144,887 words): 91%


===apertium-cy-en 0.2===
== Tagger ==


* 0.1 performance and coverage for English to Welsh.
===="i" as preposition====
::Ambiguity: <code>^i/i<pr>/prpers<prn><subj><p1><mf><sg>$ ^foderneiddio/moderneiddio<vblex><inf>/moderneiddio<vblex><prs><p3><sg>$</code>


===apertium-cy-en 0.5===


* Properly capitalised sentences.
Welsh "i" (to) is getting translated as "[f]i" (I, me).
* Get the number for nouns from the appropriate place. e.g. sometimes from the det, sometimes from the noun.


===apertium-cy-en 1.0===
if Welsh "i" occurs immediately after a verb marked as 1p sing
output pronoun 1p sing
otherwise output preposition "to"

===="o'n" - disambiguate "he" and "from"====

; mae fo'n mynd -> he isgoing
Fine (apart from the missing space).

Contrast:
; mae o'n mynd -> *is ofgoing - he is going

The elided form "o" is more common here than "fo". Following the 1.3.4 pattern above:

if Welsh "o" occurs immediately after a verb marked as 3p sing
output pronoun 3p sing
otherwise output preposition "of/from"

This is probably better than the earlier version I had here:

For Welsh pattern "verb + o"
output "verb + 3p sing pronoun"

== Transfer ==

<pre>
# Welsh
: Literal
@ Gloss (English)
</pre>

=== Welsh to English ===

==== Word order (VSO to SVO) ====
<pre>
# Genir pawb yn rhydd ac yn gydradd â 'i gilydd mewn urddas a hawliau.
: Be born everyone free and equal with each other in dignity and rights.

@ Everyone is born free and equal with each other in dignity and rights.
</pre>
==== Noun Noun -> Noun of Noun ====
<pre>
# Llywodraeth Cynulliad Cymru
: Government Assembly Wales ==> Government (of) Assembly (of) Wales

@ Welsh Assembly Government
</pre>

==== Noun Adjective -> Adjective Noun====
<pre>
# bachgen hapus
: boy happy

@ happy boy

# geneth bert
: girl pretty

@ pretty girl
</pre>

====Compound prepositions====
<pre>
<donnek> I've also thought of another wrinkle - compound prepositions
<spectie> i will probably need to write a rule
<donnek> eg ar ben (on top of)
<donnek> lit on head
<spectie> we can do a similar thing with those
<spectie> for example:
<donnek> becomes ar fy mhen (on my head, literally) = on top of me
<donnek> ar ei ben, ar ei phen, ar ein pennau
<spectie> are there many of them
<donnek> maybe we don't need to think about them now, but just to flag them for later
<spectie> if there are not many it might be worth making them multiwords
<donnek> how do multiwords work
<spectie> there are a few ways
<spectie> depending on if one of the words inside the multiword inflects or not
<donnek> that would be the case here
<spectie> for example "take care"
<spectie> "i take care of", "you take care of", "he takes care of"
<spectie> but "take care" is treated as one verb
<donnek> ok
</pre>

====Attributive and predicative adjectives====

<pre>
<spectie> its a problem with attributive/predicative
<donnek> it's say something (which is) nice
<spectie> but in english we don't distinguish between the two (at least in terms of morphology)
<spectie> yes
<spectie> in afrikaans they have a -e for attributive (e.g. feodale stelsel -- feudal system)
<spectie> and "the system is feudal" - "die stelsel is feodaal"
<spectie> donnek, aye
<donnek> in Welsh the second would have yn before the adj
<donnek> so we may not need anything to mark attrib/pred
</pre>

* Dywedodd rhywbeth neis wrthi = He said something nice to her
* Mae'r peth yno yn neis = That thing is nice
: Mae yr peth yno yn neis
* Mae'n gar neis = It is a nice car
: Mae yn gar neis

<pre>
<donnek> at first glance, we may just need a rule for rhyw+thing
<donnek> rhyw = some
<donnek> rhywbeth (something), rhywfaint (somewhat), etc
<donnek> rhywle (somewhere)
</pre>

====Possession====

<pre>
Mae cath 'da Bwflw
Bod+p1.sg.pres cath gyda Bwflw
Be+p1.sg.pres cat with Beefalo
`Beefalo has a cat'
</pre>

;Apertium notes

We can probably deal with this in interchunk as follows

vbbod NP1 pr_gyda NP2

->

NP2 vbhave NP1

====The 'yn' particle====


As well as meaning 'in', 'yn' is used to form the present participle of a verb in welsh. For example:

*dysgu = to learn
*yn dysgu = learning

The present tense is formed by combining 'yn' with the corresponding form of 'bod' (to be) as follows:

*Mae Beefalo yn gweithio = Beefalo is working/Beefalo works

Note: when following a vowel, yn is abbreviated to 'n, e.g.

*Mae Beefalo'n gweithio

====Genitive Phrases====


* Handling of gender and number in adjectives
To form the indefinite genitive, a simple construct of <object><subject> can be used.
For example, "Soldiers of Wales" would be "milwyr Cymru", literally "soldier Wales"


Definite genitives are formed with a similar construction, just with the addition of y between the object and the subject.
For example, "Beic y gath" = "The cat's bike" literally "bike the cat"
Note: feminine nouns incur a soft mutation after the word "y"




[[Category:Discussions]]
[[Category:Discussions]]
[[Category:Language pairs]]
[[Category:Welsh to English]]

Latest revision as of 13:24, 10 December 2010


Todo[edit]

  • Fix multiword verbs in bilingual dictionary -- and add ones non-existent in English dictionary to that dictionary
  • Remove items which are in English dictionary but not Welsh/Bilingual
  • Fix verb conjugation in the Welsh analyser
  • Add restrictions in the bidix
  • Fix numbers
  • Add adverbs
  • More thorough handling of contractions (i'ch, a'u, ...) — including preblank
  • Add pre-verbal particles (basic functionality)
  • Add adjective macro to all chunks

Roadmap[edit]

apertium-cy-en 0.1[edit]

  • 8,000 of the highest frequency words in each dictionary.
  • Rules dealing with basic verb tenses (past, present, future)
  • Basic word re-ordering for simple phrases.
Aims and uses
  • For a non-native speaker to be able to discern the topic of a general news item.
  • To be able to identify who said what to who.
  • To be able to distinguish is a particular item is interesting enough to be translated properly.
  • Sentences of up to 5 words should be translated reasonably well from Welsh to English.
Report
  • Coverage:
    • Wikipedia (753,741 words): 85.5%
    • PNAW (11,684,177 words): 94%
    • BBC Newyddion (144,887 words): 91%

apertium-cy-en 0.2[edit]

  • 0.1 performance and coverage for English to Welsh.

apertium-cy-en 0.5[edit]

  • Properly capitalised sentences.
  • Get the number for nouns from the appropriate place. e.g. sometimes from the det, sometimes from the noun.

apertium-cy-en 1.0[edit]

  • Handling of gender and number in adjectives