English and Esperanto

From Apertium
Jump to navigation Jump to search

Esperantistoj, b.v. vidu Peto al esperantistoj.


Intros to Esperanto

Perhaps http://en.wikiversity.org/wiki/Rules_of_Esperanto_grammar (or http://donh.best.vwh.net/Esperanto/rules.html) is a good overview.

And the affixes: http://esperanto.davidgsimpson.com/eo-affixes.html (short) http://steve-and-pattie.com/esperantujo/grparafx.html (longer)

Tenses are exlained in http://en.wikipedia.org/wiki/Esperanto_grammar#Verbs

Wordlists

http://freepages.rootsweb.ancestry.com/~wakefield/translations/engesp.html

http://www.mirrorservice.org/sites/download.sourceforge.net/pub/sourceforge/d/dm/dmdictionary/EngEsp.txt

Transfer rules English->Esperanto

en pattern eo pattern en eo status
adj adj.sg.nom aggressive agresema OK
adj.pos adj.sg.nom his lia OK
adj.sint adj.sg.nom big granda OK
adj.sint.comp pli<preadv> + adj.sg.nom bigger pli granda OK
adj.sint.sup plej<preadv> + granda.adj.sg.nom biggest plej granda OK
adv adv after poste OK
adv.itg adv.itg how kiel OK
cnjadv cnjadv after post kiam OK
cnjcoo cnjcoo and kaj OK
cnjsub cnjsub that ke OK
det.def.pl det.def.pl.nom the la OK
det.def.sg det.def.sg.nom the la OK
det.def.sp det.def.sp.nom the la OK
det.dem.pl det.dem.pl.nom these ĉi tiuj OK
det.dem.sg det.dem.sg.nom this ĉi tiu OK
det.ind.pl det.ind.pl.nom both ambaŭj OK
det.ind.sg det.ind.sg.nom another alia OK
det.ind.sp det.ind.sp.nom any ajna OK
det.qnt.pl det.qnt.pl.nom few malmultaj OK
det.qnt.sg det.qnt.sg.nom less la malpli da OK
det.qnt.sp det.qnt.sp.nom enough sufiĉe da OK
ij ij goodbye adiaŭ OK
n.acr.sg n.sg.nom TV televido OK
np.al.sg np.al.sg.nom GCompris GCompris OK
np.ant.f.sg np.ant.f.sg.nom April April OK
np.ant.m.sg np.ant.m.sg.nom Robert Robert OK
np.cog.sg np.cog.sg.nom Stevenson Stevenson OK
np.cog.pl np.cog.pl.nom Stevensons Stevenson(oj?) ?
n.sg n.sg.nom hundred cento OK
n.pl n.pl.nom hundreds centoj OK
np.top.sg np.top.sg.nom Afghanistan Afganio OK
num.sg num.sg one unu OK
num.sp num.sp hundred cent OK
pr pr after post OK
preadv preadv any tre OK
predet.sp sp all (cars) ĉiu OK - all cars
prn.itg.sp prn.itg.sg.nom what kio OK
prn.obj.p1.mf.pl prn.obj.p1.mf.pl us nin OK
prn.obj.p1.mf.sg prn.obj.p1.mf.sg me min OK
prn.obj.p2.mf.sp prn.obj.p2.mf.sp you vin OK
prn.obj.p3.f.sg prn.obj.p3.f.sg her ŝin OK
prn.obj.p3.mf.pl prn.obj.p3.mf.pl them ilin OK
prn.obj.p3.m.sg prn.obj.p3.m.sg him lin OK
prn.obj.p3.nt.sg prn.obj.p3.nt.sg it ĝin OK
prn.ref.p1.mf.pl prn.ref.p1.mf.pl ourselves nin needs <acc>
prn.ref.p1.mf.sg prn.ref.p1.mf.sg myself #mi needs <acc>
prn.ref.p2.mf.pl prn.ref.p2.mf.pl yourselves #vi needs <acc>
prn.ref.p2.mf.sg prn.ref.p2.mf.sg yourself #vi needs <acc>
prn.ref.p3.f.sg prn.ref.p3.f.sg herself #si needs <acc>
prn.ref.p3.mf.pl prn.ref.p3.mf.pl themselves #si needs <acc>
prn.ref.p3.m.sg prn.ref.p3.m.sg himself #si needs <acc>
prn.ref.p3.nt.sg prn.ref.p3.nt.sg itself #si needs <acc>
prn.subj.p1.mf.pl prn.subj.p1.mf.pl we ni OK
prn.subj.p1.mf.sg prn.subj.p1.mf.sg I mi OK
prn.subj.p2.mf.sp prn.subj.p2.mf.sp you vi OK
prn.subj.p3.f.sg prn.subj.p3.f.sg she ŝi OK
prn.subj.p3.mf.pl prn.subj.p3.mf.pl they ili OK
prn.subj.p3.m.sg prn.subj.p3.m.sg he li OK
prn.subj.p3.nt.sg prn.subj.p3.nt.sg it ĝi OK
prn.tn.pl prn.tn.pl.nom both ambaŭ OK
prn.tn.sg prn.tn.sg.nom another alia OK
prn.tn.sp prn.tn.sg.nom all ĉiu OK
rel.adv rel.adv where kie OK
rel.an.mf.sp rel.an.mf.sp which #\<rel\> OK
vaux.inf vaux.inf will #\<vaux\> OK
vaux.past vaux.past could povis OK
vaux.pres vaux.pres can povas OK
vbdo.past vblex.past did faris OK
vbdo.pres vblex.pres do faras OK
vbdo.pres.p3.sg vblex.pres does faras OK
vbhaver.ger vblex.ger having havanta OK
vbhaver.inf vblex.inf have havi OK
vbhaver.past vblex.past had havis OK
vbhaver.pres vblex.pres have havas OK
vbhaver.pres.p3.sg vblex.pres has havas OK
vblex.ger vblex.ger advertising anoncanta OK
vblex.imp vblex.imp affect afekciu OK
vblex.inf vblex.inf advertise anonci OK
vblex.past vblex.past advertised anoncis OK
vblex.pp vblex.pp advertised anoncita OK
vblex.pres vblex.pres advertise anoncas OK
vblex.pres.p3.sg vblex.pres advertises anoncas OK
vbser.ger vbser.ger being estanta OK
vbser.inf vbser.inf be esti OK
vbser.past vbser.past were estis OK
vbser.past.p1.sg vbser.past was estis OK
vbser.past.p3.sg vbser.past was estis OK
vbser.pp vbser.pp been estita OK
vbser.pres vbser.pres are estas OK
vbser.pres.p1.sg vbser.pres am estas OK
vbser.pres.p3.sg vbser.pres is estas OK

Test set

<jacobn> Jim, Fran: I just looked at http://www.link.cs.cmu.edu/link/batch.html  and to me is looks like that "carefully selected text" I was talking about a week ago which would be needed to define the most important features to get covered.
<jimregan2> Jacob, it's ok
<jimregan2> we already have carefully selected text for English :)
<jimregan2> ALL + plural
<jimregan2> ALL + adj + plural
<jacobn> what do you think, is http://www.link.cs.cmu.edu/link/batch.html + their translations to Esperanto suitable as test set?
<jimregan2> as one test set, yes
<jimregan2> I have another few, and I promise I'll get to them on Wednesday
<jacobn> Jim, I would like to include a " carefully selected text for English" in the en-eo test set. Do you have a better suggestion than http://www.link.cs.cmu.edu/link/batch.html ?
<jacobn> Fine
<jimregan2> heck - I'll even set a reminder :)
<jacobn> no hast necessary
<jimregan2> newspaper - type text is best
<jimregan2> I'll grab a few chunks from different books at project gutenberg

<jimregan2> oh, you know about the '*' in the sentences, right?
<jacobn> the * ??
<jacobn> no, never met it
<jacobn> ;-)
<jimregan2> at the start of a lot of the sentences, there's a '*'
<jacobn> Oh that
<jacobn> yes, Ive read it
<jimregan2> that's a standard convention in linguistics to say 'this sentence is incorrect'
<jacobn> I would start with the non-* sentences
<jimregan2> just dump anything with '*'
<jimregan2> they're not worth any effort

Tagging errors

(10:19:15) jacob: en-eo	  You can save multiple configurations, and switch between them easily. 
	- Vi povas sekurigi multajn agordojn, kaj ŝalti inter ili facile. 
	+ Vi povas savi *multiple agordoj, kaj ŝanĝo inter ilin facile.
(10:20:00) jacob: Why is "switch" in "switch between them" considered a noun?
(10:20:16) jacob: (ŝanĝo)
(10:20:19) francis: did you put it in the testing interface ?
(10:20:42) francis: the sentence
(10:20:47) francis: ^and/and<cnjcoo>$ ^switch/switch<n><sg>/switch<vblex><inf>/switch<vblex><pres>$ ^between/between<pr>$
(10:20:48) francis:  
(10:20:54) francis: ^and<cnjcoo>$ ^switch<n><sg>$ ^between<pr>$
(10:20:54) francis:  
(10:21:04) francis: the options for "switch" are noun, verb 
(10:21:07) francis: it chooses noun
(10:21:13) francis: the tagger works on a statistical basis
(10:22:55) jacob: But "easily" can only be there if "switch" is a verb.
(10:23:34) jacob: "There is a switch between them"  (noun)
(10:23:51) jacob: "switch between them" (noun or verb)
(10:24:11) jacob "switch between them easily" (can only be verb)

"switch between them easily" (can only be verb) - not true. 'You can put a switch between them easily' -- Jimregan 18:34, 15 September 2008 (UTC)

A note about accusative

The next kind of thing we should think about is the type of sentence part that goes like this:

'the man you saw' 'the man the girl saw'

I don't know if we have to change word order here - probably not - but the nominative and accusative are SNs 2 and 1 respectively.

But think about this:

'the man my brother became'

Adding accusative here is wrong, so what can we do about it? Not much. Maybe in this specific instance, sure, but generally, we can only take the common cases and hope for the best. There's been plenty of work into statistical parsing, subject identification, etc., but it's still not much better than picking the common cases, and hoping for the best.

This is why we always tell people to have their translations checked by a native speaker :)

Jacob TODO

<jacobn> Ok, Ill try the web doc translator more, find the systematics, report a bug and attach files etc.

See also