English and Esperanto

From Apertium
Jump to navigation Jump to search

Esperantistoj, b.v. vidu Peto al esperantistoj.

Intros to Esperanto

Perhaps http://en.wikiversity.org/wiki/Rules_of_Esperanto_grammar (or http://donh.best.vwh.net/Esperanto/rules.html) is a good overview.

And the affixes: http://esperanto.davidgsimpson.com/eo-affixes.html (short) http://steve-and-pattie.com/esperantujo/grparafx.html (longer)

Tenses are exlained in http://en.wikipedia.org/wiki/Esperanto_grammar#Verbs




Test set

<jacobn> Jim, Fran: I just looked at http://www.link.cs.cmu.edu/link/batch.html  and to me is looks like that "carefully selected text" I was talking about a week ago which would be needed to define the most important features to get covered.
<jimregan2> Jacob, it's ok
<jimregan2> we already have carefully selected text for English :)
<jimregan2> ALL + plural
<jimregan2> ALL + adj + plural
<jacobn> what do you think, is http://www.link.cs.cmu.edu/link/batch.html + their translations to Esperanto suitable as test set?
<jimregan2> as one test set, yes
<jimregan2> I have another few, and I promise I'll get to them on Wednesday
<jacobn> Jim, I would like to include a " carefully selected text for English" in the en-eo test set. Do you have a better suggestion than http://www.link.cs.cmu.edu/link/batch.html ?
<jacobn> Fine
<jimregan2> heck - I'll even set a reminder :)
<jacobn> no hast necessary
<jimregan2> newspaper - type text is best
<jimregan2> I'll grab a few chunks from different books at project gutenberg

<jimregan2> oh, you know about the '*' in the sentences, right?
<jacobn> the * ??
<jacobn> no, never met it
<jacobn> ;-)
<jimregan2> at the start of a lot of the sentences, there's a '*'
<jacobn> Oh that
<jacobn> yes, Ive read it
<jimregan2> that's a standard convention in linguistics to say 'this sentence is incorrect'
<jacobn> I would start with the non-* sentences
<jimregan2> just dump anything with '*'
<jimregan2> they're not worth any effort

Tagging errors

(10:19:15) jacob: en-eo	  You can save multiple configurations, and switch between them easily. 
	- Vi povas sekurigi multajn agordojn, kaj ŝalti inter ili facile. 
	+ Vi povas savi *multiple agordoj, kaj ŝanĝo inter ilin facile.
(10:20:00) jacob: Why is "switch" in "switch between them" considered a noun?
(10:20:16) jacob: (ŝanĝo)
(10:20:19) francis: did you put it in the testing interface ?
(10:20:42) francis: the sentence
(10:20:47) francis: ^and/and<cnjcoo>$ ^switch/switch<n><sg>/switch<vblex><inf>/switch<vblex><pres>$ ^between/between<pr>$
(10:20:48) francis:  
(10:20:54) francis: ^and<cnjcoo>$ ^switch<n><sg>$ ^between<pr>$
(10:20:54) francis:  
(10:21:04) francis: the options for "switch" are noun, verb 
(10:21:07) francis: it chooses noun
(10:21:13) francis: the tagger works on a statistical basis
(10:22:55) jacob: But "easily" can only be there if "switch" is a verb.
(10:23:34) jacob: "There is a switch between them"  (noun)
(10:23:51) jacob: "switch between them" (noun or verb)
(10:24:11) jacob "switch between them easily" (can only be verb)

"switch between them easily" (can only be verb) - not true. 'You can put a switch between them easily' -- Jimregan 18:34, 15 September 2008 (UTC)

A note about accusative

The next kind of thing we should think about is the type of sentence part that goes like this:

'the man you saw' 'the man the girl saw'

I don't know if we have to change word order here - probably not - but the nominative and accusative are SNs 2 and 1 respectively.

But think about this:

'the man my brother became'

Adding accusative here is wrong, so what can we do about it? Not much. Maybe in this specific instance, sure, but generally, we can only take the common cases and hope for the best. There's been plenty of work into statistical parsing, subject identification, etc., but it's still not much better than picking the common cases, and hoping for the best.

This is why we always tell people to have their translations checked by a native speaker :)

Jacob TODO

<jacobn> Ok, Ill try the web doc translator more, find the systematics, report a bug and attach files etc.

See also