Anaphora resolution

From Apertium
Revision as of 09:35, 14 September 2017 by Francis Tyers (talk | contribs)
Jump to navigation Jump to search

Apertium has a problem with anaphora resolution.

For example:

  1. If you have "el seu" in Catalan and are translating to French it could be "son" (third-person singular or "leur" (third-person plural). If you are translating to English or Russian then you also need to know the gender of the possessor (его, ее, их).
  2. If you are generating subject pronouns for a language, often you need to know the gender of the pronoun, e.g. "ha arribat" could be "He has arrived" or "She has arrived". In this case the "frequent" thing to do is to use the masculine pronoun, but that just relies on the male pronouns are used more frequently (see below):

Usually this kind of thing is done over parse trees, but Apertium doesn't have parse trees, so we'd need to find another way to do it.

Masculine and feminine subject pronouns in English wikipedia:

5682787  he 
3469648  He 
1508156  she 
 839442  She