User:Mlforcada/Sandbox/Basque-to-English

From Apertium
Jump to navigation Jump to search

Loose notes on Basque to English[edit]

  • Should regression tests in http://wiki.apertium.org/wiki/Basque_to_English/Pending_tests be modified to contain English output which, while not being adequate, is as good as we can get? These should be marked accordingly. For instance we might give up on getting The dog has seen the cat for Txakurrak katua ikusi du and be happy with The dog the cat has seen. Should these be part of the regression tests?
IMO, this type of end-of-the-line testing ignores too much of the inner workings to be useful. Coupled with the presumptions involved, that a test can pass or fail is almost accidental. It's useful, as a side-effect, to tell students to write these 'tests' on the wiki, as a means of self-study in splitting sentences into constituents (and then to write rules based on them), but as tests? not so much.
Failing, accurate tests have their own inherent value, as does a mini-corpus of examples. Falsified test values have no value, unless you place a premium on having passing tests for their own sake.
If you'd rather delete it, by all means, delete it - there isn't much that particular example can add. But I don't want or need examples of MT-ese, they are unfortunately far too easy to come by.
  • The bilingual dictionary is hard to read and needs heavy cleaning, as it contains entries that lead to inadequate translations such as handi --> tidy (should be large) (rev. 33921). How can we do that?
That came from matxin and by looking at the other definitions for 'handi', it means 'tidy' in the sense 'a tidy sum'. (Fixed)
  • Pronouns appear for NOR constructs (Etorri dira --> They have come, Etorri da --> He has come (why he?)) but not for NOR-NORK constructs: (Ekarri dute --> Have brought). This should be done consistently, and pronouns should disappear when the subject is overt in Basque. (rev. 33921)
What exists so far was only there to test that the right pronouns were being generated. 'He' is used as the default p3.sg subject pronoun, because that's what Mireia advised.
  • Apparently, nothing has been done yet for indirect objects in verbs: Etorri da and Etorri zaigu has the same translation whereas the second one should have a To us indirect object; same with Ekarri zuten and Ekarri ziguten (They brought and They brought [to] us(rev. 33921).
I think I mentioned this as an area where I was missing information? I managed to find some of the tense information I was looking for, but I still don't have anything consistent for indirect pronouns.