On the usefulness of reg.testing
re: https://plus.google.com/u/0/114804369744409916883/posts/JvozgQxVAyT I do find them very useful for certain purposes, although I agree they are not regression tests. I'm not sure what it should be called, but I consider it a tiny 'corpus' of stuff that used to be broken and/or easily tends to break if I'm not careful when changing rules or dictionaries. (However, I'd find them less annoying and more realistic if we could do "
test|lang|source sentence|possible target sentence 1|possible target sentence 2|…|possible target sentence N".) The before/after corpus test which Jacob mentions is in general a lot more helpful to find improvements (or degradations) in quality, while testvoc scripts (both based on lt-expand and corpus testvoc) is helpful in guiding developement. --unhammer 10:53, 21 November 2011 (UTC)
- umm I've not used this kind of tests (or non-tests) yet. Whether it is testing or not, can we call them lists of parallel sentences/sintagms?. I was thinking on it and I see two advantages of doing such lists: 1) for a language whose grammar is not well formalised, building a list of parallel examples helps to understand the underlying grammar and the nature of the changes you need to do in the translator. It also helps people don't speaking the language to understand differences between languages 2) In the absence of large parallel corpus where most common translation errors and challenges would likely appear, I see these lists as a surrogate of a parallel corpus. You cannot use it to quantitatively assess the performance, but you can learn what needs to be done in specific aspects, and just check that the rest of things continue to work. This is why I thought it could be useful for es-an pair ('an' being a language not completely formalised in terms of syntax, and with few parallel texts). --Juanpabl 15:42, 21 November 2011 (UTC)