PMC proposals/Apertium Workshop in Russia

From Apertium
Jump to navigation Jump to search

2011/11/02 #7: Apertium Workshop in Russia

Summary

This proposal aims to support a course/workshop/tutorial on machine translation in Russia aimed at the minority and regional languages thereof, and following that the development of a prototype pair for a minority language of Russia. Russia has a long history of work in machine translation, but very little work on the languages of Russia which are not Russian. Apertium has a lot of support for European languages, but few languages beyond. Having a long history of linguistics and computer science, Russia seems like an ideal place to expand.

Proposed by: Francis Tyers

Seconded by: --Jacob Nordfalk 19:36, 2 November 2011 (UTC)

In detail

The detailed proposal can be found in the following PDF document (updated 11:59, 11 November 2011 (UTC)). A letter of support from the Chuvash State Humanities Institute can be found here.

Caveats

  • A more extensive version of this proposal was submitted to the EAMT in the previous call for proposals, but was rejected.
  • Among the objections were:
    • "The author does not mention how many participants the course is planned for." -- The course is planned for between 6 and 15 participants.
    • "No course programme is given." -- The course programme will basically follow the Apertium Luxembourg Workshop, although with examples adapted for Russian, more hands-on, and 5 days instead of 2. We expect substantially fewer participants.
      Update: The preliminary course outline is in the revised proposal. Francis Tyers 13:45, 9 November 2011 (UTC)

Comments

The cost breakdown does not, at least at a surface glance, match the project proposal. I think much more explanation is needed: for example, why does a workshop, or a translator, need an internet connection? -- Jimregan 13:31, 2 November 2011 (UTC)

Good question! The year-long internet connection is for the students who will be working on the translator, the same goes for the computers. Internet access is not so widespread in Chuvashia as in Moscow/St. Petersburg and the rest of Europe. For example, neither of the two universities have a campus internet connection. Computers are more widespread, but students studying Chuvash philology are not so likely to own them, nor likely to have ones suitable (e.g. sufficiently powerful/with GNU/Linux) suitable for work on Apertium. - Francis Tyers 13:36, 2 November 2011 (UTC)
As someone living in Chuvashia for almost three years I may explain the situation from the local side (which, even now, still shocks me from time to time). In Chuvashia official (monthly -fmt) average income is €220-230. As Chuvash language is not used at home with city children and there are not schools in it in the city, all Chuvash-language students comes from the country. That is what happens with the two students who would participate in the project. But the distance between city and country is extreme in Russia, as the result of decades of soviet industrialisation policies and post-soviet kolkhoz bankruptcy. So, even I don't have official data, the average income in the country may be 50-60% of the city one. More: The local universities do not have computers or free access to the internet (except in especial computer rooms used for classes). Country students live in hall of residents and there, also, there are not computers or internet. (Here the problem is that Moscow practices façade policies and gives all the money to the Moscow State Universitate in order to show that the country has at least one elite university centre: compare in the Shanghai ranking of universities the situation of Russian and Spanish universities). So all this causes that almost everywhere in Russia (with maybe the exception of regions with oil, as Tatarstan and Sakha) students with good knowledge of a minorised language seldom have a computer and/or access to the internet. That is the case at least in Chuvashia and that is why it is needed to buy computers and pay internet connections for such a project.--Hèctor Alòs i Font 18:42, 2 November 2011 (UTC)
Reminds me of Nepal. I hope this would not only make a language pair but also be a tiny contribution to an under-prioritized region --Jacob Nordfalk 19:43, 2 November 2011 (UTC)

This is a request for funding. I'd like to know how much money we have and if there are other candidates for using them. --Jacob Nordfalk 18:24, 2 November 2011 (UTC)

We have around 10,000€ and no other candidates currently, although anyone can submit requests, e.g. for conferences and such. To my knowledge this is the first "project" request. - Francis Tyers 18:34, 2 November 2011 (UTC)
On one hand, if we have ~10,000, that essentially means we spent nothing last year, and spending the money on a project like this is what we ought to be doing; on the other hand, it would just be irresponsible to put half the money we have into any project without any guarantees or recourse. As this is GSoC money we're talking about, maybe we could take a leaf from the GSoC programme? Add a set of milestones, and if they are not met, the next payment does not go through? The details would be different, but it's worth at least discussing. -- Jimregan 19:28, 2 November 2011 (UTC)
I think that would be perfectly fine. We could split it quite easily by deliverable. The idea would be that we get €1,450 for D1/D2 (the workshop) and then if it is successful (according to a PMC vote) the remaining €3,460 for the translator (D3-5) ? - Francis Tyers 19:33, 2 November 2011 (UTC)
No, that's not what I meant. I wasn't considering the workshop at all, because I can't think of a useful way to split that. Maybe buy the netbooks as a roll of the dice, and require that they be set up and ready before a workshop can be considered? Doesn't seem too useful, but maybe someone else might have a more useful idea. What I actually meant was to split the creation of the translator to have an assessable point or two, so if a milestone isn't met, we don't waste more money. But, you know, something usefully assessable. That the workshop could in any way serve as an indicator for the translator is a non sequitur. -- Jimregan 19:53, 2 November 2011 (UTC)
The idea of a midterm is good. In fact, we have a "mid-project evaluation" scheduled for the 7th month. So an idea could be to say €960 before month 7, and €960 if the mid-project evaluation is passed. What that evaluation should be is to be determined, but I think we could come up with something reasonable during the first months. - Francis Tyers 20:10, 2 November 2011 (UTC)


More comments, based on the updated proposal (have you added it here, yet?)

  • You should mention that the translation will also be made available under the EUPL. It's not like there's a choice in the matter, but it would look better to mention it.
  • You should write something here about the students who have been selected, and how they were selected. We're not some trade organisation, these are people we'll be expected to welcome into our community.
  • As you are both a proposer, and a signatory on the bank account, you should make explicit how money transfers, etc., have been handled, so that everything is above board. Presumably, Mikel will take care of signing cheques here, to keep things legitimate, but, you know, mention it.
  • Write something about ongoing mentoring/support for the students during the translator building phase.
  • You mention that Trond may participate in the workshop. If there is even a chance that you may require extra funding for that, mention it now. If you don't, and ask in future, I will vote against it on general principal - I don't want us to get trapped into a project with ever-escalating costs.
  • The split of the translator development into four phases is something I requested so that payment could be made conditional on performance during the subsequent phase - so we don't commit ourselves to paying for work that isn't being done. I consider that a condition to accepting the project. I think this is a generally prudent condition for an organisation of our size and means to impose, and, in any case, the students who will be doing the work are unknown to the rest of us.

In summary, I'm generally in favour, but this is our first project proposal, and I want us to start as we mean(/hope) to continue. -- Jimregan 11:43, 11 November 2011 (UTC)

These have been taken care of in the version of the proposal dated 11:59, 11 November 2011 (UTC). - Francis Tyers 11:59, 11 November 2011 (UTC)
More for information than as a condition, but I did say "...and how they were selected". Also, is there any way you can put the PDF on the wiki? I'd feel better if we had everything related to the vote in one place. -- Jimregan 12:14, 11 November 2011 (UTC)
They were selected as being students who know Chuvash and Turkish, being interested in the project, and sticking around. There were a few more candidates, but these were the two best ones. Hèctor may be able to weigh in more on the selection process. - Francis Tyers 14:02, 11 November 2011 (UTC)
The process has been the following. There are here two faculties of Chuvash philology. I collaborate quite closely with one of them (the one of the university where I worked) and I tried there to collect students for Apertium during during this year (thinking on GSoC 2012). In fact, Fran gave there one talk in May and after a short workshop. Failure for both me and Fran. So, I got in touch with the other faculty. There Chuvash philology students are taught also Turkish. The selection of the two students was been done by the Turkish language teacher of this faculty. I've been working with the two students on Chuvash morphology in September and October (4-5 meetings). In my opinion they can do the job, but it's clear that they are far from being perfect. Their computer skills are very basic. On the other hand, they have understood how the whole thing works and they have begun to show that they are able to think as computer linguists. As a proof of their commitment, they decided in the end of September that their degree thesis will be the Chuvash morphological analyzer/generator. The thesis have to be written until April. (I hope that they have changed their mind since then, as the project is not yet sure) --Hèctor Alòs i Font 18:24, 11 November 2011 (UTC)
That's what I wanted to know, thanks guys. -- Jimregan 15:19, 12 November 2011 (UTC)

Mikel is in favour, with funding made available in installments following evaluation of deliverables. Comments to improve the final proposal for the vote:

Quantify and specify better the amount and the nature of support provided by the Chuvash State Institute of the Humanities
I miss a lesson on evaluation in table 1. Evaluation is important (deliverables D3, D5, D7)
Can you still make it in December 2011 or January 2012
I would expect a bit more detail about how accomodation will be organised by the local organisers.
If the cost of translation is aprox €5 per page and the slides contain 7,000 words and translation costs around €75, does this mean that a typical page is 467 words? That boils down to about €0.01 per word, which is an order of magnitude below usual rates. Are the figures OK?
Give a complete reference of the Çuvaş – Türkiye Sözlük. Is it [this]?
The proposal says that the two Chuvash students will be selected after the course but it gives two names later. This is strange and should be clarified before my approval is permanent.
I don't think this is a 'scientific work of grand scale'. Perhaps an experiment with the accelerator at CERN is. I would use different language, really
€350 is a bit expensive for a netbook; I recently bought a nice 10 HP Mini for €239.
Where will the €10-a-month internet connections be installed exactly?
It would be nice to break the total funding in installments. In particular, as some money should be available before the tasks, a payment schedule would be a must.
Typographical:
Turkish and Chuvash The course → Turkish and Chuvash. The course (p. 1)
missing period after "the overview is in Table 1" (p. 2)

Voting

Please note that voting is only open to PMC members. Please vote by signing (~~~) in the relevant section.

Agree

  1. Francis Tyers
  2. --Jacob Nordfalk 07:40, 3 November 2011 (UTC)
  3. Jimregan
  4. Mlforcada 16:53, 17 November 2011 (UTC) (see conditions above)

Disagree

Abstain

  1. Felipe Sánchez Martínez (fsanchez)
  2. Juan Antonio Pérez (japerez)