PMC proposals/Apertium Workshop in Russia

From Apertium
Jump to navigation Jump to search

Proposal passed.

2011/11/02 #7: Apertium Workshop in Russia[edit]


This proposal aims to support a course/workshop/tutorial on machine translation in Russia aimed at the minority and regional languages thereof, and following that the development of a prototype pair for a minority language of Russia. Russia has a long history of work in machine translation, but very little work on the languages of Russia which are not Russian. Apertium has a lot of support for European languages, but few languages beyond. Having a long history of linguistics and computer science, Russia seems like an ideal place to expand.

Proposed by: Francis Tyers

Seconded by: --Jacob Nordfalk 19:36, 2 November 2011 (UTC)

In detail[edit]

The detailed proposal can be found in the following PDF document (updated 09:45, 13 December 2011 (UTC)). A letter of support from the Chuvash State Humanities Institute can be found here.


  • A more extensive version of this proposal was submitted to the EAMT in the previous call for proposals, but was rejected.
  • Among the objections were:
    • "The author does not mention how many participants the course is planned for." -- The course is planned for between 6 and 15 participants.
    • "No course programme is given." -- The course programme will basically follow the Apertium Luxembourg Workshop, although with examples adapted for Russian, more hands-on, and 5 days instead of 2. We expect substantially fewer participants.
      Update: The preliminary course outline is in the revised proposal. Francis Tyers 13:45, 9 November 2011 (UTC)


The cost breakdown does not, at least at a surface glance, match the project proposal. I think much more explanation is needed: for example, why does a workshop, or a translator, need an internet connection? -- Jimregan 13:31, 2 November 2011 (UTC)

Good question! The year-long internet connection is for the students who will be working on the translator, the same goes for the computers. Internet access is not so widespread in Chuvashia as in Moscow/St. Petersburg and the rest of Europe. For example, neither of the two universities have a campus internet connection. Computers are more widespread, but students studying Chuvash philology are not so likely to own them, nor likely to have ones suitable (e.g. sufficiently powerful/with GNU/Linux) suitable for work on Apertium. - Francis Tyers 13:36, 2 November 2011 (UTC)
As someone living in Chuvashia for almost three years I may explain the situation from the local side (which, even now, still shocks me from time to time). In Chuvashia official (monthly -fmt) average income is €220-230. As Chuvash language is not used at home with city children and there are not schools in it in the city, all Chuvash-language students comes from the country. That is what happens with the two students who would participate in the project. But the distance between city and country is extreme in Russia, as the result of decades of soviet industrialisation policies and post-soviet kolkhoz bankruptcy. So, even I don't have official data, the average income in the country may be 50-60% of the city one. More: The local universities do not have computers or free access to the internet (except in especial computer rooms used for classes). Country students live in hall of residents and there, also, there are not computers or internet. (Here the problem is that Moscow practices façade policies and gives all the money to the Moscow State Universitate in order to show that the country has at least one elite university centre: compare in the Shanghai ranking of universities the situation of Russian and Spanish universities). So all this causes that almost everywhere in Russia (with maybe the exception of regions with oil, as Tatarstan and Sakha) students with good knowledge of a minorised language seldom have a computer and/or access to the internet. That is the case at least in Chuvashia and that is why it is needed to buy computers and pay internet connections for such a project.--Hèctor Alòs i Font 18:42, 2 November 2011 (UTC)
Reminds me of Nepal. I hope this would not only make a language pair but also be a tiny contribution to an under-prioritized region --Jacob Nordfalk 19:43, 2 November 2011 (UTC)

This is a request for funding. I'd like to know how much money we have and if there are other candidates for using them. --Jacob Nordfalk 18:24, 2 November 2011 (UTC)

We have around 10,000€ and no other candidates currently, although anyone can submit requests, e.g. for conferences and such. To my knowledge this is the first "project" request. - Francis Tyers 18:34, 2 November 2011 (UTC)
On one hand, if we have ~10,000, that essentially means we spent nothing last year, and spending the money on a project like this is what we ought to be doing; on the other hand, it would just be irresponsible to put half the money we have into any project without any guarantees or recourse. As this is GSoC money we're talking about, maybe we could take a leaf from the GSoC programme? Add a set of milestones, and if they are not met, the next payment does not go through? The details would be different, but it's worth at least discussing. -- Jimregan 19:28, 2 November 2011 (UTC)
I think that would be perfectly fine. We could split it quite easily by deliverable. The idea would be that we get €1,450 for D1/D2 (the workshop) and then if it is successful (according to a PMC vote) the remaining €3,460 for the translator (D3-5) ? - Francis Tyers 19:33, 2 November 2011 (UTC)
No, that's not what I meant. I wasn't considering the workshop at all, because I can't think of a useful way to split that. Maybe buy the netbooks as a roll of the dice, and require that they be set up and ready before a workshop can be considered? Doesn't seem too useful, but maybe someone else might have a more useful idea. What I actually meant was to split the creation of the translator to have an assessable point or two, so if a milestone isn't met, we don't waste more money. But, you know, something usefully assessable. That the workshop could in any way serve as an indicator for the translator is a non sequitur. -- Jimregan 19:53, 2 November 2011 (UTC)
The idea of a midterm is good. In fact, we have a "mid-project evaluation" scheduled for the 7th month. So an idea could be to say €960 before month 7, and €960 if the mid-project evaluation is passed. What that evaluation should be is to be determined, but I think we could come up with something reasonable during the first months. - Francis Tyers 20:10, 2 November 2011 (UTC)

More comments, based on the updated proposal (have you added it here, yet?)

  • You should mention that the translation will also be made available under the EUPL. It's not like there's a choice in the matter, but it would look better to mention it.
  • You should write something here about the students who have been selected, and how they were selected. We're not some trade organisation, these are people we'll be expected to welcome into our community.
  • As you are both a proposer, and a signatory on the bank account, you should make explicit how money transfers, etc., have been handled, so that everything is above board. Presumably, Mikel will take care of signing cheques here, to keep things legitimate, but, you know, mention it.
  • Write something about ongoing mentoring/support for the students during the translator building phase.
  • You mention that Trond may participate in the workshop. If there is even a chance that you may require extra funding for that, mention it now. If you don't, and ask in future, I will vote against it on general principal - I don't want us to get trapped into a project with ever-escalating costs.
  • The split of the translator development into four phases is something I requested so that payment could be made conditional on performance during the subsequent phase - so we don't commit ourselves to paying for work that isn't being done. I consider that a condition to accepting the project. I think this is a generally prudent condition for an organisation of our size and means to impose, and, in any case, the students who will be doing the work are unknown to the rest of us.

In summary, I'm generally in favour, but this is our first project proposal, and I want us to start as we mean(/hope) to continue. -- Jimregan 11:43, 11 November 2011 (UTC)

These have been taken care of in the version of the proposal dated 11:59, 11 November 2011 (UTC). - Francis Tyers 11:59, 11 November 2011 (UTC)
More for information than as a condition, but I did say "...and how they were selected". Also, is there any way you can put the PDF on the wiki? I'd feel better if we had everything related to the vote in one place. -- Jimregan 12:14, 11 November 2011 (UTC)
They were selected as being students who know Chuvash and Turkish, being interested in the project, and sticking around. There were a few more candidates, but these were the two best ones. Hèctor may be able to weigh in more on the selection process. - Francis Tyers 14:02, 11 November 2011 (UTC)
The process has been the following. There are here two faculties of Chuvash philology. I collaborate quite closely with one of them (the one of the university where I worked) and I tried there to collect students for Apertium during during this year (thinking on GSoC 2012). In fact, Fran gave there one talk in May and after a short workshop. Failure for both me and Fran. So, I got in touch with the other faculty. There Chuvash philology students are taught also Turkish. The selection of the two students was been done by the Turkish language teacher of this faculty. I've been working with the two students on Chuvash morphology in September and October (4-5 meetings). In my opinion they can do the job, but it's clear that they are far from being perfect. Their computer skills are very basic. On the other hand, they have understood how the whole thing works and they have begun to show that they are able to think as computer linguists. As a proof of their commitment, they decided in the end of September that their degree thesis will be the Chuvash morphological analyzer/generator. The thesis have to be written until April. (I hope that they have changed their mind since then, as the project is not yet sure) --Hèctor Alòs i Font 18:24, 11 November 2011 (UTC)
That's what I wanted to know, thanks guys. -- Jimregan 15:19, 12 November 2011 (UTC)
Unfortunately, the two students who were selected resigned because the process was quite longer than expected. So, we had to run a new selection process, which I explain. I contacted both Chuvash language faculties and reexplained them what we need. The Chuvash State University invited me to present the project to c. 30 students. In the Pedagogical University there are less students who study Turkish, and they preferred to speak with them. The result was that I interviewed individually 5 students from the State University and 4 of the Pedagogical University. After that, only 2 students from the State University (both of the first year) and 2 of the Pedagogical University (from the 4th and 5th year) put themselves forward as candidates. All of them were recommended by their teachers. In my opinion all four are good: one of the Pedagogical University is very good (see her CV in the new version of the proposal), but the other has a problem of time, and I prefer not the select her (although I know her, and I know she's is fully reliable if she accepts a job). On the other side, both students of the State University don't have yet a big knowledge of Turkish, but showed a lot of interest and commitment. Between them I selected the one who had better language skills: she speaks perfectly Tatar, which helps her a lot for Turkish (and I hope she'll work further in the Chuvash-Tatar translator).--Hèctor Alòs i Font 18:52, 13 December 2011 (UTC)

Mikel is in favour, with funding made available in installments following evaluation of deliverables. Comments to improve the final proposal for the vote:

Quantify and specify better the amount and the nature of support provided by the Chuvash State Institute of the Humanities
The Chuvash State Institute of the Humanities is a local prestigious centre, which can be useful if an unexpected difficult problem appears. This kind of support may be helpful in an extremely rigid and bureaucratic society as the Russian/Chuvash. In any case, our main supporters are in the Chuvash State University, where the course will be hold and to where studies one of the two project students. The Chuvash Philology Faculty is helping us a lot, and its Turkish Language teacher will also help us as a consultant. So will do the Turkish language specialist of the Chuvash State Institute of the Humanities. --Hèctor Alòs i Font 18:15, 13 December 2011 (UTC)
I miss a lesson on evaluation in table 1. Evaluation is important (deliverables D3, D5, D7)
Can you still make it in December 2011 or January 2012
Yes. The course can be done 23-27 January in one of the computer technologies faculties of the Chuvash State University ( ). We have the planning and there is enough time to prepare and announce everything, but we should begin very soon.--Hèctor Alòs i Font 19:36, 13 December 2011 (UTC)
I would expect a bit more detail about how accomodation will be organised by the local organisers.
The accomodation will be in an internat. The cost will be 300 rubles/night (around 7.50€).
If the cost of translation is aprox €5 per page and the slides contain 7,000 words and translation costs around €75, does this mean that a typical page is 467 words? That boils down to about €0.01 per word, which is an order of magnitude below usual rates. Are the figures OK?
The standard translation page in Russia has 1,800 characters (with blanks). I've been speaking with the people who will do the translation, and they confirm the figures and the quality of the translation. I know them and they are reliable: it wouldn't be the best translation in the world, but it will be correct and fully usable.--Hèctor Alòs i Font 18:15, 13 December 2011 (UTC)
Give a complete reference of the Çuvaş – Türkiye Sözlük. Is it this?
Yes, it is that one. Fixed in revision marked 20:18, 17 November 2011 (UTC). - Francis Tyers 20:18, 17 November 2011 (UTC)
The proposal says that the two Chuvash students will be selected after the course but it gives two names later. This is strange and should be clarified before my approval is permanent.
This is an error in the proposal, fixed in revision marked 20:18, 17 November 2011 (UTC). - Francis Tyers 20:18, 17 November 2011 (UTC)
I don't think this is a 'scientific work of grand scale'. Perhaps an experiment with the accelerator at CERN is. I would use different language, really
This is how the girls phrased it: "В научных работах такого масштаба не участвовали, поэтому и нет ни грантов, ни премий." I translated it hastily, it might be better "scientific work of such a scale". :) - Francis Tyers 18:30, 17 November 2011 (UTC)
€350 is a bit expensive for a netbook; I recently bought a nice 10'' HP Mini for €239.
Prices here are a bit more expensive and variety, lesser. Seeing what's available, we think the best solution is to afford notebooks and resell them after one year. In this case, probably a bit less than €350 could be done, but if the maintain the figure, we'll be able to give a favourable price to our students, as a kind of premium. Please, take into consideration that this may be a strange part of the budget, but this is in fact because the budget is half-Russian: Occidental students would have their own computers and Internet connections, but wouldn't receive 80€/month grant. You may look at the figures and consider that the students will receive a 120€/month grant.--Hèctor Alòs i Font 18:15, 13 December 2011 (UTC)
Where will the €10-a-month internet connections be installed exactly?
This is the current cost here of a USB-modem suitable of a notebook - --Hèctor Alòs i Font 18:15, 13 December 2011 (UTC)
It would be nice to break the total funding in installments. In particular, as some money should be available before the tasks, a payment schedule would be a must.
Turkish and Chuvash The course → Turkish and Chuvash. The course (p. 1)
missing period after "the overview is in Table 1" (p. 2)
Fixed these two in revision marked 20:18, 17 November 2011 (UTC). - Francis Tyers 20:18, 17 November 2011 (UTC)

As this matter is still under discussion, and Sergio has registered his intention to vote, we should perhaps consider this vote to have a default extension. Say, one week? -- Jimregan 17:45, 17 November 2011 (UTC)


Please note that voting is only open to PMC members. Please vote by signing (~~~) in the relevant section.


  1. Francis Tyers
  2. --Jacob Nordfalk 07:40, 3 November 2011 (UTC)
  3. Jimregan
  4. Mlforcada 16:53, 17 November 2011 (UTC) (see conditions above) Conditions have been satisfactorily met. Mlforcada 12:18, 15 December 2011 (UTC)



  1. Felipe Sánchez Martínez (fsanchez)
  2. Juan Antonio Pérez (japerez)