Retrospective: Google Code-In 2017
Google Code-In 2017 was certainly an overall success for Apertium. Students completed upwards of 140 tasks which is probably the highest count yet. There were a lot of new contributors, some of whom seem like they will (to varying extents) continue contributing to many different Apertium projects.
However, some aspects of Apertium's involvement in GCI can be improved so that mentors and students have an even better experience in the future.
What went right:
- Worthy and deserving students were selected as Winners/Finalists.
What went wrong:
- Very few mentors were involved in the selection of Winners/Finalists.
- The other mentors were not told when/where/how the selection discussion would take place. These details and the selected students were not known even after the discussion was over.
- There was no transparency among mentors for what criteria were used to select Winners/Finalists.
How we can improve it:
- It appears that this was not the case in previous years of GCI and only happened this year because of the increased workload. The next couple sections talk about managing workload so it is not discussed here.
- In previous years, there was a spreadsheet containing the top ten students where mentors could rank the students, and the students with the highest overall rankings would be selected as Winners/Finalists. This is a good system but it has drawbacks:
- All mentors do not interact with all students, so each mentor's ranking can only be partially meaningful.
- The evaluation criteria are not specifically discussed or standardized.
- Any system of ranked voting suffers from the deficiencies of Arrow's impossibility theorem.
- A possible solution is to have a paragraph written for each student by the mentor(s) that worked most closely with them, describing the quality of their work and interaction with appropriate evidence. Then, all the mentors can read this description and follow the process of cardinal voting, i.e. each mentor assigns a numerical 'grade' to each student and the students are finally ranked by average grade. The benefits are:
- Even if a mentor does not interact with a student they can still judge the work.
- The 'grade' can be split into different categories, like "code quality", "style/frequency of communication", "willingness to help others", etc. Mentors would have a transparent, standardized system to evaluate students, and possibly this system could be told to students too so they know what is valued in the community.
- Arrow's impossibility theorem does not apply to cardinal systems.
- The results are actually more accurate (see the "psychological studies" references on the Wikipedia page).
- No special process is required beyond a shared online spreadsheet with a sum and average value function.
Organizing and managing tasks
What went right:
- There were a large number of tasks covering a wide range of Apertium systems and a wide range of skills.
- Each task typically had more than one mentor.
- Each task typically had a decent description of the task and links for further information.
- As the contest progressed, creating new tasks was hassle-free.
What went wrong:
- Some tasks were very poorly described (sometimes just one sentence, no links).
- There was a lot of confusion around when the contest started about tasks not being uploaded properly, missing mentors/tags, etc.
- Many tasks—especially ones with only a single instance, meant to fix a single issue—were claimed by students who clearly needed a lot more experience with the relevant software before they could make any progress. There were thus too many instances of timing out, submitting completely irrelevant work, "yo bro er whattami suppost to do", and so on.
How we can improve it:
- In the task planning phase, mentors should only add themselves to a task when they have verified that the description is complete and makes sense. (Mentors should also discuss what constitutes a "complete description".) Tasks should only be published when they have been reviewed in this manner.
- The uploading issues will probably not be present next year because we solved them this year.
- Quoted from an email on apertium-stuff:
Each task ... would require the student to have completed work equivalent to the previous tasks. The first one or two tasks in the chain would be beginner tasks, easier than our current beginner tasks but not as easy as ["get on IRC"].
- "Download and compile one Apertium translation pair, and send a screenshot of trial translations"
- "Add 200 words to the bilingual dictionary" or "Add 1 lexical transfer rule"
- "Add 500 words to the bilingual dictionary" or "Add 10 lexical transfer rules" or "Write a constrastive grammar" or ...
- "Install a few translation pairs from your distribution software repository (or download and compile if you want to). Fork APy and run it locally on your computer. Send a screenshot of trial queries." or similarly for html-tools
- all the issue-fixing or feature-proposal tasks for APy or similarly for html-tools
- tasks which involve modifying or testing with components of both ("Fix html-tools behavior when APy is down" etc.)
Similar 'chains' could be made for begiak and the lttoolbox tasks.
- Tasks should also clearly define what it means to be "completed" so that mentors do not need to waste time commenting on irrelevant/very poor submissions.
- These structured tasks will not only solve the problems mentioned above, but also make the learning curve much less steep, encouraging more students to work with Apertium. (Apertium probably has one of the steepest learning curves of all GCI organizations.) More tasks will be completed (always encouraging!), especially the initial tasks in the chains above, and the complex tasks will receive more relevant attention.