Retrospective: Google Code-In 2017

From Apertium
Revision as of 02:46, 10 February 2018 by Shardulc (talk | contribs) (Add wiki section and revise order)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Google Code-In 2017 was certainly an overall success for Apertium. Students completed upwards of 140 tasks which is probably the highest count yet. There were a lot of new contributors, some of whom seem like they will (to varying extents) continue contributing to many different Apertium projects.

However, some aspects of Apertium's involvement in GCI can be improved so that mentors and students have an even better experience in the future.


Organizing and managing tasks

What went right:

  • There were a large number of tasks covering a wide range of Apertium systems and a wide range of skills.
  • Each task typically had more than one mentor.
  • Each task typically had a decent description of the task and links for further information.
  • As the contest progressed, creating new tasks was hassle-free.

What went wrong:

  • Some tasks were very poorly described (sometimes just one sentence, no links).
  • There was a lot of confusion around when the contest started about tasks not being uploaded properly, missing mentors/tags, etc.
  • Many tasks—especially ones with only a single instance, meant to fix a single issue—were claimed by students who clearly needed a lot more experience with the relevant software before they could make any progress. There were thus too many instances of timing out, submitting completely irrelevant work, "yo bro er whattami suppost to do", and so on.

How we can improve it:

  • In the task planning phase, mentors should only add themselves to a task when they have verified that the description is complete and makes sense. (Mentors should also discuss what constitutes a "complete description".) Tasks should only be published when they have been reviewed in this manner.
  • The uploading issues will probably not be present next year because we solved them this year.
  • Quoted from an email on apertium-stuff:

Each task ... would require the student to have completed work equivalent to the previous tasks. The first one or two tasks in the chain would be beginner tasks, easier than our current beginner tasks but not as easy as ["get on IRC"].

Chain 1:

  • "Download and compile one Apertium translation pair, and send a screenshot of trial translations"
  • "Add 200 words to the bilingual dictionary" or "Add 1 lexical transfer rule"
  • "Add 500 words to the bilingual dictionary" or "Add 10 lexical transfer rules" or "Write a constrastive grammar" or ...

Chain 2:

  • "Install a few translation pairs from your distribution software repository (or download and compile if you want to). Fork APy and run it locally on your computer. Send a screenshot of trial queries." or similarly for html-tools
  • all the issue-fixing or feature-proposal tasks for APy or similarly for html-tools
  • tasks which involve modifying or testing with components of both ("Fix html-tools behavior when APy is down" etc.)

Similar 'chains' could be made for begiak and the lttoolbox tasks.

  • Tasks should also clearly define what it means to be "completed" so that mentors do not need to waste time commenting on irrelevant/very poor submissions.
  • These structured tasks will not only solve the problems mentioned above, but also make the learning curve much less steep, encouraging more students to work with Apertium. (Apertium probably has one of the steepest learning curves of all GCI organizations.) More tasks will be completed (always encouraging!), especially the initial tasks in the chains above, and the complex tasks will receive more relevant attention.


Selecting Winners/Finalists

What went right:

  • Worthy and deserving students were selected as Winners/Finalists.

What went wrong:

  • Very few mentors were involved in the selection of Winners/Finalists.
  • The other mentors were not told when/where/how the selection discussion would take place. These details and the selected students were not known even after the discussion was over.
  • There was no transparency among mentors for what criteria were used to select Winners/Finalists.

How we can improve it:

  • It appears that this was not the case in previous years of GCI and only happened this year because of the increased workload. The next couple sections talk about managing workload so it is not discussed here.
  • In previous years, there was a spreadsheet containing the top ten students where mentors could rank the students, and the students with the highest overall rankings would be selected as Winners/Finalists. This is a good system but it has drawbacks:
    • All mentors do not interact with all students, so each mentor's ranking can only be partially meaningful.
    • The evaluation criteria are not specifically discussed or standardized.
    • Any system of ranked voting suffers from the deficiencies of Arrow's impossibility theorem.
  • A possible solution is to have a paragraph written for each student by the mentor(s) that worked most closely with them, describing the quality of their work and interaction with appropriate evidence. Then, all the mentors can read this description and follow the process of cardinal voting, i.e. each mentor assigns a numerical 'grade' to each student and the students are finally ranked by average grade. The benefits are:
    • Even if a mentor does not interact with a student they can still judge the work.
    • The 'grade' can be split into different categories, like "code quality", "style/frequency of communication", "willingness to help others", etc. Mentors would have a transparent, standardized system to evaluate students, and possibly this system could be told to students too so they know what is valued in the community.
    • Arrow's impossibility theorem does not apply to cardinal systems.
    • The results are actually more accurate (see the "psychological studies" references on the Wikipedia page).
    • No special process is required beyond a shared online spreadsheet with a sum and average value function.


Organizing mentors

What went right:

  • Mentors who were active were pretty active, on IRC and on task pages.
  • No students complained about late responses or ineffective mentoring.

What went wrong:

  • Some mentors were positively flooded with task review, with (at one point) more than 48-hour backlogs.
  • Students had little idea about when and which mentors would be available at any given time. This was especially important for tasks where only one mentor was active/was involved enough to help.

How we can improve it:

  • For the benefit of students and other mentors, mentors should publicly declare:
    • what timezone they're in
    • what rough time slots they will be available to respond to IRC, task pages, pull requests, etc.
    • what rough time slots they may be available
  • Among mentors, mentors should commit to putting in a certain number of hours on a weekly basis—this sort of declaration would help mentors keep their own commitments and also not feel guilty about not doing work which they weren't supposed to do to begin with.
  • Among mentors, mentors should also discuss when they will be on vacation, or have exams/classes/etc., so that they don't simply drop off the face of the Earth for no apparent reason in the middle of the contest.
  • Depending on general availability and time as discussed above, an appropriate number of tasks and task instances can be assigned to each mentor, to avoid demanding more work than is possible.
  • All these would be greatly helped by a mailing list for mentors.


The wiki (this wiki)

What went right:

  • The wiki contained contained answers to most FAQs asked by students.
  • begiak's .awik command helped point students to the appropriate resource.

What went wrong:

  • Students did not have much success searching the wiki by themselves. Possible reasons are:
    • they did not know they needed to access/search the wiki for their questions
    • coming from the modern Internet, they were discouraged because it required too much effort to use the wiki
    • the wiki does not make it easy to find things
(Of the above, the last one is the only one in our control, so it is the one we should fix.)
  • On a related note, the wiki is somewhat unorganized. Beyond the first level of hierarchy ('Main Page' -> 'Installation', 'HOWTO Contribute', etc.) there is not much organization and the pattern of navigation seems to depend on 'See also', 'External links', and searching unknown words in wiki pages.

How we can improve it:

  • We should add at least two more levels of hierarchy to the organization of the wiki; for example, 'Main Page' -> 'HOWTO Contribute' -> 'Pair development' -> 'Transfer rules'. This documentation exists already but it is either on large single pages whose sections do not go into much detail, or on multiple pages which have a lot of content in common so the actual unique content goes unnoticed. Adding levels of hierarchy would just mean providing a lot of links and summaries of what those links contain/what you should read if you are looking for <X>.
  • We should add documentation for an introductory, high-level overview of how translation works:
    • Description of what the analyzer, disambiguator, transfer rules, etc. do without going into details of writing dictionaries/rules but with details about what sort of input/output they have
    • A complete, annotated running example of a single translation, demonstrating a large variety of features
    • Sample language modules and translation pair which has very few lemmas/rules on purpose, but still demonstrates a large variety of features
    • What are all those file extensions? What do those files do? How do I use <X> part without the entire pipeline?
    • Links to detailed pages about each topic
  • (optional) We should purge/revise outdated content. Detail pages about pairs are often outdated and don't have clearly


The eternal git/svn issue

(Update: there is a PMC proposal about this! Many of the points mentioned are especially relevant to GCI.)