Talk:PMC proposals/Move Apertium to Github

From Apertium
Revision as of 02:59, 28 October 2017 by Sushain (talk | contribs)
Jump to navigation Jump to search

Apertium on GitHub

Reasons to Switch

  1. GitHub’s excellent issue tracker
  2. More people are far more familiar with Git vs. SVN (especially younger folks, see GCI/GSoC)
  3. More people have GitHub accounts, easier to start-up for a new user
  4. GitHub’s interface is far superior to SourceForge’s interface
  5. Avoids SourceForge’s downtime (not so bad lately)
  6. SourceForge gives an awful impression
  7. More visibility as an FOSS project (people browse GitHub)

Prevailing Approaches

Common pros/cons are excluded for the sake of brevity.

Approach 1

A monorepo with all the lingustic data, pairs and language modules. Other folders in SVN like the core engine and peripheral tools (e.g. APy) would live in their own repos.


  • Large-scale editing of e.g. 15 pairs is easy.
  • There are no meta-repos or submodules to deal with.
  • GitHub’s interface can be used directly.


  • The monorepo would be massive (> 3 GB).
    • Most devs (aside from the couple core devs) would have to use GitHub’s SVN bridge to work on a pair.
      • This is highly contradictory to working on GitHub
      • People will have to learn SVN, negating some reasons to switch
    • Diluted usefulness of branches, PRs and hooks
    • GitHub doesn’t necessarily allow repos larger than 1 GB (unclear whether this limit refers to bare repo). If GitHub decides to stop us at some point after we switch, that’s really bad.
  • Everyone will disable email notifications (“watching” a repo) since there will be too much spam
  • Massive number of issue labels to curate and apply (non-members cannot tag an issue when submitting), reducing the effectiveness of the issue tracker
  • Commit access will continue to give write access to everything
  • Contradictory to the Git/GitHub philosophy (bad impression)

Approach 2

Individual repos for each pair, language module and tools. A couple of “meta-repos” that contain submodules pointing to collections of repos, e.g. apertium-staging would contain ~8 submodules pointing to each of the pairs in SVN’s /staging and apertium-all would have submodules to apertium-staging, apertium-incbuator, apertium-languages, etc. This hierarchy would be maintained via GitHub’s repo tags (a.k.a. “topics”), i.e. apertium-xxx-yyy could be marked with the incubator tag to end up in apertium-incubator.


  • Usable issue tracker for each repo
  • Fits into the Git/GitHub philosophy
  • Everyone can contribute using Git (no need for SVN bridge)
  • Familiar branching, PR and hooks that work as expected
  • Email notifications and watching repos is useful
  • Granular permissions (not everyone has access to literally everything, especially useful for GCI/GSoC)
    • RESPONSE: Could be considered more bureaucratic


  • Harder for people who make changes to lots of pairs at the same time (i.e. couple of core devs)
    • Commands are more gnarly (git submodule can be pretty unintuitive)
      • RESPONSE: Possible to mitigate with aliases and cheat sheets
    • An analogous change to 15 pairs will result in 15 different commits, each repo has its own history.
  • Somewhat harder for people who use the meta-repos
    • RESPONSE: It’s really not that difficult to checkout (git submodule update --recursive --init) and pull updates to a meta-repo (git pull --recurse-submodules) and with aliases it can be even shorter.
  • Requires tooling to keep meta-repos up-to-date
    • RESPONSE: These are super simple scripts based on GitHub’s reliable API. Sushain is willing to write them and Tino is willing to host (and perhaps code review).
  • GitHub doesn’t provide a nice interface to view repos in a tree format
    • RESPONSE: Sushain will though! See this page that can be trivially finished to cover all our repos and is a very simple single HTML file (and pretty nice IMO).