PMC proposals/Move Apertium to Github

From Apertium
< PMC proposals
Revision as of 11:10, 7 February 2018 by Bech (talk | contribs) (→‎Summary: two proposals in the same package)
Jump to navigation Jump to search


The issue of moving Apertium code partially or wholly to Github has been in debate for a long time. Previous proposals (PMC proposals/Move apertium to github and PMC proposals/Allow some code under met with a number of objections and eventually expired. This proposal attempts to address those issues and outline a modern, updated plan to move all Apertium code to Github.

Bech (talk) 12:10, 7 February 2018 (CET) This proposal as previous ones incluses two distint changes in the same package :
  1. Moving from to
  2. Moving from subversion to git software.
The opportunity of any of two changes could also be examinated separately.

Plan in brief:

  • Individual repos for each pair, language module, and tool (preserving all commit history).
  • A couple of "meta-repos" that contain submodules pointing to collections of repos.
    • e.g. apertium-staging would contain ~8 submodules pointing to each of the pairs in SVN's /staging and apertium-all would have submodules to apertium-staging, apertium-incubator, apertium-languages, etc.
    • Hierarchy maintained automatically via simple scripts and Github API.
    • Simple hierarchical interface on top of Github for easy use. (Sushain's demo version)
  • Resources such as Using git (in progress), the linked tutorials, and the Github tutorial for existing SVN developers.

Proposed by: Shardulc (talk) 00:52, 4 February 2018 (CET)
Seconded by: Sushain (talk) 00:55, 4 February 2018 (CET)

In detail

As a FOSS project, the main benefits to Apertium are:

  • Github gives Apertium higher visibility: according to people who attended the GSoC mentor summit, people often search for "github apertium" to try to find Apertium's code and are unsuccessful
    • --Mlforcada (talk) 10:09, 4 February 2018 (CET) An apology of laziness?
    • --Sushain (talk) 17:59, 4 February 2018 (CET) To some extent I agree. However, for younger individuals that have "open source === github" in their mind, it makes sense.
  • Github makes it very easy for new people to contribute because:
    • More people outside Apertium are far more familiar with Git vs. SVN (especially younger folks such as GCI/GSoC students)
    • To start contributing, a new user just has to fork the repo and send a pull request (a 'pull request' is a request to merge a patch), without requesting access on the mailing list etc.
  • Github encourages better quality code because:
    • Github would provide an excellent issue tracker for each pair/language/tool (an example and a description)
    • Github has excellent pull request review tools (shardulc notes that during GCI 2017, he asked students to make dummy pull requests in dummy Github repositories on at least three different occasions, just because line-by-line code reviews are so good)
    • Each pair/language/tool can have different package maintainers with commit access, so that pull requests are reviewed and merged by people best able to judge their quality, before giving commit access to new contributors
      • --Mlforcada (talk) 10:09, 4 February 2018 (CET) This is *very* important

Additional benefits:

  • Github provides webhooks for automatic actions, like begiak reporting on new issues; reliable APIs for scripts (more about this in Caveats); and email notifications for repos that you follow.
  • All the benefits of git for those who agree that they are benefits (lightweight and plentiful branches, better support for merging and merge-based workflows, offline commits, complete offline history), and for those who don't agree, there is the Github SVN bridge
  • Granular permissions: all contributors do not have access to literally everything—especially useful for GCI/GSoC students (more about this in Caveats)
  • Github's web interface has a feature set that eclipses SourceForge's interface especially when it comes to navigating code and it receives frequent improvements/updates
    • --Mlforcada (talk) 10:09, 4 February 2018 (CET) It would be nice to give more objective wording for this.
    • --Sushain (talk) 17:59, 4 February 2018 (CET) My apologies; these are mostly my words here. I have revised them to be a bit more objective.
  • Each repo has its own version history and releases
  • Package maintainers can add continuous integration tools or enforce workflows for specific pairs/languages/tools as required; these decisions can be taken independently for each pair/language/tool by contributors involved with it
    • --Mlforcada (talk) 10:09, 4 February 2018 (CET) Please explain.
    • --Sushain (talk) 17:59, 4 February 2018 (CET) GitHub provides support for lots of tooling and settings. For example, I can easily ensure that a repo's master branch doesn't get pull requests merged until status checks pass. These status checks can be anything from code linting, unit/integration tests and coverage tools (i.e. continuous integration). These tools have to be configured at a per repo basis and would be a nightmare for a monorepo. Just consider how large the configuration would get to test everything and many [programming] languages we'd need to install to test all of Apertium's different tools, core, modules, etc. For an example of continuous integration in action, see: There are lots of more complex uses. e.g. a UI library could make it so that every pull request builds a version of the docs with that pulls changes and hosts it online. For an Apertium module, the build tool could be easily configured to emit a simple HTML page artifact that contains a bunch of test phrases that could be verified for accuracy before merging the code (unit tests are sometimes < eyeballs).
    • --Shardulc (talk) 18:59, 4 February 2018 (CET) About "workflows": if all the active contributors to a package agree, then more 'git-like' development workflows could be used. For example, the 'apertium/apertium-xxx' repo has a 'master' branch and a 'development' branch. The repo is forked by each of the developer 'shardulc' who makes lots of feature-specific branches and opens pull requests from 'shardulc/apertium-xxx/new-feature' to 'apertium/apertium-xxx/development'. Finally, every once in a while, the features in 'development' are reviewed and merged into 'master'.

Communicating the change:

  • Thanks to a recent PMC election, we have a list of contributors and email addresses (which is complete as far as we know). This can be used to:
    • announce the change to all contributors
    • provide links to help pages, documentation of the change, etc.
    • provide email addresses and IRC nicks to contact if any further help is needed
  • Limited backwards-compatibility:
    • the SVN repo can be 'locked': anybody trying to commit will be presented with a message notifying them of the change (the lock can then be manually temporarily disabled)
    • the SVN repo directories will be replaced with svn:externals pointing to the Github repos (see Caveats)

Miscellaneous concerns:

  • Mailing lists: should probably be preserved on SourceForge for now until/unless we choose to switch to another solution or self-host them.
  • Existing issues: Sushain volunteers to manually transpose (or find an automatic solution) to moving our existing issues (which are not too many).


Objection: It is harder to work with the meta-repos. git submodule commands are gnarly.

  • Aliases and cheatsheets will remedy this effectively.
  • Very few people check out e.g. the entire staging directory at once anyway.

Objection: If X number of pairs are changed at once, it will create X different commits.

  • Already happens for most developers anyway, who have specific pairs/languages checked out and not the whole tree.
  • (arguable) Is this really a bad thing? Doesn't it make sense that each pair/language/tool can stand on its own with its own history, with no connections to others?

Objection: Selective granular permissions are Bureaucratic and Bad.

  • They are not compulsory. If wanted, everyone could have access to everything with "organization permissions" instead of "repository permissions", but this is not desirable because:
  • Most developers work with specific languages and pairs and do not need access to everything. A single compromised account should not threaten all the code.

Objection: How will the meta-repos be kept up-to-date with the latest versions of pairs/languages/tools?

  • With scripts! Github provides a clean, reliable API for doing actions like these. The required scripts are very simple.
  • Sushain is willing to write the scripts and Tino is willing to host them (and perhaps do code review). The scripts themselves can be on Github so that if required, others can maintain them.

Objection: GitHub doesn’t provide a nice interface to view repos in a tree format like SourceForge.

  • (as mentioned in summary) Sushain will! See this demo page which is a simple, elegant, single HTML file.
  • The page is automatically generated from tags on repos which makes it trivial to extend to all existing repos and new ones.
  • As before, this code can be on Github so others can review and maintain it if required.

Objection: Existing developers will be inconvenienced.

  • (as mentioned in summary) Resources such as Using git (in progress), the linked tutorials, and the Github tutorial can help existing developers.
  • (as mentioned in details) The Github SVN bridge can be used by those not comfortable with git.
  • In the long run, 'new' developers will outnumber the current existing developers.
  • For backwards-compatibility, the SVN repo can be populated with svn:externals that point to Github's SVN bridge. Anyone checking out SVN repos will be effectively using the Github SVN bridge instead.

Objection: There will be ~500 repositories under Apertium!

  • Nobody will have to navigate those repositories directly. An interface like Sushain's demo interface solves that problem.
  • The philosophy of git and Github support large numbers of repositories, each serving distinct purposes, over the alternative.
  • It is better than having one single repository for all the code, or single repositories for each subtree, etc. for many reasons as mentioned in Details.


Non-PMC signatories:



  • Tino Didriksen (talk) 13:32, 4 February 2018 (CET)
  • Xavi Ivars (talk) 14:04, 4 February 2018 (CET)
  • Firespeaker (talk) 17:28, 4 February 2018 (CET)
    • Mikel's comments and questions are good—I'd like to see the questions addressed, but am otherwise on board. Also, would it make sense to host the repository organising page on I think I saw something like this recently that some other org had set up, but I don't remember where.
      • This would work. I'm happy to set that up. Presumably the repo with this page would also have the scripts to keep the submodules up-to-date, etc. We can have a redirect or reverse proxy of it from the site. I have also responded to Mikel's comments. -- Sushain (talk) 17:45, 4 February 2018 (CET)
      • I also think the page is a good idea. Github Pages makes it so that the organization 'apertium' gets the domain '' to host anything we want. Also, we can have unlimited pages for 'projects', such as '' if needed. Shardulc (talk) 18:50, 4 February 2018 (CET)