PMC proposals/Move Apertium to Github
The issue of moving Apertium code partially or wholly to Github has been in debate for a long time. Previous proposals (PMC proposals/Move apertium to github and PMC proposals/Allow some code under github.com/apertium) met with a number of objections and eventually expired. This proposal attempts to address those issues and outline a modern, updated plan to move all Apertium code to Github.
Plan in brief:
- Individual repos for each pair, language module, and tool (preserving all commit history).
- A couple of "meta-repos" that contain submodules pointing to collections of repos.
apertium-stagingwould contain ~8 submodules pointing to each of the pairs in SVN's
apertium-allwould have submodules to
- Hierarchy maintained automatically via simple scripts and Github API.
- Simple hierarchical interface on top of Github for easy use. (Sushain's demo version)
- Resources such as Using git (in progress), the linked tutorials, and the Github tutorial for existing SVN developers.
As a FOSS project, the main benefits to Apertium are:
- Github gives Apertium higher visibility: according to people who attended the GSoC mentor summit, people often search for "github apertium" to try to find Apertium's code and are unsuccessful
- Github makes it very easy for new people to contribute because:
- More people outside Apertium are far more familiar with Git vs. SVN (especially younger folks such as GCI/GSoC students)
- To start contributing, a new user just has to fork the repo and send a pull request (a 'pull request' is a request to merge a patch), without requesting access on the mailing list etc.
- Github encourages better quality code because:
- Github would provide an excellent issue tracker for each pair/language/tool (an example and a description)
- Github has excellent pull request review tools (shardulc notes that during GCI 2017, he asked students to make dummy pull requests in dummy Github repositories on at least three different occasions, just because line-by-line code reviews are so good)
- Each pair/language/tool can have different package maintainers with commit access, so that pull requests are reviewed and merged by people best able to judge their quality, before giving commit access to new contributors
- Github provides webhooks for automatic actions, like begiak reporting on new issues; reliable APIs for scripts (more about this in Caveats); and email notifications for repos that you follow.
- All the benefits of git for those who agree that they are benefits (lightweight and plentiful branches, better support for merging and merge-based workflows, offline commits, complete offline history), and for those who don't agree, there is the Github SVN bridge
- Granular permissions: all contributors do not have access to literally everything—especially useful for GCI/GSoC students (more about this in Caveats)
- Github's web interface is far superior to SourceForge's interface and the clunky SourceForge page gives an awful first impression
- Each repo has its own version history and releases
- Package maintainers can add continuous integration tools or enforce workflows for specific pairs/languages/tools as required; these decisions can be taken independently for each pair/language/tool by contributors involved with it
- Mailing lists: should probably be preserved on SourceForge for now until/unless we choose to switch to another solution or self-host them.
- Existing issues: Sushain volunteers to manually transpose (or find an automatic solution) to moving our existing issues (which are not too many).
Objection: It is harder to work with the meta-repos.
git submodule commands are gnarly.
- Aliases and cheatsheets will remedy this effectively.
- Very few people check out e.g. the entire staging directory at once anyway.
Objection: If X number of pairs are changed at once, it will create X different commits.
- Already happens for most developers anyway, who have specific pairs/languages checked out and not the whole tree.
- (arguable) Is this really a bad thing? Doesn't it make sense that each pair/language/tool can stand on its own with its own history, with no connections to others?
Objection: Selective granular permissions are Bureaucratic and Bad.
- They are not compulsory. If wanted, everyone could have access to everything with "organization permissions" instead of "repository permissions", but this is not desirable because:
- Most developers work with specific languages and pairs and do not need access to everything. A single compromised account should not threaten all the code.
Objection: How will the meta-repos be kept up-to-date with the latest versions of pairs/languages/tools?
- With scripts! Github provides a clean, reliable API for doing actions like these. The required scripts are very simple.
- Sushain is willing to write the scripts and Tino is willing to host them (and perhaps do code review). The scripts themselves can be on Github so that if required, others can maintain them.
Objection: GitHub doesn’t provide a nice interface to view repos in a tree format like SourceForge.
- (as mentioned in summary) Sushain will! See this demo page which is a simple, elegant, single HTML file.
- The page is automatically generated from tags on repos which makes it trivial to extend to all existing repos and new ones.
- As before, this code can be on Github so others can review and maintain it if required.
Objection: Existing developers will be inconvenienced.
- (as mentioned in summary) Resources such as Using git (in progress), the linked tutorials, and the Github tutorial can help existing developers.
- (as mentioned in details) The Github SVN bridge can be used by those not comfortable with git.
- In the long run, 'new' developers will outnumber the current existing developers.
- For backwards-compatibility, the SVN repo can be populated with svn:externals that point to Github's SVN bridge. Anyone checking out SVN repos will be effectively using the Github SVN bridge instead.
Objection: There will be ~500 repositories under Apertium!
- Nobody will have to navigate those repositories directly. An interface like Sushain's demo interface solves that problem.
- The philosophy of git and Github support large numbers of repositories, each serving distinct purposes, over the alternative.
- It is better than having one single repository for all the code, or single repositories for each subtree, etc. for many reasons as mentioned in Details.