PMC proposals/Move apertium to github
git provides a large number of advantages over subversion, including a very good branching mechanism, offline commit history, a bisection tool for locating broken commits, and excellent merge/rebase capabilities. The built-in documentation is very good, and (unlike svn) the command line git command comes in glorious ANSI colour :) Since you have the full history on your computer, you can grep through commits or check out earlier versions _quickly_, without even being online. Repositories also tend to take less disk space.
Making use of a service such as Github.com would also allow for each apertium module to be in a separate repository, with the possibility for creating central repositories (such as incubator) which link to all of the included modules. Github also provides an issue tracker and a system for making commits in a personal fork of the upstream repository, then requesting that your changes be pulled into upstream. Note that apertium can retain its current method of allowing people to commit directly, but retain the option of using pull requests for those who don't plan to contribute regularly. Sourceforge could be retained for mailing lists and similar services.
Migration of the repositories from subversion to git should be relatively simple. Tools exist for creating git repositories from subversion while retaining all commit history. The migration should begin with smaller apertium modules, such as the contents of nursery and incubator. The more central modules, such as lttoolbox and apertium itself, can be moved last. Documentation will need be updated, but a simple guide similar to https://wiki.gnome.org/TranslationProject/GitHowTo should be sufficient. Much of the information contained therein is probably not necessary for apertium workflow, making for a simpler, easier-to-write document. For more complex requirements, the existing git documentation is excellent and there are many resources for a variety of git recipes. I will create a draft version of a document covering apertium general use prior to the beginning of the move.
Proposed by: User:Leftmostcat
- Github Organizations: https://help.github.com/categories/2/articles
- GH Org Teams: https://help.github.com/articles/how-do-i-set-up-a-team
- Example of similar structure: https://github.com/metabrainz
- The svn repo contains several larger binaries and their history. The total sum of those would need to be cloned for every person who intends to seriously work with the subproject. A shallow clone (equivalent to svn checkout) can only be used for basic patchwork (cannot clone, fetch, push into, or push from shallow clones). See https://git.wiki.kernel.org/index.php/GitFaq#How_do_I_do_a_quick_clone_without_history_revisions.3F and following point. Tino Didriksen 16:56, 6 September 2013 (UTC)
- Because of the ability to separate repositories, the impact of this would be minimized. To work on a language pair, it would only be necessary to clone the pair itself. —Leftmostcat 17:16, 6 September 2013 (UTC)
- I once (about 2 years ago?) tried to checkout all of apertium svn into one big git repo (ie. with full history). It took less space than the SVN checkout (git is quite good at compressing history because it has to, SVN keeps only local copy of everything and thus doesn't bother?) --unhammer 07:59, 8 September 2013 (UTC)
- s/local copy/local copies/ -- Jimregan 15:09, 9 September 2013 (UTC)
- I would not recommend shallow clones, since 1) most apertiumers will be new to git, and it just adds more complexity 2) you typically don't save much drive space: http://blogs.gnome.org/simos/2009/04/18/git-clones-vs-shallow-git-clones/ 3) people will be checking out a repo at a time, not everything that was in SVN, and 4) maybe it's not such a bad thing that repos with many versions of big binaries stand out like a sore thumb ;-) --unhammer 07:59, 8 September 2013 (UTC)
- "Note that apertium can retain its current method of allowing people to commit directly". Yuck. Github makes pulling easy enough that this should never, ever be considered. Also, the biggest benefit of Github is "drive by contributions" -- there's no need to be registered with a project to contribute, and forking is simplified to the point that you can just click 'edit' on a file and it does it transparently (and edit in the browser, without needing to download anything). I think that's a better selling point to most people than things like bisect :) -- Jimregan 15:28, 9 September 2013 (UTC)
The apertium github organisation is at https://github.com/apertium?tab=members