Difference between revisions of "Talk:PMC proposals/Move Apertium to Github"

From Apertium
Jump to navigation Jump to search
 
(13 intermediate revisions by 4 users not shown)
Line 6: Line 6:
 
* Avoids SourceForge’s downtime (not so bad lately)
 
* Avoids SourceForge’s downtime (not so bad lately)
 
* SourceForge gives an awful impression
 
* SourceForge gives an awful impression
* More visibility as an FOSS project (people browse GitHub)
+
* More visibility as an FOSS project
  +
** GitHub has become the de-facto host for open source: people searches for "github apertium" to find apertium's code
   
 
== Prevailing Approaches ==
 
== Prevailing Approaches ==
Line 12: Line 13:
   
 
=== Approach 1 ===
 
=== Approach 1 ===
  +
  +
==== Variant A ====
 
A monorepo with all the lingustic data, pairs and language modules. Other folders in SVN like the core engine and peripheral tools (e.g. APy) would live in their own repos.
 
A monorepo with all the lingustic data, pairs and language modules. Other folders in SVN like the core engine and peripheral tools (e.g. APy) would live in their own repos.
   
Line 18: Line 21:
 
* There are no meta-repos or submodules to deal with.
 
* There are no meta-repos or submodules to deal with.
 
* GitHub’s interface can be used directly.
 
* GitHub’s interface can be used directly.
  +
* Less need for extremely complicated git commands
  +
* Possible to do partial checkouts using SVN
  +
** This removes completely possible pro #1
   
 
==== Cons ====
 
==== Cons ====
Line 24: Line 30:
 
*** This is highly contradictory to working on GitHub
 
*** This is highly contradictory to working on GitHub
 
*** People new to Apertium will have to learn SVN, negating some reasons to switch
 
*** People new to Apertium will have to learn SVN, negating some reasons to switch
  +
*** '''ALTERNATIVE VIEWPOINT:''' there is no "learning SVN", it's three commands.
 
** Diluted usefulness of branches, PRs and hooks
 
** Diluted usefulness of branches, PRs and hooks
 
** GitHub doesn’t necessarily allow repos larger than 1 GB (unclear whether this limit refers to bare repo). If GitHub decides to stop us at some point after we switch, that’s really bad.
 
** GitHub doesn’t necessarily allow repos larger than 1 GB (unclear whether this limit refers to bare repo). If GitHub decides to stop us at some point after we switch, that’s really bad.
Line 30: Line 37:
 
* Commit access will continue to give write access to everything
 
* Commit access will continue to give write access to everything
 
* Contradictory to the Git/GitHub philosophy (bad impression)
 
* Contradictory to the Git/GitHub philosophy (bad impression)
  +
* Given that the usual recovery/fix for repo inconsistencies is to wipe and re-clone, having to re-clone a huge monorepo would greatly exacerbate those kinds of issues
  +
  +
==== Variant B ====
  +
Several monorepos, one for each of:
  +
* incubator
  +
* pairs
  +
* languages
  +
* tools
  +
  +
==== Pros ====
  +
* Large-scale editing of e.g. 15 pairs is easy.
  +
* There are no meta-repos or submodules to deal with.
  +
* GitHub’s interface can be used directly.
  +
* Less need for extremely complicated git commands
  +
* Possible to do partial checkouts using SVN
  +
  +
==== Cons ====
  +
* All the cons in ''Variant A'', minus:
  +
** Repos will be smaller than the massive monorepo
  +
* Moving a package between release states and preserving history is complicated (can be scripted)
  +
  +
==== Variant C ====
  +
Several repos:
  +
* One for each of the modules in languages/
  +
* One for all the released pairs
  +
* One for incubator
  +
* One for each of the core tools
   
 
=== Approach 2 ===
 
=== Approach 2 ===
Line 43: Line 77:
 
* Granular permissions (not everyone has access to literally everything, especially useful for GCI/GSoC)
 
* Granular permissions (not everyone has access to literally everything, especially useful for GCI/GSoC)
 
** <strong>RESPONSE:</strong> Could be considered more bureaucratic
 
** <strong>RESPONSE:</strong> Could be considered more bureaucratic
  +
*** Re-response: not really. Granular permissoins are a (good) option, but it's not mandatory. We could use "org" permissions instead of "repo" permissions
  +
* Empowerment for package maintainers:
  +
** They could enforce workflows (code reviews, etc) for specific packages, and accept easily patches from other people (via pull requests) before requesting commit access.
   
 
==== Cons ====
 
==== Cons ====
Line 49: Line 86:
 
*** <strong>RESPONSE:</strong> Possible to mitigate with aliases and cheat sheets
 
*** <strong>RESPONSE:</strong> Possible to mitigate with aliases and cheat sheets
 
** An analogous change to 15 pairs will result in 15 different commits, each repo has its own history (''both pro and con'').
 
** An analogous change to 15 pairs will result in 15 different commits, each repo has its own history (''both pro and con'').
  +
*** <strong>RESPONSE:</strong> Already happening for most of the people. Almost no-one has the whole SVN repo, but multiple SVN subfolders.
 
* Somewhat harder for people who use the meta-repos
 
* Somewhat harder for people who use the meta-repos
 
** <strong>RESPONSE:</strong> It’s really not that difficult to checkout (<code>git submodule update --recursive --init</code>) and pull updates to a meta-repo (<code>git pull --recurse-submodules</code>) and with aliases it can be even shorter.
 
** <strong>RESPONSE:</strong> It’s really not that difficult to checkout (<code>git submodule update --recursive --init</code>) and pull updates to a meta-repo (<code>git pull --recurse-submodules</code>) and with aliases it can be even shorter.
Line 54: Line 92:
 
** <strong>RESPONSE:</strong> These are super simple scripts based on GitHub’s reliable API. Sushain is willing to write them and Tino is willing to host (and perhaps code review).
 
** <strong>RESPONSE:</strong> These are super simple scripts based on GitHub’s reliable API. Sushain is willing to write them and Tino is willing to host (and perhaps code review).
 
* GitHub doesn’t provide a nice interface to view repos in a tree format
 
* GitHub doesn’t provide a nice interface to view repos in a tree format
** <strong>RESPONSE:</strong> Sushain will though! See this [https://rawgit.com/sushain97/apertium-on-github/master/source-browser.html page] that can be trivially finished to cover all our repos and is a very simple single HTML file (and pretty nice IMO). This page can be automatically generated and thus maintained from the repo tags.
+
** <strong>RESPONSE:</strong> Sushain will though! See this [https://rawgit.com/sushain97/apertium-on-github/master/source-browser.html page] that can be trivially finished to cover all our repos and is a very simple single HTML file (and pretty IMO). This page is automatically generated from the repo tags.
  +
  +
== Related Concerns ==
  +
* Mailing lists - should probably be preserved on SourceForge for now until/unless we choose to switch to another solution or self-host them.
  +
* Existing issues - Sushain volunteers to manually transpose (or find an automatic solution) to moving our existing issues (pretty small #)

Latest revision as of 17:55, 1 February 2018

Reasons to Switch[edit]

  • GitHub’s excellent issue tracker
  • More people outside Apertium are far more familiar with Git vs. SVN (especially younger folks, see GCI/GSoC)
  • More people outside Apertium have GitHub accounts, easier to start-up for a new user
  • GitHub’s interface is far superior to SourceForge’s interface
  • Avoids SourceForge’s downtime (not so bad lately)
  • SourceForge gives an awful impression
  • More visibility as an FOSS project
    • GitHub has become the de-facto host for open source: people searches for "github apertium" to find apertium's code

Prevailing Approaches[edit]

Common pros/cons are excluded for the sake of brevity.

Approach 1[edit]

Variant A[edit]

A monorepo with all the lingustic data, pairs and language modules. Other folders in SVN like the core engine and peripheral tools (e.g. APy) would live in their own repos.

Pros[edit]

  • Large-scale editing of e.g. 15 pairs is easy.
  • There are no meta-repos or submodules to deal with.
  • GitHub’s interface can be used directly.
  • Less need for extremely complicated git commands
  • Possible to do partial checkouts using SVN
    • This removes completely possible pro #1

Cons[edit]

  • The monorepo would be massive (> 3 GB).
    • Most devs (aside from the couple core devs) would have to use GitHub’s SVN bridge to work on a pair.
      • This is highly contradictory to working on GitHub
      • People new to Apertium will have to learn SVN, negating some reasons to switch
      • ALTERNATIVE VIEWPOINT: there is no "learning SVN", it's three commands.
    • Diluted usefulness of branches, PRs and hooks
    • GitHub doesn’t necessarily allow repos larger than 1 GB (unclear whether this limit refers to bare repo). If GitHub decides to stop us at some point after we switch, that’s really bad.
  • Everyone will disable email notifications (“watching” a repo) since there will be too much spam
  • Massive number of issue labels to curate and apply (non-members cannot tag an issue when submitting), reducing the effectiveness of the issue tracker
  • Commit access will continue to give write access to everything
  • Contradictory to the Git/GitHub philosophy (bad impression)
  • Given that the usual recovery/fix for repo inconsistencies is to wipe and re-clone, having to re-clone a huge monorepo would greatly exacerbate those kinds of issues

Variant B[edit]

Several monorepos, one for each of:

  • incubator
  • pairs
  • languages
  • tools

Pros[edit]

  • Large-scale editing of e.g. 15 pairs is easy.
  • There are no meta-repos or submodules to deal with.
  • GitHub’s interface can be used directly.
  • Less need for extremely complicated git commands
  • Possible to do partial checkouts using SVN

Cons[edit]

  • All the cons in Variant A, minus:
    • Repos will be smaller than the massive monorepo
  • Moving a package between release states and preserving history is complicated (can be scripted)

Variant C[edit]

Several repos:

  • One for each of the modules in languages/
  • One for all the released pairs
  • One for incubator
  • One for each of the core tools

Approach 2[edit]

Individual repos for each pair, language module and tools. A couple of “meta-repos” that contain submodules pointing to collections of repos, e.g. apertium-staging would contain ~8 submodules pointing to each of the pairs in SVN’s /staging and apertium-all would have submodules to apertium-staging, apertium-incbuator, apertium-languages, etc. This hierarchy would be maintained via GitHub’s repo tags (a.k.a. “topics”), i.e. apertium-xxx-yyy could be marked with the incubator tag to end up in apertium-incubator.

Pros[edit]

  • Usable issue tracker for each repo
  • Fits into the Git/GitHub philosophy
  • People who wish to use Git can contribute using that (while it's still possible to use the SVN bridge for those who want that)
  • Familiar branching, PR and hooks that work as expected
  • Email notifications and watching repos is useful
  • An analogous change to 15 pairs will result in 15 different commits, each repo has its own history (both pro and con).
  • Granular permissions (not everyone has access to literally everything, especially useful for GCI/GSoC)
    • RESPONSE: Could be considered more bureaucratic
      • Re-response: not really. Granular permissoins are a (good) option, but it's not mandatory. We could use "org" permissions instead of "repo" permissions
  • Empowerment for package maintainers:
    • They could enforce workflows (code reviews, etc) for specific packages, and accept easily patches from other people (via pull requests) before requesting commit access.

Cons[edit]

  • Harder for people who make changes to lots of pairs at the same time (i.e. couple of core devs)
    • Commands are more gnarly (git submodule can be pretty unintuitive)
      • RESPONSE: Possible to mitigate with aliases and cheat sheets
    • An analogous change to 15 pairs will result in 15 different commits, each repo has its own history (both pro and con).
      • RESPONSE: Already happening for most of the people. Almost no-one has the whole SVN repo, but multiple SVN subfolders.
  • Somewhat harder for people who use the meta-repos
    • RESPONSE: It’s really not that difficult to checkout (git submodule update --recursive --init) and pull updates to a meta-repo (git pull --recurse-submodules) and with aliases it can be even shorter.
  • Requires tooling to keep meta-repos up-to-date
    • RESPONSE: These are super simple scripts based on GitHub’s reliable API. Sushain is willing to write them and Tino is willing to host (and perhaps code review).
  • GitHub doesn’t provide a nice interface to view repos in a tree format
    • RESPONSE: Sushain will though! See this page that can be trivially finished to cover all our repos and is a very simple single HTML file (and pretty IMO). This page is automatically generated from the repo tags.

Related Concerns[edit]

  • Mailing lists - should probably be preserved on SourceForge for now until/unless we choose to switch to another solution or self-host them.
  • Existing issues - Sushain volunteers to manually transpose (or find an automatic solution) to moving our existing issues (pretty small #)