Difference between revisions of "Talk:PMC proposals/Move Apertium to Github"

From Apertium
Jump to navigation Jump to search
 
(16 intermediate revisions by 4 users not shown)
Line 1: Line 1:
<h2>Reasons to Switch</h2>
== Reasons to Switch ==
* GitHub’s excellent issue tracker
<ol>
* More people outside Apertium are far more familiar with Git vs. SVN (especially younger folks, see GCI/GSoC)
<li>GitHub’s excellent issue tracker</li>
<li>More people outside Apertium are far more familiar with Git vs. SVN (especially younger folks, see GCI/GSoC)</li>
* More people outside Apertium have GitHub accounts, easier to start-up for a new user
* GitHub’s interface is far superior to SourceForge’s interface
<li>More people outside Apertium have GitHub accounts, easier to start-up for a new user</li>
* Avoids SourceForge’s downtime (not so bad lately)
<li>GitHub’s interface is far superior to SourceForge’s interface</li>
* SourceForge gives an awful impression
<li>Avoids SourceForge’s downtime (not so bad lately)</li>
* More visibility as an FOSS project
<li>SourceForge gives an awful impression</li>
** GitHub has become the de-facto host for open source: people searches for "github apertium" to find apertium's code
<li>More visibility as an FOSS project (people browse GitHub)</li>
</ol>


<h2>Prevailing Approaches</h2>
== Prevailing Approaches ==
<p>Common pros/cons are excluded for the sake of brevity.</p>
Common pros/cons are excluded for the sake of brevity.
<h3>Approach 1</h3>
<p>A monorepo with all the lingustic data, pairs and language modules. Other folders in SVN like the core engine and peripheral tools (e.g. APy) would live in their own repos.</p>
<h4>Pros</h4>
<ul>
<li>Large-scale editing of e.g. 15 pairs is easy.</li>
<li>There are no meta-repos or submodules to deal with.</li>
<li>GitHub’s interface can be used directly.</li>
</ul>
<h4>Cons</h4>
<ul>
<li>The monorepo would be massive (&gt; 3 GB).
<ul>
<li>Most devs (aside from the couple core devs) would have to use GitHub’s SVN bridge to work on a pair.
<ul>
<li>This is highly contradictory to working on GitHub</li>
<li>People new to Apertium will have to learn SVN, negating some reasons to switch</li>
</ul>
</li>
<li>Diluted usefulness of branches, PRs and hooks</li>
<li>GitHub doesn’t necessarily allow repos larger than 1 GB (unclear whether this limit refers to bare repo). If GitHub decides to stop us at some point after we switch, that’s really bad.</li>
</ul>
</li>
<li>Everyone will disable email notifications (“watching” a repo) since there will be too much spam</li>
<li>Massive number of issue labels to curate and apply (non-members cannot tag an issue when submitting), reducing the effectiveness of the issue tracker</li>
<li>Commit access will continue to give write access to everything</li>
<li>Contradictory to the Git/GitHub philosophy (bad impression)</li>
</ul>


<h3>Approach 2</h3>
=== Approach 1 ===

<p>Individual repos for each pair, language module and tools. A couple of “meta-repos” that contain submodules pointing to collections of repos, e.g. <code>apertium-staging</code> would contain ~8 submodules pointing to each of the pairs in SVN’s /staging and <code>apertium-all</code> would have submodules to <code>apertium-staging</code>, <code>apertium-incbuator</code>, <code>apertium-languages</code>, etc. This hierarchy would be maintained via GitHub’s repo tags (a.k.a. “topics”), i.e. apertium-xxx-yyy could be marked with the <code>incubator</code> tag to end up in <code>apertium-incubator</code>.</p>
==== Variant A ====
<h4>Pros</h4>
A monorepo with all the lingustic data, pairs and language modules. Other folders in SVN like the core engine and peripheral tools (e.g. APy) would live in their own repos.
<ul>

<li>Usable issue tracker for each repo</li>
==== Pros ====
<li>Fits into the Git/GitHub philosophy</li>
* Large-scale editing of e.g. 15 pairs is easy.
<li>People who wish to use Git can contribute using that (while it's still possible to use the SVN bridge for those who want that)</li>
* There are no meta-repos or submodules to deal with.
<li>Familiar branching, PR and hooks that work as expected</li>
* GitHub’s interface can be used directly.
<li>Email notifications and watching repos is useful</li>
* Less need for extremely complicated git commands
<li>Granular permissions (not everyone has access to literally everything, especially useful for GCI/GSoC)
* Possible to do partial checkouts using SVN
<ul>
** This removes completely possible pro #1
<li><strong>RESPONSE:</strong> Could be considered more bureaucratic</li>

</ul>
==== Cons ====
</li>
* The monorepo would be massive (&gt; 3 GB).
</ul>
** Most devs (aside from the couple core devs) would have to use GitHub’s SVN bridge to work on a pair.
<h4>Cons</h4>
*** This is highly contradictory to working on GitHub
<ul>
*** People new to Apertium will have to learn SVN, negating some reasons to switch
<li>Harder for people who make changes to lots of pairs at the same time (i.e. couple of core devs)
*** '''ALTERNATIVE VIEWPOINT:''' there is no "learning SVN", it's three commands.
<ul>
** Diluted usefulness of branches, PRs and hooks
<li>Commands are more gnarly (<code>git submodule</code> can be pretty unintuitive)
** GitHub doesn’t necessarily allow repos larger than 1 GB (unclear whether this limit refers to bare repo). If GitHub decides to stop us at some point after we switch, that’s really bad.
<ul>
* Everyone will disable email notifications (“watching” a repo) since there will be too much spam
<li><strong>RESPONSE:</strong> Possible to mitigate with aliases and cheat sheets</li>
* Massive number of issue labels to curate and apply (non-members cannot tag an issue when submitting), reducing the effectiveness of the issue tracker
</ul>
* Commit access will continue to give write access to everything
</li>
* Contradictory to the Git/GitHub philosophy (bad impression)
<li>An analogous change to 15 pairs will result in 15 different commits, each repo has its own history.</li>
* Given that the usual recovery/fix for repo inconsistencies is to wipe and re-clone, having to re-clone a huge monorepo would greatly exacerbate those kinds of issues
</ul>

</li>
==== Variant B ====
<li>Somewhat harder for people who use the meta-repos
Several monorepos, one for each of:
<ul>
* incubator
<li><strong>RESPONSE:</strong> It’s really not that difficult to checkout (<code>git submodule update --recursive --init</code>) and pull updates to a meta-repo (<code>git pull --recurse-submodules</code>) and with aliases it can be even shorter.</li>
* pairs
</ul>
* languages
</li>
* tools
<li>Requires tooling to keep meta-repos up-to-date

<ul>
==== Pros ====
<li><strong>RESPONSE:</strong> These are super simple scripts based on GitHub’s reliable API. Sushain is willing to write them and Tino is willing to host (and perhaps code review).</li>
* Large-scale editing of e.g. 15 pairs is easy.
</ul>
* There are no meta-repos or submodules to deal with.
</li>
* GitHub’s interface can be used directly.
<li>GitHub doesn’t provide a nice interface to view repos in a tree format
* Less need for extremely complicated git commands
<ul>
* Possible to do partial checkouts using SVN
<li><strong>RESPONSE:</strong> Sushain will though! See this [https://rawgit.com/sushain97/apertium-on-github/master/source-browser.html page] that can be trivially finished to cover all our repos and is a very simple single HTML file (and pretty nice IMO).</li>

</ul>
==== Cons ====
</li>
* All the cons in ''Variant A'', minus:
</ul>
** Repos will be smaller than the massive monorepo
* Moving a package between release states and preserving history is complicated (can be scripted)

==== Variant C ====
Several repos:
* One for each of the modules in languages/
* One for all the released pairs
* One for incubator
* One for each of the core tools

=== Approach 2 ===
Individual repos for each pair, language module and tools. A couple of “meta-repos” that contain submodules pointing to collections of repos, e.g. <code>apertium-staging</code> would contain ~8 submodules pointing to each of the pairs in SVN’s /staging and <code>apertium-all</code> would have submodules to <code>apertium-staging</code>, <code>apertium-incbuator</code>, <code>apertium-languages</code>, etc. This hierarchy would be maintained via GitHub’s repo tags (a.k.a. “topics”), i.e. apertium-xxx-yyy could be marked with the <code>incubator</code> tag to end up in <code>apertium-incubator</code>.

==== Pros ====
* Usable issue tracker for each repo
* Fits into the Git/GitHub philosophy
* People who wish to use Git can contribute using that (while it's still possible to use the SVN bridge for those who want that)
* Familiar branching, PR and hooks that work as expected
* Email notifications and watching repos is useful
* An analogous change to 15 pairs will result in 15 different commits, each repo has its own history (''both pro and con'').
* Granular permissions (not everyone has access to literally everything, especially useful for GCI/GSoC)
** <strong>RESPONSE:</strong> Could be considered more bureaucratic
*** Re-response: not really. Granular permissoins are a (good) option, but it's not mandatory. We could use "org" permissions instead of "repo" permissions
* Empowerment for package maintainers:
** They could enforce workflows (code reviews, etc) for specific packages, and accept easily patches from other people (via pull requests) before requesting commit access.

==== Cons ====
* Harder for people who make changes to lots of pairs at the same time (i.e. couple of core devs)
** Commands are more gnarly (<code>git submodule</code> can be pretty unintuitive)
*** <strong>RESPONSE:</strong> Possible to mitigate with aliases and cheat sheets
** An analogous change to 15 pairs will result in 15 different commits, each repo has its own history (''both pro and con'').
*** <strong>RESPONSE:</strong> Already happening for most of the people. Almost no-one has the whole SVN repo, but multiple SVN subfolders.
* Somewhat harder for people who use the meta-repos
** <strong>RESPONSE:</strong> It’s really not that difficult to checkout (<code>git submodule update --recursive --init</code>) and pull updates to a meta-repo (<code>git pull --recurse-submodules</code>) and with aliases it can be even shorter.
* Requires tooling to keep meta-repos up-to-date
** <strong>RESPONSE:</strong> These are super simple scripts based on GitHub’s reliable API. Sushain is willing to write them and Tino is willing to host (and perhaps code review).
* GitHub doesn’t provide a nice interface to view repos in a tree format
** <strong>RESPONSE:</strong> Sushain will though! See this [https://rawgit.com/sushain97/apertium-on-github/master/source-browser.html page] that can be trivially finished to cover all our repos and is a very simple single HTML file (and pretty IMO). This page is automatically generated from the repo tags.

== Related Concerns ==
* Mailing lists - should probably be preserved on SourceForge for now until/unless we choose to switch to another solution or self-host them.
* Existing issues - Sushain volunteers to manually transpose (or find an automatic solution) to moving our existing issues (pretty small #)

Latest revision as of 17:55, 1 February 2018

Reasons to Switch[edit]

  • GitHub’s excellent issue tracker
  • More people outside Apertium are far more familiar with Git vs. SVN (especially younger folks, see GCI/GSoC)
  • More people outside Apertium have GitHub accounts, easier to start-up for a new user
  • GitHub’s interface is far superior to SourceForge’s interface
  • Avoids SourceForge’s downtime (not so bad lately)
  • SourceForge gives an awful impression
  • More visibility as an FOSS project
    • GitHub has become the de-facto host for open source: people searches for "github apertium" to find apertium's code

Prevailing Approaches[edit]

Common pros/cons are excluded for the sake of brevity.

Approach 1[edit]

Variant A[edit]

A monorepo with all the lingustic data, pairs and language modules. Other folders in SVN like the core engine and peripheral tools (e.g. APy) would live in their own repos.

Pros[edit]

  • Large-scale editing of e.g. 15 pairs is easy.
  • There are no meta-repos or submodules to deal with.
  • GitHub’s interface can be used directly.
  • Less need for extremely complicated git commands
  • Possible to do partial checkouts using SVN
    • This removes completely possible pro #1

Cons[edit]

  • The monorepo would be massive (> 3 GB).
    • Most devs (aside from the couple core devs) would have to use GitHub’s SVN bridge to work on a pair.
      • This is highly contradictory to working on GitHub
      • People new to Apertium will have to learn SVN, negating some reasons to switch
      • ALTERNATIVE VIEWPOINT: there is no "learning SVN", it's three commands.
    • Diluted usefulness of branches, PRs and hooks
    • GitHub doesn’t necessarily allow repos larger than 1 GB (unclear whether this limit refers to bare repo). If GitHub decides to stop us at some point after we switch, that’s really bad.
  • Everyone will disable email notifications (“watching” a repo) since there will be too much spam
  • Massive number of issue labels to curate and apply (non-members cannot tag an issue when submitting), reducing the effectiveness of the issue tracker
  • Commit access will continue to give write access to everything
  • Contradictory to the Git/GitHub philosophy (bad impression)
  • Given that the usual recovery/fix for repo inconsistencies is to wipe and re-clone, having to re-clone a huge monorepo would greatly exacerbate those kinds of issues

Variant B[edit]

Several monorepos, one for each of:

  • incubator
  • pairs
  • languages
  • tools

Pros[edit]

  • Large-scale editing of e.g. 15 pairs is easy.
  • There are no meta-repos or submodules to deal with.
  • GitHub’s interface can be used directly.
  • Less need for extremely complicated git commands
  • Possible to do partial checkouts using SVN

Cons[edit]

  • All the cons in Variant A, minus:
    • Repos will be smaller than the massive monorepo
  • Moving a package between release states and preserving history is complicated (can be scripted)

Variant C[edit]

Several repos:

  • One for each of the modules in languages/
  • One for all the released pairs
  • One for incubator
  • One for each of the core tools

Approach 2[edit]

Individual repos for each pair, language module and tools. A couple of “meta-repos” that contain submodules pointing to collections of repos, e.g. apertium-staging would contain ~8 submodules pointing to each of the pairs in SVN’s /staging and apertium-all would have submodules to apertium-staging, apertium-incbuator, apertium-languages, etc. This hierarchy would be maintained via GitHub’s repo tags (a.k.a. “topics”), i.e. apertium-xxx-yyy could be marked with the incubator tag to end up in apertium-incubator.

Pros[edit]

  • Usable issue tracker for each repo
  • Fits into the Git/GitHub philosophy
  • People who wish to use Git can contribute using that (while it's still possible to use the SVN bridge for those who want that)
  • Familiar branching, PR and hooks that work as expected
  • Email notifications and watching repos is useful
  • An analogous change to 15 pairs will result in 15 different commits, each repo has its own history (both pro and con).
  • Granular permissions (not everyone has access to literally everything, especially useful for GCI/GSoC)
    • RESPONSE: Could be considered more bureaucratic
      • Re-response: not really. Granular permissoins are a (good) option, but it's not mandatory. We could use "org" permissions instead of "repo" permissions
  • Empowerment for package maintainers:
    • They could enforce workflows (code reviews, etc) for specific packages, and accept easily patches from other people (via pull requests) before requesting commit access.

Cons[edit]

  • Harder for people who make changes to lots of pairs at the same time (i.e. couple of core devs)
    • Commands are more gnarly (git submodule can be pretty unintuitive)
      • RESPONSE: Possible to mitigate with aliases and cheat sheets
    • An analogous change to 15 pairs will result in 15 different commits, each repo has its own history (both pro and con).
      • RESPONSE: Already happening for most of the people. Almost no-one has the whole SVN repo, but multiple SVN subfolders.
  • Somewhat harder for people who use the meta-repos
    • RESPONSE: It’s really not that difficult to checkout (git submodule update --recursive --init) and pull updates to a meta-repo (git pull --recurse-submodules) and with aliases it can be even shorter.
  • Requires tooling to keep meta-repos up-to-date
    • RESPONSE: These are super simple scripts based on GitHub’s reliable API. Sushain is willing to write them and Tino is willing to host (and perhaps code review).
  • GitHub doesn’t provide a nice interface to view repos in a tree format
    • RESPONSE: Sushain will though! See this page that can be trivially finished to cover all our repos and is a very simple single HTML file (and pretty IMO). This page is automatically generated from the repo tags.

Related Concerns[edit]

  • Mailing lists - should probably be preserved on SourceForge for now until/unless we choose to switch to another solution or self-host them.
  • Existing issues - Sushain volunteers to manually transpose (or find an automatic solution) to moving our existing issues (pretty small #)