Difference between revisions of "Talk:PMC proposals/Move Apertium to Github"
(→Cons) |
|||
(18 intermediate revisions by 5 users not shown) | |||
Line 1: | Line 1: | ||
== Reasons to Switch == |
|||
<h1>Apertium on GitHub</h1> |
|||
* GitHub’s excellent issue tracker |
|||
<h2>Reasons to Switch</h2> |
|||
* More people outside Apertium are far more familiar with Git vs. SVN (especially younger folks, see GCI/GSoC) |
|||
<ol> |
|||
* More people outside Apertium have GitHub accounts, easier to start-up for a new user |
|||
<li>GitHub’s excellent issue tracker</li> |
|||
* GitHub’s interface is far superior to SourceForge’s interface |
|||
<li>More people are far more familiar with Git vs. SVN (especially younger folks, see GCI/GSoC)</li> |
|||
* Avoids SourceForge’s downtime (not so bad lately) |
|||
<li>More people have GitHub accounts, easier to start-up for a new user</li> |
|||
* SourceForge gives an awful impression |
|||
<li>GitHub’s interface is far superior to SourceForge’s interface</li> |
|||
* More visibility as an FOSS project |
|||
<li>Avoids SourceForge’s downtime (not so bad lately)</li> |
|||
** GitHub has become the de-facto host for open source: people searches for "github apertium" to find apertium's code |
|||
<li>SourceForge gives an awful impression</li> |
|||
<li>More visibility as an FOSS project (people browse GitHub)</li> |
|||
</ol> |
|||
== Prevailing Approaches == |
|||
Common pros/cons are excluded for the sake of brevity. |
|||
<h3>Approach 1</h3> |
|||
<p>A monorepo with all the lingustic data, pairs and language modules. Other folders in SVN like the core engine and peripheral tools (e.g. APy) would live in their own repos.</p> |
|||
<h4>Pros</h4> |
|||
<ul> |
|||
<li>Large-scale editing of e.g. 15 pairs is easy.</li> |
|||
<li>There are no meta-repos or submodules to deal with.</li> |
|||
<li>GitHub’s interface can be used directly.</li> |
|||
</ul> |
|||
<h4>Cons</h4> |
|||
<ul> |
|||
<li>The monorepo would be massive (> 3 GB). |
|||
<ul> |
|||
<li>Most devs (aside from the couple core devs) would have to use GitHub’s SVN bridge to work on a pair. |
|||
<ul> |
|||
<li>This is highly contradictory to working on GitHub</li> |
|||
<li>People will have to learn SVN, negating some reasons to switch</li> |
|||
</ul> |
|||
</li> |
|||
<li>Diluted usefulness of branches, PRs and hooks</li> |
|||
<li>GitHub doesn’t necessarily allow repos larger than 1 GB (unclear whether this limit refers to bare repo). If GitHub decides to stop us at some point after we switch, that’s really bad.</li> |
|||
</ul> |
|||
</li> |
|||
<li>Everyone will disable email notifications (“watching” a repo) since there will be too much spam</li> |
|||
<li>Massive number of issue labels to curate and apply (non-members cannot tag an issue when submitting), reducing the effectiveness of the issue tracker</li> |
|||
<li>Commit access will continue to give write access to everything</li> |
|||
<li>Contradictory to the Git/GitHub philosophy (bad impression)</li> |
|||
</ul> |
|||
=== Approach 1 === |
|||
<p>Individual repos for each pair, language module and tools. A couple of “meta-repos” that contain submodules pointing to collections of repos, e.g. <code>apertium-staging</code> would contain ~8 submodules pointing to each of the pairs in SVN’s /staging and <code>apertium-all</code> would have submodules to <code>apertium-staging</code>, <code>apertium-incbuator</code>, <code>apertium-languages</code>, etc. This hierarchy would be maintained via GitHub’s repo tags (a.k.a. “topics”), i.e. apertium-xxx-yyy could be marked with the <code>incubator</code> tag to end up in <code>apertium-incubator</code>.</p> |
|||
==== Variant A ==== |
|||
<h4>Pros</h4> |
|||
A monorepo with all the lingustic data, pairs and language modules. Other folders in SVN like the core engine and peripheral tools (e.g. APy) would live in their own repos. |
|||
<ul> |
|||
<li>Usable issue tracker for each repo</li> |
|||
==== Pros ==== |
|||
<li>Fits into the Git/GitHub philosophy</li> |
|||
* Large-scale editing of e.g. 15 pairs is easy. |
|||
<li>Everyone can contribute using Git (no need for SVN bridge)</li> |
|||
* There are no meta-repos or submodules to deal with. |
|||
<li>Familiar branching, PR and hooks that work as expected</li> |
|||
* GitHub’s interface can be used directly. |
|||
<li>Email notifications and watching repos is useful</li> |
|||
* Less need for extremely complicated git commands |
|||
<li>Granular permissions (not everyone has access to literally everything, especially useful for GCI/GSoC) |
|||
* Possible to do partial checkouts using SVN |
|||
<ul> |
|||
** This removes completely possible pro #1 |
|||
<li><strong>RESPONSE:</strong> Could be considered more bureaucratic</li> |
|||
</ul> |
|||
==== Cons ==== |
|||
</li> |
|||
* The monorepo would be massive (> 3 GB). |
|||
</ul> |
|||
** Most devs (aside from the couple core devs) would have to use GitHub’s SVN bridge to work on a pair. |
|||
<h4>Cons</h4> |
|||
*** This is highly contradictory to working on GitHub |
|||
<ul> |
|||
*** People new to Apertium will have to learn SVN, negating some reasons to switch |
|||
<li>Harder for people who make changes to lots of pairs at the same time (i.e. couple of core devs) |
|||
*** '''ALTERNATIVE VIEWPOINT:''' there is no "learning SVN", it's three commands. |
|||
<ul> |
|||
** Diluted usefulness of branches, PRs and hooks |
|||
<li>Commands are more gnarly (<code>git submodule</code> can be pretty unintuitive) |
|||
** GitHub doesn’t necessarily allow repos larger than 1 GB (unclear whether this limit refers to bare repo). If GitHub decides to stop us at some point after we switch, that’s really bad. |
|||
<ul> |
|||
* Everyone will disable email notifications (“watching” a repo) since there will be too much spam |
|||
<li><strong>RESPONSE:</strong> Possible to mitigate with aliases and cheat sheets</li> |
|||
* Massive number of issue labels to curate and apply (non-members cannot tag an issue when submitting), reducing the effectiveness of the issue tracker |
|||
</ul> |
|||
* Commit access will continue to give write access to everything |
|||
</li> |
|||
* Contradictory to the Git/GitHub philosophy (bad impression) |
|||
<li>An analogous change to 15 pairs will result in 15 different commits, each repo has its own history.</li> |
|||
* Given that the usual recovery/fix for repo inconsistencies is to wipe and re-clone, having to re-clone a huge monorepo would greatly exacerbate those kinds of issues |
|||
</ul> |
|||
</li> |
|||
==== Variant B ==== |
|||
<li>Somewhat harder for people who use the meta-repos |
|||
Several monorepos, one for each of: |
|||
<ul> |
|||
* incubator |
|||
<li><strong>RESPONSE:</strong> It’s really not that difficult to checkout (<code>git submodule update --recursive --init</code>) and pull updates to a meta-repo (<code>git pull --recurse-submodules</code>) and with aliases it can be even shorter.</li> |
|||
* pairs |
|||
</ul> |
|||
* languages |
|||
</li> |
|||
* tools |
|||
<li>Requires tooling to keep meta-repos up-to-date |
|||
<ul> |
|||
==== Pros ==== |
|||
<li><strong>RESPONSE:</strong> These are super simple scripts based on GitHub’s reliable API. Sushain is willing to write them and Tino is willing to host (and perhaps code review).</li> |
|||
* Large-scale editing of e.g. 15 pairs is easy. |
|||
</ul> |
|||
* There are no meta-repos or submodules to deal with. |
|||
</li> |
|||
* GitHub’s interface can be used directly. |
|||
<li>GitHub doesn’t provide a nice interface to view repos in a tree format |
|||
* Less need for extremely complicated git commands |
|||
<ul> |
|||
* Possible to do partial checkouts using SVN |
|||
<li><strong>RESPONSE:</strong> Sushain will though! See this [https://rawgit.com/sushain97/apertium-on-github/master/source-browser.html page] that can be trivially finished to cover all our repos and is a very simple single HTML file (and pretty nice IMO).</li> |
|||
</ul> |
|||
==== Cons ==== |
|||
</li> |
|||
* All the cons in ''Variant A'', minus: |
|||
</ul> |
|||
** Repos will be smaller than the massive monorepo |
|||
* Moving a package between release states and preserving history is complicated (can be scripted) |
|||
==== Variant C ==== |
|||
Several repos: |
|||
* One for each of the modules in languages/ |
|||
* One for all the released pairs |
|||
* One for incubator |
|||
* One for each of the core tools |
|||
=== Approach 2 === |
|||
Individual repos for each pair, language module and tools. A couple of “meta-repos” that contain submodules pointing to collections of repos, e.g. <code>apertium-staging</code> would contain ~8 submodules pointing to each of the pairs in SVN’s /staging and <code>apertium-all</code> would have submodules to <code>apertium-staging</code>, <code>apertium-incbuator</code>, <code>apertium-languages</code>, etc. This hierarchy would be maintained via GitHub’s repo tags (a.k.a. “topics”), i.e. apertium-xxx-yyy could be marked with the <code>incubator</code> tag to end up in <code>apertium-incubator</code>. |
|||
==== Pros ==== |
|||
* Usable issue tracker for each repo |
|||
* Fits into the Git/GitHub philosophy |
|||
* People who wish to use Git can contribute using that (while it's still possible to use the SVN bridge for those who want that) |
|||
* Familiar branching, PR and hooks that work as expected |
|||
* Email notifications and watching repos is useful |
|||
* An analogous change to 15 pairs will result in 15 different commits, each repo has its own history (''both pro and con''). |
|||
* Granular permissions (not everyone has access to literally everything, especially useful for GCI/GSoC) |
|||
** <strong>RESPONSE:</strong> Could be considered more bureaucratic |
|||
*** Re-response: not really. Granular permissoins are a (good) option, but it's not mandatory. We could use "org" permissions instead of "repo" permissions |
|||
* Empowerment for package maintainers: |
|||
** They could enforce workflows (code reviews, etc) for specific packages, and accept easily patches from other people (via pull requests) before requesting commit access. |
|||
==== Cons ==== |
|||
* Harder for people who make changes to lots of pairs at the same time (i.e. couple of core devs) |
|||
** Commands are more gnarly (<code>git submodule</code> can be pretty unintuitive) |
|||
*** <strong>RESPONSE:</strong> Possible to mitigate with aliases and cheat sheets |
|||
** An analogous change to 15 pairs will result in 15 different commits, each repo has its own history (''both pro and con''). |
|||
*** <strong>RESPONSE:</strong> Already happening for most of the people. Almost no-one has the whole SVN repo, but multiple SVN subfolders. |
|||
* Somewhat harder for people who use the meta-repos |
|||
** <strong>RESPONSE:</strong> It’s really not that difficult to checkout (<code>git submodule update --recursive --init</code>) and pull updates to a meta-repo (<code>git pull --recurse-submodules</code>) and with aliases it can be even shorter. |
|||
* Requires tooling to keep meta-repos up-to-date |
|||
** <strong>RESPONSE:</strong> These are super simple scripts based on GitHub’s reliable API. Sushain is willing to write them and Tino is willing to host (and perhaps code review). |
|||
* GitHub doesn’t provide a nice interface to view repos in a tree format |
|||
** <strong>RESPONSE:</strong> Sushain will though! See this [https://rawgit.com/sushain97/apertium-on-github/master/source-browser.html page] that can be trivially finished to cover all our repos and is a very simple single HTML file (and pretty IMO). This page is automatically generated from the repo tags. |
|||
== Related Concerns == |
|||
* Mailing lists - should probably be preserved on SourceForge for now until/unless we choose to switch to another solution or self-host them. |
|||
* Existing issues - Sushain volunteers to manually transpose (or find an automatic solution) to moving our existing issues (pretty small #) |
Latest revision as of 17:55, 1 February 2018
Contents
Reasons to Switch[edit]
- GitHub’s excellent issue tracker
- More people outside Apertium are far more familiar with Git vs. SVN (especially younger folks, see GCI/GSoC)
- More people outside Apertium have GitHub accounts, easier to start-up for a new user
- GitHub’s interface is far superior to SourceForge’s interface
- Avoids SourceForge’s downtime (not so bad lately)
- SourceForge gives an awful impression
- More visibility as an FOSS project
- GitHub has become the de-facto host for open source: people searches for "github apertium" to find apertium's code
Prevailing Approaches[edit]
Common pros/cons are excluded for the sake of brevity.
Approach 1[edit]
Variant A[edit]
A monorepo with all the lingustic data, pairs and language modules. Other folders in SVN like the core engine and peripheral tools (e.g. APy) would live in their own repos.
Pros[edit]
- Large-scale editing of e.g. 15 pairs is easy.
- There are no meta-repos or submodules to deal with.
- GitHub’s interface can be used directly.
- Less need for extremely complicated git commands
- Possible to do partial checkouts using SVN
- This removes completely possible pro #1
Cons[edit]
- The monorepo would be massive (> 3 GB).
- Most devs (aside from the couple core devs) would have to use GitHub’s SVN bridge to work on a pair.
- This is highly contradictory to working on GitHub
- People new to Apertium will have to learn SVN, negating some reasons to switch
- ALTERNATIVE VIEWPOINT: there is no "learning SVN", it's three commands.
- Diluted usefulness of branches, PRs and hooks
- GitHub doesn’t necessarily allow repos larger than 1 GB (unclear whether this limit refers to bare repo). If GitHub decides to stop us at some point after we switch, that’s really bad.
- Most devs (aside from the couple core devs) would have to use GitHub’s SVN bridge to work on a pair.
- Everyone will disable email notifications (“watching” a repo) since there will be too much spam
- Massive number of issue labels to curate and apply (non-members cannot tag an issue when submitting), reducing the effectiveness of the issue tracker
- Commit access will continue to give write access to everything
- Contradictory to the Git/GitHub philosophy (bad impression)
- Given that the usual recovery/fix for repo inconsistencies is to wipe and re-clone, having to re-clone a huge monorepo would greatly exacerbate those kinds of issues
Variant B[edit]
Several monorepos, one for each of:
- incubator
- pairs
- languages
- tools
Pros[edit]
- Large-scale editing of e.g. 15 pairs is easy.
- There are no meta-repos or submodules to deal with.
- GitHub’s interface can be used directly.
- Less need for extremely complicated git commands
- Possible to do partial checkouts using SVN
Cons[edit]
- All the cons in Variant A, minus:
- Repos will be smaller than the massive monorepo
- Moving a package between release states and preserving history is complicated (can be scripted)
Variant C[edit]
Several repos:
- One for each of the modules in languages/
- One for all the released pairs
- One for incubator
- One for each of the core tools
Approach 2[edit]
Individual repos for each pair, language module and tools. A couple of “meta-repos” that contain submodules pointing to collections of repos, e.g. apertium-staging
would contain ~8 submodules pointing to each of the pairs in SVN’s /staging and apertium-all
would have submodules to apertium-staging
, apertium-incbuator
, apertium-languages
, etc. This hierarchy would be maintained via GitHub’s repo tags (a.k.a. “topics”), i.e. apertium-xxx-yyy could be marked with the incubator
tag to end up in apertium-incubator
.
Pros[edit]
- Usable issue tracker for each repo
- Fits into the Git/GitHub philosophy
- People who wish to use Git can contribute using that (while it's still possible to use the SVN bridge for those who want that)
- Familiar branching, PR and hooks that work as expected
- Email notifications and watching repos is useful
- An analogous change to 15 pairs will result in 15 different commits, each repo has its own history (both pro and con).
- Granular permissions (not everyone has access to literally everything, especially useful for GCI/GSoC)
- RESPONSE: Could be considered more bureaucratic
- Re-response: not really. Granular permissoins are a (good) option, but it's not mandatory. We could use "org" permissions instead of "repo" permissions
- RESPONSE: Could be considered more bureaucratic
- Empowerment for package maintainers:
- They could enforce workflows (code reviews, etc) for specific packages, and accept easily patches from other people (via pull requests) before requesting commit access.
Cons[edit]
- Harder for people who make changes to lots of pairs at the same time (i.e. couple of core devs)
- Commands are more gnarly (
git submodule
can be pretty unintuitive)- RESPONSE: Possible to mitigate with aliases and cheat sheets
- An analogous change to 15 pairs will result in 15 different commits, each repo has its own history (both pro and con).
- RESPONSE: Already happening for most of the people. Almost no-one has the whole SVN repo, but multiple SVN subfolders.
- Commands are more gnarly (
- Somewhat harder for people who use the meta-repos
- RESPONSE: It’s really not that difficult to checkout (
git submodule update --recursive --init
) and pull updates to a meta-repo (git pull --recurse-submodules
) and with aliases it can be even shorter.
- RESPONSE: It’s really not that difficult to checkout (
- Requires tooling to keep meta-repos up-to-date
- RESPONSE: These are super simple scripts based on GitHub’s reliable API. Sushain is willing to write them and Tino is willing to host (and perhaps code review).
- GitHub doesn’t provide a nice interface to view repos in a tree format
- RESPONSE: Sushain will though! See this page that can be trivially finished to cover all our repos and is a very simple single HTML file (and pretty IMO). This page is automatically generated from the repo tags.
Related Concerns[edit]
- Mailing lists - should probably be preserved on SourceForge for now until/unless we choose to switch to another solution or self-host them.
- Existing issues - Sushain volunteers to manually transpose (or find an automatic solution) to moving our existing issues (pretty small #)