Difference between revisions of "Talk:PMC proposals/Allow some code under github.com/apertium"

From Apertium
Jump to navigation Jump to search
m
Line 17: Line 17:
** Apertium's wiki dump can be archived here as well if it's not enormously big :)
** Apertium's wiki dump can be archived here as well if it's not enormously big :)
* some sort of experimental/playground/sandbox content (but not linguistic data related -- otherwise it would be in incubator. Many things in branches/ are of this category)
* some sort of experimental/playground/sandbox content (but not linguistic data related -- otherwise it would be in incubator. Many things in branches/ are of this category)
* language data '''not''' directly used by the engine (freely distributable (small parallel) corpora, wordlists etc) --[[User:Ilnar.salimzyan|selimcan]] ([[User talk:Ilnar.salimzyan|talk]]) 05:12, 23 February 2015 (CET)
* language data '''not''' directly used by the engine (freely distributable (small parallel) corpora, wordlists etc)


It looks like Apertium has developed several machine learning tools, but, as always, there is not enough data for every language pair -- having an explicit languageData/ directory might be a step towards systematising what we have and creating some more.
It looks like Apertium has developed several machine learning tools, but, as always, there is not enough data for every language pair -- having an explicit languageData/ directory might be a step towards systematising what we have and creating some more.
--[[User:Ilnar.salimzyan|selimcan]] ([[User talk:Ilnar.salimzyan|talk]]) 06:32, 23 February 2015 (CET)





Revision as of 05:32, 23 February 2015

If done in a systematic manner, moving parts of Apertium to a git repository involves deciding what to move and what to keep only in svn.

To help dear PMC members to decide, here is an inventory of things currently found in the svn repository:

  • engine (apertium, lttoolbox, apertium-lex-tools)
  • linguistic data for the engine
    • languages
    • incubator
    • nursery
    • staging
    • trunk
  • tools for creating/learning/managing/serving linguistic data
  • documentation
    • official documentation
    • papers
    • courses
    • Apertium's wiki dump can be archived here as well if it's not enormously big :)
  • some sort of experimental/playground/sandbox content (but not linguistic data related -- otherwise it would be in incubator. Many things in branches/ are of this category)
  • language data not directly used by the engine (freely distributable (small parallel) corpora, wordlists etc)

It looks like Apertium has developed several machine learning tools, but, as always, there is not enough data for every language pair -- having an explicit languageData/ directory might be a step towards systematising what we have and creating some more. --selimcan (talk) 06:32, 23 February 2015 (CET)




The proposal says that

Situations where git shines include
<...>
* using branches for development of new features while keeping the main branch stable 
<...>


This is valuable in many contexts (less terrifying to experiment/refactor, easier to collaborate etc). I don't want to go any further into this, just to relate it to development of linguistic data (although I am aware that migrating linguistic data is not planned as of now).


In my opinion, branching, among other things, would be especially useful in cases when several people working on different translators are relying on the same languages/ sub-module. Imagine that something is wrong with that monolingual language package that should be fixed. There are two options:

  • a) you can do that right away, without caring about the consequences for the translators which, very likely, will break (even if you do care, it might be hard to amend all translators at once) or
  • b) you can make the necessary changes on a development branch, and therefore do the right thing to the monolingual package and give other developers a chance to adapt their translators (maybe in a 'development' branch as well) at the same time.


Of course, we can just do a) and say that the developers of translators affected by the change should adopt to the only branch of the monolingual package there is, but that would mean that the "nightly" version of the translator performs worse than the latest released version of it (which is currently the case with Kazakh-Tatar pair on apertium.org vs turkic.apertium.org). I think this is counter-intuitive and not something we want.


Having a separate languages/ module solved the problem of duplicated code/effort, but without easy branching facilities, it just doesn't scale very well. Easy branching would allow us to collaborate without stepping into each other's shoes. --selimcan (talk) 06:31, 23 February 2015 (CET)