Hi Ben,

    reading forward it seems I've miscommunicated what I mean by an experiment.  By experiment I mean a Vm configuration not used for production, such as the Sista and Locoed VMs.  I visit this below.

On Thu, Jun 1, 2017 at 10:40 AM, Ben Coman <btc@openinworld.com> wrote:
On Thu, Jun 1, 2017 at 10:38 PM, Eliot Miranda <eliot.miranda@gmail.com> wrote:
Hi Tim,

On May 31, 2017, at 11:53 PM, Tim Felgentreff <timfelgentreff@gmail.com> wrote:


Just re the discussion of dev and stable branch, the original idea was that Cog is dev and master is stable. We never expected that people would use or recommend the Cog bintray builds for anything other than development.

But "master" ended up so far behind, recent changes would not get much testing leading up to Pharo release...
$ git log master
17 Aug 2016  

You want to *encourage* people to use more recent VMs to get more feedback earlier.   

Yes, but I disagree that the issue is the master/Cog distinction.  The problem is that there is no mechanism to help remind us when to advance things to master.  If that's present then we have no problems using the current structure.

I feel the only problem is that we need someone who merges to master when it is green. I think we have already protected the master branch in the way Ben suggested, i.e., you can only open a PR and merge it if the Travis build is all green.

I can do this.  Ideally it would be either automatic or prompted.  What I mean is that there should be a set of tests that are relevant n on images using the production subset of the VMs built from the Cog branch. Whenever the tests are all green then either I get sent an email prompting me to push to master, or a push to master occurs.

I think it would be useful to differentiate between different levels of stability depending on application and personal perspective.    
   A. personal-stable -- stable "enough" for developer/student to use personally on their desktop -- might be optimistic if it passes all tests

Sure.  So the mechanism needed is running at least the test suite on a new VM right?

   B. production-stable -- "bulletproof" for operating machinery and business systems - maybe when A has be in wide use for a while

Not sure we can do much more here than run the tests.  What we can insist on is that a master commit should pass the tests on all supported platforms x all production VM configurations.

btw, it occurs to be that "master" is a bit ambiguous.  Perhaps for (B.) "master" could be renamed "production" 

Well, what's in a name?  "master" is fine for production.  Tagging a particular master commit with a release id tag would be the necessary extra no?
and for (A.) maybe introduce "stable" or alternatively "validated" to mean all tests passed without the strong implication it is "stable". 

Well, if the build and test steps are separated then there would hopefully be a page with green "tests passed" entries containing like to the VMs that passed the tests, no?

Maybe I can get into the habit of checking the status of the build a few hours after a commit.  But a generated email would compensate for my, um, it's on the tip of my tongue, um, my, my memory!  And an automated push would allow me to resume walking in front of buses.

The bintray deployment should not be taken as a source of stable builds. It is meant to be used by what Eliot calls brave souls who want to help to test the latest and possibly unstable changes.

Good.  This makes perfect sense to me.  Are there places in the configuration to add brief overview texts explaining this to the bintray download pages?  It would be great to have a short paragraph that says these are development versions and directs to the master builds.

P.S. for master builds Gilad has noticed that there is no .msi for the newspeak builds (and I suspect there may be no .dmg).  In e.g. build.win32x86/newspeak.cog.spur/installer is code to make the .msi for a newspeak vm.  And the corresponding thing exists for making the Mac OS .dmg.  Any brace souls feel up to trying to get them together be built?  

Just my 2c


On Thu, 1 Jun 2017, 06:05 Ben Coman, <btc@openinworld.com> wrote:
On Thu, Jun 1, 2017 at 2:27 AM, Nicolas Cellier <nicolas.cellier.aka.nice@gmail.com> wrote:

2017-05-31 17:31 GMT+02:00 Eliot Miranda <eliot.miranda@gmail.com>:

Hi All,

> On May 31, 2017, at 1:54 AM, K K Subbu <kksubbu.ml@gmail.com> wrote:
> On Wednesday 31 May 2017 12:35 PM, Esteban Lorenzano wrote:
>>>> On 31 May 2017, at 09:01, K K Subbu <kksubbu.ml@gmail.com> wrote:
>>>> On Wednesday 31 May 2017 12:18 PM, Esteban Lorenzano wrote:
>>>> 1) We need a stable branch, let’s say is Cog 2) We also need a
>>>> development branch, let’s call it CogDev
>>> IMHO, three branches are required as a minimum - stable,
>>> integration and development because there are multiple primary
>>> developers in core Pharo working on different OS platforms.
>> but nobody will do the integration step so let’s keep it simple:
>> integration is made responsibly for anyone who contributes, as it is
>> done now.
> I proposed only three *branches*, not people. Splitting development into two branches and builds will help in isolating faster (separation of concerns). If all issues get cleared in dev branch itself, then integration branch will still be useful in catching regressions.

I don't believe this.  Since the chain is VMMaker.oscog => opensmalltalk/vm => CI, clumping commits together when pushing from, say, CogDev to Cog doesn't help in identifying where things broke in VMMaker.  This is why Esteban has implemented a complete autobuild path run on each VMMaker.oscog commit.

But, while this is a good thing, it isn't adequate because
a) important changes are made to opensmalltalk/vm code independent of VMMaker.oscog
b) sometimes one /has/ to break things to have them properly tested (e.g. the new compactor).  i.e. there has to be a way of getting some experimental half-baked thing through the build pipeline so brace souls can test them

> I will defer to your experience. I do understand the difference between logical and practical in these matters.

Let's take a step back and instead of discussing implementation, discuss design.

For me, a VM is good not when someone says it is, not when it builds on all platforms, but when extensive testing finds no faults in it.  For me this implies tagging versions in opensmalltalk/vm (which by design index the corresponding VMMaker.oscog because generated source is stamped with VMMaker.oscog version info) rather than using branches.

Further, novel bugs are found in VMs that are considered good, and these bugs should, if possible, be added to a test suite.  This points to a major deficiency in our ability to tests VMs.  We have no way to test the UI automatically.  We have to use humans to produce mouse clicks and keystrokes.  For me this implies tagging releases, and the ability to state that a given VM supersedes a previous known good VM. 
I just want to interject a check everyone's understanding of git branches.  Although I haven't used SVN, what I've read indicates SVN concepts can be hard to shake and git branching is conceptually very different from SVN.   

The key thing is "a branch in Git is simply a lightweight movable pointer to a commit." 
The following article seems particularly useful to help frame our discussion...

The commit is related to a graph of previous commits, right?  So a branch implies a distinct set of commits, right?  When one merges from a branch, the commits on the branch don't get added to the target into which one merges do they  Isn't that the difference between pull and pull -a, that the former just pulls commits on a single branch while pull -a p;pulls commits on all branches?
A branch is much the same as a tag, the are both references to a particular commit, except
* branches are mutable references
* tags are immutable references

So if you want a moveable "good-vm" tag, maybe what you need is a branch reference. 

But isn't that what master is supposed to be?

And the previous paragraph applies equally to performance improvements, and functionality enhancements, not just bugs.

Test suites and build chains catch regressions.  Regressions in functionality and in performance are _useful information_ for developers trying to improve things, not necessarily an evil to be avoided at all costs.
Agreed. But you want to deal with your own regressions not other peoples.  It seems harder to apply a "if you break it, you fix it" philosophy if its always broken.

To reiterate, distinguishing between the production set and the experimental set is what's necessary here.

The system must allow pushing an experiment through the build and test pipeline to learn of a piece of development's impact. 
IIUC, experimental branches (and PRs!!) can run through the CI pipeline identical to the Cog branch (except maybe deployment step).  There seems no benefit here for needing to commit directly to the Cog branch when a PR-commit would work the same.
No, not experimental branches.  Experimental configurations.  Sista and LowCode are currently experiments.  We don't care if these are broken; they're not part of the production VM set yet.  So Clément, Ronie and myself should be able to modify, including break, these configurations without hindrance, and without generating noise for the community.

An experiment may have to last for several months (for several reasons; the new compactor is a good example: some bugs show up in unusual circumstances; some bugs are hard to fix).

Another requirement is to provide a stable point for someone to begin new work.  They need to know that their starting point is not an experiment in progress. They need to understand that the cost of working on what is effectively a branch from the trunk is an integration step(s) into trunk layer on, and this can't be just at the opensmalltalk/vm level using fit to assist the merge, but also at the VMMaker.oscog level using Monticello to merge.  Both are good at supporting merges because both support identifying the set of changes.  Both are poor at supporting merges because they don't understand refactoring and currently only humans can massage a set of changes forwards applying refactorings to a set of changes.  This is what real merges are, and the reason why git only eases the trivial cases and why real programmers use a lot more tools to merge than just a vcs.

Can others add additional requirements, or critique the above requirements?  (Try not to mention git or ci implementations when you do).

With the above said what seems lacking to me is the testing framework for completed VMs.  A build not can identify commits that fail a build and also produce a VM for subsequent packaging and/or testing.  Separating the steps is very useful here.  A long pipeline with a single red or green light at the end is much less useful than a series of short pipelines, each with a separate red or green light.  Reading through a bot log to identify precisely where things broke is both tedious and, more importantly, not useful in an automated system because that identification is manual.  Separate short pipelines can be used to inform an automatic system (right Bob? Bob Westergaard built the testing system at Cadence and that is constructed from lots of small steps and it isolates faults nicely; something that an end-to-end system like Esteban's doesn't do as well).

Now, if we have a long sequence of nicely separated generate, build, package, test steps how many separate pipelines do we need to be able to collaborate?  Is it enough to be able to tag an upstream artifact as having passed some or all of its downstream tests or do we need to be able to duplicate the pipeline so people can run independent experiments?

For me, I see two modes of development; new development and maintenance.  New development is fine in a fork in some subset of the full build chain.  e.g. when working on Spur I forked within VMMaker.oscog (and, unfortunately, in part because we didn't have opensmalltalk/vm or many of the above requirements discussed, let alone met, I would break V3 for much of the time). e.g. the new compactor was forked in VMMaker.oscog without breaking Esteban's chain by my using a special generation step controlled by a switch I set in my branch.  I tested in my own sandbox until the new compactor needed testing by a wider audience.

Maintenance is some relatively quick fix one (thinks one) can safely apply to either VMMaker.oscog or opensmalltalk/vm trunk to address some issue.

Forking is fine for new development if
a) people understand and are prepared to pay the cost of merging, or, better,
b) they can use switches to include their work as optional in trunk
There are lots of switches:
A switch between versions in VMMaker.oscog, e.g. Spur memory manager vs V3, or the new Spur compactor vs the old, or the Sista JIT vs the standard, etc
A switch between a vm configuration, e.g. pharo.cog.spur vs squeak.cog.spur in a build directory, which can do any of
- select a generated source tree (e.g. spursrc vs spur64src)
- use #ifdef's to select code in the C source
- use plugins.int & plugins.ext to select a set of plugins
A switch between dialects (Pharo vs Squeak vs Newspeak)
A switch between platforms (Mac OS X vs win32, Linux x64 vs Linux ARM)

I get the above distinctions and know how to navigate amongst them upstream, but don't understand very well the downstream (how to clone the build/test CI pipeline so I can cheaply fork, work on the branch and then merge). So I'm happier using switches to try and hide new work in trunk to avoid derailing people.  And so I prefer the notion of a single pipeline that tags specific versions as good.

Is one of the requirements that people want to clearly separate maintenance from new development?

This diagram may be a good reference for discussion, of how maintenance hotfixes can relate to development branches.   

Makes perfect sense.  The thing this doesn't mention is the ability to have experiential configurations live alongside the code being worked on for a stable release.
cheers -ben 

Is one of the requirements that people want to clearly identify which commit caused a specific bug? (Big discussion here about major, e.g. V3 => Spur transitions vs small grain changes; you can identify the latter, but not necessarily the former).
I suppose what I'm asking is what's the benefit of an all green build?  For me a tested, version and named artefact is more useful than an all green build.  An all red build is a red flag.  A mostly green build is simply a failure to segregate production from in development artefacts.

Hi Eliot,
the main advantage of github is the social thing:
- lower barrier of contributing via a better integration of tools
 (not only vcs, but issue tracker, wiki, continuous integration, code review/comments and pull request - even if we under use most of these tools),
- and ease integration of many small contributions back.
For this to work well, such work MUST happen in separate branches.
in this context, there is an obvious benefit of green build: quickly estimate if we can merge a pull request or not.
when red, we have no information about possible regressions, and have to go through the tedious part: go down into the console log of both builds, try to understand and compare... There is already enough work involved in reviewing source code.

I see that my opening argument was simplistic.  However Nicolas' point above is probably more significant.  
If we want to encourage new contributors, we need:

* to show that the CI builds are cared for

* allow newcomers to be confident that the tip they are working from is green before they start.  When they submit their PR and the CI tests fail, they should be able to zero in the failures *they* caused and *as*a*newbie* not have to sort through the confounding factors from other's failures. 

* act timely to integrate, to encourage further contributions.  If someone contributes a good fix, a green CI test may make you inclined to quickly review and integrate. But when the CI shows failure, how will you feel about looking into it? Further, when the mainline returns to green, the existing PRs don't automatically retest, and no-one seems to be manually managing them, so such PRs seem to end up in limbo which is *really* discouraging for potential contributors.

cheers -ben

I tend to agree on your view for mid/long term changes:
Say a developper A works on new garbage collector, developper B on 64bits compatibility, developer C on lowcode extension and developer D on sista (though maybe there is a single developper touching 3 of these)
Since each of these devs are going to take months, and touch many core methods scattered in interpreter/jit/object memory or CCodeGenerator, then it's going to be very difficult to merge (way too many conflicts).

If on different branches, there is the option to rebase or merge with other branches. But it doesn't scale with N branches touching same core methods: N developpers would have to rebase on N-1 concurrent branches, resolve the exact same conflicts etc... Obviously, concurrent work would have to be integrated back ASAP in a master branch.

So, a good branch is a short branch, if possible covering a minimal feature set.
And long devs you describe must not be handled by branches, but by switches.
This gives you a chance to inspect the impact of your own refactoring on your coworkers.

In this model, yes, you have a license to break your own artifact (say generationalScavenger, win64, lowcode, sista).
But you must be informed if ever you broke the production VM, and/or concurrent artifacts. You have to maintain a minimal set of features working, otherwise you prevent others to work. In the scavenger case, you used a branch for a short period, and that worked quite well.

In this context, I agree, a single green light is not enough.
We need a sort of status board tracing the regressions individually.

> Regards .. Subbu

best, Eliot