On Thu, Dec 17, 2015 at 9:29 AM, Eliot Miranda <eliot.miranda@gmail.com> wrote:
Ah, that's interesting.  So my concern is whether github is a safe long-term bet.  Specifically what is there to prevent some third party from buying github, or of github going public and the board taking the decision, or github on its own, deciding to charge for hosting, keeping the data hostage to extract payment?  What safeguards are in place to prevent this?  I'm not interested in "this will never happen" arguments.  I'm interested in hard data please.

This sounds like a risk management problem. We want to minimize the risk that we lose access to the source code and it's history, right? Is there other data that you are concerned about?

With regard to GitHub, I think these are the interesting questions:

  1. What are the chances that GitHub will stop providing free hosting to open source projects?
  2. What are the consequences if #1 occurs?
  3. What can we do about it?
First, let's look at #1. This sort of thing does happen. Holding data hostage is unusual, but free online services get shut down all the time. What might cause *Github* to do it? 

Could they be forced to cut expenses? Github has been around for almost 8 years, and have stuck with their model of "free public repositories, pay for privacy" throughout that time. It seems to be working for them. Three years ago one of their investors said they've been profitable over most of their life, and are growing revenue at 300% per year[1]. This summer, they raised $250 million more, with the company valued at $2 billion[2]. That indicates that they're still growing quickly, and think they'll be able to expand into new markets. So running out of money and dropping free hosting as a way to cut costs seems unlikely. 

How about a change in control? Maybe Oracle will buy them and squeeze as much profit out of them as possible before tossing the dry husk away. For that to happen, the offer would have to be spectacular. Github's investors need at least a 10x return, and probably more, to make money for their funds. If they were worth $2 billion this summer, the acquisition price would have to be something like $20-50 billion. That just doesn't allow the buyer much room to maneuver. There's no special technology behind Github that would make sense to to acquire at that price. Github's value is entirely in market position, customer relationships, goodwill etc. To make back the money, the buyer would need to keep running Github and keep earning revenue from it.

Going public? Even less likely. Because of regulatory changes, tech companies have been waiting longer to go public and doing so at a much higher valuation. (Lots of different takes on this, but see eg. [3]) If Github went public, it would be because its valuation was so high that employees and investors wanted to (more easily) sell some shares and enjoy their wealth. That would be a huge endorsement of the business model and current management team. With few investors—only five so far[4]—the founders would undoubtedly retain control, similar to the IPOs of Google and Facebook. Messing with the business model would be unthinkable at that point. 

What if Github decided to change strategies without some sort of external impetus? That seems unlikely as well. The economics underlying the freemium strategy are getting more and more compelling over time. Disks are cheap, and the cost of storage keeps going down. I just ran across a new cloud storage service that charges half-a-cent per GB per month[5]. Computing power is also getting cheaper, and with cluster managers like Mesos and Kubernetes, we're using it more efficiently as well. The "burden" of providing free hosting is low and will be getting lower as time goes on. 

On the other hand, Github is *the* go-to place for hosting source code. There are millions of users that have both free public repositories and paid private ones. (Github reports 12 million users[6], and I bet a large fraction of them at least have access to both public and private repositories.) Taking away the free repositories would alienate a LOT of customers, and hurt revenue.

So, without saying "this will never happen," I will say that Github shutting down free hosting would be unlikely.

Alright, let's look at #2. If the unlikely did happen, what would be the consequences?

As others have mentioned, the architecture of git makes it impossible to hold the source code and history hostage. Everyone who clones a git repository has a complete copy of the data. If they decided to lock everyone out of the repositories we'd just get another server and do this:
cd cog
git remote add origin git://git.squeak.org/cog.git
git push origin master
At the same time, we'd be in good company. Github currently has 30 million repositories[6]. Let's be really generous and say that half of those are private, and thus paid-for and exempt from hostage-taking. That means 15 million repositories are now subject to extortion from Github. Sure, most of those are personal forks with no significant changes. But even if there were only, say, 100,000 "real" repositories, that would be a *cataclysm* for the open source world. Alternate hosting would be popping up all over the place, and whatever inconvenience we might have about moving would be quickly solved by larger and richer open source projects. It wouldn't take much more than "here's our new git hosting" posted on the mailing list and squeak.org to make the change, because *everybody* would know about the problem.

Finally, #3, what can we do about it?

Well, in terms of influencing Github's business model, nothing. We have no leverage. So #1 is out of our control.

But, there are a few things we can do to improve #2. First, we could mirror all commits to another repository. That could be a Github competitor, like BitBucket, or just a server that we host with Rackspace or whatever, or even "offline" storage like S3. I believe the Pharo folks are already mirroring the VM source, from the current hosting, so that helps reduce the risk as well.

Second, we could move more of the VM source into Smalltalk. That might mean generating more of the source files with VM maker, running builds from within the image instead of using CMake etc. It probably wouldn't be worth it to make *all* the platform sources versioned in MC, but we could go further in that direction from where we are now. 

Finally, if it really did come down to Github holding the sources hostage and we had no other copies, we could just pay up. Currently, their cheapest plan is $7/month for 5 private repositories, which ought to cover our needs. Even with the meager donations that Squeak attracts today, surely we could raise $85 to get a year of paid hosting, and use that time to figure out what to do for the long term. Github might raise their prices (Why not? This scenario already has them being suicidally irrational.), but I can't see them exceeding our fundraising capabilities. What's the point of extortion if the victim can't pay?

(As a side note, I would be shocked if hosting squeakvm.org currently costs less than $7/month. No idea who's paying for it, but how confident are we that they'll continue to do so?)

In summary, Github is a very safe bet. Your nightmare scenario involves a series of very improbable events: Github would have to stop offering free hosting. They'd have to actively alienate their paying customers by holding their source code hostage. There would have to be sudden disk failures on dozens of laptops and servers where the repository is cloned. And to top it all off, the larger Squeak community, including Pharo, Cuis, Newspeak, Scratch and Croquet would have to be unable to come up with a few dozen dollars to pay for the hosting. 

This will never happen.


[1] http://peter.a16z.com/2012/07/09/software-eats-software-development/
[2] http://fortune.com/2015/07/29/github-raises-250-million-in-new-funding-now-valued-at-2-billion/
[3] http://www.forbes.com/sites/samanthasharf/2014/12/24/is-the-ipo-outmoded-why-venture-backed-companies-are-waiting-longer-to-go-public/
[4] https://www.crunchbase.com/organization/github/investors
[5] https://www.backblaze.com/b2/cloud-storage.html
[6] https://github.com/about/