For as long as we’ve used Git at Jaguar, we have been self-hosting our repositories on one of our servers.
The reason for this wasn’t a distrust of external services, and certainly wasn’t borne of some desire to give me more sysadmin tasks to distract from coding. Rather, it came down to one reason: GitHub’s inflexible pricing model.
GitHub uses number of private repositories as the metric to differentiate between low-end and high-end accounts. The line of thinking, no doubt, is that larger companies have more projects, thus require more repositories, and so this can be used as a measure of a customer’s size.
However, this is at odds with the way many groups, including us, use Git. Git is a filesystem, and there are a lot of legitimate use cases that utilize a large number of repositories but still represent a relatively small level of usage, usage that certainly doesn’t warrant $100+ a month for hosting. A widely-read blog post, If Dropbox Used GitHub’s Pricing Plan, points out in humorous fashion that GitHub charging by the repository is as asinine as if Dropbox charged users by the folder, rather than by a metric that represents actual usage (storage space used, in Dropbox’s case).
Of course, while this matters to us greatly, I sincerely doubt that GitHub is shedding any tears at losing the ~$10-25 a month that would be reasonable to charge us. They’re profitable and in charge of their own destiny. Good for them. But our problem remained unsolved.
The next biggest VCS hosting site, Bitbucket, ran an incredibly distant second to GitHub. They were “the GitHub for Mercurial users”, which held no interest with us as we had firmly jumped on the Git bandwagon. They also had the same problem as GitHub: plans were differentiated by private repository count, and they lacked many of GitHub’s nicer features. So, we didn’t think much of them.
Then, they were acquired by Atlassian, and moved all plans to unlimited space & repositories, opting instead to use number of users as the metric for differentiating between account levels.
Then, they started supporting Git. Now, things were getting serious.
Along with the pricing restructure and Git support came a steady improvement in features. Bitbucket has not been shy about taking inspiration from GitHub, and implementing some of GitHub’s better features. What was once written off as a second-rate GitHub clone has become, at very least, a first-rate clone, and with better pricing.
GitHub still remains the place to be for open source code, because of the scale of the open source community that has bought in to developing and sharing on GitHub. I don’t think that’s likely to change, nor does it seem like Bitbucket is angling to try. Instead, Bitbucket is catering to private development, and users of other Atlassian services.
When we first started using Git, we used a hosting tool called Gitosis. It was a good tool at the time, but development stalled, and it was eventually replaced by a similar but more capable tool, Gitolite. Both of these tools are scripts that are triggered by SSH connections to a specific user account (usually named “git”) and kick into action when the incoming user’s key matched up with a valid user in the script’s config.
I have nothing but good things to say about Gitolite, which served us extremely well, and is now the underpinnings to the exciting GitLab project, an open source GitHub-style web-based Git self-hosting tool. (I played around with GitLab a bit, but we were ready to get out of the Git self-hosting business.)
Unfortunately, Gitolite’s SSH-key-authentication-only nature meant that we couldn’t use Bitbucket’s nice migration tool. Of course, since this is Git, we could always create a new repository on Bitbucket, add it as a remote on our repos, and push it. But we have a lot of repositories and this seemed too time-intensive.
What we needed was to make our repositories available over password-protected HTTPS, which the Bitbucket migration tool could work with. This involved two steps. The first step was setting up a webserver to host the repositories. Apache was already running on this box, so it was a simple case of creating a virtual host with its docroot as the folder of repositories, and creating a digest file for HTTP Digest authentication:
<VirtualHost *:80> ServerName www.url-for-our-git-import.com DocumentRoot /path/to/the/repositories <Directory /path/to/the/repositories> AuthType Digest AuthName "Jaguar Design Studio git" AuthDigestDomain http://www.url-for-our-git-import.com/ AuthUserFile /path/to/my/digestfile <Limit POST PUT GET> Require valid-user </Limit> </Directory> </VirtualHost>
Step two was to prepare the repositories for being hosted over HTTPS. The first thing was that the Apache user (www-data) needed to be able to read the repositories. There were a couple of options here, from making the repositories world readable, to adding the www-data user to the git group (which was the group owning the repos). I opted to make the www-data user part of the git group, as it was easier to undo (remove the www-data user from the git group line in /etc/group, versus changing the file permissions on every repo).
The other part of step two was to run “git update-server-info” on each repository. This, according to the man page, generates an “auxiliary info file to help dumb servers”. This was not needed with Gitolite but it would be required for our “dumb” temporary HTTPS hosting.
Rather than deal with 100+ repos individually, I scripted this:
#!/bin/sh for file in * do if [ -d "$file" ]; then cd "$file"; git update-server-info; cd ..; continue fi done exit 0
Simple enough script, just naively loops through directories and runs “git update-server-info” inside each one.
Now, we were ready to import.
Importing each repo was a matter of entering the URL and login info for each repo, picking the new name for the repo, and clicking the Import button at the bottom of the screen.
The process was a lot less painful than I had anticipated. Had we realized how easy it was going to be to move, we might not have held out for so long.
The only thing that would have made the process even nicer would have been a bulk importer. As is, though, the process was smooth and easy.
We’re now up and running on Bitbucket, and on to the task of updating all of our deployment scripts. We’re happy with Bitbucket so far, and our complaints are being registered on the Bitbucket issue tracker. Given the good job Atlassian has done with pushing Bitbucket forward, we’re pretty confident that the service will continue to improve and keep us happy.