OSCON Day 1: Subversion Tutorial

   Print.Print
Email.Email weblog link
Blog this.Blog this

Robert Kaye
Aug. 02, 2005 05:25 PM
Permalink

Atom feed for this author. RSS 1.0 feed for this author. RSS 2.0 feed for this author.

In this morning's tutorial Brian Fitzpatrick introduced the Subversion version control system and compared it at great length to the venerable CVS version control system. Brian stoked his presentation with a lot of history that explains how Subversion arrived at its impressive set of features. Subversion come into being when a number of software developers were fed up with CVS and wished to create a system that improved on CVS. The goal of Subversion is to support the same features as CVS initially and then improve on the system to eventually surpass and displace CVS.

Brian outlined the current problems with CVS since many small projects use CVS successfully and have never really run into problems with it. But, many corporations that have attempted to run CVS on a larger scale have run into serious problems and performance bottlenecks. CVS works only on a file-by-file basis and during a commit one file might commit OK whereas other files will fail -- it's not very transaction oriented. CVS has problems dealing with binary files, wasn't designed for network use and cannot act directly on a repository -- a local copy is always needed. If you look at the features and strong points of Subversion you can see the stark contrast to CVS.

The first and most drastic improvement in Subversion is the concept of a global revision number that gets updated each time any file is checked into the repository. CVS users may not be comfortable with a single revision number for a repository, but this is really Subversion's strength. To underscore this importance of this revision number Brian was wearing a shirt with r8810 emblazoned on it. When an audience member asked what it referred to, Brian jumped up with joy -- he was waiting for someone to come and ask that question. It turns out that r8810 is the global revision number when Subversion itself went to release 1.0. A good way to drive one of the most important points home -- well done!

The global revision number comes from Subversion's atomic commit feature that commits files in a single transaction: either all files are committed, or none are. This prevents collisions from happening if someone else has changed the repository at the same time that you are committing files.

The next improvement deals with the handling of binary files -- CVS stores binary files whole and for each new revision a new file is stored. If you make a 2 byte change to a 10MB file CVS would gladly suck up another 10MB of disk space. Subversion only stores the difference between the two files, regardless if the file is text or binary. The size of the Subversion repository grows proportional with the size of the changes -- not proportional to the size of the files contained in the repository.

The Subversion creators designed the system to work over the network from day one, and therefore whole files will only be sent across the network when a repository is initially checked out. After that, only diffs between files are ever sent across the network to reduce the overall network traffic. This extends even further to allow a lot of actions to occur without a network connection to the main repository. This idea really excites me -- I can't count how many times I've sat on a plane wanting a diff against the repository, but being left high and dry with no net connection.

Subversion will also never change the content of your files -- not even to expand inline keywords such as $Id:$ or to change the line endings on your files. All of these things are done on the client side, and never inside the repository. Subversion can even allow you to move files inside the repository gracefully -- it keeps track of the locations where a given file has been. This is a profound change when compared to CVS -- with CVS once you checked a file in, it was there permanently. Moving files in CVS will cause all sorts of problems and is not recommended -- I'm really glad that this is possible in Subversion.

To address CVS' performance issues, Subversion introduces the concept of cheap copies. Cheap copy files require very little space and can be copied fast in constant time. Based on this concept, Subversion provides branching and tagging features. A branch in subversion is simply a copy of the trunk that can be modified, and a tag is a copy of a file that is never modified. Applying a tag to a repository in CVS was a slow process since it needed to touch each file in the repository. Since Subversion uses cheap copies for tags, it becomes a constant time function and tags can be applied to a whole repository nearly instantly. Brian pointed out that tags may not really be necessary anymore since the global version number can be used instead of a tag -- a tag is now just a more human friendly way to express the global version number.

There are many more facets to Subversion that Brian covered in the tutorial -- if you're interested in delving into Subversion for your own projects, you may want to check out Subversion's metadata features that allow the user to associate key=value pairs with directories and files. You'll also find that Subversion's command line syntax is not completely unlike CVS' syntax, but improved and simplified. (really, who would think to use the update -j command in CVS to merge a branch?)

However, when setting up Subversion you'll need to consider if you want to run it as a standalone system or if you want to run it inside an Apache installation. Subversion has many more configuration options and network protocol options (e.g. WebDAV) which make it considerably more flexible than CVS. This also means that you'll need to put more thought into how to set up your first Subversion repository.

It's clear that the Subversion team carefully analyzed CVS and all its shortcomings and then set out to create a replacement that surpassed all aspects of CVS. Brian certainly did a good job of presenting Subversion and why someone should use it. I personally plan to investigate how much effort it will be to move MusicBrainz over to Subversion. I've been frustrated with CVS many a times.

Robert Kaye is the Mayhem & Chaos Coordinator and creator of MusicBrainz, the music metadata commons.