Is distributed source control always the right answer?

by Jeremy Jones

I've been wondering that lately. I've been using Subversion for ... well ... what seems to be a lot of years now. Looking back at the dates for Subversion's history and coinciding them with events that were happening in my life, I'm guessing that I started using Subversion no later than the end of 2003. (So, maybe that's not "a lot of years now"...) And I was using CVS from about 2001 until I started with Subversion. During the majority of that time, I have been either the sole commiter of the code base that I was working on, or one of very few people working on the same code.

Recently, I started a job where I'll likely be working more mingled in with other developers on the same code at the same time. Everything is set up using Subversion. Before starting this new job, though, I began looking into distributed source control, which is the cool new kid on the block. I've created some personal projects using Bazaar and have glanced at Darcs, Mercurial, and Git. I like Bazaar a lot. It can be a little sluggish at times (like pushing, pulling, and merging), but not unbearably so - and I expect that it'll get better. I keep running through my mind how moving to a distributed model would impact the work flow with my co-workers and I'm not totally convinced that distributed is always the way to go.

I just finished reading this piece of a conversation with Linus Torvalds regarding Git, and I remain unconvinced that going distributed would be the best thing for us. And I'm guessing that maybe most small teams of "closed" development probably don't need a distributed source control system, either. It seems that the problems that have spawned this new model of source control is more of a problem for open source development, particularly of larger projects, and less of a problem for smaller proprietary development. For example, it's really important for Linus that Linux kernel developers (or anyone, really) be able at any moment to create a new branch. It's important for Linux that people be able to experiment with new kooky ideas and maybe come up with a cool new feature to go into the kernel. It's also important to Linus that people be able to do so in anonymity. I can see how this would be important for a project that has potentially tens of thousands of developers interested in experimenting with the code and are doing so on their own free time. I think this is less important when a small team is being paid to work on a code base. Typically, you don't have the leisure time to perform experiments. If you do need to work on an experimental feature set, it's not a problem for a repository admin at work to create a branch for you. And anonymity isn't typically necessary at work. At least, not anywhere I've ever worked.

I can see how having an "off-line" repository could be helpful. But from what I've heard, svk should address a lot of those issues. And most of these distributed systems are reputed to handle merging between branches better, which would be nice. I love Bazaar and would love to use it at work, but I'm just not convinced that it buys us enough benefits to switch from Subversion. Does anyone have convincing reasons that a small, closed source development team should consider switching to a distributed tool?

25 Comments

Paul
2007-12-14 09:28:16
Jeremy-


I recently also started a new job, and one of my tasks is to handle our repository. I've been handed a Mercurial (hg) "repository" that basically a fork of a CVS repository. The idea is that we fork, continue to bring in updates from the other team, and at the end of the release cycle, we merge back up with them in CVS. These are the two things I considered when decided to move the team from Mercurial BACK to svn.


(1) One of the developers decided that Mercurial was best for their needs. However, that developer has been having difficulties using the hg interface (distributed version control is a different animal). Every one of the other developers is having even more problems with it. They spend more time "learning" Mercurial than writing code.
(2) There are only four full time developers on this team. Regardless of how much code gets changed, svn would work perfectly for this team. svn handles merges by saying "Hey, this other developer already checked in code that conflicts with yours. Fix it before you commit." At previous jobs, that sometimes meant there was a race to commit so you didn't have to worry about handling the merge. Still, merges between hg repositories are orders of magnitude bigger than the merges you'll see when you only have a day's worth of code.


Linus created git to solve a big problem. It worked. git is absolutely fabulous for working with the Linux kernel. Entire teams can maintain their own branch of the linux kernel, merge patches around, and everything. It's great. But shoot, if you're the sole committer, and you only have one repository anyway, why would you need git?


I think you're absolutely right. Distributed Version Control is not always the answer. Sometimes it's overkill.


Paul

Jeremy M. Jones
2007-12-14 09:39:56
Paul,


Thanks for the post. I think if a large part of the work the team needed to do was merging between branches, a distributed solution would be an excellent solution. Merges for svn *are* pretty broken (as you pointed out), both when a developer tries to "merge" his changes into the main branch by doing a commit, and really merging from one branch to another branch.


And, agreed. Linus created git to solve a problem that he, and many other folks have. If you have a similar problem to the Linux kernel development team, it'll likely work beautifully for you. But it's not for everyone. Good post!

Michael Peters
2007-12-14 13:18:16
I'm pretty happy with SVK. I get a distributed SCM without forcing anyone else on the team to do it. They keep right on using SVN. Of course whenever we need to do a big merge between branches, I'm always the one that does it since it's so much easier to do in SVK than in SVN.


SVK also has some other niceties with it's interface. It's just like SVN without all the little annoying bits :)

Aristotle Pagaltzis
2007-12-14 14:10:32

SVK is a castle built on sand. The Subversion filesystem is… just not that great, to put it charitably, and SVK does not fix that. Heck, both Subversion repository formats are a step back from CVS… that is more than a feat.


As for “distributed”, don’t forget that you can always use DVCS in a manner identical with centralised VCSs by setting up a star topology, ie. a repository server in the centre that everyone pushes to and pulls from. However, you get all the advantages of DVCS at the fringes of the star: excellent merge handling, the ability for developers to develop patches outside of the central repository without being cast into a ghetto with no version tracking, the ability for developers to collaborate on a piece of work before pushing it to the central repository, etc etc.


And the fact that local commits and diffs are blazing fast should not be discounted. It sounds like a “yeah OK shrug” feature, but Linus is right: when you make something that used to take 5 seconds finish in 0.05 seconds, that is a qualitative difference that completely changes the way in which people work.


Having all history available locally is “just” a bonus.


There is nothing that you can do with a centralised VCS that you can’t do in the same way with a DVCS, but there is a whole lot that you can do with a DVCS that you can’t do under a centralised system.


The sole and only reason to stick with a centralised VCS is better tool integration. But that will not remain valid indefinitely.


As for DVCS being difficult to understand, that seems like FUD to me. It isn’t difficult to understand, it’s just a bit of a culture shock. You need to be more deliberate with a DVCS – more mindful of what you are trying to achieve when you kick off a particular action – because the model is more flexible. Centralised VCS work better for mindless use, because they provide far fewer options. But to me that seems like an issue of socialisation, not one of inherent difficulty.


To use DVCS, you have to spend a bit of time building a mental model of how it works. But I don’t see that as a terribly high barrier; I spent a few days reading the Subversion book when I started using SVN, and likewise I spent a few reading the Mercurial book when I started using that. It has a nice coherent exposition of the DVCS way of doing work. I’ve had no issues understanding how to use Hg whatsoever; and I don’t consider myself a genius exactly.


Git is a bit harder to figure out because it exposes more of its innards. But it’s getting easier all the time, and using Hg (and its excellent book) as a bridge make into the DVCS world makes the transition pretty simple.

Jeremy M. Jones
2007-12-14 14:18:29
@Michael,


I'll have to check out svk now. That seems like a reasonable alternative/compromise. I get offline and merging goodness and I don't even need to get into a discussion with the team about switching to another system.

Jeremy M. Jones
2007-12-14 14:32:48
@Aristotle,


Thanks for the perspective from the DVCS camp. I'm definitely going to keep playing with bzr on the side and check out Mercurial and others. But I am not sure how greatly my organization would benefit from switching at this point. I'll definitely keep an eye on DVCS for the future.

Manuzhai
2007-12-15 14:22:08
I've been advocating Mercurial a lot recently.


Without addressing any of your other points, I'd like to point out that I feel Mercurial is a much better match for small projects than Subversion specifically because it is so much more agile, nimble than Subversion. Creating a project is only a hg init away, cloning is very easy, and you don't need to set up a complicated server that does all kinds of difficult things.


Mercurial is much easier to use if you just want to version a directory for a bit. Also, the pulling/pushing model was very natural for my colleague (a VCS newbie) when I explained it to him. It just needs a bit of time if you're used to SVN, but that's because the centralized model is something you have to unlearn, not because it's better (you can still have a somewhat centralized model using a decentralized VCS, but that's still better).

Manuzhai
2007-12-15 14:24:19
Something I forgot: at this point, I think the only valid usecase for SVN is big repositories that need partial checkouts and complicated ACL setups. All other cases can be covered by the distributed systems, of which I think Mercurial is the best (particularly because it is the most focused and therefore the most usable).
Danno
2007-12-15 18:20:03
I've only just started to try using Darcs but one of the things that strikes me about it is the emphasis on commits being atomic by feature/bug rather than "Okay, I've worked on this for a while now, let me check it in."
Jeff
2007-12-16 01:44:10
I respectfully disagree with you, and I would challenge you to spend some time developing with git as the source control for even a personal project. Even working alone, I find that it is a much better tool than subversion. First, branching is so cheap that it is no longer something you do when you are taking a project in a whole new direction or something. You can create a branch for every little idea you want to play with. It's actually a really liberating experience once you get into the gist of things. You can experiment with all kinds of refactorings or what have you, and it's all really lightweight. Beyond that, local versioning is just awesome. Of course being able to code on the train or airplane and still have my branches and versioning is huge, but I think it mostly has to do with the feel of the development process. Checking in is incredibly quick since you don't have to go over the network, so you can check-in every time you get a new bell or whistle implemented. This actually makes it easier to cherry-pick features later on if merging with something else, and it forms a nice log of your development in the project. Git is so much more responsive and powerful than subversion. Once I switched over about 2 years ago I've not gone back to SVN, which is where I was for the previous 5 years or so. The interface is getting cleaner with time, although it could still use some refining, but just as an exercise I think it's worth taking it out for a spin.
Jeremy M. Jones
2007-12-16 02:49:57
@Manuzhai,


I'm not going to totally disagree with you. I really enjoy working with bzr. (Man, I've *got* to play around with Mercurial some time.) But setting up a svn repo is really no more work than setting up the bzr stuff I've been doing. It's just `svnadmin init {{project name}}`, I think. But I still see some benefits of going distributed. My point has been I'm not sure there is enough benefit to advocate switching my team. Migrating the repo from svn to (bzr|hg|git|darcs) would be a substantial effort. And the payback doesn't seem like it would "pay for" the effort.


I totally agree with the "unlearn" point. I'm not willing to say at all that centralized (svn) is better than distributed (hg|git|bzr|darcs). In fact, I'm inclined to think the opposite. Any behavior that you've learned does have to be unlearned in order to move forward with a better behavior. Good point, there.


And regarding partial checkouts, et al. I think that you can probably get nearly the functionality with a distributed setup by breaking projects up into smaller chunks. (In fact, Linus, in the article I reference above, said that you should certainly break up repos into smaller chunks for git.) You should be able to impose an acl on at least an entire repository, so finer grained acls could be part of breaking a project up. So, I don't think that svn is necessarily the clear winner here.

Jeremy M. Jones
2007-12-16 02:55:35
@Jeff,


Thanks for disagreeing (respectfully, no less!). And I take up your challenge. I just have to figure out which new project I have on the horizon I could stuff into git. You make it sound so appealing. Branch for any idea you have and I presume merge it all back together before pushing it up to the central repo. You can do similar with svn, as you know, but merging is so much more of a pain. And still one of the huge appeals to going distributed is having your repo with you wherever you go. Thanks for the post - and for challenging me!

joaedmonds
2007-12-16 05:12:39
Hello Members,
Can you help me to find
The best soft to PC-where to buy, join site & other tipe to do a home online business.
Thanks for the info.
Dan Fitch
2007-12-17 08:10:40
For what it's worth, I develop in subversion at work. I developed in subversion at home. Stored everything from code to dotfiles to plaintext writing projects in it. I just switched half of my personal repos to git, and after a rather wacky learning curve, I am never, EVER, going back. In the original post, you say "from what I've heard, svk should address a lot of these issues." svk might fix some things, but git is a whole different beast.


At first, I stumbled over some things, but it doesn't take all that long to get rolling. Now, let's say I want a different config.h for something on my old crappy laptop. With subversion, maintaining branches for things like that was so difficult that I usually just plain didn't branch and hacked together some symlinks. With git, the files can be branched and merged properly. The DVCS zealots might be on to something, even for single-user workflows.


I think this approach might be overkill if you are a single person always doing all your work on a single system. But how often does that happen these days?

Daniel Berger
2007-12-17 11:21:05
Wake me up when git works on Windows (without cygwin).
David F. Skoll
2007-12-17 19:51:56
I used to think SVN was "good enough" for our small (3-person) team, but having switched to git, I'll never look back.


For me, the key isn't necessarily the distributed nature of git. It's just the ridiculous ease with which you can branch and merge. It makes our (probably typical?) workflow of "work on feature X ... oops, interrupt! Fix Y... now go back to X ... now merge" almost painless.

Ulrik Mikaelsson
2007-12-18 02:32:18
I can easily tell you what in SVN tipped me over to the DVCS-side. I had been looking at DVCS for a long time, I was very curious, but didn't have enough reason to switch until I suddenly had a completely impossible merge in my SVN repo. I'm not sure if this still isn't handled in SVN, but if so, it's quite embarrasing.


Create a repository, create a main branch TRUNK, work some on it, branch off into a new branch B, and work a little on both, to finally merge back B into TRUNK. So far so good (more or less). Then work some more on B and TRUNK, and again try to do a complete merge of B into TRUNK. WHAM.


At least 2 years ago, when I made the switch, SVN didn't even keep track of previous merge-point, only the original branch-point, so whenever I tried to merge back a branch that had been previously merged, it tried to merge EVERYTHING again, even already merged stuff. That, of course, didn't work.


After that, I evaluated BZR, git and Mercurial (I'm still kicking myself I didn't evaluate Darcs as well). I finally chose BZR for my team, due to it's very smooth transition from SVN. (Good branch-conversion-tools, almost the same syntax, requiring little re-learning syntactically. BZR did almost everything I wanted, I've so far only found two weaknesses in it:
1. It doesn't take precautions for UNIX/DOS-style line-endings, which means that if someone edits a file with the wrong editor, every single line is going to be marked changed. Very frustrating and annoying, especially when you haven't paid attention, and discover it 5 revisions back in the repository.
2. It has no notion of a changeset, only revisions, making cherry-picking and change-reordering, (I.E. to fix the line-ending-problem from 1.) non-existant. It's very cumbersome.


I would recommend BZR as a good way to get a hang of DVCS, but myself I'm a bit tempted to really get going with git.


I'm very happy with DVCS, and I really think it's ALWAYS the right answer, if for nothing else, for it's superiority branching/merging. Even IF you want to keep working with central repositories, use with a distributed tool. It makes your life easier.

Gil
2007-12-18 02:37:20
It's about the math that control moving patches/changesets around. Distributed source control systems such as Bitkeeper or darcs have well-defined mathematical model for the graph that is your set of changes to a source corpus.


See http://en.wikibooks.org/wiki/Understanding_darcs/Patch_theory for a good description.

Bill
2007-12-18 03:16:34
Jeremy,
My team uses subversion for day to day work (most are former long time CVS users) they don't quite have the hang of committing one feature and branches make "too much work". Because we have different releases for slightly different hardware platforms where a big part of the code is common; there is a lot of merging to do between releases (branches). Since I do this work and I found SVN unable to do the most trivial merges (w/o a bunch of tracking by hand) I switched to git (git-svn to be exact). No one else knows or really cares. git-svn isn't perfect but it makes merging and cherry picking features/bug fixes from one branch to the other and putting them back in subversion so easy it is more than worth it. It also allows me to have the members of my team use/create there own branches and lets me worry about merging them into the release branches. Eventually they'll see the light and use git; but in the meantime merging is trivial and I don't have to worry about having to do a lot of merging by hand.


Bill
PS. The distributed part of git isn't its biggest strength for my use case it is the branching, merge tracking, and viewing of history.

Dick Davies
2007-12-18 05:22:05
SVK is horribly confusing and just a messy solution. Try out mercurial and see what you think, I was gobsmacked how much more lightweight (and fast) it was compared to svn (and I was a big svn fan).
Steve
2007-12-18 08:27:39
I've been in many a Subversion 'strategy' session in the past. You may have seen a couple of these yourself, with lines for the trunk, the release branch, development branches, etc running all over the place on the closest whiteboard. There are usually two points of view in such strategy sessions, and they can be boiled down to the following:
1) Branching BAD! This is a pragmatic outlook taken by people weaned on the limits of systems like CVS and SVN. Such version control systems have traditionally had extremely poor support for branching and especially merging.
2) Branching GREAT! A sure sign of someone green to SVN/CVS.


I don't know what you consider a small team, but in one of my more recent gigs, I was at a company in which there were < 10 developers. Distributed became a huge deal, because so much time was spent coming up with a strategy for how to handle the Subversion system that it actually handicapped the group's productivity (is the trunk for development code? for release code?). In the end, all players kind of figured out how to work around the source control system ... until it eventually bit them.


The good news is that if your operating system is Linux, you have a wonderful tool called Git which eradicates the whole issue. No one argues against branching and merging because they are in no way problematic and seem as natural as breathing. Want to create a branch for every bug you encounter? No problem, knock yourself out. Heck, you don't even have to maintain a bunch of separate directories!


Now for those that can't quite divorce themselves from SVN, there is great news. Git ties in quite nicely with Subversion. If you can't reason beyond a central repository, have the best of both worlds!


If you are a small shop or even a one man gig, I would certainly advocate using Git. Not only will your repository be a heck of a lot smaller than if you'd used Subversion, but it will also be blindingly fast. As both systems are incredibly easy to back up, I just do not see a compelling advantage to Subversion over Git, unless you are constrained to a non-Linux OS for development.


I can understand your difficulty with the distributed repository concept. Afterall, most people still think that SVN keeping a single revision for all files is a terrifying concept (oh the humanity! Lets create a bazillion repositories to solve the 'problem'!)


Good luck,



Steve

lark
2007-12-18 18:56:22
You can use distributed source control system the same way as central one.


Notice, besides the various goodies for merge management, you have another bonus that you can speed up local iteration. When use a central one, you face a problem: commit or not. The common commit rule is you don't pollute repository with dirty code, but in the local iteration phase, you usually want to commit often.


You may not need to try various experimental features, but for just a single feature, you still have to implement it step by step , trial, refactor, etc. With local branch and commit capability, you do it efficiently.


And, considering you can work offline, diff, view history, etc. You don't have to be fast networked, or even networked.

Robert
2007-12-19 06:01:33
When git gets a native Windows version...let me know.
Noah
2007-12-25 11:45:18
Jeremy/I watched the Linus talk, and I am interested. Good post.
Yesudeep Mangalapilly
2008-01-31 09:42:28
@Daniel Berger and Robert


You asked for it:
http://code.google.com/p/msysgit/