Thursday, February 08, 2007

Distributed vs Centralized revision control

In the last few years we've seen the rise of the so called distributed revision control systems. These are, essentially, tool support for maintaining your own personal version of an open source project.

Let's take, as an example, the WINE project. They use Git, the same version control system as the Linux kernel. Suppose I have a favourite application that I would like to run using WINE. There might be some things that my app requires which WINE doesn't yet support. I don't mind implementing these things, but I really couldn't be bothered going through the hassle of satisfying every whim of the WINE developers (how they like their patches submitted, how they like their changelog written, etc).

What happens when WINE releases a new version which I want to upgrade to? Do I keep an entire source code tree around just to run my favorite app? What if I have more than one favorite app? Git lets me update my source code tree and merge any updates that conflict with my local changes. But it does more than that.. it lets me "check in" my local changes, to a local repository, and if some member of the WINE team happens to implement something that I implemented to get my favourite app to work, I can tell Git to drop my change and use the WINE developers (cause he probably knows what he's doing more than I do). Git makes it really easy to maintain my personal tree.

Let us, for a moment, consider the alternative. Centralized revision control systems, like CVS, don't make this easy. Sure, I can make local changes to the source tree then, later, I can update and fix any conflicts and, this way, I can maintain a personal tree, but it is not easy because I can't "check in" my changes. This, I think, actually has a beneficial upside: it encourages people to go through the effort to sanitize their personal tree and check it into the repository. These changes become part of the core distribution and everyone benefits. Not only can I now run my favorite app, but everyone who wants to run that app can also run it.

Unfortunately, it takes a lot of time and effort to sanitize your tree.. and not everyone wants to do that. Beyond all other reasons, I believe this is why distributed revision control tools exist. Some people might want their "personal" tree to be for their personal use only, but I think the majority of people don't really care.. they keep their tree personal just because it is too much effort to make it public. Even with Git (and the others) it is easy to make your personal changes public, but most people don't bother because it is *some* effort. It is only the core developers of a project which make their changes public; the same developers that would check in their changes to the repository if they were using a centralized revision control system.

Is there as some middle ground? I think there is.. using the centralized revision control system, Subversion(SVN) it is very easy to create branches. Unlike CVS, which made branching a nightmare, SVN provides a server-side copy command which uses copy-on-change semantics. As such, making a branch is really just copying the tree to somewhere else in the repository, usually a directory called /branches. Also, SVN makes it easy to setup per-directory access permissions.

So, taking the example above, what would it be like if the WINE project decided to use Subversion, and to use it properly? If they set it up right, I could go to the website and download a script.. let's call it getwine.sh. I could then run ./getwine.sh QuantumG and it would connect to the source code repository and create me an account. The account would be read-only to the entire repository, except to the directory /branches/QuantumG/. Under there it would copy the WINE source code, then the script would fetch the source code to my machine, change into the directory, and quit.

I can now make my changes and check them into the WINE repository. One of the great things about Subversion is that it gives a unique number (called the revision number) to every check in. So, should I get to the point where my favorite app is now working, I might feel like telling the world.. so I'd post a message to the wine-devel mailing list and say "hey everybody, I just got my favorite app to work! Take a look, the revision number is 23893." People who liked that app and wanted to run it with WINE would just merge the changes from my branch into their own branch. Maybe one of these people would think "Gee, that's really cool, I think everyone should have that, I'm going to sanitize these changes and get it checked into the trunk." They might not consider this much work because they haven't needed to do all the work I did to get it working in the first place.

Of course, all this can be done with a distributed revision control system too, but it is significantly more effort. Maybe this effort will go down in the future. Maybe something approaching the above ideal can be achieved using distributed revision control systems and public branch repositories. Hopefully, the people who make Git will see the wisdom of making all trees public until you explicitly set them to be private (for those people who really do want a private branch) instead of the current practice of making trees private by default.

1 comment:

  1. I think easy and effective is most important for source control tools.

    Catherine Sea
    http://www.scmsoftwareconfigurationmanagement.com

    ReplyDelete