Your version control system is not a file system

If you find yourself needing to check binary files into your Version Control System, something isn’t right. Your VCS is optimised for tracking changes to source files. When you have multiple revisions of a source file, the VCS has stored the original file and the changes between revisions. This is good.

When you check in a binary, it doesn’t really do that. Most systems just keep a separate copy of the binary for each revision. So if you store 10 revisions of a 100 megabyte file, you can kiss a gigabyte goodbye. You might argue that disks are cheap. Unfortunately the cost of storage isn’t the issue. It’s the downtime to upgrade the server, it’s the admin overhead and risk of moving all of your data to a new disk. Sure you can do it.

Or you could stop using your VCS as the most expensive file system in your organisation.

(image from D. Meutia’s photostream)

Update: I wrote this in response to a contractor putting a 325mb file into my previous employer’s Perforce repository. I should qualify some of the statements in the post – for example there’s every reason to put small binary files in as part of your app. I think most people choose to check in binary dependencies into their projects rather than take the Maven/Ivy route.

6 thoughts on “Your version control system is not a file system

  1. Douglas Squirrel says:

    Hear hear! See my response postfor a cautionary tale of woe caused by big binary files. But what are you supposed to do with Excel or Word files, which don’t have a plain-text form but still may need versioning?

  2. Fabrizio Dutra says:

    Excell and Word files are note derivate files and must be consider as source files and it is normal.
    The problem is versioning derivate files… (this can be also a JSP files in some systems)
    Versioning derivate files is a bad practice can leave your versions not-repeatable and with difficult to maintain.

  3. […] maintenance is carried out.  Deployable builds should be labelled, but big binary files like this should not be kept in version control. An ivy repository is an […]

  4. a visitor says:

    You’re an idiot. Do you know how many kinds of binary files exist in organisations that need to be worked upon by a team, with all the changes tracked over time? What about creatives with images like PSD’s? How do you propose tracking changes to these files? Create a new folder on the filesystem for each changed version? Oh no, wait – that’s exactly what you’re saying is a bad idea. Or how about just don’t maintain old versions… no whoops that won’t work either because then there’s no versioning at all. Or how about… you have a version control system that just uses the binary deltas to track changes in binary files? Wow, guess you never thought of that before writing an article and bothering to publish it.


  5. simpsonjulian says:

    Dear Visitor,

    I love comment and debate on my blog. If you’d only make a comment that wasn’t abusive, we’d talk about it and I’d probably update the post.

    Can I suggest that if you’re going to make abusive comments, you don’t do it from [presumably] your employer’s netblock?


  6. Claudio Bezerra says:

    Hi, first I’d like to praise a good iniative. I’ve not seen much material on the web about software building seen from a perspective of software engineering.
    I saw that one visitor, Fabrizio Dutra, mentioned that versioning derivative files is a bad practice and I agree. However not everyone at my office agrees. Do you know of books or articles that confirm this assumption?
    Thanks in advance!

Comments are closed.

%d bloggers like this: