Category Archives: Opinion

Go with the grain

Go with the grain. When you’re planing wood, you have to follow the grain of the wood. The grain has a direction, which depends on the cut of the wood. (which is the way it grew). Do that and you’ll end up with thin shavings and a smooth finish.. Go against the grain, and you’ll rip out chunks.

I’ve worked with Ant builds that did the same thing. Instead of using the features of the tool the way they grew, people have fought against them. The result is usually expensive and wasteful.

Many of our tools (in my opinion, the best ones) are optimized to do one thing well: that’s their grain. Go with it, or choose another.

Tagged ,

The hidden cost of building

The hidden cost of buildingThanks to EJ Ciramella for this thought provoking post. There’ll be a Build Doctor T-Shirt on it’s way to him soon.

In this down economy, irrespective of size of company involved, people want to save money, limit costs and increase throughput of their systems. One area of savings is the build and continuous integration environment.

With the smattering of continuous integration servers available to said company’s release engineering staff, many offer non-acl controlled build buttons. Without this control point, anyone in any department attached to the corporate network can click off a build, regardless of the readiness of the code within source control.

Where I work, we’ve recently dropped CruiseControl for Hudson. A few colleagues have come by since the rollout asking where the build button is because they don’t have access. When pressed about why they’d like to manually spin a build, the resounding answer has been “to see what it looks like in Hudson”. This is the exact situation we’re trying to avoid and the exact subject I’ll try and illuminate within this article.

Before diving any deeper, there are a few things I think release engineering for any company must understand. Each company has a unique workflow, from project concept, to design, to implementation to release and off into support mode. What works great at one place may or may not work at another company. There are industry best practices and white papers aplenty, but if you find it difficult to follow any of these at your company, the best approach in most cases is to take what you can from these documents and carefully plan an evolutionary (not revolutionary) process to reach a tailored solution.

With all of this in mind, let’s cover the various steps and the unseen costs to performing builds.

Here is a high-level a list of things that happen when we spin a release type build.

  • SCM label

Initially, we used to label first and ask questions later. The mindset is, if the build fails, a developer could sync (we use Perforce) to a label and get exactly what the build server had used to generate the failure. Since the Perforce plugin for Hudson doesn’t operate in this manner (and a few other ways that we ended up altering the plugin to suit), we made the switch to labeling only if the build passes. Since very few developers ever did take time to sync by the labels, the failed build labels were just a waste of space. Either way, depending on the size of the codebase getting the label attached, there is disk space and memory consumption that happens on the Perforce server.

  • Build node (CPU/RAM/HD)

I’m sure that the visitors of this site are savvy enough to understand the beauty and flexibility that come with a distributed build system. I use the term “distributed” here loosely as any given build is not farming out various parts of compilation, just individual jobs (in Hudson parlance) or in some cases, individual maven modules. By the time a build reaches this queue stage, the job has now consumed a executor within the cluster. In our current cluster, we have three slaves and a master. The master and two slaves are only allowed one executor. That third slave has six. If this build is forced to run on the “singleton” nodes, that node becomes unavailable until that build is completed. Thankfully, our build times are short (the longest is 20 mins), but because of the speed of the build fired, this I think mentally cheapens the process. Don’t misinterpret this as “let’s artificially inflate build times”, but keep this in mind when a large refactoring of the build process and associated scripts yields a massive time savings (heck, why not push back to get more testing done in that same time frame?).

  • Client spec updates (Perforce Server)

One of the great features of Perforce is the server is where the “what you have” list is maintained. I’ve seen arguments for working offline and if a refactoring happens, yadda-yadda-yadda. But in large-scale corporate environments, many institutional services will be unavailable in this mode.

When a person syncs a project (in Perforce terms, but essentially a directory of directories and files), the server updates its files to reflect what the user ends up with. Same is true with the build process. When setting up Hudson, our SCM configuration choices left us with Hudson managing the client specs. This means, in some cases for us, there is one client spec for each node for each job. Even if only one is getting updated, that is still data being written to the server (again).

  • Artifact storage (both live and backup)

Now the build is finished and we’re going to retrieve the artifacts for storage. Where I work, the build artifacts are stored in a few ways (all on various NetApp slices). They are as follows:

  • Deployable units – These are the actual applications that you push to our various deployment environment. Because of a facilitating maven project that allows cross application dependency validation, some of these artifacts are stored on generalized NetApp slice (to allow us to keep our artifact storage Continuous Integration agnostic – who knows when we’ll switch from Hudson to something else) and replicated to Archiva to allow people to reference certain bits as dependencies from pom files.
  • Libraries – These are effectively the building blocks of the deployable units. The final destination of libraries is Archiva (or your repository manager of choice).

Essentially, there are two places things are stored, Archiva and our “buildartifacs” mount, to reiterate, both of which are NetApp mounts. There is a backup mechanism that keeps around hourly, daily, monthly and yearly restore points. All of this takes up space but if we ever had a complete system failure, we could very quickly return to business as usual.

  • Potential deployment of said artifact

Now that the build is done and someone within the organization has chosen to deploy and test it, they may deploy it (or put a dependency on it and trigger an application build – all with no changes) to a given stack for testing. One of our typical artifacts is close to 280 mb zipped. And to successfully deploy and test, this artifact is extracted on at least two servers and typically has a 139 mb web content artifact also deployed at the same time (as we have to keep these things in sync). Deployments back up the previous deployment (just a few) prior to extracting the new item.

  • Testing requirement of artifact (with no changes)

Once deployed, now comes the tax on the squishy bits (humans) if you have no auto integration testing/smoke/load testing. And if you do have all those tiers of testing, you’ll be consuming each one along the way. Couldn’t everything be doing something more productive rather than re-testing a deploy that contains no changes?

  • Other

There are other fiddly bits that happen as part of each build like sending emails, various cleanup steps, etc., not to mention the can of worms that is dependency generation (if Build-A happens, Build-B needs to be re-spun to pick up the change in Build-A ad infinitum ad nauseum). What if the build artifact has to actually be transferred to another country for deployment? Transferring 280 mb of data while people are trying to sync an as of yet un-proxied Perforce project or retrieving dependencies from Archiva is not a good way to spend your workday.

Everything above consumes hardware resources, time and space that could be reserved for a legitimate build or better testing. From cpu time to memory allocation to the various stopping points and distribution mechanisms. if people go ahead and deploy and try manual, testing now we’re talking about consuming one or more human resources. I cover a single large application above with the quoted sizes, but actually, this maven module generates another 120 mb artifact that is stored in Archiva which is consumed by another deployable unit that is 338 mb zipped.

This is why it’s best to either limit or prevent people from firing builds manually. I’ve taken the tack that if we find people spinning unnecessary builds, we’ll revoke their privileges within the Hudson matrix ACL settings. I’m not opposed to taking away further permissions forcing more to rely on the polling aspect of Hudson.

(image care of swimparallel)

Sticking plaster over a gaping wound

Another thing that I learned in 2008: the build manager can’t be responsible for ensuring the quality of the code that they build and deploy. We had QA’s but there were loads of little things to check: Were the release instructions accurate? Was there a rollback process? Did all the scripts actually work?

I did have some success in getting the obvious issues caught with a test. There were ongoing issues with file encodings until I had someone write a validator that could be run from a unit test framework. The same approach also managed to catch issues like incompatible database scripts.

All in all though, there was too much to do. We managed to get the checklist passed up to the developers and release managers. We won that battle. I lost the war.
(image taken from jluster’s photostream)

Your version control system is not a file system

If you find yourself needing to check binary files into your Version Control System, something isn’t right. Your VCS is optimised for tracking changes to source files. When you have multiple revisions of a source file, the VCS has stored the original file and the changes between revisions. This is good.

When you check in a binary, it doesn’t really do that. Most systems just keep a separate copy of the binary for each revision. So if you store 10 revisions of a 100 megabyte file, you can kiss a gigabyte goodbye. You might argue that disks are cheap. Unfortunately the cost of storage isn’t the issue. It’s the downtime to upgrade the server, it’s the admin overhead and risk of moving all of your data to a new disk. Sure you can do it.

Or you could stop using your VCS as the most expensive file system in your organisation.

(image from D. Meutia’s photostream)

Update: I wrote this in response to a contractor putting a 325mb file into my previous employer’s Perforce repository. I should qualify some of the statements in the post – for example there’s every reason to put small binary files in as part of your app. I think most people choose to check in binary dependencies into their projects rather than take the Maven/Ivy route.

NAnt vs Ant: locations (NAnt Rant)

NAnt vs Ant: locations (NAnt Rant)(Image taken from Nesster’s Photostream)

I think it was 2002 when Dan North took me under his wing and showed me the location attribute of Ant. That was then. Now, I’m doing a lot of .NET build engineering. And I’m dying for this feature. Here’s an Ant build to demonstrate:

<project default="properties">
  <target name="properties">
    <mkdir dir="build" />
    <property name="value" value="build" />
    <property name="location" location="build" />
    <echo>   here's the one with a value ${value}   here's the one with a location ${location}    </echo>
    <touch file="${value}/value.txt" />
    <touch file="${location}/location.txt" />
    </target>
</project>

Both the properties that are set represent a directory. Each has a relative path. What happens when you run it?

Buildfile: /Users/jsimpson/Documents/workspace/playpen/code/props-build.xml

properties:
[mkdir] Created dir: /Users/jsimpson/Documents/workspace/playpen/code/build
[echo]
[echo] here’s the one with a value build
[echo] here’s the one with a location /Users/jsimpson/Documents/workspace/playpen/code/build
[echo]
[touch] Creating /Users/jsimpson/Documents/workspace/playpen/code/build/value.txt
[touch] Creating /Users/jsimpson/Documents/workspace/playpen/code/build/location.txt

BUILD SUCCESSFUL
Total time: 1 second


Amazing. The property set with the location does the right thing and works out it’s fully qualified path. It also deals with any platform specific path seperator issues and presents you with the appropriate path. You may not think that this matters; the touch command worked for the property that used a value attribute, right? Yes, but only because that task will do the work. If you have a task or external command that doesn’t do the right thing, it’ll break.

Nant doesn’t have this. Here’s the same version, which works in the same way:

<project default="properties">
  <target name="properties">
    <mkdir dir="build" />
    <property name="value" value="build" />
    <echo>   here's the one with a value ${value}  </echo>
    <touch file="${value}/value.txt" />
  </target>
</project>

Giving us:

Buildfile: file:///Users/jsimpson/Documents/workspace/playpen/code/props.build
Target framework: Mono 2.0 Profile
Base Directory: /Users/jsimpson/Documents/workspace/playpen/code.
Target(s) specified: properties

Build sequence for target `properties’ is properties
Complete build sequence is properties

properties:

[echo]
[echo] here’s the one with a value build
[echo]
[touch] Touching file ‘/Users/jsimpson/Documents/workspace/playpen/code/build/value.txt’ with ’05/29/2008 21:04:25′.

BUILD SUCCEEDED

It gives the same result, but only because some tasks will work out that it’s a relative task and compensate. Other things won’t.
The astute reader might guess that I let myself get bitten by this again today. Maybe one day I’ll remember this, but right now I’d cheerfully leave the functions, and switch back to Ant in a heartbeat. Please, someone tell me that I’m wrong and that there’s a patch somewhere.

Tagged ,