Category Archives: Tools

Skewer – a tool for provisioning cloud nodes with Puppet

Puppet is amazing.  It changed my career (thanks to Luke , and before him Mark).  However, I have some itches.  I have attempted to write about these before, but haven’t felt like pushing the ‘publish’ button.

I’ve been running Puppet in a atypical way for some time now.

  • No Puppet Master
  • No distribution packaging
  • No commit until I know something works
  • Only test from the outside

The only thing I feel I need to expand on is the last: testing.  Obviously if you write Ruby code, you should rspec the hell out of it.  But should you test Puppet code?  It’s mostly a declarative language.  If you’re properly declaring the outcomes that you want, then it can be easy.  If too much logic creeps in, you’re doing it wrong or you should write a function or type – and you should rspec the hell out of that.  This approach has served me for years with decalarative build tools*.

I have no desire to go verify that Puppet does what I tell it.  But I do care about the outcome.  Also, I need to know that it runs on the target platform, as I use a MacBook.

So I wrote Skewer.   Skewer’s only job is to:

  • Provision cloud machines (or connect to existing ones)
  • Bootstrap Puppet (via shell scripts and rubygems)
  • Run Puppet
  • Optionally run Cucumber features at the end

That scratches my itch.  Skewer probably won’t scratch your itch if you run lots of nodes.  It works on Ubuntu, though adding support for other operating systems wouldn’t be too hard.  You may also like the Puppet Cloud Provisioner.

Skewer evolved from a Rakefile that I used to test my puppet code.  I set out to rewrite it over the Christmas period, and got the last feature passing on Friday.  Like my other open source project, I learned a lot while doing it.    Skewer has some wrinkles, but I use it in my day job, and I’ve managed to keep that so far.

* Okay, I actually do a little bit more.  I use Rake to run puppet parser validate on every .pp file in my project, and I use puppet-lint to catch howlers.

Tagged ,

Simon Stewart on WebDriver’s build system

WebDriver creator Simon Stewart knows a thing or two about building code. So I was intrigued when he mentioned that he’d written a grammar for Rake, to enable building Java code.

Replacing Ant with Rake has been a compelling idea for some years now. Until now I wasn’t convinced that you weren’t going to have the same issues as Ant – poorly factored builds that rapidly evolve into a project specific DSL. This may change things.

The build system, or grammar as Simon calls it allows you to break a typical monolithic build file down into a collection of fragments. Each fragment can have one or more targets declared, and each target has some attributes. More at CrazyFunBuild.

Simon is undergoing an exceptionally drawn-out email interview on the process:

Your build tool is one of a few new players. What was your motivation for adding to the build gene pool? Were you scratching an itch, or do you have a broader motive?

Definitely scratching an itch. WebDriver started off as a simple java
project, but it quickly became obvious that it’d also be useful to
have language bindings for things like C#, ruby and python. I could
have settled on a separate build tool for each language, but there are
places where a Java component depends on a DLL (for example) Switching
build tools repeatedly when constructing a single logical unit seemed
wasteful, so I started looking around for a build tool that would
provide support for all the languages I wanted to use.

I failed, but settled on rake because it had poor support for everything 🙂

The next problem was that as the project grew, so did the Rakefile. It
ended up being obscenely long and increasingly fragile, and in the end
I was about the only person who would confidently hack around in
there. An obviously sub-optimal state of affairs. The first step in
fixing this was to break out common tasks into functions (because a
Rakefile is just a ruby script in disguise) This still left a pretty
large build file to deal with, so the next stage was to allow us to
break the script into pieces. The obvious issue is that if you do
this, where are paths relative to? The location of the top-level
Rakefile? Or the fragment of code in the subdirectory? Worse, it’d be
unwise to have duplicate task names (“test”) but detecting those while
writing a fragment of a build file would be troublesome at best.

At the same time, I like my builds to be as declarative as possible,
only breaking through the “fourth wall” to scripting when necessary.
Encouraging people to leave lots of little scripts that are the pieces
of a larger application as build files seemed like the worst way of
achieving that goal of “declarativeness”. So, I wrote a parser for a
sub-set of ruby (which mutated into a subset of python) using ragel
that parses build files and generates rake targets based on the path
to the build file and the name of the target in that file. It’s by no
means an original idea: the only thing I can take even a crumb of
credit for is the current implementation (and it’s pretty much
designed to work with selenium, so there are lots of corners cut in

By clearly defining the build grammar, there’s also a chance to
clearly define how paths are interpreted, so that neatly side-steps
that problem. I also provided an “escape hatch” so that you can call
out to other rake tasks as required. Better this is just a thin skin
around other build tools (the java parts delegate to ant controlled
programatically, and the .net pieces use visual studio) but it means
that anyone can read the build files and understand how the source
code, regardless of language, is transformed into a binary.

So, yeah, scratching the itch of “I want a single, declarative build
tool that allows someone not familiar with the other build tools used
to understand how the system works, and which can work with multiple
languages”. Right now, it’s specific to the project, and I’m
comfortable with that: I want to write a browser automation framework,
not a build grammar or (worse) a build tool. 🙂

To be continued


Inform or Accommodate?

Should you stop the build when someone breaks your formatting rules? Should you detect and fix them? There were two comments on the previous post: Oliver agreed, and Will didn’t:

Instead of raising errors when things like whitespace or tabs occur, why not just modify the file to correct it?

After some reflection I decided that you shouldn’t clean up on behalf of others. At best, you take away their feedback loop, at worst you compound the error. They need to know that something is wrong, so they have a chance to improve. But you can make tools to help them clean up their mess:

desc "Turn crap into gold"
task :midas do
  Dir["public/**/*.js"].each do |f|
    next if f.match(/^lib|resources/)
    sh "sed -i '' 's/  /  /g' #{f}"
    sh "sed -i '' 's/ $//' #{f}"

Supporting Multiple Environments – Part 3

Supporting Multiple Environments

Supporting Multiple Environments – Part 3

(part one and two)
In this installment, I’m going to cover the configuration storage mechanism for this separate configuration jar approach.

Configuration Storage

With the approach listed in step two, it’s easiest to manage the actual values via property files, stacked up in a layered approach.  Here’s what we came up with:

Deployment environment overlay procedure

In our deploy scripts (Ant based remember), the first in is the first out (and properties are immutable):

deploy environment application on a specific machine –  Is there a unique application ID needed for clustering?
deploy environment application specific – What is <appId>’s memory requirements for this deployment environment?
deploy environment machine specific – Are there any specifications that need to be made for a given machine in a given environment?
deploy environment specific (environment specific) – What is the database server associated with this environment?
application – How many connections to your db server should this application be defaulted to?
defaults-deploy-env – non-dev – Where is bash located?
general –  What is the port you use to connect to a database server?

In development, this process is reversed.  In this custom coded (but VERY simple) Maven plugin, we have the following order of precedence:

command line – Want to quickly tweak a setting for a single build?
settings.xml – Are you testing a new version of the app server software?
pom properties  – What compiler configuration are you using?
defaults – dev – Where is your app server software installed?
general – (see above)

There are slight variations on which configuration layers exist and should take precedence over another (the reader’s situation may call for more or less layers in general).  All layers aside from the general and the defaults levels are optional on both fronts.  By no means is this an exhaustive list or the explicit ordering of configuration.  Just what worked best for us.

In the deployment environment example above, each deployment environment has its own subdirectory which allowed us to assign unique privileges per stack to prevent accidental configuration of various QA, staging and production environments.  The content of this project is regenerated with each build of the configuration project.  There are some inherit compromises with this approach (a change to a QA environment re-bundles an unchanged production configuration for example), but build cycles are extremely short and painless and each one is given a fully fledged build number in the mainline (project branches are assigned a non-unique snapshot identifier).

Each deployable unit may be shipped to production with different configuration project versions, with a complete history of the changes included in each revision (made available via Hudson).  For security reasons, none of the sensitive staging and production passwords are maintained by development or release engineering.  The configuration values for those things are stubbed out in the respective property files and an operations managed process properly injects the correct passwords into the proper configuration files.


In the final installment, I’m going to cover how to share this configuration between developers and your deployment environments.

Tagged ,

dbdeploy.netAgile Database deployment for Java and .NET

(This post was originally hosted at

DbDeploy is an implementation of the ActiveRecord Migrations pattern. DbDeploy.NET is the .NET port of DebDeploy. Both DbDeploys are projects initiated by ThoughtWorks. ActiveRecord comes to us via DHH.

Why would I use it?

When you’re developing software that hasn’t been released, the database is easy: you can tear it down and rebuild it at will. Once you have production data that people are using it, what do you do? How do you manage the change? The Migrations pattern allows you to make bite-sized changes to your database, and test it. It works very well with Continuous Integration.

What else is out there?


Open source:

When should I use this pattern?

It’s ideal for greenfield agile projects where you are using Continuous Integration and want to make sure that changes to the database schema will be applied to integration tests. You can use other approaches if you have an ORM and you haven’t released to production yet.

When shouldn’t I use this pattern?

  • When you have a huge legacy database
  • When you’re trying to put data into a database and not schema changes
  • When you don’t use source control

The Migrations pattern is a really helpful way to manage database change; It’s not a silver bullet though. You need to have discipline and a good test regime. It works well with Continuous Integration.

Update: Gregg Jensen got in touch with a new URL for DbDeploy.Net

SparkBuild – build optimisation

This is a guest post from Scott Castle of Electric Cloud. I’ve been wanting to get these guys on the blog for a while.

Scott has 5 USB drives full of Electric Cloud software, videos, docs and more to give away. All you need to do is tweet the #sparkbuild hashtag, and 5 lucky people will be chosen at random to get one sent out.

Take it away, Scott!

I hate doing laundry (which is ironic because I love clean clothes). My problem isn’t with the bleach smell, or the crap-quality washing machines at the laundromat, or even having to fold everything after; my problem is that it’s not a mindless task. I’ve got to:

  • sort all the clothes into washer-size batches – bright colors, darks, whites, hot and cold water fabrics, delicates
  • count the quarters on hand,
  • decide which batches will be dried all the way and which only need a half-cycle,
  • make a plan about how many loads and in which order to maximize throughput and minimize quarter use…
  • you get the picture.

This is not an automatic task. It takes a lot of brain power to plan the logistics of it all. But, hello? It’s laundry. As a programmer, this is not how I want to spend my time! I’d like to be able to dump all the clothes into the washer, and get clean ones out a little later, and not have to think about this ever again.

I also hate manually compiling code and I take as many shortcuts as I can get away with. Nobody does full builds unless they’re in the release group, but I don’t do full incrementals either (going to the top-level of the code base and typing ‘make all’); in the time it takes to parse every makefile and build everyone’s changes, I could have gotten in a load of towels, at least. So I, and I’m betting you, if you’re a programmer, go to every directory where I know I have a prerequisite, and ‘make all’ there, then build my own changes. This is much faster than waiting for a full incremental, but it makes me think of laundry, all that sorting and planning and fluffing and folding…

A colleague of mine has the same frustration and, being a better coder than me, wrote a solution. It turns out that if you collect a little data when a full build is run, you can use that data as a map to calculate something he calls a ‘subbuild’ – the critical path of prerequisites needed for the target I want to build, and where to find the rule to make each one of them.

I know what you’re thinking: that’s just an incremental! If I had a single make instance, that would be true (and I’d have to parse and evaluate the whole makefile, every time I ran make) but I’m working on a code base which uses recursive make, so I can’t just go to the top and say ‘make mycomponent.exe’. The subbuild technique makes a recursive make structure operate as if it was a single make instance, and that is great because now I don’t have to decide, each and every time, which components to build before I compile my own code.

My colleagues have coded this technique up into a tool that works with GNU Make (3.80 and 3.81) and NMAKE (7 and 8), and we’ve released it as a free tool; you can try it yourself at And if you’re interested in more technical information about make, subbuilds, and dependency trees, check out this post.

Now, if only we could write something to do my laundry…

Image thanks to AlexJReid. Disclaimer: I’m getting no kickbacks for this.

Tagged ,