Category Archives: Build

LondonCI, 17 July @ BCS

James Bettlely is presenting Continuous Delivery using Maven.  There’s a missing subtitle (apologies to Douglas Adams): without losing your job, looking very silly, or both.  More info at the BCS website, and the meetup group.  You need to register with the BCS to gain access to the building.

Tagged ,

Who’s been fetching from my repo?

It’s interesting knowing who is using your Artifact Repository: where, and who are they? ‘whois’ and your favourite DNS client are handy tools to have.

It’s why I always put an HTTP server in front of these tools, both for flexibility, and the standard access log format.


Tagged

Simon Stewart on WebDriver’s build system

WebDriver creator Simon Stewart knows a thing or two about building code. So I was intrigued when he mentioned that he’d written a grammar for Rake, to enable building Java code.

Replacing Ant with Rake has been a compelling idea for some years now. Until now I wasn’t convinced that you weren’t going to have the same issues as Ant – poorly factored builds that rapidly evolve into a project specific DSL. This may change things.

The build system, or grammar as Simon calls it allows you to break a typical monolithic build file down into a collection of fragments. Each fragment can have one or more targets declared, and each target has some attributes. More at CrazyFunBuild.

Simon is undergoing an exceptionally drawn-out email interview on the process:

Your build tool is one of a few new players. What was your motivation for adding to the build gene pool? Were you scratching an itch, or do you have a broader motive?

Definitely scratching an itch. WebDriver started off as a simple java
project, but it quickly became obvious that it’d also be useful to
have language bindings for things like C#, ruby and python. I could
have settled on a separate build tool for each language, but there are
places where a Java component depends on a DLL (for example) Switching
build tools repeatedly when constructing a single logical unit seemed
wasteful, so I started looking around for a build tool that would
provide support for all the languages I wanted to use.

I failed, but settled on rake because it had poor support for everything 🙂

The next problem was that as the project grew, so did the Rakefile. It
ended up being obscenely long and increasingly fragile, and in the end
I was about the only person who would confidently hack around in
there. An obviously sub-optimal state of affairs. The first step in
fixing this was to break out common tasks into functions (because a
Rakefile is just a ruby script in disguise) This still left a pretty
large build file to deal with, so the next stage was to allow us to
break the script into pieces. The obvious issue is that if you do
this, where are paths relative to? The location of the top-level
Rakefile? Or the fragment of code in the subdirectory? Worse, it’d be
unwise to have duplicate task names (“test”) but detecting those while
writing a fragment of a build file would be troublesome at best.

At the same time, I like my builds to be as declarative as possible,
only breaking through the “fourth wall” to scripting when necessary.
Encouraging people to leave lots of little scripts that are the pieces
of a larger application as build files seemed like the worst way of
achieving that goal of “declarativeness”. So, I wrote a parser for a
sub-set of ruby (which mutated into a subset of python) using ragel
that parses build files and generates rake targets based on the path
to the build file and the name of the target in that file. It’s by no
means an original idea: the only thing I can take even a crumb of
credit for is the current implementation (and it’s pretty much
designed to work with selenium, so there are lots of corners cut in
there)

By clearly defining the build grammar, there’s also a chance to
clearly define how paths are interpreted, so that neatly side-steps
that problem. I also provided an “escape hatch” so that you can call
out to other rake tasks as required. Better this is just a thin skin
around other build tools (the java parts delegate to ant controlled
programatically, and the .net pieces use visual studio) but it means
that anyone can read the build files and understand how the source
code, regardless of language, is transformed into a binary.

So, yeah, scratching the itch of “I want a single, declarative build
tool that allows someone not familiar with the other build tools used
to understand how the system works, and which can work with multiple
languages”. Right now, it’s specific to the project, and I’m
comfortable with that: I want to write a browser automation framework,
not a build grammar or (worse) a build tool. 🙂

To be continued

Tagged

Zero tolerance

This has been surprisingly useful at keeping my JS code clean.


desc "Check for tabs and trailing spaces"
task :crapcheck do
  Dir["public/**/*.js"].each do |f|
    next if f.match(/^lib|resources/)
    text = File.read(f)
    raise "Tabs found in #{f}" if text.match(/t/)
    raise "Trailing spaces found in #{f}" if text.match(/ $|    $/)
  end
end

The tab check is useful because I hate them, and they mess with JSLint; the trailing spaces mess up my diffs. Take that.

Tagged

The Build AssemblyLifePipeLoom

Dave Farley, co-author of Continuous Delivery (I got my copy last month – more on that in another post) commented on a blog post about the origins of the term build pipeline. He might well do, as it was his idea to make the pipeline concept central to the book.

Being a bit of a geek I called it a pipeline not because it is like a real pipeline, transporting a fluid, but because it reminded me of instruction pipelining in CPUs. Effectively the deployment pipeline (aka build pipeline) works, as a process, by doing branch prediction. We assume that later stages will pass, and so can move on to work on new stuff as soon as the commit stage has succeeded. If subsequent, post-commit, pipeline stages pass, we have won our gamble and so have made progress faster, if they break we have a pipeline stall and have to go back and fix the failure.

UrbanCode came up with the idea of Build Lifecycle.

Jason Yip once suggested that the model for Continous Integration has always been the Toyoda Loom.

I prefer the term Build Assembly Line. Think of engines plopping out from one part of the factory, being inspected and sometimes kicked off the line.  Eventually, you might use an engine, or install it in another assembly and use that.

Doesn’t matter which you choose though: What’s important is having a metaphor for the thing you’re building.  Otherwise you’ll build something weird.

Tagged , ,

Should you move to Maven 2?

I’ve seen many a company try to migrate from Ant to Maven with varied success.  There is a change in mindset that has to come about when making the transition.  Here are some of the highlights.

Standard Findings

Monolithic build structure

Should you move to Maven 2?
This is the first big shift in thinking.  Typical (obviously, not ALL) Ant projects work by syncing some massive amount of code from source control, CD-ing into some top level directory and then telling Ant to build just the module you plan on testing.  What’s so bad about this, you may ask?  Well, for one, you’re likely syncing large amounts of code that you’ll never run locally.  You’re probably building up (and unit testing, right?) packages that don’t change or the rate of change is very low as well.  Ideally, these packages and libraries would be built for the user already.  Now look at this through, say, a webdev’s eyes.  If your webdev group is responsible for things like CSS, HTML or JSP changes, why should they be concerned about building up your oodles-of-utils package?  Or, if a unit test starts failing on them (you’re unit testing, right?), why should they have to dive in and figure out what’s missing or broken.  In a perfect world, any tier of development could be substituted for another (how great would it be if everyone knew everything?).  In in the real world, the one with interruptions and families and deadlines, that’s unrealistic (especially in larger companies).

So decommissioning a monolith a few modules at a time is the best thing to do, once you’ve decided to go the Maven route.  There are two ways of doing this work.  You can take the atomic, all-or-nothing approach; going directly from a monolith to a more modular code base in one fell swoop.  If you can get sign off on this, then this is a wonderful thing.  I’ve had to restrain both myself and others from biting off too much.  What I like to do is pull a few things out at once, maybe three to four modules.  Of those four, let three be low cycling libraries and one a high cycle library.  This way, people learn the new location of the parts that make up your libraries that are combined to make your deployable unit.  Think of it more as an evolutionary process versus a revolutionary process.

Having smaller bitesized chunks is also a better way to get to know Maven.  If you introduce people to a massive monolith with customizations all over the place with a dozen attached assemblies, then people are going to poke at it with a stick and hate it quickly.  Clearly seeing how a web application goes together and the resulting artifact is created is much more digestible, and you’ll get fewer complaints about your Maven implementation.

Need everything to build always (as it is all always changing)

Another fear that emerges as people start considering modularization is with multiple deployable units, how does anyone know what is compatible with other internal code when the process isn’t always building the same thing all time?  Well, that can be answered a few ways, but the simplest answer is once a library is released (or otherwise frozen) for a deployable unit, then that deployable unit need not upgrade its version of this library.  If shared functionality in the library changes, then you will have to retest, but that begs the question – is your application code in the right module?  Shouldn’t a shared module stay somewhat generic and each deployable unit extend/implement those features instead of baking-in that logic at such a low level?  I’ve found that over time, if you make the library a separate module from the larger deployable unit builds, the code starts migrating in the correct direction versus where ever it’s easiest to add it (no more massive search/replaces in the code base via an IDE).

Confusion with regard to building artifacts

Once everything is pulled out, there may be confusion on the developer’s part with regard to which modules should be built in which order.  This, to me, is an educational thing.  At any point, the developer can run a “mvn dependency:tree” and see:

– What dependencies make up their project
– Where those dependencies were resolved from
– What order they are needed to be built in

When moving from a world where people operate at a very high level directory and build everything to a world where every module is very light weight and each move is a tactical one, people often don’t know how to get that app server up or that daemon running locally. With every application as its own standalone build, people just need to sync what they want to run and rely on a repository manager for the rest (app server bits, database bits, etc).

Shortcuts to repository management

A repository manager is part of the Maven 2 process, end of story.  Trying to use a corporate file share or keeping everyone working in offline mode is just not the Maven way.  Using a repository manager also helps to minimize configuration people will have to manage locally in their settings.xml as well as help enforce the Maven way of life (banning redeploys, pushing releases to one repository and snapshots to another, not deleting artifacts, etc.).  With something like one of the big three (Nexus, Archiva, Artifactory), you simply have a grouped repository everyone points at.  That “grouped” repository is a representation of all the other repositories your company will use.  This way, you can have something like this:

<snip>
 <mirrors>
     <mirror>
         <id>nexus-test</id>
         <mirrorOf>*</mirrorOf>
         <url>http://server/nexus/url</url>
     </mirror>
 </mirrors>
 <profiles>
     <profile>
         <id>nexus-test</id>
         <repositories>
             <repository>
                 <id>central</id>
                 <url>http://central</url>
                 <releases><enabled>true</enabled></releases>
                 <snapshots><enabled>true</enabled></snapshots>
             </repository>
         </repositories>
         <pluginRepositories>
             <pluginRepository>
                 <id>central</id>
                 <url>http://central</url>
                 <releases><enabled>true</enabled></releases>
                 <snapshots><enabled>true</enabled></snapshots>
             </pluginRepository>
         </pluginRepositories>
     </profile>
 </snip>

And that’s it.  This one setting covers every remote repository we use – from Codehaus to Repo1.  If you’re really ambitious, although Sonatype doesn’t recommend it, you can even tidy up the url so if you switch repository managers, devs don’t need to touch their settings.xml file.  While this configuration can be rolled into the MAVEN_HOME/conf/settings.xml file, I personally like to keep my configuration maven-version-independent (putting this in my user home directory’s version of settings.xml).

Custom things done in Ant that (is thought) can’t be done in M2

Everyone has one little dark corner of their build world.  Usually it was some quick hack to make things work through Ant.  Possibly a custom task or maybe some shell-out, or something crazier.  These little dark corners should have light shed on them, in fact, flood them with light.  Instead of letting this be a choking point, start by looking at the common repositories for plug-ins that do what you’re looking for.  There are very few problems someone else hasn’t already solved and even if you searched a while back, a solution may exist now that didn’t then.  In the past, I’ve done exhaustive searches and found no plugin that suited my needs.  Then I find some plug-in has changed a few months after to do exactly what I was looking for, or someone wrote one and contributed it back to google/codehaus/repo1.  If that route fails for you, just build a Maven 2 plug-in and deploy it to your local repository.  You can even have a transition period where Maven calls Ant to do just this little bit, then move the Ant tasks inside of Maven 2, then finally migrate to a Maven 2 plug-in.  Don’t use the argument that “you should be writing code, not a Maven 2 plug-in”.  Do you want your system to be robust and clear when there are successes and failures?  Then write the plug-in.  You can start quickly by typing “mvn archetype:generate”, then select “maven-archetype-mojo” (option 12 as of this writing).

Other things to consider when moving to Maven

CI compatibility – some are designed around Maven

The original cruise control‘s Maven integration was very poor (I’d say the opensource version is still pretty bad).  It doesn’t understand the different life-cycles or the output from each.  Hudson understands the life-cycles, and will inherently do things depending on what it sees in the build output.  So far in my travels and exploration of various CI servers/tools, Hudson is head and shoulders above the rest with regard to Maven integration.  Have site output you’d like to share?  Maven can publish that quickly with a link off of your project’s page.  You have artifacts you’d like made available to another downstream job (or later process)?  Hudson picks up on those artifacts and tucks them away (maybe/maybe not to your liking).  All other products need to have these various things called out, you need to tell them, “look here for this tar.gz file” versus knowing based upon what Maven has logged.

Lack of understanding of how to upgrade (from one version of Maven to another)

Here’s another big disconnect, you can’t just fling a new version of Maven down like you could with Ant.  With Ant, you could generally look at the release notes, then install and add your custom tasks and then build.  With Maven 2, for the most part, you’re protected from a lot of things, but you also need to watch for plugin versions, core changes to dependency resolution, etc..  I personally sleep better installing locally and building, then diffing against the artifacts that are generated by the build server.  Some changes to Maven (2.0.5 to 2.0.6) required users to review their dependencies for example.

Should you move to Maven 2?

Well, that question is best answered by you, dear reader. If you can modularize your codebase, then you’ll see the biggest improvement both in development time (better throughput) as well in stability – no more broken unittests that when fixed, reveal more broken unittests ultimately convincing all developers to turn them off.  If you can’t (or don’t see any benefit), I’d submit your development team isn’t mature enough to realize the many benefits to a highly modular codebase.  In the end, you have to choose what gets product out the door.

(image via ePublicist)

You Could Totally Use a Dependency Manager

Look who just got an article on CMCrossroads  – me.    There’s a reasonable amount of crossover in our audiences, so I’m glad to have this done.  Thanks to Bob and Jonathan for getting this out there.

You Could Totally Use a Dependency Manager.

Separation of concerns in Ant

Separation of concerns in AntThere’s nothing wrong with Ant. No, really! True, there’s some nasty Ant files out there. Perhaps that’s because we often treat our build as a second class citizen. How do you keep your build files from becoming bloated and hard to maintain? Break ’em up!

I’m going to use a classic problem to illustrate this: deployment. Have you seen an Ant build that seemed to know far too much about the innards of the container? First you end up with a lot of properties. Then you need to maintain loads of odd targets to try and make it all work together. We can do better.

Step 1: Break it out. You want to have a totally separate Ant buildfile that you can use for deployment. Acceptance criteria: You can kick off a deploy by calling the separate file with a few properties (like where the deployable is, etc.)

Step 2: Import it. Use Ant’s import task in the body of your buildfile. Never inside a task!

Step 3: Prefix. A colon is a legal character in an Ant property or task name. So make the prefix match the name of the project. Each distinct buildfile should have the name attribute of the project element set with a meaningful name. Use that.

Step 4: Maintain discipline. It doesn’t matter how you do this. Cold showers, if you like. Just make sure that you keep the properties in the right place with the right name.

Here’s an example:

<project name="base" default="deploy">
	<property name="container" value="tomcat" description="The Java container that we use" />
    <import file="${container}.xml" />
</project>

Note that there’s no deploy target in the file. That resides elsewhere. Running the default target will kick off a deploy to Tomcat from …

<project name="tomcat">
	
  	<property name="container:hostname" value="some.great.hostname" />
	<property name="container:remote.user" value="deploy" />
	<property name="tomcat:admin_url" value="http://${container:hostname}/admin" />
	
	<target name="tomcat:deploy" description="This throws a war file at tomcat">
		<echo message="gory details of tomcat deploy go here"/>
	</target>
	<target name="deploy" depends="tomcat:deploy" />
	

</project>


.. here. Note that there are properties with a nice generic prefix. Keep those generic because …

<project name="jboss">
	
  	<property name="container:hostname" value="some.great.other.hostname" />
	<property name="container:remote.user" value="fleury" />
	<property name="jboss:some.jboss.property" value="Paula is brilliant" />
	
	<target name="jboss:deploy" description="This throws a war file at jboss">
		<echo message="jboss deploy goodness here"/>
	</target>
	<target name="deploy" depends="jboss:deploy" />
	

</project>

… all you need to do is pass a different container property to have it deploy elsewhere. What I love about this is that the two implementations cannot exist side-by-side. Only one can be imported, and the property namespace isn’t polluted.

(image thanks to Hansol)

Supporting Multiple Environments – Part 4

Supporting Multiple Environments - Part 4In the final installment, I’m going to talk about how to share configuration between developer level environments on through to clustered or “stack” type environments.

Recycling Configuration

Ok – so now your configuration is its own standalone Maven module with its own series of branches and build process.  But what if there are shared items between the generic configuration for both development boxes and your deployment environments?  This is where my little Maven plugin comes in.  With the configuration stored in property files and built and treated as a very simply library and deployed to your repository manager, you can then put a dependency on this configuration jar.

My simple plugin reads in the development configuration as ordered and outlined above and jams the resulting property set into the Maven project’s property set.  I bound my configuration plugin to the validate phase and this allowed the “process-resources” phase to leverage these settings.

One thing I haven’t had a chance to implement is the actual recycling of the code that does the layering.  Ideally, the plugin that does the layering would live close to (or be part of) the actual configuration project so when a new layer is added (or one removed), you could address this in a single location versus updating N number of deploy scripts.

Caveats

There are a few problems this overall concept brings with it.  Here are a few of the main ones I’m aware of (that haven’t already been addressed):

– Confusion on overlay procedure – Of course you’ll have to explain how various layers of configuration get compressed into a single flat configuration.  The  plugin as written allows the user to print out what the final (local) configuration will look like.  The deploy process similarly generates a flattened property file that is processed in isolation and gives people who deploy an idea of the flattened landscape.  A well annotated set of source files as well as detailed “mvn site” output is also strongly suggested (you are publishing site documentation, aren’t you?  Hudson makes it really easy, just do it!).

– Who’s allowed to touch which layer  – I mentioned this above, but pushing each deployment specific bundle of property files into a single directory within the configuration project allows (in Perforce at least) your SCM admins to restrict access by group to various bundles.  We block QA from touching any configuration outside of their own stacks but let them have read only access to everything else for example.

– Preventing duplication and unnecessary overrides – A few times during testing, we noticed a property set at a low level (in the general bucket) and at an another deeper level.  Same property, same value.  Ideally, we’d have some set of unit testing to go through various combinations of properties and verify that this doesn’t happen.  Also, in a perfect world, there would be a report tied to generating the config jar that would say how many times a particular value comes up across various stacks.  This would highlight things such as a port number that was initially introduced in testing on a particular deployment environment and rather than pushed higher in the config tree, it was simply replicated.

– Migrating from settings.xml or profiles to property based configuration – This was one of the larger challenges, how to take everything that has been established and move it to an entirely new way of thinking and construction?  The solution for us was a one-off Maven plugin that could look at profiles (and a combination of profiles) and generate the various files listed above.

If you’ve followed me this far, you’ll see very little configuration the further you get out in the tree (if you find you need all the layers at all).  You can also imagine breaking the configuration into smaller more specific chunks.

This may also this give you a chance to also revisit what things deserve a property and what doesn’t (there’s a tendency to make everything under the sun configurable).

Hopefully this gets your gears turning and helps you find an efficient solution for managing the configuration for your product.

Supporting Multiple Environments – Part 3

Supporting Multiple Environments

Supporting Multiple Environments – Part 3

(part one and two)
In this installment, I’m going to cover the configuration storage mechanism for this separate configuration jar approach.

Configuration Storage

With the approach listed in step two, it’s easiest to manage the actual values via property files, stacked up in a layered approach.  Here’s what we came up with:

Deployment environment overlay procedure

In our deploy scripts (Ant based remember), the first in is the first out (and properties are immutable):

deploy environment application on a specific machine –  Is there a unique application ID needed for clustering?
deploy environment application specific – What is <appId>’s memory requirements for this deployment environment?
deploy environment machine specific – Are there any specifications that need to be made for a given machine in a given environment?
deploy environment specific (environment specific) – What is the database server associated with this environment?
application – How many connections to your db server should this application be defaulted to?
defaults-deploy-env – non-dev – Where is bash located?
general –  What is the port you use to connect to a database server?

In development, this process is reversed.  In this custom coded (but VERY simple) Maven plugin, we have the following order of precedence:

command line – Want to quickly tweak a setting for a single build?
settings.xml – Are you testing a new version of the app server software?
pom properties  – What compiler configuration are you using?
defaults – dev – Where is your app server software installed?
general – (see above)

There are slight variations on which configuration layers exist and should take precedence over another (the reader’s situation may call for more or less layers in general).  All layers aside from the general and the defaults levels are optional on both fronts.  By no means is this an exhaustive list or the explicit ordering of configuration.  Just what worked best for us.

In the deployment environment example above, each deployment environment has its own subdirectory which allowed us to assign unique privileges per stack to prevent accidental configuration of various QA, staging and production environments.  The content of this project is regenerated with each build of the configuration project.  There are some inherit compromises with this approach (a change to a QA environment re-bundles an unchanged production configuration for example), but build cycles are extremely short and painless and each one is given a fully fledged build number in the mainline (project branches are assigned a non-unique snapshot identifier).

Each deployable unit may be shipped to production with different configuration project versions, with a complete history of the changes included in each revision (made available via Hudson).  For security reasons, none of the sensitive staging and production passwords are maintained by development or release engineering.  The configuration values for those things are stubbed out in the respective property files and an operations managed process properly injects the correct passwords into the proper configuration files.

Continued…

In the final installment, I’m going to cover how to share this configuration between developers and your deployment environments.

Tagged ,