I thought I was pretty good at guitar once. You only realise how much you have left to learn when you see someone who’s at the next level – and then you realise that they can kick your ass without breaking a sweat. Realising that you’re not as good as you thought you were can have two effects:you double your efforts, or give up in disgust.
I started live tweeting his webinar once the Java client ground into life on my Macbook, but I switched to an editor when I realised how much depth there was to this. The rest of this post is a fairly linear set of notes on his talk. I’ll be adding links (and maybe a correction or two). Please do comment or tweet if you have anything to add or fix.
18:07 – Webinar client has sprung into life. Kohsuke is already talking about patterns for hudson. Lesson: don’t use hostnames in DNS names. Also, use port 80 for the service. Apache makes a good frontend with mod_proxy.
Sun Hudson server has 600GB of data. You need to back up at least some #Hudson data files. There’s a Hudson backup plugin. @kohsukekawa uses cpio. That’s kicking it old-school. They also recommend ZFS snapshots.
Hudson slave nodes need 1 170KB jar file to run. Single link, with no assumptions about transport. VM’s or recycled hardware can be slave. Do be careful about how you organise Hudson clusters. Be prepared to make more clusters if other people have different requirements.
@kohsukekawa just suggested using the native OS packages that they supply. Fine idea. I do that. Also, use a filesystem that grows well. Again, they use ZFS.
Deploying your operating system is perhaps the best way that you can treat your Hudson slave machines. Kohsuke recommends Windows Deployment Services for Windows slave machines. For pretty much any other OS he’s a big fan of the Hudson PXE plugin. PXE is a standard for booting a computer over Ethernet – Sun hardware always supported this out the box. I used to love booting and installing an operating system from the prompt.
Your other option is to clone Virtual Machines but this is far more painful – how do you version a few gigabytes of operating system? You may be able to get away with building systems by hand, but I’m not fond of the error rate.
Hudson can install lots of dependencies on the slave machine: with a Java Runtime environment and SSHD on your unix systems you can install the Hudson slave on the server you just need the hostname. For Windows, you can pass the administrator username and password via DCOM. It even works from a Hudson master running on Unix.
What if the Hudson master server can’t see the slave machines (ever had an over-zealous firewall administrator)? You can log into the slave machines and use Java Web Start to have the slave talk back to the master. You’re screwed if the slaves can’t see the master.
After you’ve launched a Hudson slave via JNLP you can install it as a Windows Service. The Unix option is to run a headless Java process.
Hudson will also maintain the JDK, Ant, and Maven for you. You need to declare the versions that you need. I hope the sheer enormity of this isn’t lost on some of the readers of this post. That saves so much monkeying around. You can also tell it how to run the installer to get other dependencies. I’d probably tell Hudson how to kick off a Puppet run. Kohsuhke also suggested:
- cfengine (the tool that inspired puppet)
- Ruby devs, you might like to use chef. I still can’t recommend it in production however
- Windows peeps, you can use Active Directory and the open source WPKG (which needs a reboot)
The downside to puppet/cfengine is Windows. We all need to talk to Windows systems. Hopefully Mr. Nasrat’s work on Puppet for Windows will pay dividends in that respect. Kohsuke recommends Cygwin for deploying apps on Windows.
What should the goals of your Hudson cluster be?
- Make the slaves a pawn: he used the term interchangeable and nameless. I like pawn. It conjures disposable build slaves for me. The benefit is load balancing within the cluster, reduced false positives, add easier lifecycle management.
- Depend on labels, not slaves – this improves utilisation. Use a group of slaves in a label so you depend on many hosts, not one.
Hudson is going to put me out of a job. There was a fascinating section on reliability: Hudson will monitor slaves and take them offline if they start running out for disk space, or the clocks aren’t in synch, etc. I believe that he said all these checks came from the field.
- Use NTP to keep the slaves in synch
- Keep /tmp clean. i use Puppet for this.
- Keep adequate records of changes to your cluster (goodness – old school systems admin again!)
Hudson also cleans up the mess after builds. As I understand it, Hudson will attempt to diff the process table before and after the build process, and kill off any processes that are left running. Again, that’s a simple idea that adds a lot to your build process. Of course you want to control the dependencies from your build. If you must have some process running (slow old enterprise software?), the wiki documents how you can switch this off.
Hudson will report load on your cluster. Load tends to even out if you have enough capacity.
Upgrading Hudson is just a matter of upgrading the war file. Do keep the old one. It’s possible to downgrade again. Hudson can also update itself, which I wouldn’t recommend if you use OS packages. You need to update your plugins yourself these days, which wasn’t always the case.
If a release has few subsequent bugfixes, it’s probably a good one.
The easiest way to make builds dependent is to trigger one job after another. You can trigger multiple parallel test jobs after a build, to get faster feedback.
The build promotion plugin allows you to pick up really good builds to promote downstream. It can do this automatically, or from test results. Once promoted, a build can be used further down the development process: to commence QA, deploy, integrate elsewhere, or push to Maven. An example of this: Sun use the promotion plugin for JAXB-RI to push to CVS for delivery to the JAX-WS RI. So good to see these guys dining on dogfood.
Concurrent builds allow you to get faster and faster feedback to the developers, and hence isolation of changes. Good for flickering builds. Use a timeout to stop a bad build exhausting all capacity.
Matrix projects allow you to concurrently build your project on different Operating Systems and middleware. Do use a ‘touchstone’ or ‘canary’ build first. Build them sequentially if you need the same resources (e.g. databases). What if your matrix gives you combinations that are the same, what do you do? You can filter out some combinations or suggest a coverage ratio and let Hudson work it out. Nicely declarative.
18:53 – just lost the audio. Kosuhke’s Conclusion:
- Planning helps on an enterprise scale Hudson deployment.
- Throwing lots of resource helps a LOT.
18:55 – audio is back and we’re in Q and A. I chose this moment to go dash for my train.
My conclusions: Hudson offers a huge amount of value for a tiny price. But not just in terms of numbers of features. That gets boring after a while. It’s the thoughtfulness of the features, and the attempts to address lifecycle that have me excited over Hudson tonight. Most other Continuous Integration tools that I have used exhibit a clear bias towards the development end of things. I haven’t seen a tool that reached so far towards the operational side of things. Putting the stack together on the build machines has typically left to the likes of me; now there’s a tool that does that
Don’t tell him, but I was pretty impressed with Kohsuke’s knowledge of systems, as well. I’m sure a lot of these features came from the community, but he has a pretty rounded view. Maybe it’s the Sun Microsystems influence – say what you like about them, but they know good engineering. In any case: we owe this man, the open source team, and Sun Microsystems a lot for pushing the envelope on Continuous Integration.
If I were a Continuous Integration tools vendor, I’d be feeling nervous about getting my guitar out.
Update: did you miss the webinar? Have a look at it here.
Image via Andre