Continuous Integration in the cloud: good idea?

Continuous Integration can be tricky to provision. It’s IO or CPU bound at the beginning and then it has a tendancy to batter your database for a long time while staying almost idle. Slava Imeshev of Viewtier kindly commented on myoutsourcing continuous integration post:

My take on this is that hosted CI in a common virtualized environment such as EC2 won’t work. A CI or a build server, unlike the rest of the applications, needs all four components of a build box, CPU, RAM, disk and network I/O. The industry wisdom says applications that are subject of virtualization may demand maximum two. Sure you can run a build in EC2, but you will have to sacrifice build speed, and that’s usually the last thing you want to do. If you want fast builds, you have to run in the opposite direction, towards a dedicated, big fat box hosted locally.

Viewtier has been hosting Continuous Integration for open source projects for five years, and our experiences shows that even builds on a dedicated build box begin to slow down if the number of long-running builds exceeds a double of number of CPUs. Actually, we observe a trend towards farms of build machines hosted locally.

Hmm. He makes an interesting point. Seems that we might need to do more than throw EC2 agents at our favourite Continuous Integration servers. The great appeal of Cloud Continuous Integration is that there’s no limit to the amount of resource that we can buy. There’s an assumption that you’ll be able to make use of that. I’m wondering what patterns will emerge to deal with that. Will we fire many builds that compile and run unit tests (proper unit tests, without database calls) on EC2, and then queue them elsewhere for slow functional test runs?

Ideally we’d get more effective at writing tests; I’d think that parallelizing a test execution via the cloud could allow us to sweep test performance issues under the rug. We’ll just have to wait and see.

10 thoughts on “Continuous Integration in the cloud: good idea?

  1. Jesse Gibbs says:

    Disclaimer: I work for Atlassian, a company that makes CI tools which support running in EC2.

    Cloud computing is a really cool technology that can change your development tool infrastructure, but it’s still subject to the laws of physics. In most cases, a build run locally is going to provide much faster feedback than a build run on remote infrastructure.

    One solution to consider that takes advantage of cloud computing while still providing fast feedback:
    * Run your commit-triggered builds on premises, with a subset your tests (e.g. just unit tests, or a small set of functional ‘smoke tests’), so that you get fast feedback while making the most of limited on-premises compute resources.
    * Use cloud resources for longer running tests where latency is not as big an issue.

    If you have a suite of functional tests that takes several hours to run, then the extra time for setup and data transfer to the cloud has relatively low impact on the feedback loop. If you are running a test suite nightly, then so long as the results are available by the next morning when work starts, it doesn’t matter if the build took 6 hours instead of 5.

    Now that most CI tools support running remote builds in EC2, I’d be interested to hear if some best practice patterns are starting to emerge around what to run locally vs. in EC2.

  2. duality72 says:

    I’m not sure Slava makes enough of the difference between one “big fat box hosted locally” and the multitudes of build configurations with remote build agents you can host in the cloud. And how much will that big fat box cost you to host and maintain, idle or not, versus dynamically provisioning a fast EC2 instance? Obviously building in the cloud isn’t the answer for everything, but not every build situation calls for eye-bleeding speed.

  3. Tracy Ragan says:

    Disclaimer: I work for OpenMake Software and have been solving build problems for Global 2000 companies for 18 years.
    CI running on a Cloud Server can work but let me explain. The 800lb Gorilla in the room here is what is meant by a “build”. In reality a CI Server does not actually execute the compile and link process of the build. It instead calls another process to do the build – like OpenMake Meister , Apache Maven/Ant or Make. The CI server is really just a job scheduler that is triggered on an event (check-in). The CI Server does not do versioning, builds, testing, deployments or emails. It just orchestrates those activities and centralizes the logging.

    So can a CI Sever run in the cloud? Absolutely. We test our Free CI Server product in this way (OpenMake Mojo) . Can a CI Server improve the speed of your compile and link process? No, it cannot. It cannot improve your compile times regardless if it is running locally or in a common virtualized environment. The build script you write or build engine you use is responsible for the speed of the compile and link. Your CI server cannot do anything to improve your compile and link speeds. The CI server cannot break apart your Maven or Make script and send it to different machines to compile. And Maven and most versions of Make cannot be multi-threaded. So your CI server cannot make your scripts do what they do not know how to do.

    Yes, unlike other activities called by a CI process, (check-out, static code analysis, Junit), the compile and link process (build) needs all four components of a build box, CPU, RAM, disk and network I/O. So if you are using a virtualized environment to store your source code, you can execute a compile and link on that virtualized machine. Yes, you must address the audit risk of source code being stored outside your “internal” environment. The biggest hit would be if you tried to store your source on a local machine and tried passing the source code to the virtualized machined. Ouch! The file transfer alone would send your build into the next decade.

    And yes build times do matter. You do not want a 5 hour build if you are doing CI. The whole point is to build frequently, not just once in the middle of the night.

    So its time to point out the 800lb Gorilla in the room. The speed issue is not coming from the use of a CI Server or Virtualized environment. The speed of your build is directly related to the time it takes to execute the compile and link process. All other activities in a CI workflow can execute quite quickly -its the compile and link (build) that is the bottleneck. Using a virtualized environment to host your CI server can work as long as you do not try to pass source code across the pipe. Your CI server can run in the cloud and call on internal remote agents to execute tasks running locally to that remote agent. Look for ways to speed up your compile and link process by using a build engine that can perform incremental builds (build avoidance) and true parallelization (multi-threaded calls to compilers). It is simply not the job of the CI Sever to improve the speed of any activity it calls. It can only speed up the orchestration of the workflow – but cannot improve speed for the activity it calls It cannot speed up a check-in/check-out, a build script or any particular test script. Those tools must carry that burden.

  4. Disclaimer: I work for Bitbar, a company which delivers CI on the Cloud.Fast CI needs strong hardware, so as someone already mentioned EC2 will not work, as it is virtualized. Fortunately there are tens of other Cloud vendors, with offers that better suit this purpose. Now, having the right HW on the Cloud, you don't need to worry about build and test times any more. Well, if necessary you can even make your process parallel, to make it even faster (this of course may require changes to your build/test scripts).The communication overhead with the Cloud is low, as you only supply the latest source code changes to the Cloud – which is not very much data. So, at the end you get the build/test feedback very quickly after commit and you don't have to invest in the infrastructure.If getting notifications, browsing test reports, logs and other build artifacts from the Cloud doesn't satisfy you and you just have to download huge amounts of half-products to your office, then you can use one of the caching solutions which will download the files to the right place before you discover the need of using them. Actually I should say 'download to the right places' as currently many development teams are distributed.

  5. Thanks Krzysztof. Agree that there's many ways to get around objections to CI in the cloud. The performance of a build is both situational and subjective. Situational in that it depends on the platform, build tool, and project. Subjective in that some people really, really want to know right now if the build passed, and some don't even notice 🙂

  6. Tim says:

    I've heard of an 800lb gorilla metaphor for a dominant market supplier, and an elephant in the room as a metaphor for a big issue that people pretend to ignore, but never before an 800lb gorilla in the room. Did he scare away the elephant?(it must be a male gorilla as females are much smaller).Enough flippancy :-)Are you aware of any CI techniques that work for COTS packages? I'd love to put Oracle or SAP business suites under CI so that I could replatform them more easily.

  7. builddoctor says:

    Tim,I'm terrible at mixing metaphors so I won't chip in there. I'm not aware of much action with COTS software. Clearly you can drive a shell script to do all this but to my mind it's the glacial pace of some products and lack of tests that make CI challenging in that environment. Thanks for the comment.

  8. Tim says:

    Good point about speed, and they do have very limited test harnesses and testing. The deployment model for SAP could also be a challenge. I had in mind using Michael Feathers' approach to wrapping the tests around the legacy + some simplistic record/replay testing at quite a high level of granularity.I know that it doesn't sound very exciting, but there is a lot of money here, eg SAP estimate 95% hosting savings from moving to Cloud based platforms, and, as things stand, customers are locked in to whatever the original implementation was based on (dbms + h/w), with the big SIs charging $MM just to validate a new platform.I'll see if they've got any plans themselves.

  9. builddoctor says:

    Tim, I'd be fascinated to see if anybody is doing this. Perhaps the speed issue isn't a concern given the potential cost savings 🙂

  10. For Oracle COTS CI experience try SI Tieto from Finland/Sweden. Feel free to use Linkedin for connections.

Comments are closed.

%d bloggers like this: