Why Testing in Production isn’t as stupid as it sounds


One of my colleagues this week was consolidating the results from our recent Application Performance Management survey, and one interesting finding was that 40% of customers have at least one release cycle a month. Out of those respondents, one third experience a Severity-1 incident each month as well. That’s a pretty compelling pair of statistics, and they might explain the continued frustration and conflict between development and operations teams. It’s also perhaps the reason why this DevOps underground movement can no longer be ignored (even by Gartner). There is no doubt development organizations have become agile, but does deploying this frequent change make the business more or less agile? For example, if one in three releases creates a Severity-1 incident, then surely agile development becomes a risk to the business. We’re at the point where Operations either has to start managing change better or simply restrict the amount of change that can occur.

So why are Sev 1 incidents so common? Based on my experiences and customer interaction, I’d strongly argue that testing in development isn’t enough. At the very least, it’s certainly not an insurance policy for deploying an application in production. When a Formula 1 team designs a car in a wind tunnel and tests it on a simulator pre-season, they don’t assume that the performance they see in test will mirror the results they see in a race. Yet, that is pretty much what happens today in the application development lifecycle. Development teams build and test their apps in pre-production before handing it off to operations for deployment in production, and they assume everything will work just fine. This is probably the worst assumption IT has made over the last decade, because development and production environments differ significantly. It’s also a lame excuse for any development team to use when a production issue occurs: “Well it ran fine in test so you must have deployed it wrong.” Yes people make mistakes occasionally, but if one in every three releases has an issue, deployment error may not be the sole reason. If development never get to see how their baby runs in production, they’ll never learn how to build robust, scalable, and high-performance applications.

How many of you have heard this from a prospect or customer: “We did a release last night and we’re investigating issues this morning.” Sound familiar?

So how about this as a novel idea: test and monitor your applications in production where they actually run.

For example:

Midnight: Deploy your new application release

00:15am:     Start load test

01:15am:     Stop load test

01:30am:     Ops analyze the results from the load test (health check)

02:00am:     Ops make a decision whether to continue with or roll back application release.

09:00am:     Dev analyze the results from the load test

The point here is why wait until 9am for your customers or users to log on and test whether your production release was successful? Do you really want to be dealing with Sev 1 incidents as they occur throughout the day, or would you rather test production under load to find potential issues before your users wake up and are impacted?

The reality is, you’re probably thinking: “This all sounds great, but setting up load tests and monitoring production is too complex, time consuming, and scary.” Looking at the toolsets of the last decade I’d tend to agree with you; in fact, many organizations struggle to load test and monitor their applications in development, let alone production. The good news is that new vendors like SOASTA and AppDynamics are breaking this barrier, so now it’s entirely possible to load test and monitor your application in production in less than an hour. In fact, we recently delivered a joint webinar that proved how simple this can be and why we have joint customers doing exactly this today.

It’s true that testing and monitoring your production application won’t find every potential issue, but it will certainly cut the amount of Sev 1 incidents that Dev and Ops teams typically deal with, not to mention the business impact on real end users. Yes, there are many considerations and concerns you have to deal with to test and monitor in production, but those risks are way less than simply deploying your application in production, crossing your fingers, and hoping everything will be OK. People lose their jobs when production applications consistently go down during business hours. If I was an application owner, I’d mandate that all application releases are tested and monitored in production prior to peak hours. Investing in an hours testing at night is better than spending several hours during the day resolving production issues.

If you want to become agile in managing application performance, you can get started today by downloading SOASTA CloudTest Lite and AppDynamics Lite. Both are free products that will prove how easy it is to now test and monitor your applications in production. You can’t afford not to test and monitor production applications these days.

App Man.

  • http://www.bbramley.com Ben Bramley

    This sounds like a good candidate for automation – get AppDynamics to start and stop the load test for you and then decide itself whether to automatically rollback based on policy violations during the test? :-)

    • Ron Gidron

      You throw Nolio into the mix here and you can actually do all this zero touch (Nolio already does that driving Ben:-),
      I do think however that there is a risk here, deployment activities are sometimes very complex to rollback if not impossible and while production is the ultimate test, we need to stress that the method suggested here doesn’t replace pre production testing – after all you also don’t want to break your production system beyond quick repair while testing on it.. I love your idea Steve, I am saying it should only be applied when your production rollouts are agile (i.e., short cycles, smaller change sets).

  • http://www.facebook.com/profile.php?id=753104135 Stephen Burton

    Definitely Ben, although I think customers initially would want manual control over load tests in production. Automation is certainly the secret to becoming Agile, and as Ron says other solutions like Nolio can assist with rolling back application releases if they introduce business impact.

  • Pingback: Continuous Deployment and Testing in Production | Software Testing Blog

Copyright © 2014 AppDynamics. All rights Reserved.