The benefits of adopting a DevOps approach are widely known. By unifying developer and operations groups and emphasizing monitoring and automation, companies are increasing the speed and frequency of deployments and recovering faster from failures. The results of firms that successfully implement DevOps can be eye-opening. According to the 2017 State of DevOps report produced by Puppet and DORA (DevOps Research and Assessment), high-performing DevOps organizations reported 46 times more frequent code deployments and 440 times faster lead times from commit to deploy. Results like these are inspiring more companies to adopt DevOps. A separate survey by Forrester Research found 50% of companies implemented or expanded DevOps initiatives last year. Another 27% are planning to adopt DevOps by the fall of 2018.
Historically, however, the failure of DevOps initiatives is high. In 2015, Ian Head, a research director at Gartner, famously predicted that “90% of I&O organizations attempting to use DevOps without specifically addressing their cultural foundations will fail.”
In this four-part blog series, I argue that continuous performance testing holds the key to unlocking organizational transformation and DevOps success, and I lay out a methodology for creating an effective performance engineering program. This adaptable methodology is one that I have developed over years of trial and error working as a performance engineering lead and architect. The approach I recommend supports not just CI/CD, traditionally understood as continuous integration/continuous delivery, but also leads directly to a culture of continuous improvement in process, borrowing from Lean-Agile and Six Sigma concepts.
A NASA mindset
To be good at DevOps, the latter needs to embrace the former. Ops needs to get on the Agile train. And, once ops is onboard, everyone will discover the importance of striving for Lean-Agile. Lean-Agile is rooted in Lean Manufacturing and focuses on eliminating waste. By reducing waste, you will increase the speed at which you get things done. Good CI/CD and DevOps not only continuously improves code and code quality, it enables the continuous improvement of the processes and automation servicing the Software Development Life Cycle.
Think about how much time systems in QA, UAT, and other lower environments sit idle. If you are like many organizations there is an immensity of wasted compute resources that can be converted into productive time by implementing and automating continuous testing.
Simply decreasing idle time is not enough, however, highly optimized processes for gathering metric data are vital if you are to be successful. To have really good metric data and telemetry, you need to approach performance testing like NASA does.
On a NASA mission, the failure of even a small component can be catastrophic. Long before an initial launch occurs, components are modeled with equations, assembled into composite systems, working their way up to increasingly complex systems, all with rigor in testing to ensure all variables are known. By the time a system is ready for launch, engineers fully understand all the environmental variables that lead to component failure and have optimized the processes for achieving the desired outcome.
In performance testing of software systems, variations or deviations in metrics during component tests must likewise be completely understood. They must have a positive or negative correlation coefficient to other metrics. If a metric deviation exists with a neutral coefficient, meaning the deviation is uncorrelated to another variable or cannot be explained, you cannot predict its behavior. In the modern software-defined world, when companies implement application performance monitoring absent a well-defined strategy this is an all-too-common problem faced by DevOps. While AI and ML promise to rescue us, it’s still vital that teams understand why metrics deviate and strive to deeply understand the relationship between the variables that cause those deviations.
Organizations need to test mission-critical code with NASA-like rigor. You may be thinking that such meticulousness would lead to impossible bottlenecks. In fact, the opposite happens. By break-testing individual services continuously rather than trying to performance test the entire system, you build confidence in every components’ tolerances and their impact on parent and child dependencies. Organizations will eliminate waste and achieve “Leanness” by continuously running a multitude of small, repeatable, concurrent tests. Coupled with precision monitoring and pipeline automation serving as the basis for amplified feedback loops, it supercharges the CI/CD pipeline and DevOps is unchained.
Hobbled by old habits
During the days of monolithic applications, the approach of discreetly testing individual components would have been hard, if not impossible. Today, I work with many organizations that are investing heavily in decomposing legacy apps into microservices, yet they still struggle to shift the mindset of their teams toward more effective component-level and scale-model testing.
Instead of testing individual services of an application with fast, repeatable performance tests and leveraging mocks for dependencies, many organizations run performance tests against the entire system as if it were still monolithic.
Performance teams spend an incredible amount of time setting up, configuring, and stressing an application at production-level load for 24, 36, or 72 hours to prove it’s stable, sometimes requiring developers and technical leads to bail out of the next sprint cycle to help out.
When these large-scale tests do break, it’s often hard to pinpoint the failure because they cannot be replicated consistently. Teams end up spending inordinate hours—days and sometimes weeks—troubleshooting issues and re-tuning parameters to keep an app from blowing up so they can release it to production.
Three steps to continuous testing
Three things need to happen for developers and operations engineers to break their old habits and achieve the impressive DevOps results mentioned earlier.
First, basic DevOps best practices should be in place. If QA and performance teams are still siloed, they should be reorganized and rolled up under operations. Operations team members should then be embedded with development teams. Ops engineers should take an active partnering role in amplifying the feedback loops to developers by writing stories in the backlog for performance issues and by participating in scrums and sprint retros. These ops engineers should become the automation experts at isolating, replicating, and describing the environmental variables causing issues and ensuring that test harnesses are continuously improving. In this way, the pipeline becomes more efficient, giving developers, tech leads, QA engineers, and product managers direct insight into what is happening in every stage leading up to production.
Second, if your tests are large, you need to start breaking them up. The goal is to componentize the tests and run as many tests as you can in a half hour to an hour. This should be done at the API layer so that different services are tested at the same time but independently of one another. Each test should have an underpinning goal and should provide an answer to a specific what-if scenario.
Third, you want to replace downstream services with mocks wherever possible. This allows you to more easily test what-if scenarios for dependent services without relying on them to be up or stable.
As you continuously repeat the runs of smaller tests and begin to provide real-time feedback to developers, you should get to the point where you are able to form a hypothesis about how to improve your code and then quickly make the improvements. And, as you get into a more Lean-Agile state, you will be equipped to multiply the number of hypotheses that you are trying to derive answers for at any given time.
In today’s blog post, I’ve provided an overview of an approach to performance testing that enables effective DevOps, borrowing from Lean-Agile and Six-Sigma. In my next blog, “The Importance of Application Decomposition in Performance Testing,” I’ll lay out the basis for how to properly instrument your applications so you can begin collecting high-quality metric data.
Colin Fallwell is part of AppDynamics Global Services team, which is dedicated to helping enterprises realize the value of business and application performance monitoring. AppDynamics’ Global Services’ consultants, architects, and project managers are experts in unlocking the cross-stack intelligence needed to improve business outcomes and increase organizational efficiency.