I wouldn’t do website load/performance testing any more without having an APM tool in place. Period. Full stop. End of story.
I’ve been involved in website load testing for over 10 years, as a “end-user” when I was web operations manager for an online job board, as a team leader for a company providing cloud load testing services, and as a consultant on web performance with my own company DevOpsGuys. The difference in the value you get from load/performance testing with and without APM tools is enormous.
We’ve probably all seen those testing reports that are full of graphs of response time versus req/sec, CPU utilisation curves, disk IO throughput, error rates ad nauseam. I, to my eternal shame, have even written them… and whilst they are useful for answering the (very simplistic) question of “how many simulated requests/users can my website support before it falls over?” generating any real application insight from what are essentially infrastructure metrics is difficult. This type of test report rarely results in any corrective actions other than (1) “lets throw more hardware at it” or (2) “let’s shout at the devs that they have to fix something because the application is slow”. Quite often the report gets circular filed because no-one knows how to derive application insight and hence generate meaningful corrective actions at either the code, application stack configuration or infrastructure level. All that effort & expense is wasted.
So how are things different when using APM tools (like my preferred tool, AppDynamics)? Here are my top 4 reasons:
1. See the Big Picture (Systems Thinking)
“Systems thinking is a framework for seeing interrelationships rather than things, for seeing patterns rather than static snapshots.” – Peter Senge, “The Fifth Discipline”.
The “first way of DevOps” is systems thinking, and APM tools reinforce the systems thinking perspective by helping you see the big picture very clearly. You can see the interrelationships between the web tier, application tier, database servers, message queues, external cloud services etc. in real time while you’re testing rather than being focussed on the metrics for each tier individually. You can instantly see where the bottlenecks in your application are in the example below the 4306ms calls to Navision stand out!
2. Drill Down to the Code Level
One of my favourite things when load testing with APM tools is being able to drill down to the stack trace level and identify the calls that are the most problematic. Suddenly, instead of talking about infrastructure metrics like CPU, RAM and Disk we are talking about application metrics — this business transaction (e.g. web page or API request) generates this flow across the application and 75% of the time is spent in this method call which makes 3 database calls and 2 web service calls and its this database call that’s slow and here’s the exact SQL statement that was executed. The difference in the response you get from the developer’s when you give them this level of detail compared to “your application is slow when we hit 200 users” is fantastic — now you are giving them real, pinpoint actionable intelligence on how their application responds under load.
3. Iterate Faster
“the application was made 56x faster during a 12hr testing window”
Because you can move quickly to the code level in real-time while you test and because this facilitates better communication with the development team your load testing suddenly becomes a lot more collaborative, even if the load testing is being performed by an external 3rd party.
We generally have all the relevant parties on a conference call or HipChat chat session while we test and we are constantly exchanging information, screenshots, links to APM snapshots and the developers are often able to code fixes there and then because we can rapidly pinpoint the pain points.
If you’ve got a customer with an Agile mindset and continuous delivery capability it can enable you to do rapid test and fix cycles during testing, often multiple times times in a day. In one notable example, the application was made 56x faster during a 12hr testing window due to 4 application releases during that period.
4. Stop the “Blame Game”
“make the enemy poor performance, not each other…”
Traditionally in the old school (pre-APM tools) days, load tests were often conducted by external load testing consultancies who would come in, do the testing, and then deliver some big report on how things went.
The customer would assemble their team together in a conference room to go through the report, which often triggered the “blame game” – Ops blaming Dev, Dev blaming QA, QA blaming Ops, Ops blaming the hosting provider, the hosting provider blaming the customer’s code and around and around it would go.
But with the right APM tools in place we’ve found this negative team dynamic can be avoided.
As mentioned earlier, testing tends to become more collaborative because it’s easier to share the performance data in real time via the APM tool, and discussions become more evidence-based. It’s more about “what are we going to do about this problem we can see here in the APM tool” and less about trying to allocate blame when no-one really knows where the problem actually resides and they don’t want to be left holding the can. The system-thinking, holistic view of the application’s performance promulgated by the APM tool makes performance the enemy, not each other. And that means that the performance issues are likely to be fixed faster, and not ignored due to politics and infighting.
There are probably loads more reasons you can come up with for why load testing with APM tools are awesome (and I’d love you hear your thoughts in the comments), but I will leave you with one more bonus reason – because it’s fun. For me, using AppDynamics when I’m doing load testing and performance tuning has really bought the fun factor back into the work. It’s fun to see the load being applied to the system and to see (via AppDynamics) the effect across the entire application. It’s fun to work closer with the Dev & Ops teams (dare I say, “DevOps”!) and to share meaningful, actionable insights on where the problems lie, and it’s fun be able to rapidly iterate and show the performance improvements in real-time.