We get a lot of questions about our analytics-driven Application Performance Management (APM) collection and analysis technology. Specifically, people want to know how we capture so much detailed information while maintaining such low overhead levels. The short answer is, our agents are intelligent and know when to capture every gory detail (the full call stack) and when to only collect the basics for every transaction. Using an analytics-driven approach, AppDynamics is able to provide the highest level of detail to solve performance issues during peak application traffic times.
AppDynamics, An Efficient Doctor
AppDynamics’ APM solution monitors, baselines and reports on the performance of every single transaction flowing through your application. However, unlike other APM solutions that got their start in development environments, ours was built for production, which requires a more agile approach to capturing transaction details.
I’d like to share with you a story which illustrates AppDynamics analytics-based methodology and compares it with many of our competitors’ “capture as much detail as possible whether there are problems or not” (aka, our agents are too old to have intelligence built in) approach.
You visit Dr. AppDynamics for your regular health checkups. She takes your vital signs, records weight, measures reflexes and compares every metric taken against known good baselines. When your statistics are close to the baselines the doctor sends you home and sees the next patient without delay. When your health vitals deviate too far from the pre-established baselines the smart doctor orders more relevant tests to diagnose your problem. This methodology minimizes the burden on the available resources and efficiently and effectively diagnoses any issues you have.
In contrast, you visit Dr. Legacy for your regular health checkups. She takes your vital signs, records weight, measures reflexes and immediately orders a battery of diagnostic tests even though you are perfectly healthy. She does this for every single patient she sees. The medical system is now overburdened with extra work that was not required in the first place. This burden is slowing down the entire system so in order to ensure things move faster Dr. Legacy decides to reduce the amount of diagnostics tests being run on every single patient (even the ones with actual problems). Now the patients who have legitimate problems are going undiagnosed in the waiting room during the time when they need the most attention. In addition, due to the large amount of diagnostics testing and data being generated, the cost of care is driven up needlessly and excessively.
Does Dr. Legacy’s methodology make any sense to you when better methods exist?
AppDynamics intelligent approach to collecting data and inducing diagnostics makes it easier to spot outliers and, because deep diagnostic data is provided for only the transactions that require this level of detail, there is less impact on system resources and very little monitoring overhead.
Monitoring 100% of Your Business Transactions All the Time
AppDynamics monitors every single business transaction (BT) that flows through your applications. There is no exception to this rule. We automatically learn and develop a dynamic baseline for end-to-end response time as well as the response time of every component along the transaction flow, and also for all critical business metrics within your application.
We score each transaction by comparing the actual response time to the self-learned baseline. When we determine that a BT has deviated too far from normal behavior (using a tunable algorithm), our agent knows to automatically collect full call stack details for your troubleshooting pleasure. This analytics-based methodology allows AppDynamics to detect and alert on problems right from the start so they can be fixed before they cause a major impact.
Of course, there are times when deep data capture of every transaction is advantageous—such as during development—and the AppDynamics APM solution has another intelligent feature to address this need. We’ve built a simple, one-click button to enable full data recording system-wide. Developer mode is ideal for pre-production environments when engineers are profiling and load testing the application. Developer mode will capture a transaction snapshot for every single request. In production this would be overkill and wasteful. It’s even smart enough to know when you’re done using it and will automatically shut off when it is unintentionally left on, so your system won’t get bogged down if transaction volume increases.
Who Looks at Production Call Stacks When There are No Problems?
One of the worst qualities about legacy APM solutions is the fact that they collect as much data as they can, all the time. Usually this originates from the APM tool starting as a profiling tool for developers that has been molded to work in production. While this methodology is fine for development environments (we support this with dev-mode as described above), it fails miserably in any high volume scenario like load testing and production. Why does it fail? I’m glad you asked 😉
Any halfway decent APM tool has built-in overhead limiters to keep themselves from causing harm and introducing too much overhead within a running application. When you are collecting as much deep dive data as possible with no intelligent way of focusing your data collection you are inducing the maximum allowed overhead basically all the time (assuming reasonable load). The problem is that as your application load gets higher, this is the time when your problems are most likely to surface, and this is the time when legacy APM overhead is skyrocketing (due to massive amounts of code execution and deep collection being “always on”) so the overhead limiters kick in and reduce the amount of data being collected or kill off data collection altogether. In plain English this means that legacy APM tools can’t tell good transactions from bad and will provide you with the least amount of data at the time you need the most data. Isn’t it funny how marketing and sales teams try to turn this methodology into the best thing ever?
I have personally used many different APM tools in production and I never needed to look at a full call stack when there was no problem. I was too busy getting my job accomplished to poke around in mostly meaningless data just for the fun of it.
Distributed Intelligence for Massive Scalability
All of the intelligent data collection mentioned above requires a very small amount of extra processing to determine when to go deep and what to save. This is a place where the implementation details really make a difference.
At AppDynamics, we put the smarts where they are best suited to be – at the agent level. It’s a simple paradigm shift that distributes the workload across your install base (where it’s not even noticed) rather than concentrating it a single point. This important architectural design makes it so that as the load on the application goes up, the load on the management server remains low.
Contrasting this with legacy APM solutions, restricting whatever intelligence you have to the central monitoring server(s) causes higher resource requirements and therefore a monitoring infrastructure that requires more servers and greater levels of care and feeding.
Collecting, transmitting, storing, and analyzing large amounts of unneeded data comes with a high total cost of ownership (TCO). It takes a lot of people, servers, and storage to properly manage those legacy APM tools in an enterprise environment. Most APM vendors even want to sell you their expensive full time consultancy services just to manage their complex solutions. Intelligent APM tools ease your burden instead of increasing it like the legacy APM tools do.
All software tools go through transition periods where improvements are made and generational gaps are recognized. What was once cutting edge becomes hopelessly outdated unless you invest heavily in modernization. Hopefully this detailed look at APM methodologies helps you cut through the giant pile of sales and marketing propaganda that develops and IT ops folks are constantly exposed to. It’s important to understand what software vendors really do, but it’s most important to understand how they do it as it will have a major impact on real life usage.