Have you ever tried to troubleshoot a production bottleneck or outage? Let me guess: the first place you’ll look is log files, right? And in those log files lay all the answers to your problems? Err, not exactly. Log files are like haystacks; they take up lots of space and take hours to find the precious needles that are causing you pain. Even with tools like “kerplunk,” your troubleshooting success is only as good as the data you can collect, manage and report. You can’t log everything because disk I/O and debug logging is an expensive operation, which is why most log files today only contain basic information about what applications are doing in their various infrastructure silos.
For example, if you need to troubleshoot how a slow distributed business transaction executed across your infrastructure, it could take you hours or never to piece together a jigsaw. And if one piece of your jigsaw is missing, then the trail goes cold and you shrug your head. Bottom line, managing application performance with log files is still a long, manual and tedious task with no guarantee of success. You can’t manage with facts if you don’t have all the facts in the first place.
So even with tools that help you parse and index log files, you’re still dependent on the right data being captured and available. Pointing the finger with weak or incomplete evidence just fuels the fire when it comes to figuring out who and what is causing the issue. For example, have you ever tried telling a DBA his database is the issue just because it’s slow?
So along came Application Performance Management (APM) toolsets to save the day–promising “End to End” visibility of your business transactions in production so you can identify, isolate and resolve all your problems in minutes before your end users are impacted. Sounds too good to be true, right? The first question people ask is “How much overhead does your APM agent introduce in production?” which is quickly addressed with a “No more than 1 or 2%” claim. So while agent overhead is a top concern for organizations, it’s actually nothing if you take a step back and look at the bigger picture.
You see, most APM solutions today are only as smart as the user that configures and tells them what to monitor. Just like log files, you can’t monitor everything–so there needs to be a balance between visibility and overhead. This is why most APM vendors today require “professional services” to figure out the optimal configuration of their APM solution with your application before it’s deployed in production.
This “Optimal Configuration” (or Overhead) is broken down into four areas:
#1 Business Transactions – what URLs & requests of the app should be named as a transaction?
#2 Instrumentation – which classes and methods of the app should be monitored?
#3 Thresholds – when should data be collected?
#4 Overhead – how much agent overhead is introduced from data collection?
The bad news is that all four of these areas will change with every build of your application. So it’s actually incredibly bad news for agile organizations that want to regularly push through new application releases into production. If you have to spend one or two days configuring your APM solution every time a new build is released, it’s going to feel like groundhog day every other day. The sheer time, patience, and cost to configure things like transactions, instrumentation, thresholds and overhead is just unmanageable for most organizations. It’s actually the main reason why many APM solutions today don’t make it into production (and stay in pre-production). If an organization is practicing agile development, they simply can’t wait for someone to re-configure the APM solution every time they want to deploy in production. This therefore makes it very difficult or impossible to manage application performance effectively.
If you’re interested in APM and are considering toolsets, be sure to understand the true overhead of a solution not just from an agent overhead perspective. The last thing you want to be doing during a production slowdown or outage is configuring your APM solution. That is, unless you’ve got a team of APM consultants on standby.
Now imagine what life would be like if an APM solution could run in production and solve your problems without needing transaction, instrumentation, threshold and overhead configuration. Sit comfortably, take a deep breath, click on this link and relax. The days of manual APM labor are over.
App Man.