TAG | Production Monitoring
Finding the Root Cause of Application Performance Issues in Production
Posted by App Man | May, 04, 2012 | In The Usual Suspects
0 Comments
The most enjoyable part of my job at AppDynamics is to witness and evangelize customer success. What’s slightly strange is that for this to happen, an application has to slow down or crash.
It’s a bittersweet feeling when End Users, Operations, Developers and many Businesses suffer application performance pain. Outages cost the business money, but sometimes they cost people their jobs–which is truly unfortunate. However, when people solve performance issues, they become overnight heroes with a great sense of achievement, pride, and obviously relief.
To explain the complexity of managing application performance, imagine your application is 100 haystacks that represent tiers, and somewhere a needle is hurting your end user experience. It’s your job to find the needle as quickly as possible! The problem is, each haystack has over half a million pieces of hay, and they each represent lines of code in your application. It’s therefore no surprise that organizations can take days or weeks to find the root cause of performance issues in large, complex, distributed production environments.
End User Experience Monitoring, Application Mapping and Transaction profiling will help you identify unhappy users, slow business transactions, and problematic haystacks (tiers) in your application, but they won’t find needles. To do this, you’ll need x-ray visibility inside haystacks to see which pieces of hay (lines of code) are holding the needle (root cause) that is hurting your end users. This X-Ray visibility is known as “Deep Diagnostics” in application monitoring terms, and it represents the difference between isolating performance issues and resolving them.
For example, AppDynamics has great End User Monitoring, Business Transaction Monitoring, Application Flow Maps and very cool analytics all integrated into a single product. They all look and sound great (honestly they do), but they only identify and isolate performance issues to an application tier. This is largely what Business Transaction Management (BTM) and Network Performance Management (NPM) solutions do today. They’ll tell you what and where a business transaction slows down, but they won’t tell you the root cause so you can resolve the issues.
Why Deep Diagnostics for Production Monitoring Matters
A key reason why AppDynamics has become very successful in just a few years is because our Deep Diagnostics, behavioral learning, and analytics technology is 18 months ahead of the nearest vendor. A bold claim? Perhaps, but it’s backed up by bold customer case studies such as Edmunds.com and Karavel, who compared us against some of the top vendors in the application performance management (APM) market in 2011. Yes, End User Monitoring, Application Mapping and Transaction Profiling are important–but these capabilities will only help you isolate performance pain, not resolve it.
AppDynamics has the ability to instantly show the complete code execution and timing of slow user requests or business transactions for any Java or .NET application, in production, with incredibly small overhead and no configuration. We basically give customers a metal detector and X-Ray vision to help them find needles in haystacks. Locating the exact line of code responsible for a performance issue means Operations and Developers solve business pain faster, and this is a key reason why AppDynamics technology is disrupting the market.
Below is a small collection of needles that customers found using AppDynamics in production. The simple fact is that complete code visibility allows customers to troubleshoot in minutes as opposed to days and weeks. Monitoring with blind spots and configuring instrumentation are a thing of the past with AppDynamics.
Needle #1 – Slow SQL Statement
Industry: Education
Pain: Key Business Transaction with 5 sec response times
Root Cause: Slow JDBC query with full-table scan
Needle #2 – Slice of Death in Cassandra
Industry: SaaS Provider
Pain: Key Business Transaction with 2.5 sec response times
Root Cause: Slow Thrift query in Cassandra
Needle #3 – Slow & Chatty Web Service Calls
Industry: Media
Pain: Several Business Transactions with 2.5 min response times
Root Cause: Excessive Web Service Invocation (5+ per trx)
Needle #4 -Extreme XML processing
Industry: Retail/E-Commerce
Pain: Key Business Transaction with 17 sec response times
Root Cause: XML serialization over the wire.
Needle #5 – Mail Server Connectivity
Industry: Retail/E-Commerce
Pain: Key Business Transaction with 20 sec response times
Root Cause: Slow Mail Server Connectivity
Needle #6 – Slow ResultSet Iteration
Industry: Retail/E-Commerce
Pain: Several Business Transactions with 30+ sec response times
Root Cause: Querying too much data
Needle #7 – Slow Security 3rd Party Framework
Industry: Education
Pain: All Business Transactions with > 3 sec response times
Root Cause: Slow 3rd party code
Needle #8 – Excessive SQL Queries
Industry: Education
Pain: Key Business Transactions with 2 min response times
Root Cause: Thousands of SQL queries per transaction
Needle #9 – Commit Happy
Industry: Retail/E-Commerce
Pain: Several Business Transactions with 25+ sec response times
Root Cause: Unnecessary use of commits and transaction management.
Needle #10 – Locking under Concurrency
Industry: Retail/E-Commerce
Pain: Several Business Transactions with 5+ sec response times
Root Cause: Non-Thread safe cache forces locking for read/write consistency
Needle #11 – Slow 3rd Party Search Service
Industry: SaaS Provider
Pain: Key Business Transaction with 2+ min response times
Root Cause: Slow 3rd Party code
Needle #12 – Connection Pool Exhaustion
Industry: Financial Services
Pain: Several Business Transactions with 7+ sec response times
Root Cause: DB Connection Pool Exhaustion caused by excessive connection pool invocation & queries
Needle #13 – Excessive Cache Usage
Industry: Retail/E-Commerce
Pain: Several Business Transactions with 50+ sec response times
Root Cause: Cache Sizing & Configuration
If you want to manage and troubleshoot application performance in production, you should seriously consider AppDynamics. We’re the fastest growing on-premise and SaaS based APM vendor in the market right now. You can download our free product AppDynamics Lite or take a free 30-day trial of AppDynamics Pro – our commercial product.
Now go find those needles that are hurting your end users!
App Man.
apm, appdynamics, AppDynamics Pro, application monitoring, Application Performance Management, BTM, Business Transaction Management, CA Wily, Compuware, Deep Diagnostics, Dynatrace, End User Monitoring, New Relic, OpNet, OpTier, Production Monitoring, Root Cause Analysis, Transaction Profiling
Why Testing in Production isn’t as stupid as it sounds
Posted by App Man | Dec, 06, 2011 | In Agile & DevOps, APM Thought Leadership
3 Comments
One of my colleagues this week was consolidating the results from our recent Application Performance Management survey, and one interesting finding was that 40% of customers have at least one release cycle a month. Out of those respondents, one third experience a Severity-1 incident each month as well. That’s a pretty compelling pair of statistics, and they might explain the continued frustration and conflict between development and operations teams. It’s also perhaps the reason why this DevOps underground movement can no longer be ignored (even by Gartner). There is no doubt development organizations have become agile, but does deploying this frequent change make the business more or less agile? For example, if one in three releases creates a Severity-1 incident, then surely agile development becomes a risk to the business. We’re at the point where Operations either has to start managing change better or simply restrict the amount of change that can occur.
So why are Sev 1 incidents so common? Based on my experiences and customer interaction, I’d strongly argue that testing in development isn’t enough. At the very least, it’s certainly not an insurance policy for deploying an application in production. When a Formula 1 team designs a car in a wind tunnel and tests it on a simulator pre-season, they don’t assume that the performance they see in test will mirror the results they see in a race. Yet, that is pretty much what happens today in the application development lifecycle. Development teams build and test their apps in pre-production before handing it off to operations for deployment in production, and they assume everything will work just fine. This is probably the worst assumption IT has made over the last decade, because development and production environments differ significantly. It’s also a lame excuse for any development team to use when a production issue occurs: “Well it ran fine in test so you must have deployed it wrong.” Yes people make mistakes occasionally, but if one in every three releases has an issue, deployment error may not be the sole reason. If development never get to see how their baby runs in production, they’ll never learn how to build robust, scalable, and high-performance applications.
apm, appdynamics, appdynamics lite, application monitoring, CloudTest, HP LoadRunner, Load Runner, Load Testing, Production Monitoring, Production Testing, SOASTA
Online media company gets proactive with application monitoring in production
Posted by App Man | Nov, 11, 2011 | In Agile & DevOps, APM Thought Leadership
1 Comment
On Wednesday I delivered a keynote at WJAX in Munich. Everything went really well, but I was a little shocked at the response I got when I asked the audience “How many of you monitor the performance of your apps in production?” As I scanned the audience, I counted 9 out of ~950 developers had put their hands up, meaning about 1% had visibility of how their applications actually performed in production. I know what you’re thinking: “But isn’t application performance in production the responsibility of Operations?” Well, it is and it isn’t. Most organizations think that when an application has an issue, it’s related to the infrastructure it runs on. That’s like saying when a car crashes, it’s because a part failed on the car whereas in actual fact most accidents are caused by the driver. Yes, hardware fails occasionally, but application logic and configuration drives how infrastructure resource is used, which is why most issues today occur when new code is deployed in production.
apm, appdynamics, appdynamics lite, AppDynamics Pro, application monitoring, Application Performance Management, MTTR, Performance Bottlenecks, Production Monitoring, Slow Application, WJAX
The Real Overhead of Managing Application Performance
Posted by App Man | May, 23, 2011 | In APM Thought Leadership
0 Comments
Have you ever tried to troubleshoot a production bottleneck or outage? Let me guess: the first place you’ll look is log files, right? And in those log files lay all the answers to your problems? Err, not exactly. Log files are like haystacks; they take up lots of space and take hours to find the precious needles that are causing you pain. Even with tools like “kerplunk,” your troubleshooting success is only as good as the data you can collect, manage and report. You can’t log everything because disk I/O and debug logging is an expensive operation, which is why most log files today only contain basic information about what applications are doing in their various infrastructure silos.
For example, if you need to troubleshoot how a slow distributed business transaction executed across your infrastructure, it could take you hours or never to piece together a jigsaw. And if one piece of your jigsaw is missing, then the trail goes cold and you shrug your head. Bottom line, managing application performance with log files is still a long, manual and tedious task with no guarantee of success. You can’t manage with facts if you don’t have all the facts in the first place.
So even with tools that help you parse and index log files, you’re still dependent on the right data being captured and available. Pointing the finger with weak or incomplete evidence just fuels the fire when it comes to figuring out who and what is causing the issue. For example, have you ever tried telling a DBA his database is the issue just because it’s slow?
apm, Application Performance Man, BTM, Business Transactions, Log Files, MTTR, Performance Management, Production Monitoring, Root Cause Analysis













