65% of enterprises need more than 3 hours to troubleshoot application problems

Recently conducted research by Enterprise Management Associates and AppDynamics showed that IT organizations are spending extensive amounts of time and resources on application support. This research based on a survey of 302 IT professionals, also suggested that the majority of companies are still trying to manage complex applications using siloed tools and a combination of “all hands on deck” interactive marathons and tribal knowledge.

65% of enterprises need more than 3 hours to troubleshoot application problems

As you can see in the chart below, 65% of enterprises said that it takes them more than 3 hours to determine the root cause of an application-related problem. In fact, it takes more than 6 hours for one-third of enterprises (33%) to isolate an application problem.

Screen Shot 2015-08-02 at 9.25.57 PM.png

This report also suggested the enterprises are using multiple tools to monitor the application environment with ever growing complexity. These siloed tools also lack common context, and it becomes very difficult, if not impossible, to connect the dots between the troubleshooting information coming out of these disparate tools.

Let’s consider a scenario where an enterprise support team gets multiple customer calls about the checkout process performing slowly. Their Java profiler suggests some JVM performance issue, their database monitoring tools suggest certain issues with queries and their network monitoring tools. However, this disjointed data does not help them conclude if these issues are causing the slowdown of these checkout transactions, or merely a symptom. They need common context and the ability to correlate this disparate information to the checkout transaction so they can quickly isolate the problem and take action to resolve the issue.

77% of enterprises need more than 5 “people-hours” to resolve application problems

The EMA research also suggested that the total number of “people-hours” necessary to solve a single problem is most commonly between 5 and 7 hours (see Figure below). However, in many cases, the process takes much longer. Twenty percent of the time the number is 8-10 hours, and in 8% of cases the process requires more than 20 man-hours.

Screen Shot 2015-08-02 at 9.27.17 PM.png

Given that enterprises are using multiple siloed tool designed for subject matter experts (SME) in respective areas, it often requires a long interactive war room session where these SMEs can leverage their expertise and tribal knowledge to isolate and resolve the performance issues.

Application slowdown or downtime is very expensive. A critical application failure costs a staggering $500,000 to $1 million per hour according to a recently published research “DevOps and the Cost of Downtime: Fortune 1000 Best Practice Metrics Quantified,” by IDC Vice President Stephen Elliott.

A Unified Monitoring solution, like the one recently announced by AppDynamics, can address theses challenges observed by most of the enterprises in this research by EMA. AppDynamics Unified Monitoring is the industry-first, application-centric solution that traces and monitors transactions from the end-user through the entire application and infrastructure environment, to help quickly and proactively solve performance issues and ensure excellent user experience.

Many AppDynamics customers have replaced their siloed tools with this unified monitoring platform to address some of these challenges discussed in this blog. For example, luxury online fashion retailer, Net-A-Porter adopted the AppDynamics Application Intelligence Platform to help reduce application complexity and improve end-to-end visibility of its applications

According to Hugh Fahy, CIO at the Net-A-Porter Group, “We previously used multiple point-monitoring solutions, which didn’t give us end-to-end visibility. The AppDynamics platform gives us a unified, real-time view of user experience, application performance, and availability. That allows us to optimize the user experience, and it’s hard to imagine our business without it.”

Read the full EMA report on monitoring tools.

Why did your Checkout Fail? AppDynamics knows

The reason for this blog is purely down to a real-life incident which one of our e-commerce customers shared with us this week. It’s based around a use case that pretty much anyone can relate to – the moment your checkout transaction spectacularly fails. You sit there, looking at a big fat error message and think “WTF – did my transaction complete or did the company steal my money?” A minute later you’re walking a support team through exactly what happened: “I just clicked Checkout and got an error…honestly…I waited and never got a response.”

What’s different in this story is that the support team had access to AppDynamics as they were talking to a customer on the phone…and the customer got to find out the real reason their checkout failed. How often does that happen? Never, until now. Here is the story as documented by the customer.

Data Stinks, Information Rocks!

I’ve got many years of performance geekery under my belt and I’ve learned many lessons during that time. One of the most important lessons is paying close attention to the distinction between data and information. Let’s take a look at how the dictionary defines each term:

Data – facts and statistics collected together for reference or analysis.
Information – facts provided or learned about something or someone.

What do these definitions reveal to us about data and information and how does it apply to monitoring tools? Let’s explore that together. I’ll provide specific examples along the way to illustrate my points.

The Problems with Data

Data is fundamental to problem solving, but I don’t want to have to dig through a bunch of data while my business critical, mission critical, revenue generating, etc… applications are down. To me, data is just like this picture…


4 Lessons that Operations Can Learn From Batman

In honor of the summer blockbuster The Dark Knight Rises, let’s take a look at one of the most celebrated heroes in history—Batman. (Okay, granted, he’s not real. But that doesn’t mean he can’t help us understand some best practices regarding application uptime and availability, right? Look, just bear with me here.)

1. Always have the right tools.

Batman knows his limitations: he doesn’t have superpowers (unlike my esteemed colleague, App Man). He’s just an ordinary guy with a weird-looking suit and five tons of cash. So what does he do? He buys batarangs, batcopters, bat anti-shark repellant, etc. He knows that in order to give himself an edge against the forces of darkness, he needs serious gadgets.

Similarly, Application Operations teams need to recognize that they shouldn’t wade into a production outage or firefight without some kind of application management solution. It’s amazing how many times we talk to teams that are simply using log files to manage a production application. Log files are not effective tools in a true firefight, and they’re no way to get to root cause quickly when your end users experience poor performance. Get an APM solution that equips you with 10x visibility and code-level detail in production, at less than 2% overhead. Otherwise, your office may start to feel like the rough & tumble streets of Gotham.

2. The bad guys always come back.

How many times has Batman stuck the Joker in Arkham Asylum, only see him escape and start inventing new death traps? But the Dark Knight doesn’t grumble—well, actually, he does, but let’s not worry about that right now—he just suits up and goes back on patrol.

Similarly, poor performing business transactions, memory leaks, and slow SQL calls are never going to disappear entirely, no matter how many iterations your developers perform. Agile release cycles and complex code mean that no environment will be completely free of evil. It’s important to have the right web application monitoring solution to put your rogues gallery on ice, time and time again.

3. War Room sessions are a waste of time.

“It’s the network.” “No, it’s the database.” “No, it’s the application.” How much time does your team spend pointing fingers and establishing innocence? And in the meantime, how many more issues are bound to terrorize your end users?

Batman doesn’t sit in a fire circle and hold hands with his colleagues. He doesn’t parley with the police and he doesn’t make long speeches. He gets the job done. The right application performance management tool can do the same for your own attempts at application heroism: less on the fingerpointing, more on the Mean-Time-to-Resolution.

4. Never stop until you find root cause.

One of the things that people forget about Batman is that he’s not just a superhero: he’s also the World’s Greatest Detective. He uses his brains as often as his fists. And when it comes to a mystery, he’s not content to just a get a vague sense of the story, or poke around the edges of a crime scene. Rather, he wants to find the killer and the smoking gun.

In the world of application performance, that means getting to code-level detail in a production environment. It does not mean using a tool to monitor infrastructure or a network monitor that skims the surface of the application layer.  Those tools can get close, but “close” only works in horseshoes and hand grenades. What’s necessary is to find the true culprit, the method and class that’s causing a production outage or performance issue. And if you have the right tool for the job, you won’t be patrolling all night in order to do it. Rather, you can solve the mystery in literally minutes.

One lesson you might not want to take from the Caped Crusader is to cough up a huge P.O. for your APM solution. Not everyone can be a bazillionaire like Bruce Wayne. And it’s possible to arm yourself for battle without making your IT budget as big and bloated as the Penguin.

Don’t believe me? Check out the pricing for AppDynamics Pro for yourself—then send up a signal to take our 30-Day Trial. When you get a taste of what we can do, you may find yourself swinging from the rooftops.