TAG | Troubleshooting Applications
The reason for this blog is purely down to a real-life incident which one of our e-commerce customers shared with us this week. It’s based around a use case that pretty much anyone can relate to – the moment your checkout transaction spectacularly fails. You sit there, looking at a big fat error message and think “WTF – did my transaction complete or did the company steal my money?” A minute later you’re walking a support team through exactly what happened: “I just clicked Checkout and got an error…honestly…I waited and never got a response.”
What’s different in this story is that the support team had access to AppDynamics as they were talking to a customer on the phone…and the customer got to find out the real reason their checkout failed. How often does that happen? Never, until now. Here is the story as documented by the customer.
I’ve got many years of performance geekery under my belt and I’ve learned many lessons during that time. One of the most important lessons is paying close attention to the distinction between data and information. Let’s take a look at how the dictionary defines each term:
Data – facts and statistics collected together for reference or analysis.
Information – facts provided or learned about something or someone.
What do these definitions reveal to us about data and information and how does it apply to monitoring tools? Let’s explore that together. I’ll provide specific examples along the way to illustrate my points.
The Problems with Data
Data is fundamental to problem solving, but I don’t want to have to dig through a bunch of data while my business critical, mission critical, revenue generating, etc… applications are down. To me, data is just like this picture…
In honor of the summer blockbuster The Dark Knight Rises, let’s take a look at one of the most celebrated heroes in history—Batman. (Okay, granted, he’s not real. But that doesn’t mean he can’t help us understand some best practices regarding application uptime and availability, right? Look, just bear with me here.)
1. Always have the right tools.
Batman knows his limitations: he doesn’t have superpowers (unlike my esteemed colleague, App Man). He’s just an ordinary guy with a weird-looking suit and five tons of cash. So what does he do? He buys batarangs, batcopters, bat anti-shark repellant, etc. He knows that in order to give himself an edge against the forces of darkness, he needs serious gadgets.
Similarly, Application Operations teams need to recognize that they shouldn’t wade into a production outage or firefight without some kind of application management solution. It’s amazing how many times we talk to teams that are simply using log files to manage a production application. Log files are not effective tools in a true firefight, and they’re no way to get to root cause quickly when your end users experience poor performance. Get an APM solution that equips you with 10x visibility and code-level detail in production, at less than 2% overhead. Otherwise, your office may start to feel like the rough & tumble streets of Gotham.
2. The bad guys always come back.
How many times has Batman stuck the Joker in Arkham Asylum, only see him escape and start inventing new death traps? But the Dark Knight doesn’t grumble—well, actually, he does, but let’s not worry about that right now—he just suits up and goes back on patrol.
Similarly, poor performing business transactions, memory leaks, and slow SQL calls are never going to disappear entirely, no matter how many iterations your developers perform. Agile release cycles and complex code mean that no environment will be completely free of evil. It’s important to have the right web application monitoring solution to put your rogues gallery on ice, time and time again.
3. War Room sessions are a waste of time.
“It’s the network.” “No, it’s the database.” “No, it’s the application.” How much time does your team spend pointing fingers and establishing innocence? And in the meantime, how many more issues are bound to terrorize your end users?
Batman doesn’t sit in a fire circle and hold hands with his colleagues. He doesn’t parley with the police and he doesn’t make long speeches. He gets the job done. The right application performance management tool can do the same for your own attempts at application heroism: less on the fingerpointing, more on the Mean-Time-to-Resolution.
4. Never stop until you find root cause.
One of the things that people forget about Batman is that he’s not just a superhero: he’s also the World’s Greatest Detective. He uses his brains as often as his fists. And when it comes to a mystery, he’s not content to just a get a vague sense of the story, or poke around the edges of a crime scene. Rather, he wants to find the killer and the smoking gun.
In the world of application performance, that means getting to code-level detail in a production environment. It does not mean using a tool to monitor infrastructure or a network monitor that skims the surface of the application layer. Those tools can get close, but “close” only works in horseshoes and hand grenades. What’s necessary is to find the true culprit, the method and class that’s causing a production outage or performance issue. And if you have the right tool for the job, you won’t be patrolling all night in order to do it. Rather, you can solve the mystery in literally minutes.
One lesson you might not want to take from the Caped Crusader is to cough up a huge P.O. for your APM solution. Not everyone can be a bazillionaire like Bruce Wayne. And it’s possible to arm yourself for battle without making your IT budget as big and bloated as the Penguin.
Don’t believe me? Check out the pricing for AppDynamics Pro for yourself—then send up a signal to take our 30-Day Trial. When you get a taste of what we can do, you may find yourself swinging from the rooftops.Link to this post: