On Wednesday I delivered a keynote at WJAX in Munich. Everything went really well, but I was a little shocked at the response I got when I asked the audience “How many of you monitor the performance of your apps in production?” As I scanned the audience, I counted 9 out of ~950 developers had put their hands up, meaning about 1% had visibility of how their applications actually performed in production. I know what you’re thinking: “But isn’t application performance in production the responsibility of Operations?” Well, it is and it isn’t. Most organizations think that when an application has an issue, it’s related to the infrastructure it runs on. That’s like saying when a car crashes, it’s because a part failed on the car whereas in actual fact most accidents are caused by the driver. Yes, hardware fails occasionally, but application logic and configuration drives how infrastructure resource is used, which is why most issues today occur when new code is deployed in production.
Yesterday I got off the phone with a manager of web development who walked me through how and why his development team are using application monitoring in production. It might ring a few bells and help you understand the motivation behind this need and how it can help an organizations become proactive instead of reactive when managing application performance and availability in production.
The customer on this occasion was a B2B media publishing company that provides a content management platform (CMS) for around 110 websites, with many driven by user subscriptions. Performance and availability is therefore critical, given that it underpins the brand and user experience of many companies that rely on this platform. Prior to AppDynamics, this customer had PingDome and HP SiteScope to monitor application performance and availability in production. Even with this synthetic monitoring approach, they still had authors and end users complaining that the CMS platform was down or unavailable at times, which was a pretty embarrassing situation to be in.
Mean Time To Resolution worked like this: users would complain and contact customer support who would then contact Ops. Ops would perform basic infra checks and restart servers they thought were responsible. This manual process took between 20 minutes and an hour depending if anyone in Ops was available. At the same time, Ops would copy GBs of log files and sometimes thread dumps to a pre-production server so developers could access data and begin troubleshooting. Due to the access restrictions in production, it wasn’t uncommon for developers to wait a few hours for this data to become available. When the data did arrive, there was no guarantee that the log files and thread dumps would actually reveal the root cause. Bottom line: troubleshooting application performance in production was a manual, reactive, time-consuming and frustrating effort.
Today, things are a lot different. Every architect and developer has real-time visibility of application performance in production via their web browser. They don’t have to wait for log files; thread dumps or Ops to tell them something is wrong. Developers proactively check application performance in real-time after every agile release to see what impact their code had in production. Alerts can also be sent to Ops and Dev when application performance deviates from its normal baseline because not everyone looks at a dashboards or consoles all day. This is a classic example of how application monitoring can help a customer proactively manage application performance so they can find and fix potential bottlenecks without having to wait for the application to crash or slow down in production.
Using this example, lets look at the time differences for development to begin root cause analysis when a production issue occurs:
In this customer example, one of the developers downloaded AppDynamics Lite, showed his boss and ended up taking a free trial of AppDynamics Pro, a few weeks later they became a customer of AppDynamics. This is great story to see and hear but we’ve got a lot of work to do to help the other 99% of developers get this production visibility around the world.