Edmunds.com is the leading online resource for information about cars for people looking to buy or simply learn more about the car they already own. John Martin, Sr. Director of Production Engineering at Edmunds.com, knows that keeping the site performing well is important to the business. Martin’s goal is to keep response times for the website below 150 milliseconds, but in order to do so he needed more visibility into application performance.
Managing Performance with DevOps
Edmunds.com has a very strong DevOps culture. For Martin, this means that the development team is reliant on receiving consistent, actionable data from operations whenever a problem occurs. “Our development team is very data-driven,” said Martin, “so it’s necessary that whenever an event occurs in production that the operations teams are able to feed back data to development.”
Getting that data, however, was a problem for Martin and his team. Getting information about where a problem occurred and why was difficult with the tools they had at hand.
“Originally we were using Wily Introscope for deep method inspections,” said Martin. “One of the reasons it didn’t pan out for us is the requirement that we tell it when an event is occurring and we needed to know more about that event.” The problem was that Martin’s team, like any operations team, could not anticipate the important events that would impact the performance of the production application, and so could not collect the relevant data to give to development. It was time to look for a new production monitoring tool.
Smart and Lightweight
Martin looked at a few different monitoring tools, including Dynatrace and AppDynamics. “Dynatrace had similar limitations as Wily did, which was that we had to tell it what we wanted it to monitor,” Martin said. AppDynamics, on the other hand, had built-in intelligence that automatically detected when a problem was occurring in the application and began collecting the relevant data. “AppDynamics intelligence just says, ‘Hey something interesting is going on, I’m going to collect more data for you,’” Martin said. “It was a far better solution than us having to go tell our APM tool to look for something.”
AppDynamics also fit Martin’s requirements for a production-ready tool. “It was a requirement for us that whatever application monitoring solution we chose that it introduce very little overhead in our production runways,” Martin said. “The idea of an application monitoring tool should be to monitor, not to intrude.”
Improving Performance to Protect Revenue
Within a matter of months, Martin found that AppDynamics had made a significant impact in how his team found and solved performance issues in their production environment. Because operations was able to give development relevant data almost immediately, troubleshooting time was dramatically reduced and uptime improved. “We used to have around 10 people working on a single problem for several days, which is very expensive both in lost time and in lost revenue,” Martin said. “Now we’re down to a few hours.”
The combination of productivity savings and revenue protection (avoiding losing customers due to a slow transaction or an outage) has saved Edmunds over $800,000 in the first year of having AppDynamics. Assuming similar results in its second year, Edmunds will have saved over a million dollars by mid-2013.
Martin found that AppDynamics was a very effective troubleshooting tool for his team and his environment. “AppDynamics is right for anybody that 1) doesn’t have a clear picture of what’s going in their application, and 2) has a very large or distributed infrastructure,” said Martin. “It’s absolutely necessary so you’re not having to figure out how to monitor all these multiple endpoints with rudimentary tools like JConsole or VisualVM. Those are great for a simple architecture, but in a distributed architecture you’ve got to think about something different,” he said. “AppDynamics is perfect for that.”
|Example Use Case||Before AppDynamics||After AppDynamics||Benefits|
|Reduction in production downtime||99.91% availability||99.95% availability||$167,475 saved in lost revenue|
|Reduction in average MTTR per Severity-1 incident||5 man-days||Reduced by 45% (conservative)||$307,521 in productivity savings|
|Reduction in time spent identifying performance defects in pre-production||2.5 man-days||Reduced by 35% (conservative)||$320,170 in productivity savings|
|Total 1st Year Savings:||$795,166|
|Projected ROI Over 2 Years:||$1,217,334|
|Customer satisfaction||Slow transactions & outages frustrate customers||Fewer incidents, better end user experience||Average response time under 150 ms|
|DevOps collaboration||Dev working on a hunch to solve problems||Ops can give actionable data to dev||Reduced MTTR by 45%|