With over 600 restaurants and over 3,000 employees, Pizza Pizza is the largest pizza chain in Ontario, Canada. Customers can order pizza over the phone, through the website and even on their smart phones. A call center with more than 400 agents handles orders for all of the restaurants, where agents place orders through an internal application. These three applications handle hundreds of thousands of transactions every week, and they must always be up and running to cater to the needs of Pizza Pizza’s hungry customers.
Amar Narain, Director of IT at Pizza Pizza, is responsible for making sure these applications were always performing well, especially during peak hours. Unfortunately, a lack of visibility into the application layer made this a difficult task. The development team put new code in production every two weeks for a new promotion, and this often had a severe impact on application performance. Without visibility into what was causing performance bottlenecks, Narain and the developers could not effectively triage outages and slowdowns.
When an outage occurred, it was up to Narain and the developers to find the problem in the logs. With about 50 Tomcat instances, this exercise was extremely time-consuming – it often took 20-30 man- hours to find the point of failure, and even then the problem was far from being resolved. “We usually attempted to solve problems just by adding more resources – bringing in more Tomcat instances, or raising the number of connections to the databases,” Narain said.
These performance problems were detrimental both to the customers that used the website and the call center agents that placed orders from the phone – the slower the request, the longer it took to place an order and the longer the phone call. “The average talk time on a call should be about two minutes and twenty seconds. Ours was 2:25-2:35, which may not sound like that much more, but ten seconds in this industry is huge,” Narain said. “Ten seconds can mean up to 20-30 less agents helping customers during our peak.”
Furthermore, this talk time would spike every week when developers released new code to production. “Almost every Friday evening we would either see a slowdown – talk times would usually be 2:47-2:50 – or something would crash.”
While these crashes usually only affected a couple of agents, sometimes they could be much more damaging. “We had a couple of incidents in December and January that were pretty costly,” Narain said. “One outage in January took down our entire call center for one hour on a Friday night. That outage probably cost us $100,000-150,000. That’s what pushed us to really understand what was happening.”
The first APM solution that Narain and his team looked at was Oracle’s Real User Experience Insight (RUEI), but they found it to be too expensive for their budget. Then they discovered AppDynamics, and in January of 2012 they decided to buy.
The time to value for their team was less than a week; within a matter of days they were beginning to optimize the database queries that were eating up much of their resources. “One of our problems was how queries from the call center were interacting with the customer database,” Narain said. “The transactions would read and commit even though they weren’t writing any data. This caused unnecessary overhead on the application,” Narain said. “Once we optimized those SQL statements we saw a major difference in the feedback we got from call center managers – they have seen a significant improvement in response time when they’re taking orders.”
With AppDynamics, Narain and his team have been able to get out fixes for performance problems much faster. While it used to take 20-30 man-hours to find where the application had failed, it now takes only 15-30 minutes. “Before, our solutions were dependent on user feedback and logs and focused only on the current incident, rather than other business process behaviors. Now we can actually find the root cause of the problem and fix it,” Narain said.
Furthermore, Pizza Pizza’s agile release cycles no longer impact application performance. The developers and DBA use AppDynamics to find problems in the code they push to production every week, preventing slowdowns and outages before they occur in the production environment. “When the development team first saw the tool, you could see the excitement and confidence in their faces, knowing that now they have a bird’s eye view of the systems,” Narain said. “With AppDynamics we can catch problems before they cause an outage. And that means we’re saving money, and keeping our customers happy. That’s what matters.”