With up to a billion page views every month, Fox News is one of the largest news websites in the United States. New content is constantly being uploaded on their content management system (CMS) and delivered on the website via Fox’s delivery system. In addition, Fox News provides an API application that feeds content to other websites that distribute its content.
Fox News had ongoing production issues for close to two years before they began looking for an application performance management (APM) tool. “The system was a complete disaster,” said Ryan Jairam, System Administrator at Fox News. Sometimes transactions would take up to 5 minutes to return from the database, and this could have a severe impact on the rest of the application.
Solving these problems with nothing more than log files was often impossible. So Jairam and his team did what they could to reduce the effects of these long-running transactions on the rest of their application. When they first noticed a performance problem, they’d restart the application. This would solve the problem temporarily. When it came back, they’d file a support ticket with the vendor and ask their developers to take a thread dump for analysis.
Meanwhile, they’d use caching to reduce the number of executions of the long- running transactions. “I basically cached the hell out of it,” Jairam said. “We used Squid to create really long cache times, which helped while we were troubleshooting the problem.” However, this approach of exhausting the cache created a different set of problems at Fox News. “Since we had to empty the cache once it became stale, we actually created tools that would listen for new RSS feed content, and automatically flush the cache,” he said. “It was terrible, but it worked.”
Even with this caching strategy, Jairam admits he spent too much time firefighting. “We spent countless man-hours developing these workarounds when we could have been doing other things,” he said. Problems would occur every couple of weeks, he estimated, when new code was released into production. The day after a major release Jairam and a team of sys admins and developers would often dedicate up to 12 hours fixing problems that had arisen from a new code release. Upgrades in the CMS and delivery systems would be even more problematic, often taking a week or more of dedicated firefighting in order for the system to become stable. Eventually Jairam and his team decided they needed a tool that would help speed up these processes and ultimately alleviate the burden of incessant firefighting
Jairam had several different tools that he used to manage his applications before he purchased AppDynamics. Some were written in-house, which he still uses today to a lesser extent. Others, like OPNET Panorama turned out to be too unsuitable for their environment: “The overhead was so much it was killing me. I had to take it out of production.”
Jairam and his team began to look for an APM tool that was fit for a high-volume production environment like theirs. Then they found AppDynamics.
The troubleshooting process was much easier after installing AppDynamics. Minor code changes that used to take a team of four to five employees a whole day to fix would now take one person a matter of minutes to resolve. Even with larger changes, like upgrading the application, would only take a couple hours to “get it to a usable state,” which would have taken a full week before. With AppDynamics in place, Fox News estimates productivity savings of almost $165K per year.
In addition, AppDynamics has helped significantly reduce the volume of support tickets coming to Jairam’s office. “We used to have six or seven support tickets a day about our applications,” he said. “Now we only have one or two per week, if any, and 99% of those are user error. We’ve had weeks where we don’t have any.” This dramatic reduction has also freed up IT personnel saving over $70K in labor costs.
An important benefit for Jairam and his team visibility into remote web service calls. “We have a whole new level of visibility we didn’t have before,” Jairam said. “We can see now when our service providers aren’t meeting their SLAs and are causing our performance to degrade.”
For example, an outage with their search provider once caused publishing to hang. “Our CMS vendor couldn’t tell what the issue was,” Jairam said. “But almost immediately after looking at AppDynamics we could see that the feeds to our search service provider was down. When we contacted them they confirmed there was an outage at one of their datacenters.” In order to remedy the problem Jairam and his team temporarily bypassed the search feeds, freeing up more computing resources for the rest of the application.
Jairam has big plans in the future for AppDynamics. He hopes to begin optimizing code and testing database changes using performance data from AppDynamics, and he even hopes to automate some workarounds by leveraging AppDynamics workflow execution so he doesn’t have to wake up in the middle of the night if something crashes. All in all, AppDynamics has drastically changed how his team deals with performance problems. “It’s been a complete night and day difference here,” he said. “Now we have time to focus on new things and not just fight fires all day.”
|Example Use Case||Before AppDynamics||After AppDynamics||Benefits|
|Troubleshooting production issues plaguing ops and dev for 2 years||Five employees troubleshooting and using caching bandaid workarounds||Improved Mean-Time-to-Resolution (MTTR) from weeks to hours through simplicity and deep visibility such as AppDynamics? 3-Clicks to Root Cause||Productivity savings totaling $164,700|
|Reduction in outage and support tickets||6-7 tickets filed per day totaling 35 tickets per week||1-2 tickets per week and some weeks without any||Estimated productivity savings: $70,700|
|Total Annual ROI:||$235,400|
|Inability to identify performance bottleneck for delivery system||No visibility or had to create home-grown scripts||Ability to find bottlenecks with external web service calls||Gained visibility into 3rd party content and web service performance for better vendor accountability, triaging and workarounds|
|Reduced page view loss||10-15% webpage abandonment due to slow website or stale content that wasn?t refreshed from their app internal content queue on FoxNews.com homegrown scripts||Easily identified bottlenecks in the app to help optimize code.||Improvement in site performance and fresher content reduced page view loss. Supports more viewership. FoxNews has roughly 1 billion page views per month.|