ASRC Federal

Government Agency cuts Help Desk Incidents and Repair Time by More than Half with AppDynamics

To keep operations running, the US Government relies on the work of thousands of individuals, contractors, and agencies —all of whom strive to provide prompt benefits/payments to federal workers.

One Federal Civilian Agency responsible for these payments, the focus of this report, depends heavily on an application to perform case management, make payments of $5 billion annually, and, among other things, guide an army of 1,200 claims examiners who review submissions and generate weekly federal benefits payments.

Too frequently this mission-critical application would slow or suddenly stop, impacting the productivity of each examiner. The result would be missed quotas, financial waste, accounting charge-backs, and even payment benefits not being disbursed at all.

To help remedy this untenable situation, the agency turned to AppDynamics for application performance help.

Challenge: Increase application performance and quality while optimizing inefficient incident response and inefficient code

Supporting the IT infrastructure and applications within a government agency requires a team of project managers, systems architects, software engineers, technical support, and help desk staff. The ASRC Federal Data Solutions team manages more than 40 mission-critical applications across several programs within the agency. Many of these legacy, multi-tier applications include database integration, middleware, and Java components. All require ongoing application maintenance updates, resulting in an average of 50 releases per week throughout the lifecycle environment.

Overseeing some of these efforts is ASRC Federal Data Solutions Project Manager Ed Hulbert.

“The case management and payment application is the most challenging one we deal with,” says Hulbert. “It’s a multi-tier application with a cluster of Java application servers and a Java client component that runs on 1,200 end user PCs distributed across 16 different sites throughout the United States over several types of WAN connections. When there’s a performance issue, there are plenty of variables involved that could be the root cause.”

With so many teams required to support the applications, responding to application slowdowns or outages resulted in an arduous and inefficient incident response process.

Continues Hulbert, “our previous resolution process was highly reactive. Typically, we were alerted to outages or performance issues by the end users. To meet quotas and provide service to their customers, these performance-based examiners require the application to have 99.9% uptime Monday through Friday.”

The incident-resolution process typically would roll out as listed below:

  • Email notification to 180+ people, consisting of multiple teams and contractors
  • Assemble a conference call with 15 people from various teams (staffing and schedules often required this step to be repeated multiple times)
  • Individual groups (Network Operations, Windows, Middle Tier, and Database) review their area of support and report back
  • Repeat previous steps until the issue is resolved

Shahram Ghaffari, the ASRC Federal Data Solutions Architect and technical team lead, has painful memories of the previous process.

Recalls Ghaffari, “without end-to-end visibility, it was very difficult to pinpoint a definitive root cause. We really needed an application performance management (APM) solution. Our previous APM solution was essentially me, spending up to twelve hours looking through transaction logs, heap dumps, etc., to try to determine the issue. Many times we couldn’t determine the actual cause.”

The biggest change is now we don’t have system downtime.

Solution: Proactive application performance monitoring, resolution and optimization across all platforms

After considering several APM products, the ASRC Federal Data Solutions team evaluated AppDynamics Application Performance Management. One area of major interest was the solution’s small footprint and low performance overhead.

“The runtime impact for AppDynamics was very low, and system maintenance requirements were low as well,” says Ghaffari.

The agency began to quickly see results.

“The AppDynamics solution worked out of the box, with almost no configuration needed to gain end-to-end visibility,” adds Hulbert. “After a quick install, we began discovering the problem calls and transactions. Our technical staff was very pleasantly surprised with the AppDynamics ease of setup, integration, and intuitive troubleshooting capability out of the box.”

Benefits: Rapid ROI, Proactive Incident Management and Application Quality Improvement

Right from the start, developers, administrators, and users within the agency’s programs liked what they saw with AppDynamics. Today, the agency uses AppDynamics collaboratively across its support, development, and management teams, which include multiple contractors.

“Every team now has visibility into the system—we’re all looking at the same symptoms at the same time and that visibility is priceless,” says Hulbert.

Ghaffari echoed these sentiments. “With AppDynamics APM, we have end-to-end visibility of what’s going on across the entire system—all the way down to the performance on claim examiners’ desktops. We’ve even been able to provide performance dashboards to the individual site managers to let them see if something is wrong on their network— even before their users start complaining. Extending this visibility to remote sites not only empowers them, it avoids many support calls.”

Another area where the agency has seen dramatic results is in the overall quality of the development of applications.

According to Ghaffari, “Prior to using AppDynamics, our developers lacked feedback on how code changes might impact production. Today, these teams access detailed application data for ongoing optimization and performance improvement, often allowing them to catch application performance issues prior to production release.”

This has cut down dramatically on the reactionary support that was required in production.

“In the end, we were able to expand this implementation into our CITRIX infrastructure, launching the same application in that environment. AppDynamics gives us the visibility there now too, to monitor application performance while users are connected remotely all over the country,” adds Hulbert.

And what about the original incident support process mentioned earlier? Today, the agency has dramatically streamlined and automated incident response. Here’s an update on the how this APM solution is benefiting the Incident Management Team:

  • Focused alerts, with timely access to detailed application root-cause data
  • Elimination of “all-hands” bridge calls and email traffic/scheduling issues
  • Early engagement of key application-team resources for faster MTTR (Mean Time To Repair)
  • Baseline trending for performance
  • Proactively notify end users before they notify operations or the Help Desk
  • Much faster and sometimes immediate updates to Management when there is a problem, many times pinpointing exactly where the issue resides

While these improvements are impressive, the ASRC Federal Data Solutions Project Manager Hulbert sums up simply with his favorite, and most important, AppDynamics benefit: “The biggest change is now we don’t have system downtime.”

Find out more about the true cost of downtime.

We’ve even been able to provide performance dashboards to the individual site managers to let them see if something is wrong on their network— even before their users start complaining. Extending this visibility to remote sites not only empowers them, it avoids many support calls.