For competitive and confidentiality reasons, our customer has asked us not to identify them by name.When your business is to execute major engineering and construction projects around the world that add up to $3 billion in annual revenue, software plays a critical role in managing projects through their lifecycle — the revenue of the company depends on it. And like the projects themselves, those software systems tend to be large-scale and complex, in this case built largely but not exclusively on Oracle and IBM products. Unfortunately, these systems don’t provide adequate tools or visibility to quickly get to the root causes of issues — just “a lovely, horrific log file to dig through” in the words of the Systems Integrator, who is responsible for performance. But AppDynamics had the products that offered the visibility and functionality to keep the software running and keep the company focused on what it does best: engineering and building big, exciting projects.
The application’s memory usage was a “black box,” as described by the customer, and they needed a way to get a deeper look into it. They shopped around for an application or platform that would give them the visibility they needed, and landed on AppDynamics.
“After using AppDynamics to analyze our applications’ memory usage and performance side-by-side with the identical system that was running stably, it became very apparent that the problem lie with the garbage collection (GC) process of the JVM,” the customer explained. “Specifically, its inability to keep up with the creation and destruction of the objects in memory. In our case, there were no minor GCs being run, and major GCs were occurring on almost a per-minute basis, until ultimately the memory heap would run out and the application would crash.”
This diagnosis also explained the performance complaints, as garbage collection is a stop-the-world event, meaning all processing is suspended while the GC takes place.
But this still didn’t explain why this system was bogging down and crashing and an identical system wasn’t. As it turns out, they were not 100% identical. The AppDynamics correlation analysis tools pointed the finger at a particular set of (proprietary) integrations, which actually were not being run on the “identical” system.
“In the end, it just came down to telling that vendor, ‘you’re wrong, you’re causing the problem,’” the customer said, since they were able to show the AppDynamics data that precisely pinpointed the issue. “Fortunately, we could schedule that integration when our end-user typical daily load is not high, so it didn’t cause the GC to fall behind and ultimately run out the heap and crash the application.”
Now that AppDynamics was in place, it didn’t take long to find the problem.
“With AppDynamics, we were able to monitor all aspects of our application servers on a real-time basis for all tiers of the platform,” the customer said. “Add to this the ability to monitor multiple servers and applications simultaneously, along with the built-in correlation analysis tools, and finally the root cause of the slow end-user experience was easily determined. The firewall throughput was too small for the load coming from the DMZ into the internal network.”AppDynamics diagnosis. New, higher capacity firewall installed. Problem solved.
Getting to the root cause of performance problems in sprawling, heterogeneous networks is a huge challenge, but one that software-driven businesses absolutely must master. AppDynamics provided this global engineering-construction firm with the visibility and analytical tools to solve perplexing, persistent issues even when the application vendors couldn’t or wouldn’t.
Ease-of-use and historical data are two AppDynamics benefits this customer cites in particular.
“I fell in love with AppDynamics just because it was so easy to use out of the box,” the customer said. “I didn’t have to spend a lot of time tweaking it.”“The best thing for me about AppDynamics is the ability to capture an issue historically,” the customer continued, “To give us the chance to use the time range window and then go and trace it all the way to the SQL call. Then we can look at other logs. You can’t touch that with the other tools that we have. I love how easy it is to get to that without having to do a lot of programming, coding, and filtering, to quickly resolve what the problem was.”