How to Triage a Busy Thread Count Alert in 14 Minutes

July 03 2014
 

Want to learn APM best practices to fix your busy thread count? Read More


This is a real example of troubleshooting a production application issue provided by an AppDynamics customer. What you are about to see is a combination of run time analytics, adaptive data collection, intelligent alerting, and a proven problem solving workflow. From first alert to DBA handoff took only 14 minutes.

5:26 p.m. – Operations receives an email alert about Busy Threads breaching a threshold. The incident was automatically detected and alerted upon by AppDynamics when the Busy Threads JMX metric shot up to 182.

AppDynamics sends notifications detailing busy thread counts

5:34 p.m. – Details from AppDynamics show that call volume is down, response time is up, errors are up and network I/O is down. Initial suspicion is that the load balancer may be throttling traffic due to poor performance.

Thread_ART_Throughput_Errors

 

Thread_Network

 

Thread_CPU

5:38 p.m. – Company procedure is followed by disabling the server from the load balancer so that it will not receive any more traffic. Recycle of application server is considered as a possible temporary resolution to the issue.

5:40 p.m. – Details from AppDynamics are used to show that transactions are backing up because of a database issue. There is no need to recycle the application server. The issue is handed off to DBA team with full application context for resolution.

 

Thread_Txn_Map

Screenshot showing problematic JDBC call as the culprit.

Screenshot showing problematic JDBC call as the culprit.

Later that day: DBA team fixes the issue and application response time returns to normal. All nodes are restored into the load balancer rotation.

This is an example of a scenario that IT Operations teams deal with regularly. Without having AppDynamics in place to provide fault domain isolation this type of problem usually ends up in a long conference call where all support personnel for this application must participate until service has been restored. There is no need to waste significant company resources any more. Stop the “all hands on deck” madness and see how AppDynamics can help your company today.

Jim Hirschauer
Jim Hirschauer is a Technology Evangelist for AppDynamics. He has an extensive background working in highly available, business critical, large enterprise IT operations environments. Jim has been interested in application performance testing and monitoring since he was a Systems Administrator working in a retail bank. His passion for performance analysis led him down a path where he would design, implement and manage the cloud computing monitoring architecture for a top 10 investment bank. During his tenure at the investment bank, Jim created new processes and procedures that increased overall code release quality and dramatically improved end user experience.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form