Best Practice for ITSM Professionals Using Monitoring and Alerting

September 11 2014

Learn best practices to use application monitoring and alerting to improve ITSM and lessen reliance on a CMDB.

I’ve only been with AppDynamics a few months and wish I knew 10 years ago what I know now. Using technology, businesses have moved on leaps and bounds, but the fundamental enterprise IT challenges remain the same — just with increased complexity.

10 years ago, as the Director of IT Business and Service Management with a large european investment bank., I was tasked with IT governance and control environment. My main goal was to ensure IT had full visibility of the service it provided to the business.

The goal of the program was to ensure IT managed every business-critical application issue:

  • Restoring an issue while informing the customer about the problem and the business impact of the issue
  • Notifying the customer of application issues and expected mean-time-to-resolution (MTTR)
  • This seems so easy when condensed into 2 sentences, and it really should be in the modern world of IT in an investment bank. However, those who have experienced this it’s anything but and the tasks were more accurately:

  • Knowing all the technical intricacies for every business service
  • Knowing the underlying configuration items (CI’s) that supported each technical service
  • Monitoring the performance and status of every CI
  • Every time the configuration of the IT estate changed I needed to know the impact that this would have on the business service
  • Historically, in an ideal world

    To help with our role, we deployed an application discovery and dependency-mapping tool, which continually monitored the topology of our estate. This tool populated our configuration database (CMDB) with all the changes, and also reconciled them to the approved state of the estate, informing us on any deviations.

    We implemented monitoring tools on all of the CI’s to ensure proper performance. To help us receive proper notifications, we configured the tools to alert us any time there was a performance issue with any of the CI’s — in theory updating the technical and business services. The IT service owner would then confirm the service was restored and create a problem record.

    Once the problem record was created, the IT team would analyze and look for the root cause of the issue and create (or log) and error. This ideal procedure would foster a balanced IT situation within the bank. However, the situation was anything but ideal.

    In principle, this all seems relatively simple but the maintenance and manual control of the environment was unachievable.

  • The CMDB was not updated accurately
  • The alerting system was not continuously integrated
  • The technical service were not updated with any changes
  • Often, the root cause analysis was not confirmed
  • It was unlikely the errors were logged
  • Why was this the case, considering we had (in essence) deployed an out of the box ITSM environment based on an ITIL best practice? Simply put, here’s why:

  • Alerting was based on static thresholds
  • The estate changed rapidly and we couldn’t model the CMBD quickly enough
  • Lack of dynamic baselines resulted in inaccurate alert storms and an impossible root cause analysis
  • Without knowing the root cause, we couldn’t correctly log the errors
  • No changes were made without authenticating the errors
  • How AppDynamics helps IT

    Don’t get me wrong, we weren’t completely incompetent, we still had manual governance and control over the critical business process and service. However, all we had was a state of the art ITSM solution adhering to an ITIL best practice, and we went about our day jobs in pretty much the same way as we had before. Like having a Ferrari sitting in your garage collecting dust.

    So this brings me back to where I am today, working at AppDynamics and a little smarter than I was 10 years ago. With AppDynamics:

  • Monitor business transactions at a code level
  • Provide a continuously updated topology of the business service
  • Receive alerts based on dynamical baselines
  • Using the AppDynamics flow map, update the business and technical services to improve the overall quality
  • Easily see the root cause within the environment
  • Update the problem records in a service management toolset
  • If we had had AppDynamics at the bank our lives would have been much easier and the bank would be performing optimally, instead of the bottleneck and broken flow we had mapped out.

    This is the benefit of next generation application intelligence tools. They make the important measurable, not the measurable important. Please check out my white paper on dynamic ticketing with our integration with ServiceNow here.

    Keith O'Kelly
    Keith is a Tech Evangelist located in Europe

    Thank you! Your submission has been received!

    Oops! Something went wrong while submitting the form