Despite a substantial investment in more than 50 monitoring tools, there was little insight into what was being monitored or why an application or service was experiencing a problem. As a result, application issues tended to set off a cascade of finger-pointing.
Michael Makar, the senior IT manager in charge of enterprise monitoring and performance testing, remembered what it was like, “Executive management would get contacted by vice presidents or a country office director complaining that they couldn’t use a mobile approval application, or they were locked out of an HR application. Applications and services were down all the time. There was a lack of visibility across the different teams, and there was no management reporting. The outage calls could take days and it could take weeks for a root cause analysis to identify an issue with an application.” Without an effective way of communicating about performance issues or resolving them, a culture of blame took root.