Everyday in our life we rely on services provided by other people. Making a phone call, getting a car fixed, or ordering a pizza – and yet we want those things to happen as quickly as possible, because time often means money. If you take your car to a Mercedes dealer, you will understand this point better than anyone. Our productivity (and often happiness) is therefore controlled, everyday, by different organizations and people. When things slow down or don’t happen we get upset, frustrated, and sometimes rant on twitter like these folk:
If your application today has SOA design principles, is heavily distributed and relies on 3rd party service providers, then you’ve probably become frustrated at some point when your application slows down or crashes. The problem is this: your end user experience and quality of service (QoS) is only as good as the QoS of your service providers. So, unless you monitor QoS you can’t measure QoS–and if you can’t measure QoS, you can’t manage your service providers and your end user experience. For example, take a look at this customer e-commerce application which has 7 JVM’s, 1 database and 7 external web service providers:
This customer recently had a slowdown with their e-commerce production application. After a few minutes browsing AppDynamics, they successfully identified that one of their web service providers was having latency issues (AppDynamics automatically baselines performance and flags deviations for each web service provider as shown in the above screenshot). The customer called their service provider, and sure enough the service provider admitted to having issues. A few hours later the service provider called back and said “we fixed the problem, everything should be back to normal”–yet the customer could clearly see latency issues still occurring in AppDynamics. So they sent their service provider a screenshot showing the evidence. The service provider then checked again, and called back a few minutes later saying “Yes, sorry a few customers are still being impacted.” Without this level of visibility, many organizations are simply blind to how external service providers impact their end user experience and business.
Being able to troubleshoot slow performance in minutes is helpful, but what about being able to report the exact service level you receive–say, from each of your service providers over a period of time? Did your service improve over time or did it regress? How many outages or severity 1 incidents did your service providers cause this week for your application?
Take the below screenshot from AppDynamics, which plots the maximum response time for five different web services consumed by an application over the last week. You can see that three out of the five web services (denoted by pink, blue and turquoise lines) consistently deliver sub-second response times and provide a great service level. However, the other two web services (red and green lines) show performance spikes with response times of between 14 and 22 seconds. The green web service in particular is very inconsistent and shows several performance spikes in two days.
Below is the response time of another web service (PayPal) for a customer application over the last 3 months. Notice the spikes in response time and look at the deviation between average and maximum response time over the time period. What’s impressive is that despite the occasional service blip, the PayPal service has slowly improved by 14% from 450 milliseconds to around 385 milliseconds. It’s also been very stable the last few weeks, along with having a consistent service (small deviation from average and maximum response time).
If your application relies on one or more 3rd party web services, you should periodically check and report what level of service you are receiving each week. That way, you can truly understand your service provider QoS and its impact on your end user experience and application performance. You can also keep your service providers honest, with complete visibility of whether QoS is improving or degrading over time as service outages occur and are fixed.
The next time you experience a slow down or outage in your application, you should first check external web services before you start to troubleshoot your own. The last thing you want to be doing is debugging your own code, when it could be someone else’s service and code that is causing the issue. Using AppDynamics it’s possible to monitor, measure, and manage the QoS from each of your web service providers. You can get started right now by downloading AppDynamics Lite (our free edition) for a single JVM or IIS web server, or you can request a 30-day trial of AppDynamics Pro (our commercial edition) for Java or .NET applications with multiple JVMs and IIS web servers.