The reason for this blog is purely down to a real-life incident which one of our e-commerce customers shared with us this week. It’s based around a use case that pretty much anyone can relate to – the moment your checkout transaction spectacularly fails. You sit there, looking at a big fat error message and think “WTF – did my transaction complete or did the company steal my money?” A minute later you’re walking a support team through exactly what happened: “I just clicked Checkout and got an error…honestly…I waited and never got a response.”
What’s different in this story is that the support team had access to AppDynamics as they were talking to a customer on the phone…and the customer got to find out the real reason their checkout failed. How often does that happen? Never, until now. Here is the story as documented by the customer.
“Here is the complete application map of the e-commerce website that the application support team manages. Looks pretty complex, right?
End User calls Help desk
At around 10.45 a.m. on Monday, 15th January, 2013 a customer (end user) calls the e-commerce help desk to report that their Checkout transaction just failed. With the customer on the phone, our support engineer logs in to AppDynamics and checks out the health of the Checkout business transaction during the time period and sees the following:
Our engineer noticed 53 errors relating to the Checkout transaction, so he drilled down to take a look at those erroneous requests.
Finding the customers Checkout transaction
Our engineer was then able to filter these checkout errors by the customer’s username ROB*******, which then showed the exact Checkout transaction that had a problem.
Isolating the Checkout latency
And here is exactly how the customer’s checkout transaction executed across our infrastructure:
Almost instantly, you can see the bottleneck related to our remote service provider PayPal. We then drilled into this transaction to get specific details on the transaction:
In the above screenshot, you can see the error message “Read timed out” on the CheckoutBillingDA invocation. You can also see the User Principal matches the details of the customer on the phone.
Finding the root cause of the failed Checkout
One click at the transaction call graph and we confirmed the root cause (circled below), which was related to a remote HTTP call to PayPal which timed out.
Case closed. Obviously the customer’s payment on this occasion was not completed/authorized, so we asked them to re-try. In the meantime we’ll be reaching out to our payment service provider to understand why certain requests are timing out.”
Having spent a year on application support for a leading European travel portal earlier on in my career, I can tell you it isn’t fun dealing with end user complaints. You can check log files, infrastructure metrics and ask every man and his dog (normally the DBA), but more often than not you have no idea why end user transactions are slow or time out. With Application Performance Management (APM) solutions like AppDynamics, finally application support teams have visibility into their end user experience and application performance.