Speaker: Leonid Igolnik, VP of Engineering at Taleo
I’m Leonard Igolnik, and here at Taleo I’m responsible for the engineering of our Small- to Mid-Size Business (SMB) product line. My responsibilities include software engineering, quality assurance and product operations.
Taleo is one of the largest Software-as-a-Service vendors in the world today. We provide talent management applications to 4,000 customers. The Taleo Business Edition application processes about 5-6 million transactions a day with an average response time of sub-500 milliseconds, which is part of our Service Level Agreement (SLA). A lot of those transactions are applicants reviewing career websites and applying for jobs. When companies are hiring it’s very important for them to attract the right talent and we make sure the system is available to them.
Some of the challenges we’re facing as the product line as well as the engineering organization are about the growth rate. Over the last three years I’ve been here with the company and with the product, we’ve grown from about 790 customers to 4,000 customers. With this growth rate we have to make sure we continue serving our customers and deliver the quality of service that’s expected of us. Maintaining application performance and continuing to scale the application are obviously some of the areas we’re always looking at seriously.
If you’re running a large-scale Java application with multiple components, multiple tiers, frequent release cycles, and if things like being able to troubleshoot problems quickly are important to you, I think this tool is great. Being in the Software-as-a-Service field we obviously have to support our support team, and when escalations come in it’s important for me to be able to look at the problems they’re reporting and quickly get to the root cause so we can fix the software and put the patch back into production.
One of the interesting things we’ve seen so far in the product was the ability to respond to escalations quicker. For example, a few weeks back one of the key partners in our ecosystem, which is an extremely important part of our business, escalated an API performance issue to us. Using AppDynamics we were able to take the escalation on Tuesday, look at the metrics, identify the root cause Wednesday, and have the patch in production Thursday, and for me to roll into the office Friday morning, look at my tool and confirm that the problem was actually solved before I get on the phone call with the partner was absolutely awesome.
I think what we found interesting was that different teams that different teams that work for me found different uses for the product and see different value in them. For example the Operations team enjoys the Metrics Browser and some of the capabilities of collecting custom metrics and overlaying them on top of the standard metrics the tool already collects. Things like CPU versus Business Transactions, or Number of Logged-in Users versus Memory. Those are all important to us because we have a high-volume operation and we can use those for capacity planning as well as troubleshooting. On the other hand, the software engineering team likes the fact that we get actionable intelligence out of the tool in the Request Snapshots. We can look at particular Business Transactions that may not be performing well and very quickly understand where the time is lost and whether it’s the SQL or the Java code. It’s the dream you always have as a software engineer of running your profiler in your production application without degrading the performance 50% like a typical profiler does.
As a large-scale SaaS vendor, we have been collecting a lot of metrics for a while now. We have logs and custom tools we’ve built around our products to look at them. What we really liked about AppDynamics is that it’s a tool that brings all those metrics into one place. It allows us to make sense of the madness of gigabytes and gigabytes of logs that our systems produce every day.