TAG | business transaction

Greg Howard

The Power of the Business Transaction

In our last post, we talked about the importance of business transactions for applications in the cloud. They’re also crucial for managing highly distributed applications. But what is a business transaction?
Consider a business transaction to be a user-generated action within your system. The best practice for determining the performance of your application isn’t to measure CPU usage, but to track the flow of a transaction that your customer, the end user, has requested.

Read the Full Post…

Link to this post:

, ,

Greg Howard

A Holiday Greeting in Verse

‘Twas nearly 2011, a brand-new year
But app owners stared sadly at their beer.

They had a problem, and it seemed just enormous
They craved a surefire solution to application performance.

Read the Full Post…

Link to this post:

, , ,

Here’s a guest blog from one of AppDynamics’ international partners, Stefan Zoltai from sysPerform. Stefan wanted to write about how he used AppDynamics to solve a performance problem for a major telecom company in Switzerland—and we said, sure!  Take it away, Stefan…

I’d like to talk about how we used AppDynamics for a major production troubleshooting exercise—and how AppDynamics passed with flying colors.

Swisscom is the leading telecommunications company in Switzerland with about 5.7 million mobile customers and 1.8 million broadband connections. Swisscom is present on the Swiss market with a full portfolio of wireless, wire- and IP-based data and voice-based communication services.

Swisscom’s (Internet) Messaging had engaged sysPerform to assist with the analysis of their Tomcat 6 / Java 1.6 based WebMail application. WebMail has been under scrutiny for about a year now—ever since it manifested both performance and stability problems. Prior analysis efforts, conducted with a number of available tools, did not lead to the determination of the actual root cause(s) since the aforementioned problems only occurred in production under load and could not be reproduced in other environments. WebMail is rated at a throughput of 300 tx/sec.

We realized immediately that without a deep, detailed view into the application’s runtime, in production and under load, we would not be able to determine the actual root cause.

To analyze the application, we selected AppDynamics’ application performance management solution.  Since this solution has been developed specifically for high throughput, distributed production environments, we were able to obtain a high-level overview of the application as well as conduct a deep root cause analysis down to code-level execution without generating measurable overhead. Again, we did all of this at 300 transactions per second of throughput.

Thanks to AppDynamics’ ability to create a dynamic baseline of application performance, we were able to isolate the major bottlenecks on the first day and discuss a solution with the developers at Swisscom.  We were able to quickly learn the application’s performance and stability characteristics — and after only 5 days of development, we deployed a specific, major fix to address the main issue and massively improve performance.  At the moment, we are continuing our analysis efforts since stability and performance are the focus of an ongoing quality process.

[UPDATE: For Swisscom's perspective on the use of AppDynamics, check out Mika Borner's blog]

This example clearly demonstrates that operating a modern, distributed application without an adequate monitoring solution is effectively the same as “flying blind.” 60%-80% of all performance problems are caused by the application itself, and need to be analyzed from the inside out. We can confirm these numbers from many of other engagements with similar customers. External causes like hardware or network issues have become increasingly rare; it’s the problems deep inside the application that truly matter.

Intelligent application performance management however is not a means to itself, but must be evaluated in terms of economical considerations as well. Our experience indicates that an APM solution shows an ROI within just a few months. Among the reasons for such a quick ROI is the aforementioned extremely fast root cause analysis.

If you’re reading this in Switzerland, feel free to contact me with questions!

– Stefan Zoltai, Founder, SysPerform GmbH

Email: sz@sysperform.ch

Twitter: http://twitter.com/SysPerform

Link to this post:

, , , ,

In my last post, I wrote about how using Business Transactions as a management unit is critical for managing modern-day applications efficiently. Sticking to this train of thought, I will focus on how this applies to various aspects of application management. The first area I want to cover is monitoring and troubleshooting.

In a highly distributed system, there are 100s of CPUs, 100s of JVMs/CLRs, and millions of lines of code running. Now, if you want to attach service levels to every component of that system, you would either be eyeballing dashboards most of the time or trying to maintaining the configuration associated with alerting.

As I mentioned in the last post, if the business grows, it will require more capacity and more infrastructure—which means newer pieces (and ones that are moving around rapidly). Add the configuration attached with lines of code and you can pick either “daunting” or “impossible” to describe the task at hand. Long testing and staging cycles have become a thing of the past.

At the same time, the DevOps tribe is adapting to and embracing the new application landscape rapidly. Their biggest need is the need for speed, which translates into efficiency driven by intelligence in every aspect of application management in production. This is where using Business Transactions can make monitoring more efficient and easier to accomplish.

But wait a minute – am I really suggesting that you watch Business Transaction service levels (which are your business’ bottom line) instead of getting overwhelmed with alerts based on 1000s of key metrics? Doesn’t that mean you are ignoring things that can be going wrong by not giving them enough attention?

Au contraire! In fact, you can actually focus on more with this approach versus traditional monitoring techniques. Let me explain.

Traditional monitoring has all been about averages. You look at some service or some method and its average over time, then set up alerts associated with it. Doing so is great for catching slow performance degradation or systemic outages. But it won’t help you catch outliers where there is no pattern associated with errors or slow requests.

Let’s look at a couple of examples to understand this better. For brevity I will talk only about slow requests (but the same argument can also be applied to errors).

1) A frequent cause of slow requests, resulting in a poor user experience, pertains to a transaction associated with user input.  For example, in a shopping cart application, the user might add a particular item to his or her cart—which results in the application slowing down.

2) Here’s another one. Sometimes one particular node, which is part of a big cluster of say 50 nodes, has an issue in servicing requests but might be doing ok on CPU and memory usage.

In both of these cases, averages would hide the problem. Here’s an example.

Scenario
Over a period of time, an online checkout application experienced 5,000 Total Checkouts. Of those checkouts, 4,800 were Normal Checkouts and 200 were Slow Checkouts.

In cases like these, it is very likely that the average response time of the normal transaction—along with that of the bad transactions—averages out to a very normal rate. But the bottom line is that 200 users had a bad checkout, and that needs to be fixed. By focusing on business transactions only and reducing the points of monitoring, you are actually able to identify and address more performance concerns than you might have otherwise.

So what exactly are we watching here?

a) Response time for a transaction – this is averaged over all requests for this transaction. We don’t need to watch the key methods being executed in the request since the overall response time for the request is the single indicator. A method-level drill down is needed only when the response time is slow.

b) Number of slow requests over time – If we set up thresholds over the transaction and watch every request, we are able to identify exactly how many requests were outliers. Of course, for this to be useful, the system needs to be able to collect diagnostic information as slow requests happen and not afterwards. This also ensures that when the request is being serviced by a problematic node, it is represented appropriately.

We’re not the only ones who believe that monitoring Business Transactions is critical to better performance in production. The customers and prospects we speak to are serious about having a system in place that can watch all requests instead of just watching averages in their system. It’s becoming a real-world demand with real-world benefits.

The last thing I want to mention before signing off, is that AppDynamics does all of the above. We follow all requests to monitor SLAs—not just averages, but actual numbers—and we do so at an extremely low overhead, enabling us to jump in and get diagnostic data as bad requests happen!

That wasn’t too much of a plug, right? Until next time…

Link to this post:

, , ,