TAG | application monitoring

On Wednesday I delivered a keynote at WJAX in Munich. Everything went really well, but I was a little shocked at the response I got when I asked the audience “How many of you monitor the performance of your apps in production?” As I scanned the audience, I counted 9 out of ~950 developers had put their hands up, meaning about 1% had visibility of how their applications actually performed in production. I know what you’re thinking: “But isn’t application performance in production the responsibility of Operations?”  Well, it is and it isn’t. Most organizations think that when an application has an issue, it’s related to the infrastructure it runs on. That’s like saying when a car crashes, it’s because a part failed on the car whereas in actual fact most accidents are caused by the driver. Yes, hardware fails occasionally, but application logic and configuration drives how infrastructure resource is used, which is why most issues today occur when new code is deployed in production.

Read the Full Post…

Link to this post:

, , , , , , , , , ,

The majority of us in IT are specialists, with the exception of a few VPs of engineering who are “special” in their own “special” world of being “special.” What I mean by this is that no single person has the skills or experience to do everything well in IT. IT is too big for me to explain or summarize in a few words, other than it requires a lot of different people with different skills to make it tick along. Despite applications being the living breathing entities of the business, a large portion of folk in IT have little context of how applications are built, how they execute, and how they consume resource across the IT infrastructure. Many people simply don’t care as their responsibilities are completely void of anything application related. That’s fine–but the reality is that everyone in IT should have one eye on the business. The whole reason IT exists is so the business can be more competitive and make more money. If this happens, IT gets more budget and is allowed to innovate more. IT and the business need each other to survive, which is why when applications slow down or break, both parties bitch at each other.

Operations need better visibility

Unfortunately for both the business and IT, the people (Operations) who manage the performance and availability of applications in production aren’t application experts. They are also not stupid either; their skills sets are wide and broad across many technologies and platforms that underpin applications. They manage a lot of things that application developers take for granted, like networks, databases, storage and virtualization. While Operations monitor the health of these infrastructure components, they often get bombarded with crap from the business when end users and business transactions are being impacted by slow performance, despite all system monitoring showing everything is fine. This lack of understanding between the Business and Operations is because both parties see things from different perspectives.

Read the Full Post…

Link to this post:

, , , , , , , , , , , , , ,

What happens when mission critical Java applications slow down or keep crashing in production? The vast majority of IT Operations (Ops) today bury their heads in log files. Why? because thats what they’ve been doing since IBM invented the mainframe. Diving into the weeds feels good, everyone feels productive looking at log entries, hoping that one will eventually explain the unexplainable. IT Ops may also look at system and network metrics which tell them how server resource and network bandwidth is being consumed. Again, looking at lots of metrics feels good but what is causing those server and network metrics to change in the first place? Answer: the application.

IT Ops monitor the infrastructure that applications run on, but they lack visibility of how applications actually work and utilize the infrastructure. To get this visibility, Ops must monitor the application run-time. A quick way to get started is to use the free tools that come with the application run-time. In the case of Java applications, both JConsole and VisualVM ship with the standard SDK and have proved popular choices for monitoring Java applications. When we built AppDynamics Lite we felt their was a void of free application monitoring solutions for IT Ops, the market had plenty of tools aimed at developers but many were just too verbose and intrusive for IT Ops to use in production. If we take a look at how JConsole, VisualVM and AppDynamics Lite compare, we’ll see just how different free application monitoring solutions can be.

Read the Full Post…

Link to this post:

, , , , , , , , , , ,

Steve Waterworth

Agent Intelligence

How intelligent is your monitoring agent?

The agent should not do too much processing locally to ensure minimal impact to application performance by utilizing the smallest CPU and memory footprint possible. On the other hand, offloading some processing to the agent results in less network traffic and more scalability from the monitoring Mgmt Server.

Read the Full Post…

Link to this post:

, , , , , , , , , , , , , ,

One of our partners recently outlined a logical framework for evaluating Application Performance Management (APM) vendors to help prospects and customers choose the right APM solution for their needs.

Below is an extract along with a link to the full article:

“We work with a lot of customers helping them with performance and scalability of their mission-critical web applications. One of the common requests we hear, is to have a logical framework around how to evaluate the different commercial Application Performance Management products in the market. We have a very easy strategy with our tools and partners at codecentric: We are always partner with the best available solution in the market! We have been a partner with Quest Software (PerformaSure, Foglight), then Wily (Introscope) until they were acquired by CA, then dynaTrace and now AppDynamics is our main partner for APM solutions. We have a lot of knowledge on all of these tools and competition and try to be as neutral a possible. With this article I will try to give my view of how to compare APM solutions for production environments and also my view on how some of the available solutions compare.”

Read the full post by  at Code-Centric:

http://blog.codecentric.de/en/comparing-appdynamics-vs-dynatrace-ca-wily-precise-and-hp/

Link to this post:

, , , , , , , , , ,

The Amazon AWS outage has cast questions as to whether AWS (and the cloud in general) is ready for hosting revenue-critical production applications. The outage lasted for more than a day for many popular sites like Reddit and Zuora, and it raised many doubts about cloud computing.

But before we write off the cloud, let’s review a few lessons we can learn from this outage.

Some survived, many did not
The number one lesson to learn is that not EVERY application running in AWS died. Netflix, one of the biggest web apps running in AWS, survived the outage without any issues while sites like Reddit and Zuora crashed for more than a day. So why is it that some survived and many did not? It’s simply because many of these companies forgot that cloud is not a magical solution to everything, and you still have to remember to implement the architectural techniques that have been perfected for years in the physical data center world as you move in the cloud world.

Read the Full Post…

Link to this post:

, , ,

Greg Howard

The Power of the Business Transaction

In our last post, we talked about the importance of business transactions for applications in the cloud. They’re also crucial for managing highly distributed applications. But what is a business transaction?
Consider a business transaction to be a user-generated action within your system. The best practice for determining the performance of your application isn’t to measure CPU usage, but to track the flow of a transaction that your customer, the end user, has requested.

Read the Full Post…

Link to this post:

, ,

Greg Howard

15,000 Downloaders Can’t Be Wrong

It’s 2011, and AppDynamics got what we wanted as a holiday gift—15,000 downloads of our free java troubleshooting tool, AppDynamics Lite.

To commemorate the 15,000th downloader, we emailed all of our Lite users and offered to send one of our patented “I Take No Crap From My App” T-shirts in exchange for a public screenshot of their installation in action.

Read the Full Post…

Link to this post:

, ,

Greg Howard

Gartner on Application Uptime & Availability

Attendees of the 2010 Gartner Data Center Conference exhibited a lot of interest around Application Performance Management, as evidenced by the packed room where Gartner analyst Donna Scott discussed application availability in her session “Uptime All the Time.”

Read the Full Post…

Link to this post:

, ,

Here’s a guest blog from one of AppDynamics’ international partners, Stefan Zoltai from sysPerform. Stefan wanted to write about how he used AppDynamics to solve a performance problem for a major telecom company in Switzerland—and we said, sure!  Take it away, Stefan…

I’d like to talk about how we used AppDynamics for a major production troubleshooting exercise—and how AppDynamics passed with flying colors.

Swisscom is the leading telecommunications company in Switzerland with about 5.7 million mobile customers and 1.8 million broadband connections. Swisscom is present on the Swiss market with a full portfolio of wireless, wire- and IP-based data and voice-based communication services.

Swisscom’s (Internet) Messaging had engaged sysPerform to assist with the analysis of their Tomcat 6 / Java 1.6 based WebMail application. WebMail has been under scrutiny for about a year now—ever since it manifested both performance and stability problems. Prior analysis efforts, conducted with a number of available tools, did not lead to the determination of the actual root cause(s) since the aforementioned problems only occurred in production under load and could not be reproduced in other environments. WebMail is rated at a throughput of 300 tx/sec.

We realized immediately that without a deep, detailed view into the application’s runtime, in production and under load, we would not be able to determine the actual root cause.

To analyze the application, we selected AppDynamics’ application performance management solution.  Since this solution has been developed specifically for high throughput, distributed production environments, we were able to obtain a high-level overview of the application as well as conduct a deep root cause analysis down to code-level execution without generating measurable overhead. Again, we did all of this at 300 transactions per second of throughput.

Thanks to AppDynamics’ ability to create a dynamic baseline of application performance, we were able to isolate the major bottlenecks on the first day and discuss a solution with the developers at Swisscom.  We were able to quickly learn the application’s performance and stability characteristics — and after only 5 days of development, we deployed a specific, major fix to address the main issue and massively improve performance.  At the moment, we are continuing our analysis efforts since stability and performance are the focus of an ongoing quality process.

[UPDATE: For Swisscom's perspective on the use of AppDynamics, check out Mika Borner's blog]

This example clearly demonstrates that operating a modern, distributed application without an adequate monitoring solution is effectively the same as “flying blind.” 60%-80% of all performance problems are caused by the application itself, and need to be analyzed from the inside out. We can confirm these numbers from many of other engagements with similar customers. External causes like hardware or network issues have become increasingly rare; it’s the problems deep inside the application that truly matter.

Intelligent application performance management however is not a means to itself, but must be evaluated in terms of economical considerations as well. Our experience indicates that an APM solution shows an ROI within just a few months. Among the reasons for such a quick ROI is the aforementioned extremely fast root cause analysis.

If you’re reading this in Switzerland, feel free to contact me with questions!

– Stefan Zoltai, Founder, SysPerform GmbH

Email: sz@sysperform.ch

Twitter: http://twitter.com/SysPerform

Link to this post:

, , , ,

<< Latest posts

Older posts >>