Archive for March 2010

Jyoti Bansal

Monitoring versus Management

Many people are confused by the terms “monitoring” and “management.”  This is because only application monitoring solutions have typically been available in the marketplace, so that’s what people are used to seeing.  If a solution comes along that calls itself “management,” buyers don’t necessarily know the difference.

AppDynamics has embraced the APM label, but we’re aware that people may consider us a monitoring solution. And that’s fine–if they’re looking for that, they’ll get that.  Plus the bonus package.

So what’s the difference?

A monitoring solution collects data points from hardware/software systems and displays it. If you’re monitoring 100 systems and each system has 1,000 metrics, the tool will collect and display those 100,000 metrics.  Then, it’s up to the user to look at the data manually, try to find if problems are occurring, and determine what might be the root cause.  It’s possible to set up alerts that send notifications when a certain metric crosses a threshold.

In short–you get a lot of data. But you have to piece together the data yourself, figure out what story it’s trying to tell, and then take your own action.  It’s useful–but only as much as getting a ride from a friend, who drops you off a mile from your destination and forces you to walk the rest of the way.

Tech blogger Lori MacVittie writes on the subject:

“For a very long time now APM (Application Performance Management) has been a misnomer. It’s always really been application performance monitoring, with very little management occurring outside of triggering manual processes requiring the attention of operators and developers…it has rarely been the case that APM solutions have really been about, well, managing application performance. Certainly they’ve attempted to provide the data necessary to manually manage applications and do an excellent job of correlation and even in some cases deep trouble-shooting of root-cause performance problems. But real-time, dynamic, on-demand performance management? Not so much.”

A performance management solution closes that gap.  It collects data points from hardware/software systems, analyzes them intelligently, and proactively identifies the problems before they impact business users.  It identifies the level and nature of the business impact, pinpoints the root cause of the problem for rapid resolution, and simplifies or even automates remediation.

For example, let’s say your application needs extra resources during peak loads.  A true APM solution will know your application’s historical performance, and it will know when to provision additional resources.  In this instance, the tool is acting on your behalf, helping manage the application without manual intervention.

But application management also works when it helps warn users ahead of time that something bad is about to happen, allowing proactive remediation.  For example, by knowing exactly when performance deviates from the standard baseline, it can alert the user when memory leaks threaten to bring down a machine. That allows the application owner to create a workflow that directs traffic away from the machine, re-starts the machine, and then brings traffic back to the machine.

Knowing where dangers lie ahead of time–and having a clear path to not only root cause analysis, but root cause problem resolution–is a far cry from a typical monitoring solution, which flashes an array of confusing alerts when it’s already too late to avoid hitting the iceberg.

There’s nothing wrong with monitoring.  But until it’s matched with true performance management, it’s going to offer extremely limited utility to the application owner.

Link to this post:

, ,

Jyoti Bansal

The Three Faces of the Hybrid Cloud

Most people, thankfully, have acknowledged that the cloud is real.  For groups such as development and testing, it’s not only real, but highly successful; we’re seeing our customers and prospects use the public clouds, or build and deploy private clouds at a much more rapid pace than even a year ago. The real question is this: when will companies starting deploying their complex production environments in the cloud?

Some pundits believe that enterprises will ditch their data centers and move their complex production applications completely into the cloud.  But this is not pragmatic.  We have talked to hundreds of enterprises and the people responsible for their complex production applications.  From these conversations, we have gained a sense of the overarching strategy that many companies are adopting.What we are seeing is a recurring scenario where enterprises move production environments to the cloud–but this migration involves a physical-cloud “hybrid” approach instead of a “pure cloud” approach.

Here are the hybrid approaches that I believe we will begin to see in abundance:

1.  Hybrid Physical-Cloud SOA: The business decides that it needs quick time-to-market for some new functionality, but IT folks can’t procure new machines quickly enough to meet the deadline. The business groups then decide to use the cloud for capacity for the new functionality. These new services in the cloud communicate to and from the existing services and functionality in the data center.

2. Cloud Bursting: Companies using cloud for temporary capacity in case of a spike.  For example, let’s say an eCommerce greeting card company needs 100 machines on Mother’s Day, but doesn’t want to buy them–so they get that capacity temporarily in the cloud.

3.  Cloud Failover: Using the cloud for failover and disaster recovery

We are definitely seeing companies begin to move services to the cloud, but they don’t want to risk critical production apps responsible for $2 billion in revenue. For now, hybrid is the way to go.

But only for now…

Link to this post:

, ,

The debate over who owns an application in production continues to unfurl.  What I’ve found interesting is, after a period of time where writers and bloggers were looking towards IT Operations to be application owners, I’ve started to detect a bit of a backlash.

The arguments for IT Operations owning the application is simple:

  • Development should stay focused on creating new applications and adding features, not maintaining what’s already in production
  • If you ask a developer, they’ll always say they’d rather spend their time on innovation, rather than production support or bug fixes
  • IT Operations generally oversees the production infrastructure (servers, storage, networks etc), so they are the natural caretakers of production applications as well

Analysts often point out that this is easier said that done: what is sometimes called the “required cooperation” between operations and development is often difficult to obtain.  I’ve also seen it suggested that putting production applications in the hands of operations is a Utopian dream, leading to performance issues and SLA violations.

If I were to describe my own view of “natural selection” in regards to managing application performance, it would be more along the lines of collaboration. The development team is likely to help support applications as long as IT Operations is able to provide them with the information and data they need to fix root-cause issues quickly.  Because much development is now done in agile environments, this sort of teamwork is becoming less of a philosophical choice and more of a business necessity.

If you look through blogs and Twitter, you’ll find some interesting grassroots movements such as #devops and #agileoperations.  These are communities forming that acknowledge the need to break down the traditional walls that exist between Dev and Ops, and radically restructure those relationships so that they are focused on shared goals and outcomes.

One devops proponent, James Turnbull at Kartar.net, explains the problem:

“So … why should we merge or bring together the two realms?  Well there are lots of reasons but first and foremost because what we’re doing now is broken.  Really, really broken.  In many shops the relationship between development (or engineering) and operations is dysfunctional to the point of occasional toxicity.”

(I love the phrase “occasional toxicity”…)

He goes on to add:

“DevOps is all about trying to avoid that epic failure and working smarter and more efficiently at the same time. It is a framework of ideas and principles designed to foster cooperation, learning and coordination between development and operational groups. In a DevOps environment, developers and sysadmins build relationships, processes, and tools that allow them to better interact and ultimately better service the customer.

“DevOps is also more than just software deployment – it’s a whole new way of thinking about cooperation and coordination between the people who make the software and the people who run it.  Areas like automation, monitoring, capacity planning & performance, backup & recovery, security, networking and provisioning can all benefit from using a DevOps model to enhance the nature and quality of interactions between development and operations teams.”

I believe that the question that comes naturally out of these conversations is this: does IT operations have the tools they need to facilitate this collaboration with their peers in development?  Traditionally, they haven’t.  The tools either didn’t provide much deep visibility into the application, or when they did provide deep visibility, they were extremely complicated for IT Operations to be able to understand and use. But creating those tools, and encouraging that collaboration, is one of my own company’s guiding principles.

Applications are becoming more complex and distributed, and development is increasingly taking place in the context of agile release cycles.  So really, the question isn’t “who owns the app”–but how best to foster the collaborative process that enables dev and ops to both build out applications and resolve their performance problems, and to do so in record time.

Link to this post:

, , , , ,