TAG | apm

Last week I flew into Las Vegas for #Interop fully suited and booted in my big blue costume (no joke). I’d been invited to speak in a vendor debate on User eXperience (UX): Monitor the Application or the Network? NetScout represented the Network, AppDynamics (and me) represented the Application, and “Compuware dynaTrace Gomez” sat on the fence representing both. Moderating was Jim Frey from EMA, who did a great job introducing the subject, asking the questions and keeping the debate flowing.

At the start each vendor gave their usual intro and company pitch, followed by their own definition on what User Experience is.

Defining User Experience

So at this point you’d probably expect me to blabber on about how application code and agents are critical for monitoring the UX? Wrong. For me, users experience “Business Transactions”–they don’t experience applications, infrastructure, or networks. When a user complains, they normally say something like “I can’t Login” or “My checkout timed out.” I can honestly say I’ve never heard them say –  ”The CPU utilization on your machine is too high” or “I don’t think you have enough memory allocated.”

Now think about that from a monitoring perspective. Do most organizations today monitor business transactions? Or do they monitor application infrastructure and networks? The truth is the latter, normally with several toolsets. So the question “Monitor the Application or the Network?” is really the wrong question for me. Unless you monitor business transactions, you are never going to understand what your end users actually experience.

Monitoring Business Transactions

So how do you monitor business transactions? The reality is that both Application and Network monitoring tools are capable, but most solutions have been designed not to–just so they provide a more technical view for application developers and network engineers. This is wrong, very wrong and a primary reason why IT never sees what the end user sees or complains about. Today, SOA means applications are more complex and distributed, meaning a single business transaction could traverse multiple applications that potentially share services and infrastructure. If your monitoring solution doesn’t have business transaction context, you’re basically blind to how application infrastructure is impacting your UX.

The debate then switched to how monitoring the UX differs from an application and network perspective. Simply put, application monitoring relies on agents, while network monitoring relies on sniffing network traffic passively. My point here was that you can either monitor user experience with the network or you can manage it with the application. For example, with network monitoring you only see business transactions and the application infrastructure, because you’re monitoring at the network layer. In contrast, with application monitoring you see business transactions, application infrastructure, and the application logic (hence why it’s called application monitoring).

Monitor or Manage the UX?

Both application and network monitoring can identify and isolate UX degradation, because they see how a business transaction executes across the application infrastructure. However, you can only manage UX if you can understand what’s causing the degradation. To do this you need deep visibility into the application run-time and logic (code). Operations telling a Development team that their JVM is responsible for a user experience issue is a bit like Fedex telling a customer their package is lost somewhere in Alaska. Identifying and Isolating pain is useful, but one could argue it’s pointless without being able to manage and resolve the pain (through finding the root cause).

Netscout made the point that with network monitoring you can identify common bottlenecks in the network that are responsible for degrading the UX. I have no doubt you could, but if you look at the most common reason for UX issues, it’s related to change–and if you look at what changes the most, it’s application logic. Why? Because Development and Operations teams want to be agile, so their applications and business remains competitive in the marketplace. Agile release cycles means application logic (code) constantly changes. It’s therefore not unusual for an application to change several times a week, and that’s before you count hotfixes and patches. So if applications change more than the network, then one could argue it’s more effective for monitoring and managing the end user experience.

UX and Web Applications

We then debated which monitoring concept was better for web-based applications. Obviously, network monitoring is able to monitor the UX by sniffing HTTP packets passively, so it’s possible to get granular visibility on QoS in the network and application. However, the recent adoption of Web 2.0 technologies (ajax, GWT, Dojo) means application logic is now moving from the application server to the users browser. This means browser processing time becomes a critical part of the UX. Unfortunately, Network monitoring solutions can’t monitor browser processing latency (because they monitor the network), unlike application monitoring solutions that can use techniques like client-side instrumentation or web-page injection to obtain browser latency for the UX.

The C Word

We then got to the Cloud and which made more sense for monitoring UX. Well, network monitoring solutions are normally hardware appliances which plug direct into a network tap or span port. I’ve never asked, but I’d imagine the guys in Seattle (Amazon) and Redmond (Windows Azure) probably wouldn’t let you wheel a network monitoring appliance into their data-centre. More importantly, why would you need to if you’re already paying someone else to manage your infrastructure and network for you? Moving to the Cloud is about agility, and letting someone else deal with the hardware and pipes so you can focus on making your application and business competitive. It’s actually very easy for application monitoring solutions to monitor UX in the cloud. Agents can piggy back with application code libraries when they’re deployed to the cloud, or cloud providers can embed and provision vendor agents as part of their server builds and provisioning process.

What’s interesting also is that Cloud is highlighting a trend towards DevOps (or NoOps for a few organizations) where Operations become more focused on applications vs infrastructure. As the network and infrastructure becomes abstracted in the Public Cloud, then the focus naturally shifts to the application and deployment of code. For private clouds you’ll still have network Ops and Engineering teams that build and support the Cloud platform, but they wouldn’t be the people who care about user experience. Those people would be the Line of Business or application owners which the UX impacts.

In reality most organizations today already monitor the application infrastructure and network. However, if you want to start monitoring the true UX, you should monitor what your users experience, and that is business transactions. If you can’t see your users’ business transactions, you can’t manage their experience.

What are your thoughts on this?

AppDynamics is an application monitoring solution that helps you monitor business transactions and manage the true user experience. To get started sign-up for a 30-day free trial here.

I did have an hour spare at #Interop after my debate to meet and greet our competitors, before flying back to AppDynamics HQ. It was nice to see many of them meet and greet the APM Caped Crusader.

App Man.

Link to this post:

, , , , , , , , , , , , ,

The most enjoyable part of my job at AppDynamics is to witness and evangelize customer success. What’s slightly strange is that for this to happen, an application has to slow down or crash.

It’s a bittersweet feeling when End Users, Operations, Developers and many Businesses suffer application performance pain. Outages cost the business money, but sometimes they cost people their jobs–which is truly unfortunate. However, when people solve performance issues, they become overnight heroes with a great sense of achievement, pride, and obviously relief.

To explain the complexity of managing application performance, imagine your application is 100 haystacks that represent tiers, and somewhere a needle is hurting your end user experience. It’s your job to find the needle as quickly as possible! The problem is, each haystack has over half a million pieces of hay, and they each represent lines of code in your application. It’s therefore no surprise that organizations can take days or weeks to find the root cause of performance issues in large, complex, distributed production environments.

End User Experience Monitoring, Application Mapping and Transaction profiling will help you identify unhappy users, slow business transactions, and problematic haystacks (tiers) in your application, but they won’t find needles. To do this, you’ll need x-ray visibility inside haystacks to see which pieces of hay (lines of code) are holding the needle (root cause) that is hurting your end users. This X-Ray visibility is known as “Deep Diagnostics” in application monitoring terms, and it represents the difference between isolating performance issues and resolving them.

For example, AppDynamics has great End User Monitoring, Business Transaction Monitoring, Application Flow Maps and very cool analytics all integrated into a single product. They all look and sound great (honestly they do), but they only identify and isolate performance issues to an application tier. This is largely what Business Transaction Management (BTM) and Network Performance Management (NPM) solutions do today. They’ll tell you what and where a business transaction slows down, but they won’t tell you the root cause so you can resolve the issues.

Why Deep Diagnostics for Production Monitoring Matters

A key reason why AppDynamics has become very successful in just a few years is because our Deep Diagnostics, behavioral learning, and analytics technology is 18 months ahead of the nearest vendor. A bold claim? Perhaps, but it’s backed up by bold customer case studies such as Edmunds.com and Karavel, who compared us against some of the top vendors in the application performance management (APM) market in 2011. Yes, End User Monitoring, Application Mapping and Transaction Profiling are important–but these capabilities will only help you isolate performance pain, not resolve it.

AppDynamics has the ability to instantly show the complete code execution and timing of slow user requests or business transactions for any Java or .NET application, in production, with incredibly small overhead and no configuration. We basically give customers a metal detector and X-Ray vision to help them find needles in haystacks. Locating the exact line of code responsible for a performance issue means Operations and Developers solve business pain faster, and this is a key reason why AppDynamics technology is disrupting the market.

Below is a small collection of needles that customers found using AppDynamics in production. The simple fact is that complete code visibility allows customers to troubleshoot in minutes as opposed to days and weeks. Monitoring with blind spots and configuring instrumentation are a thing of the past with AppDynamics.

Needle #1 – Slow SQL Statement

Industry: Education
Pain: Key Business Transaction with 5 sec response times
Root Cause: Slow JDBC query with full-table scan

Needle #2 – Slice of Death in Cassandra

Industry: SaaS Provider
Pain: Key Business Transaction with 2.5 sec response times
Root Cause: Slow Thrift query in Cassandra

Needle #3 – Slow & Chatty Web Service Calls

Industry: Media
Pain: Several Business Transactions with 2.5 min response times
Root Cause: Excessive Web Service Invocation (5+ per trx)

Needle #4 -Extreme XML processing

Industry: Retail/E-Commerce
Pain: Key Business Transaction with 17 sec response times
Root Cause: XML serialization over the wire.

Needle #5 – Mail Server Connectivity

Industry: Retail/E-Commerce
Pain: Key Business Transaction with 20 sec response times
Root Cause: Slow Mail Server Connectivity

 Needle #6 – Slow ResultSet Iteration

Industry: Retail/E-Commerce
Pain: Several Business Transactions with 30+ sec response times
Root Cause: Querying too much data

Needle #7 – Slow Security 3rd Party Framework

Industry: Education
Pain: All Business Transactions with > 3 sec response times
Root Cause: Slow 3rd party code

Needle #8 – Excessive SQL Queries

Industry: Education
Pain: Key Business Transactions with 2 min response times
Root Cause: Thousands of SQL queries per transaction

Needle #9 – Commit Happy

Industry: Retail/E-Commerce
Pain: Several Business Transactions with 25+ sec response times
Root Cause: Unnecessary use of commits and transaction management.

Needle #10 – Locking under Concurrency

Industry: Retail/E-Commerce
Pain: Several Business Transactions with 5+ sec response times
Root Cause: Non-Thread safe cache forces locking for read/write consistency

 Needle #11 – Slow 3rd Party Search Service

Industry: SaaS Provider
Pain: Key Business Transaction with 2+ min response times
Root Cause: Slow 3rd Party code

 Needle #12 – Connection Pool Exhaustion

Industry: Financial Services
Pain: Several Business Transactions with 7+ sec response times
Root Cause: DB Connection Pool Exhaustion caused by excessive connection pool invocation & queries

Needle #13 – Excessive Cache Usage

Industry: Retail/E-Commerce
Pain: Several Business Transactions with 50+ sec response times
Root Cause: Cache Sizing & Configuration

If you want to manage and troubleshoot application performance in production, you should seriously consider AppDynamics. We’re the fastest growing on-premise and SaaS based APM vendor in the market right now. You can download our free product AppDynamics Lite or take a free 30-day trial of AppDynamics Pro – our commercial product.

Now go find those needles that are hurting your end users!

App Man.

Link to this post:

, , , , , , , , , , , , , , , , ,

Michael Shinn

Spot the Difference in AppDynamics 3.4

Okay, admit it. The title Spot the Difference in AppDynamics 3.4 caught your attention, especially after we just recently announced our free End User Monitoring features, and now you’re here to find out what other insanely cool stuff we’ve been working on to enhance the experience for APM aficionados, customers and people like you! I’m proud to say that AppDynamics continues to innovate by leaps and bounds which enables our customers to be more successful in how they manage application performance and availability.

Here’s an example of us staying ahead of the curve. I was scouring Twitter feeds several weeks back and found this tweet from a Java applications guy,


Hmmmm, a proverbial question indeed.

So what exactly changes with an application release?

Flashback to 2008…I ran into a painfully similar situation back at LG. We had a team of consultants working on Java performance optimizations for eight months, executing test cases, refactoring application code, disabling trigger objects and even redesigning some of the use case workflows. Well, it turns out the build and deployment team pushed the entire application onto a completely different set of infrastructure, and when the new test results came back we almost choked when we saw 20-30% regressions in application performance. That night the performance team drank itself into oblivion, and you can imagine what happened next.

For the next two weeks everyone on the project was now heads down trying to identify what changes caused such a massive slowdown to the system. Remember, we didn’t have an APM solution like AppDynamics, and had to resort to using four different performance monitoring tools, since we didn’t have a single solution that provided us with complete end-to-end visibility. Those were tumultuous times when Ops, Dev and IT teams engaged in heated arguments about who was at fault.

All of this could have been easily avoided if we had something like AppDynamics 3.4 Agile Release Analysis to compare application releases visually. Having this type of comparative analysis capability comes in handy when you’re trying to understand why a user login transaction, for instance, is slower with your latest agile release than what it was a week ago. You could always conduct code diffs on your releases, but sometimes it’s incredibly valuable to be able to visualize performance infrastructure changes at a glance rather than having to spend hours or even days manually verifying what differences actually occurred.

Agile Release Analysis

AppDynamics release analysis rolls up application performance metrics so you can compare KPIs of not only the application, but also individual business transactions, as well as their flow and execution. The application might’ve have gotten slower, but being able to identify which piece of the overall application or specific transaction is affecting performance is much more useful to operations and IT.

Speaking of comparing Application Flow Maps, in order for this feature to be truly tenable, we needed to improve the visibility challenges customers were facing with our original flow map. We had the right idea, but it was too rigid. Monitoring several thousand tiers became blinding for some customers, leaving App Ops with a cluttered forest, and no trees. So to stay true to our company mission with offering customers with maximum visibility with minimal effort, we improved the viewing experience so you can now navigate your entire application just like you’d navigate the entire world with Google Maps. The next time you’re notified there’s an issue with a transaction that’s traversing through a particular node in a cluster, you can feel confident it’ll be easy to spot and zoom in on. The application flow maps now change color in real-time depending on the SLA and performance of application tiers and the flows that connect them. For example, here is a screenshot of the new flow map with baseline comparison enabled:

Zoom in and out of your App Flow Map

 

Introducing Role-Based Perspectives

Let’s assume the role of a database administrator for a moment. Sure, they might look at Java or .NET code once in a blue moon, but their technical domain is managing the backend of the application. So in 3.4 we decided to streamline the troubleshooting process from a role-based perspective to help our respective backend admins “get to the point” and view information pertinent to their role.

“Show me the top SQL calls for my database.” Done.

“Okay, now which business transactions do my SQL queries impact?” You can think of it as a “Where Used” look up that allows the DBA to analyze what they’re interested in from a role-based perspective and see how queries are contributing to business transaction performance. The Backend Specialists – DBA, Message Broker, ESB and Security Administers – can now say with conviction, “I optimized my backend service that was impacting the user experience of several mission critical business transactions.” That’s what I call getting bang for the buck performance optimizations!

Reporting on Application AND Business Activity.

AppDynamics 3.4 also includes a new PDF reporting engine so you can generate, save and share reports detailing application health metrics. We’ve introduced several prepackaged reports as well as the ability to create your own custom report. All of the aforementioned cool 3.4 features offer some amazing technical benefits that speak to those in DevOps, but at the end of the day, someone is going to ask how this is impacting your business’s bottom line. So in 3.4, we now allow you to build a dashboard that monitors business activity for your most revenue-critical apps. Albeit, our technology is pretty slick as it is to auto-discover and baseline your application’s performance without the need for manual instrumentation. However, there may be users or scenarios where you may want to monitor the activity and revenue of your application, rather than say its performance or throughput. No problem – you can exploit “Information Points” in AppDynamics to track any business metric or value which is part of a business transaction. Our intuitive dashboard builder lets you expose any metric with drag and drop widgets meaning you can create powerful business views in minutes.

There are a number of other powerful features and product enhancements in AppDynamics 3.4 you can start exploring today by registering here for a 30-day free trial. If you’re already an AppDynamics customer, we look forward to hearing more feedback on how we can improve the experience even further and want to say thanks again! We hope you’re as delighted as we are about our latest release.

 

 

Link to this post:

, , , , , , , ,

App Man

Code Deadlock – A Usual Suspect

Imagine you’re an operations guy and you’ve just received a phone call or alert notifying you that the application your responsible for is running slow. You bring up your console, check all related processes, and notice one java.exe process isn’t using any CPU but the other Java processes are.  The average sys admin at this point would just kill and restart the Java process, cross their fingers, and hope everything returns back to normal (this actually does work most of the time). An experienced sys admin might perform a kill -3 on the Java process, capture a thread dump, and pass this back to dev for analysis. Now suppose your application returns back to normal–end users stop complaining, you pat yourself on the back and beat your chest, and basically resume what you were doing before you were rudely interrupted.

The story I’ve just told may seem contrived, but I’ve witnessed it several times with customers over the years. The stark reality is that no one in operations has the time or visibility to figure out the real business impact behind issues like this. Therefore, little pressure is applied to development to investigate data like thread dumps so that root causes can be found and production slowdowns can be avoided again in future. It’s true restarting a JVM or CLR will solve a fair few issues in production, but it’s only a temporary fix over the real problems that exist within the application logic and configuration.

Now imagine for one minute that operations could actually figure out the business impact of production issues, along with identifying the root cause, and communicate this information to Dev so problems could be fixed rapidly. Sounds too good to be true, right? Well, a few weeks ago an AppDynamics customer did just that and the story they told was quite compelling.

Code Deadlock in a distributed E-Commerce Application

The customer application in question was a busy e-commerce retail website in the US. The architecture was heavily distributed with several hundred application tiers that included JVMs, LDAP servers, CMS server, message queues, databases and 3rd party web services. Here is a quick glimpse of what that architecture looked like from a high level:

Detecting Code Deadlock

If we look at the AppDynamics problem pane (right) as the customer saw things, it shows the severity of their issues. During the day the application was experiencing just over 4,000 business transactions per minute, which works out at just under 1 million transactions a day. Approximately 2.5% of these transactions were impacted by the slowdown, which was the result of the 92 code deadlocks you see here that occurred during peak hours.

AppDynamics is able to dynamically baseline the performance of every business transaction type before classifying each execution as normal, slow, very slow or stalled depending on its deviation from its unique performance baseline. This is critical for understanding the true business impact of every issue or slowdown because operations can immediately see how many user requests were impacted relative to the total requests being processed by the application.

From this pane, operations were able to drill down into the 92 code deadlocks and see the events that took place as each code deadlock occurred. As you can see from the screenshot (below left), the sys admins during the slowdown kept restarting the JVMs (as shown) to try and make the issues go away. Unfortunately, this didn’t work given that the application was experiencing high concurrency under peak load.

By drilling into each Code Deadlock event, operations were able to analyze the various thread contentions and locate the root cause of the issue. The root cause of the slowdown turned out to be an application cache which wasn’t thread-safe. If you look at the screenshot below, showing the final execution of the threads in deadlock accessing the cache, you can see that one thread was trying to remove an item, another was trying to get an item, and the last thread was trying to put an item. 3 threads were trying to do a put, get and remove at the same time! This caused a deadlock to occur on cache access, thus causing the related JVM to hang until those threads were released via a restart.

 Analyzing Thread Dumps

Below you can see the thread dump that AppDynamics collected for one of the code deadlocks, which clearly shows where each thread was deadlocked. By copying the full thread dumps to clipboard, operations were able to see the full stack trace of each thread, thus identifying which business transactions, classes, and methods were responsible for cache access.

The root cause for this production slowdown may have been identified and passed to dev for resolution, but the most compelling conclusion from this customer story was related to them identifying the real business impact that occurred. The application was clearly running slow, but what did the end user experience during the slowdown and what impact would this have had on the business?

What was the Actual Business Impact?

The screenshot below shows all business transactions that were executing on the e-commerce web application during the five hour window before, during, and after the slowdown occurred.

Here are some hard hitting facts for the two most important business transactions inside this e-commerce application:

  • 46,463 Checkouts processed
    • 482 returned an error, 1325 were slow, 576 were very slow and 111 stalled.
  • 3,956 Payments processed
    • 12 returned an error, 242 were slow, 96 were very slow and 79 stalled

Error – transaction failed with an exception. Slow – the business transaction deviated from its baseline by more than 3 standard deviations. Very Slow – the business transaction deviated from its baseline by more than 4 standard deviations. Stalled – the transaction timed out.

If you take these raw facts and assume the average revenue per order is $100, then the potential revenue risk/impact of this slowdown was easily into six digits when you consider the end user experience for checkout and payment. Even if you take the 482 Errors and 111 Stalls relating to the Checkout business transaction alone – this still equates to around $60,000 of revenue at risk. And that’s a fairly conservative estimate!

If you add up all the errors, slow, very slow and stalls you see in the screenshot above, you start to picture how serious this issue was in production. The harsh reality is that incidents like this happen everyday in production environments, but no one has visibility into the true business impact of them, meaning little pressure is applied to development to fix “glitches.”

Agile isn’t about Change, It’s about Results

If development teams want to be truly agile, they need to forget about constant change and focus on what impact their releases has on the business. The next time your application slows down or crashes in production, ask yourself one question: “What impact did that just have on the business?” I guarantee just thinking about that answer will make you feel cold. If development teams found out more often the real business impact of their work, they’d learn pretty quickly how fast, reliable and robust their application code really is.

I’m pleased to say no developers were injured or fired during the making of this real-life customer story; they were simply educated on what impact their non-thread safe cache had on the business. Failure is OK–that’s how we learn and build better applications.

App Man.

Link to this post:

, , , , , , , ,

Michael Shinn

APM vs NPM. 2nd Round K.O.

Round Two – Last time I wrote a blog comparing APM versus network-based APM tools, which I still consider NPM at it’s core regardless of what some critics and competitors claim. Let me make one thing clear though, NPM is great for equipping IT network administrators to see how fast or slow data is traveling through the pipes of their application. Unfortunately, network-based APM tools simply cannot provide App Ops granular visibility into the application runtime when isolating bottlenecks go beyond the system level and it’s final destination – the end user’s browser.

I find several of the blogs and YouTube clips from such NPM vendors quite comical as they try to throw punches at APM companies. Their arguments are centered primarily against agent-based approaches being an inadequate APM solution due to today’s fickle and distributed application architectures. It’s not like I haven’t heard it before.

The amusing thing about it…they’re completely right! In fact, we couldn’t agree more, and that’s why Jyoti Bansal founded AppDynamics to address these perennial shortcomings legacy APM vendors have been ignoring. Even the smallest businesses next to the largest enterprises have complex applications that have outpaced their App Ops teams’ current set of monitoring tools. That’s why AppDynamics is reinventing and reigniting the application performance management space by enabling IT operations to monitor complex, modern applications running in the cloud or the data center. So let me respond to those claims they’ve made.

The Claims

“Agents have high deployment and ongoing maintenance burden.”
Legacy APM: TRUE
AppDynamics: FALSE. No manual instrumentation required. It’s automatic.

“Agents are invasive which can perturb the systems being monitored.”
Legacy APM: TRUE
AppDynamics: FALSE. Our customers see less than 1-2% overhead in production. 

“Performance management vendors have over promised and under delivered for decades.”
Legacy APM: TRUE
AppDynamics: FALSE. Things are going well thanks. Check our customer list and 400% growth.

All AppDynamics. The next-gen of APM.

Example FedEx App with application performance issues

I drew a parallel in my previous post that using NPM concepts to monitor application performance is like inspecting Fedex packages en-route to figure out why operations at a hub came to a screeching halt. Remember, even if the package contents is visible from afar, it doesn’t explain why the hub conveyors, which electronically guide packages to their appropriate destination chute is broken, nor can it identify why cargo operations have stalled. In other words, good luck trying to gather anything beyond the scope of the application’s infrastructure. Using network monitoring tools to collect even the most basic system health metrics such as CPU utilization, memory usage, thread pool consumption and thrashing? Time to throw in the towel.

And what about End User Monitoring?

What’s becoming just as important as being able to monitor server side processing and network time is the ability to monitor end user performance. When NPM tools are only able to see the last packet sent from the server, how does that help you understand the browser’s performance? It doesn’t since once again, this kind of analysis is only feasible higher up the stack at the Application Layer. And just to clarify when I say Application Layer, I mean application execution time, not “network process time to application” as defined by OSI Layer 7.

On the other hand, injected agents residing in that layer can insert JavaScript into the Web page to determine the execution time spent in the browser. This is becoming more of a concern for App Ops and Dev Ops now that 80-90% of the end-user response time is spent on the frontend executing JavaScript, rendering markup and stylesheets. As business logic continues it’s migration to the browser while increasing it’s processing burden, the client is looking more and more like the new server. Network monitoring tools must move to an agent-based approach if they are to truly deliver the monitoring visibility needed for the application and end user experience, otherwise their visibility will remain between a rock and a hard place.

On top of that, what about those customers running their applications in a public cloud? Are you going to convince your cloud provider to install a network appliance into their infrastructure? I highly doubt it. With AppDynamics, we have partnerships with cloud providers such as Amazon EC2, Azure, RightScale and Opsource allowing developers and operations to easily deploy AppDynamics with a flick of a switch and monitor their applications in production 24/7.

Once again, next-gen APM triumphs over NPM based application performance on not just the server side, but also the browser. AppDynamics is embracing this and fully aware of the technical and business significance of monitoring end user performance. We’re delighted to offer this kind of end-to-end visibility to our customers who will now be able to monitor application performance from the end users’ browser to the backend application tiers (databases, mainframes), all through a single pane of glass view.

Link to this post:

, , , , , , ,

Michael Shinn

APM vs NPM. Round One.

Another three-letter acronym I see frequently mixed in with APM is NPM which stands for Network Performance Management. At first glance they look very similar. The distinction appears very subtle with just a one letter difference, but it speaks volumes because their core technologies and approaches to monitoring application performance are fundamentally different.

Application Performance Management tools typically use agents that live in the application run-time which capture performance details of how application logic computes across the application infrastructure.

In contrast, Network Performance Monitoring tools are agent-less appliances that sit on the network analyzing content and traffic by capturing packets flowing through the network. They can measure response times and find errors for your applications by understanding the wire protocols across the application tiers. Like agent based APM solutions, they can automatically discover your application topology, but NPM tools lack deep code-level diagnostics which is paramount to helping App Ops and Dev teams solve problems. Here’s why.

Imagine your application’s architecture is FedEx providing package delivery services to it’s customers in a timely manner. If you were to apply the concept of network monitoring to this analogy, using an NPM tool would help track how long it takes for packages to travel in transit from one hub to the next, examining it’s contents and the expected arrival time to it’s final destination.

Now let’s say that a package was being sent from Paris to Los Angeles but hit a snag at the London hub along the way. Operations at London came to a screeching hault for some inexplicable reason and your task is to figure out why and how to fix it ASAP so operations are running smooth again. In this case, APM is the favorable solution over NPM here. APM tools not only empower you with the ability to see how long it takes for your package to travel to different locations, but also how the package is processed at the facility and why operations came to a standstill. At this juncture, you probably wouldn’t even care to analyze the package’s contents since you already know what’s inside and it’s still in London.

When applying this concept to the world of business applications, NPM tools won’t provide you with visibility into application logic inefficiencies, especially when experiencing a performance issue or service level breach to key business transactions. NPM performance metrics are derived from captured packets and the network protocols the servers support, but if you’re making inefficient, iterative business logic calls in your code, how will capturing and analyzing packets help you triage the problem? Its the same story for identifying the root cause of memory leaks or thread synchronization issues; execution and visibility like this occurs at the application layer, not at the network layer. Thus, I would argue that NPM compliments APM but doesn’t serve as a viable replacement.

Now before I go enumerating my theories as to why I see this trend, let me just say while I respect their hustle, nice try but not quite. While scouring the web I found many NPM companies trying to crash the popular APM party by marketing their network performance monitoring utilities as an application monitoring and management solution. It can be confusing for a first time buyer trying to evaluate APM tools when you see NPM companies using phrases like, “network-based APM” or “gather data already on your network” in their product messaging.

One of the reasons for this messaging is because NPM is an immensely crowded market. Here’s a list of NPM tools compiled by some IT folks at Stanford. There’s over 300 network monitoring tools, and that’s only going back five years to 2007!! Take your pick. Second, applications and the tools to monitor their performance and availability speaks to customers at a higher level from a technical and business perspective. If I’m in charge of some transaction intensive application, my customers are directly interfacing with the Application Layer in order to complete business transactions which takes place at the highest layer on the stack. Customers aren’t 5-6 levels deep at the Network Layer interacting with packets or the routers and switches they pass through.

By the way, have you seen Gartner’s APM spend analysis? It’s a $2 billion market and growing. This most certainly is a valid reason why NPM solutions are now positioning themselves as an APM solution to customers.

If you have absolutely nothing to monitor any part of your applications infrastructure, then something is better than nothing. However, at some point you’ll need better visibility at the application layer regardless how fast packets are traveling through the pipes. NPM won’t necessarily provide the range of visibility your App Ops and Dev teams need to diagnose, troubleshoot, and idenfity problem root causes.

Those who live day in and day out the networking world will naturally gravitate to what they’re familiar with, but if you haven’t ventured out into the evolving APM market, I highly recommend it. Even though you can achieve some level of performance monitoring with NPM, you’ll find yourself running into limitations in application visibility pretty quick. But as the old saying goes, “If all you have is a hammer, then everything looks like a nail.”

I’m curious to know what are your thoughts are. Which class of tools would you choose? Why?

 

Link to this post:

, , , , , ,

The X-Ray competition winner from last quarter came from an online travel company in France called Karavel who kindly documented their success with AppDynamics Pro. Karavel has been using AppDynamics extensively with custom dashboards, pro-actively alerting and also for optimizing slow business transactions in their production environment.

Here is Karavel’s X-Ray case study as they documented it:

Read the Full Post…

Link to this post:

, , , , , , , ,

I have yet to meet anyone in Dev or Ops who likes alerts. I’ve also yet to meet anyone who was fast enough to acknowledge an alert, so they could prevent an application from slowing down or crashing. In the real world alerts just don’t work, nobody has the time or patience anymore, alerts are truly evil and no-one trusts them. The most efficient alert today is an angry end user phone call, because Dev and Ops physically hear and feel the pain of someone suffering :)

Why? There is little or no intelligence in how a monitoring solution determines what is normal or abnormal for application performance. Today, monitoring solutions are only as good as the users that configure them, which is bad news because humans make mistakes, configuration takes time, and time is something many of us have little of.

Its therefore no surprise to learn that behavioral learning and analytics are becoming key requirements for modern application performance monitoring (APM) solutions. In fact, Will Capelli from Gartner recently published a report on IT Operational Analytics and pattern based strategies in the data center. The report covered the role of Complex Event Processing (CEP), behavior learning engines (BLEs) and analytics as a means for monitoring solutions to deliver better intelligence and quality information to Dev and Ops. Rather than just collect, store and report data, monitoring solutions must now learn and make sense of the data they collect, thus enabling them to become smarter and deliver better intelligence back to their users.

Change is constant for applications and infrastructure thanks to agile cycles, therefore monitoring solutions must also change so they can adapt and stay relevant. For example, if the performance of a business transaction in an application is 2.5 secs one week, and that drops to 200ms the week after because of a development fix. 200ms should become the new performance baseline for that same transaction, otherwise the monitoring solution won’t learn or alert of any performance regression. If the end user experience of a business transaction goes from 2.5 secs to 200ms, then end user expectations change instantly, and users become used to an instant response. Monitoring solutions have to keep up with user expectations, otherwise IT will become blind to the one thing that impacts customer loyalty and experience the most.

Read the Full Post…

Link to this post:

, , , , , , , , ,

App Man

AppDynamics Growth – Not Bad for 3 Years

AppDynamics has experienced significant growth over the past three years, here’s a quick summary of our key highlights.

 

Embed this image on your site:

Link to this post:

, , , , , , ,

When I joined AppDynamics less than a year ago, we were situated in a 6,000 sq ft “cozy” office on 2nd and Brannan. On my first day I was greeted with a MacBook Pro and was asked to find a spare desk amongst the boxes and carnage of a typical startup environment. To my left was a relentless engineering and UI team, and to my right was a fired up sales and marketing team, and a quietly confident Founder and CEO, Jyoti Bansal who made all of this happen. Across the office was a shiny gold bell mounted on the wall, which rang every time AppDynamics closed a new customer. In the last year I can honestly say that shiny bell hasn’t stopped ringing, and is the biggest adrenaline boost one can get while working.

Read the Full Post…

Link to this post:

, , , , , , , , , , , , ,

Older posts >>