TAG | Application Performance Management
Unscheduled downtime happens all the time. In 2012 alone dozens of big-name websites went down for hours at a time, from GoDaddy.com to, well, almost anything running on Amazon. Every time an outage occurs, the Internet has a field day speculating about everything from the cause of the outage to who got fired for it. Some more transparent companies will publish a post-mortem on their blog with the details, which is nice, but there’s one question that’s almost never answered: how much did the downtime cost?
Various analysts & institutions have taken a stab at trying to answer this question with exhaustive research and surveys. The results are not very conclusive – it seems that the answer depends on the industry, the size of the company, the type of application, and even the year. We took a shot at breaking it all down for you in the infographic below: How much does downtime really cost you?
Embed this image on your site:
Summary: How much does downtime cost?
Downtime is really expensive – especially if you work at a financial services institution that relies on transaction-based fees like credit card transaction fees or trading fees. But there are costs associated with downtime that we don’t imagine – for example, the human cost of having to spend every day firefighting instead of focusing on other projects, or the damage to your company’s brand, which can impact customer loyalty. It may be worth the effort to find out what downtime costs your organization, and how much of it you can afford.
Automation sets apart organizations at the top of their game from the rest of the pack. The limiting factor in most organizations is that they are usually too busy putting out fires and keeping up with all of their other obligations to expend the effort required to envision and build out their automation strategy. With that in mind I have created a small list of the automation tasks that I feel provide the most value to an organization. Along with these tasks I explain the type of effort involved and reward associated with each one. All of the information presented is based upon my 15 years of troubleshooting within enterprise operations and applications environments.
1. Collect troubleshooting metrics – To me this is the no-brainer of automation tasks. For each particular type of problem you always want certain information to help resolve that issue. This is also the easiest of my top three to implement but may provide less value than the other two. Here are some examples…
- Hung/Unresponsive JVM/CLR – Initiate and store thread dump, restart application on offending node.
- Slow transactions plus high server CPU utilization – Collect process listing to determine CPU contributors and spin up extra instance of application.
- Transactions throwing excessive errors – search log files for errors and send list to appropriate personnel, based upon error type possibly probe individial components deeper (see #2 below)
2. Probe application components – This one is really useful for figuring out difficult application problems but requires more effort to set up than #1. The basic concept is that you need to find out from the application support team what steps they would take manually to trouble shoot their application if it were slow or broken. The usual responses are things like “Check this log file for this word”, “Run this query against this database and if the output is -3 then I know what the problem is”, “Hit this URL to see what the response looks like. If the page does not return properly I know there is a problem with this component”, etc…
It may seem like a lot of work to set up this type of automated probing at first but once it is set up it becomes an invaluable troubleshooting tool. Imagine that the application response times get slow so these troubleshooting measures are automatically invoked and you get an email with the exact root cause within minutes of performance degradation. With this type of automation there is usually a known resolution based upon the probing results so that could be automated too. How handy is that?
3. Alert the business to changing conditions – This is where the IT staff has the ability to really make an impression and impact on the business. One of the most overlooked aspects of monitoring and automation is the ability to gather, baseline, alert, and act based upon pure business metrics. Here’s an example scenario…
Bobs business has an e-commerce website that sells many different products. They use AppDynamics Pro to track the quantity of each item sold along with the total revenue of all sales throughout the business day (using the information point functionality). One day Bob gets an alert that the latest greatest widgets are selling way below their typical volume but the automation engine searched the prices of major competitors to find that one competitor lowered prices and is undercutting business. With this actionable information Bob is able to immediately match the pricing of the competition and sales rates return to normal before too much damage is done.
Obviously business operations automation can be the most complex but also the most rewarding. Reaching out to the business and having a conversation about their activities and processes can seem daunting but I can tell you from personal experience that the business is more than willing to participate in initiatives that save them time and money. This type of conversation also normally leads to the business asking about other ways to collaborate to make them more effective so it is a great way to improve the overall level of communication with the business.
AppDynamics Pro enables a new level of automation because it knows exactly what is going on inside of your applications from both a business and IT perspective. Your level of automation is limited only by your imagination. I recommend that you start out small with a single automation use case and build outward from there. You can use AppDynamics Pro for free to try out our application runbook automation functionality on your specific use case by clicking here.Link to this post:
Welcome, let’s start with a quick introduction of who you are and what your role is at PagerDuty.
My name is Alex, I’m PagerDuty’s CEO and co-founder.
And what beer will you be drinking tonight?
Guinness! It’s the office favorite.
What problems does your solution or service solve for customers?
PagerDuty provides centralized, highly-targeted alerting, escalation, and incident management. We integrate with the monitoring systems you have in place to provide a single place to manage all of your alerting. When things go down, we wake you up.
What types of technology trends are you seeing within your customer base?
We have a great mix of customers, from large enterprises (we have 15 of the Fortune 100 as customers), to mid-size companies like Splunk, Citrix, and Box, to two-person startups, which gives us an interesting perspective. The most obvious trend that we’re seeing is companies looking to automate the notification piece of ‘monitoring and alerting’ right off the bat. With IaaS providers like Rackspace and AWS, the need to do a costly NOC build-out has been almost entirely eliminated. Folks who do have a NOC in place are looking to automate what they can, while allowing on-site engineers to tackle the problems they’re best equipped to solve.
What application performance pain and challenges do you typically see within customer accounts?
All our customers have mission-critical infrastructure in place, and for many of them, slow applications directly equate to lost revenue. When company performance depends on applications being available and responsive, having an APM solution deployed isn’t a luxury, it’s a necessity.
Why does partnering with AppDynamics make good sense?
AppDynamics is the enterprise APM solution, and PagerDuty is the enterprise alerting solution. There couldn’t be a more natural fit – when AppDynamics detects an issue, PagerDuty wakes you up.
What’s your favorite thing about AppDynamics?
The ROI! We’re big believers in actively monitoring revenue-critical production applications. Anything you can deploy and see a 2x, 4x, 10x return, quickly, that’s pretty great in my book.
How can someone find out more about PagerDuty?
Just signup for a free trial at pagerduty.com. Or just shoot us an email at email@example.com and a member of our sales staff will reach out to setup a demo.
Which software companies inspire you?
B2B companies that make something people want and that solves a clear need. We like companies that can tackle hair on fire problems. AppDynamics, obviously, and Splunk come to mind.
Who is your favorite super hero and why?
Spiderman. He’s powerful yet flawed – a real human super hero.
Today AppDynamics announced integration with PagerDuty, a SaaS-based provider of IT alerting and incident management software that is changing the way IT teams are notified, and how they manage incidents in their mission-critical applications. By combining AppDynamics’ granular visibility of applications with PagerDuty’s reliable alerting capabilities, customers can make sure the right people are proactively notified when business impact occurs, so IT teams can get their apps back up and running as quickly as possible.
You’ll need a PagerDuty and AppDynamics license to get started – if you don’t already have one, you can sign up for free trials of PagerDuty and AppDynamics online. Once you complete this simple installation, you’ll start receiving incidents in PagerDuty created by AppDynamics out-of-the-box policies.
Once an incident is filed it will have the following list view:
When the ‘Details’ link is clicked, you’ll see the details for this particular incident including the Incident Log:
If you are interested in learning more about the event itself, simply click ‘View message’ and all of the AppDynamics event details are displayed showing which policy was breached, violation value, severity, etc. :
Let’s walk through some examples of how our customers are using this integration today.
Say Goodbye to Irrelevant Notifications
Is your work email address included in some sort of group email alias at work and you get several, maybe even dozens, of notifications a day that aren’t particularly relevant to your responsibilities or are intended for other people on your team? I know I do. Imagine a world where your team only receives messages when the notifications have to do with their individual role and only get sent to people that are actually on call. With AppDynamics & PagerDuty you can now build in alerting logic that routes specific alerts to specific teams and only sends messages to the people that are actually on-call. App response time way above the normal value? Send an alert to the app support engineer that is on call, not all of his colleagues. Not having to sift through a bunch of irrelevant alerts means that when one does come through you can be sure it requires YOUR attention right away.
If you are only sending a notification and assigning an incident to one person, what happens if that person is out of the office or doesn’t have access to the internet / phone to respond to the alert? Well, the good thing about the power of PagerDuty is that you can build in automatic escalations. So, if you have a trigger in AppDynamics to fire off a PagerDuty alert when a node is down, and the infrastructure manager isn’t available, you can automatically escalate and re-assign / alert a backup employee or admin.
The Sky is Falling! Oh Wait – We’re Just Conducting Maintenance…
Another potentially annoying situation for IT teams are all of the alerts that get fired off during a maintenance window. PagerDuty has the concept of a maintenance window so your team doesn’t get a bunch of doomsday messages during maintenance. You can even setup a maintenance window with one click if you prefer to go that route.
Either way, no new incidents will be created during this time period… meaning your team will be spared having to open, read, and file the alerts and update / close out the newly-created incidents in the system.
We’re confident this integration of the leading application performance management solution with the leading IT incident management solution will save your team time and make them more productive. Check out the AppDynamics and PagerDuty integration today!