TAG | Disaster Recovery
I’m fed up of reading about Cloud outages, largely because all applications are created and managed by the most dangerous species on the planet – the human being. Failure is inevitable in regards to everything the human being creates or touches, and for this reason alone I see no news in seeing the word “outage” in IT articles with or without Cloud mentioned.
What gets me the most is that applications, infra-structure and data centers were slowing down and blowing up long before “Clouds” became fashionable. They just didn’t make the news every other week when applications resided in “data-centers”–ah, the good old days. Just ask anyone who works in operations or help desk/app support whether they’ve worked a 38 hour week; I guess the vast majority will either laugh or slap you. If everything worked according to plan, IT would be a really dull place to work, help desk would be replaced with OK desk, and we’d have nothing to talk about in the office or pub.
If you are running mission-critical applications and do not have a strategy to deal with failure, you are putting your whole organization at risk. You may think that your application cannot fail, but at some point everything fails. It may not be the software running your application that fails – it could be the hardware, the network, or even a natural disaster in your area that causes your application to go down. In case of such failure, no matter how rare, your customers will still expect the same level of service, not to mention preservation of their data. Without a failover strategy and a tested backup infrastructure you will be out of service for an unknown period of time, which will lead to angry customers and loss of revenue.
Most of you have some failover strategy in place. I’m sure many of you have spent large amounts of time and money ensuring that your application is resilient to failure because you understand your app’s importance. But you may still be missing one key component, without which you are still at risk. That component is monitoring, or more specifically, Application Performance Management (APM). While mission-critical applications rely on an APM system to help monitor application performance and health, APM is often forgotten on the failover systems. APM needs to goes hand-in-hand with any failure testing plan to ensure that your company’s strategy will work in case of a real emergency.