IT Operations and the Team USA Speed Skating Disaster

By | | 4 min read

Tag ,


Photo Credit:–oly.html

In case you weren’t aware, Team USA speed skating came home from the Olympic games in Sochi with zero medals. Not just zero Gold medals, zero medals in total (not counting short track skating which only netted one silver medal). Now I’m not a big speed skating fan but what happened during the Olympic games reminded me of what I had seen all too often while working in operations at some very large enterprises. So that’s the topic for today’s blog… What IT organizations can learn from the USA speed skating melt down in Sochi (and visa versa).

As the Team USA speed skaters were turning in poor performance after poor performance on the ice, their coaches were trying to figure out why their athletes were not competing at the level they expected and how to fix the issue. The same things happens in IT organizations when things go wrong with application performance. The business leans on the IT organization and they try to figure out what is going wrong so they can fix the issue.

Team USA decided that their new suits could be the reason why no medals had been won yet and requested to change back to the suits they had been using before they made the switch. Did they not test the new suits at all? No, of course they tested them. The important question is; how did they perform these tests?

Testing Parameters

Altitude, air density, humidity levels, varying velocity, body positions, body shapes, etc… There are a ton of different factors that may or may not have been accounted for during the testing of these suits. It’s not possible to test every variant of every parameter before using the suits in a race just like it’s not possible to properly test an application for every scenario before it get’s released into production.

The fact is, at some point you have to transition from testing to using in a real life scenario. For Team USA, that means using the new suits in races. For IT professionals, it means deploying your application into production.

Cover Up the Air Vent (Blame It On the Database)

The initial reaction by Team USA was to guess that the performance problem was a result of the back air vent system creating turbulent air flow during the race. They proceeded to cover up this air vent with no improvement in results. This is the IT equivalent of blaming application performance problems on the database or the network. This is a natural reaction in the absence of data that can help isolate the location of the problem.

Isolating the bottleneck in a production application can be easy. Here we see there is a very slow call to the database.

Isolating the bottleneck in a production application can be easy. Here we see there is a very slow call to the database.

When it was obvious that a simple closure of the air vent did not have the desired effect, Team USA then decided it was time to switch back to their old suits. In the IT world we call this the back out plan. Roll back the code to the last known good version and hope for the best.

It’s All About the Race Results (aka Production)

No matter how well you test speed skating suits or application releases, the measure of success or failure is how well you do on race day (or in production for the IT crowd). As Team USA found out, you can’t predict how things will go in production from your tests alone. Similar to an IT organization holding off on application changes before major events (like end of year financials, Black Friday, Cyber Monday, etc…), Team USA should have proven their new suits in some races leading up to their biggest event instead of making a suit change right before it.

I feel bad for the amazing athletes of Team USA Speed Skating that had to deal with so much drama during the Olympics. I would imagine it was difficult to perform at the highest level required with a major distraction like their new suits. In the end, there was no difference in results between using the new suits or the old ones. The suits were just a distraction from whatever the real issue was.

IT professionals have the luxury of tools that help them find and fix problems in production.

IT professionals have the luxury of tools that help them find and fix problems in production.

Fortunately for IT professionals we have tools that help us find problems in production instead of just having to guess what the root cause of the issue might be. If you’re a fan of USA Speed Skating, sorry for bringing up a sore subject but hopefully there is a lesson that has been learned for next time. If you’re an IT professional without the proper monitoring tools to figure out the root cause of your production issues you need to sign up today for a free trial of AppDynamics and see what you’ve been missing.

Jim Hirschauer

Jim Hirschauer

Jim Hirschauer is a Technology Evangelist for AppDynamics. He has an extensive background working in highly available, business critical, large enterprise IT operations environments. Jim has been interested in application performance testing and monitoring since he was a Systems Administrator working in a retail bank. His passion for performance analysis led him down a path where he would design, implement and manage the cloud computing monitoring architecture for a top 10 investment bank. During his tenure at the investment bank, Jim created new processes and procedures that increased overall code release quality and dramatically improved end user experience.