Operations (Ops) have a tough job. Let’s face it; delivering 99.9% availability is a tall order for anyone who manages the most complicated things in the world – applications. To say Ops purely manage infrastructure is slightly inaccurate; in reality, they manage the deployment and impact of anything (e.g. apps) that run on infrastructure. Sadly for Ops, this impact translates to slowdowns, outages, and everything in between. The slightest hiccup and Ops are cast in to the spotlight as the people to troubleshoot and resolve issues. Sounds like the perfect job right?
The current DevOps movement highlights key challenges that development (Dev) and Ops teams face every day. The simple facts are that Dev wants to innovate as quickly as possible to drive a competitive edge for the business. They leverage Agile processes so Ops continues to deal with the one thing that impacts their 99.9% availability target the most – change. Ops knows change is inevitable, but they want to control it to protect their availability goals and reputation. Protecting availability means putting necessary road blocks in the way of Dev. Things like insisting on release entry/exit criteria, knowledge transfer and good old ITIL processes. Pretty much the list of things that drives a typical Dev team mad.
For those who’ve worked in Ops, it’s no secret that the majority of issues occur almost instantly after changes are made in production environments. Whether these issues relate to release procedures or the application code itself is a good question. DevOps talks about how Dev “tosses” a release over the wall to Ops who pick it up and deploy it using their own methodology. Whilst indeed some issues are caused by Op error, it’s fair to say many are not and are purely down to how the application code runs in the production environment.
I see two things that prevent Ops from becoming more Agile with Dev:
1. Common Deployment process
2. Common Operational process
Point 1 seems to be addressed with innovative solutions like Chef and Puppet that automate the deployment of releases and patches between Dev and Ops environments. You obviously still need people and process change to embrace these new solutions. Point 2 is about managing the release and keeping it operational once it’s alive and breathing in production. You could argue release success today is determined by whether the application stays up or down post deployment or change. Should the application later develop business impact or become periodically slow, it becomes the responsibility of Ops to troubleshoot and resolve this pain. This puts them in a tricky position–they’re forced to deal with these issues in a reactive mode without having the application expertise or Dev knowledge to find the root cause of issues.
Dev needs to be part of this operational process so they understand what impact their release had on the business. If releases hurt the business then Dev needs to step up and improve quality so that future releases go smoother. With no feedback loop regarding release impact between Ops to Dev, the Agility vs. Availability conflict will only get worse. The more Ops continue to fight fires on application releases, the more the relationship with Dev falls apart.
Next Week: A Common Operational process for DevOps