How Full Cycle Developers Can Avoid the ‘Fog of Development’

As software engineers, we strive to hone our craft and learn new skills. And with a revolution underway in how we deliver and maintain the platforms we create, these character traits are particularly useful. Change happens fast—from developers being highly specialized, to dabbling in multiple parts of the stack, to today’s blurring of operational lines in a hybrid cloud world. One emerging IT role is that of full cycle developer, a term minted by Netflix last year to describe the reorganization of its Edge Engineering team.

So what’s a full cycle developer? As Netflix explains, it’s a developer who applies engineering discipline to all areas of the software life cycle, one who “thinks and acts” like an SWE, SRE or SDET. A full cycle developer may be called upon to create software and write test cases to solve business problems, automate operational aspects of the system, or deftly perform some other essential task.

For many organizations, however, this type of transformation is easier said than done. Problems arise when software engineers are expected to know the entire tech stack, when at the same time the increasingly distributed nature of enterprise applications makes it very difficult for one person to understand the whole system.

In my career, I’ve learned database, caching, build, load balancing, infrastructure automation, security, performance testing, public cloud, and container orchestration stacks. And compared to how applications were built only a decade ago, I’ve noticed that modern technology stacks are more purpose-built for a specific use case or environment. This rising complexity of systems and platforms spotlights the importance of software engineers’ never-stop-learning mantra, but it also shows the growing need for automated systems that help manage and maintain these intricate environments.

Lost in the Fog

Military leaders are familiar with the fog of war, the uncertainty in situational awareness that occurs in a military campaign. Software development isn’t war, of course, but developers often find themselves experiencing a similar uncertainty—one I call the “fog of development”—which can occur when developing a small piece of a system or platform. For instance, a software engineer might be part of a scrum team working on a dozen endpoints in a platform with hundreds of endpoints. With so many moving pieces, it’s easy to get lost in the software delivery battle.

Operate what You Build

This brings us back to full cycle developers and Netflix’s strong culture of innovation. When an organizational or team-structure change occurs at Netflix, the software industry takes note, such as when the streaming giant’s Edge Engineering team reorganized last year to enable more accountability and ownership of platforms it develops and maintains. This team, inspired by the principles of the DevOps movement, adopted the motto “operate what you build.” Simply put, if you develop a system, you’re also responsible for operating and supporting it.

The goal here may seem simple, but the amount of skill, discipline and infrastructure to make this approach successful is not trivial. Essentially, a full cycle developer can participate in any area of the systems development life cycle (SDLC)—from design and development to deployment and support of the code they write.

The old adage, “so much to do and so little time,” certainly rings true with developers today, particularly those tasked with staying on top of rapid, constant change. Team dynamics, skill sets, and learning curves all impact developers’ productivity. In a recent Stack Overflow survey of more than 100,000 developers, roughly three-fourths of respondents said they expect new coworkers to be fully up to speed within three months. Another interesting finding: developers are “lifelong learners,” with nearly 90% of respondents saying they have taught themselves a new language, framework or tool outside of their formal education.

AIOps: Because We Can’t Do It Alone

With so many platforms to master, an emerging operate-what-you-build ethos, an upsurge in metrics to monitor and extract insights from, shortened time spans on projects, and new responsibilities for full cycle developers, it’s no wonder the fog of development is a real phenomenon.

Artificial intelligence for IT operations (AIOps) can help bridge and support these initiatives, enabling full cycle developers to rely on system automation to help tune their platforms. A key promise of AIOps is the ability to provide systematic self-healing, including autonomous A/B tests when a bottleneck is detected. Look to AppDynamics to provide leadership in the AIOps space, as we’re preparing for the next leap forward in automated software monitoring, observability and remediation.

Online Satisfaction Is $ In Your Company’s Pocket

online shopping

This week Target announced that they will now be price matching,,, and’s prices all year-long – not just during the holiday season.  Talk about competitive edge.  “Showrooming” is just one of the latest ways for retailers to make sure that THEY are the ones you spend your money with.  But it doesn’t stop there.

Now that over 39% of all holiday shopping is done online – and the percentage is ever-growing – customer satisfaction is a big tell on how successful a company might be.  FORESEE recognizes this correlation, and has been indexing customer satisfaction since 2005.

Here is my “Spark Notes” version of the Holiday 2012 Index:

What is “Satisfaction Score”?

The “E-Retailer Satisfaction Index” is an analysis of customer satisfaction with 100 of the largest online retailers by sales volume (according to Internet Retailer).  The Index measures the satisfaction of online retail experience during the holiday season by “Satisfaction Scores” – which are extrapolated from 24,000 customer surveys between Thanksgiving and Christmas.

The indicators of customer satisfaction with a retail website are:
– Increased revenues
– Loyalty to the site
– Likely to recommend the retailer to others

And findings show that satisfied customers are:
– 65% more committed to the brand
– 69% more likely to recommend the retailer
– 67% more likely to purchase from the retailer the next time

Naughty vs. Nice Numbers

The U.S. Holiday Edition 2012 Satisfaction Index comes in at an average Satisfaction Score of 78 (on a 100-point scale) – with Amazon scoring the highest (for the second consecutive year) at a score of 88, and scoring 72.  This is a large gap considering that FORESEE estimates every one-point change on the Satisfaction Scale is a 14% change in the log of revenues generated – WOW!

Below are the FORESEE standout retailers based on Satisfaction Score.
Keep an eye out for companies in blue – they are AppDynamics customers!

1. Best of the Best Award (Criteria: Top Score)
– Amazon (See Graph Below)

2. Leaders Award (Criteria: Score ≥ 80)
Screen Shot 2013-01-03 at 4.24.13 PM



















It is awesome to have 4 AppDynamics customers as top satisfaction performers!
Netflix has been a leader 7 out of the 8 years that FORESEE has been indexing Satisfaction Scores.

3. Most Improved Since Last Year Award (Criteria: Increase of 3 points or more)

Screen Shot 2013-01-03 at 2.48.30 PM








Besides making the Most Improved Since Last Year list, Overstock has been able to
achieve speedy mean time to resolution, from days or hours down to minutes,
when solving problems in their production environment by using AppDynamics.
See more achievements on video here.

4.  Most Improved Over Time Award (Criteria: Significant Increase over 5 years)

Screen Shot 2013-01-03 at 2.55.19 PM














Congratulations to Staples for making the Most Improved Over Time list! It’s great to see
another AppDynamics customer killing it against other big names in retail.

 “Largest Declines Over Time” (Criteria: Significant Decreases over 5 years)
Screen Shot 2013-01-03 at 2.57.42 PM






So What Does This All Mean?

Screen Shot 2013-01-03 at 3.56.03 PMThe explanations of these findings lie in the retailers’ ability to understand the customers’ expectations.  In a nutshell, if they deliver and exceed customer expectations in online merchandise, functionality, prices, & content, then the retailer satisfaction score will be high.  “Only when they understand this can they begin to prioritize the site enhancements and make business decisions that can generate the greatest return on their investment,” explains the article.  Meeting expectations satisfies the customer – and therefore makes you money.

In this day-in-age, agile releases are the way to differentiate your site and keep up on the competition.  Have you stopped to think, are my agile releases satisfactory for my customer?  (And if you really want to differentiate your site, have you asked, are my agile releases going to exceed my customers’ expectations?)  Moving forward with this in mind will increase your site’s revenues, boost site loyalty, and users will more likely recommend your site to others – who doesn’t want that?!

Don’t be like – falling guilty of not prioritizing the customer.  Build to please, and that will surely make you money.

Like this blog? Click HERE

See the full Holiday 2012 Edition of FORSEE’s E-Retailer Satisfaction Index here.

*All graphs and facts come from FORESEE E-Retail Satisfaction Index (U.S. Holiday Edition 2012)

AppJam 2012: How Netflix Operates & Monitors in the Cloud

Ariel Tseitlin, Director of Cloud Solutions, Netflix
As an early adopter of many cloud technologies and the record-holder of the largest public cloud deployment, Netflix is often seen as a leading innovator in cloud computing. In this session Ariel will discuss the key drivers, challenges and design principles behind Netflix’s journey to the cloud. Ariel will also cover how Netflix operates in the cloud, including which tools and mechanisms they use to manage the performance and availability of their highly distributed applications.


[slideshare id=15678306&doc=appjamnetflixfinal-121217200307-phpapp01]

AppDynamics Pumps up the Jam in San Francisco

It’s been a week since we hosted AppJam Americas, our first North American user conference in San Francisco. With myself as master of ceremonies, and a minor wardrobe malfunction at the start (see video at the end of this post), the entire day was a huge success for us and our customers. One thing that stuck in my mind was that applications today have become way more complex to manage—and strategic monitoring has become key to mastering that complexity. Simply put, SOA+Virtualization+Big Data+Cloud+Agile != Easy.

The day started with Jyoti Bansal, our CEO and Founder outlining his vision to be the world’s #1 solution for managing modern web applications. The simple facts are that applications have become more dynamic, distributed and virtual. All of these factors have increased their operational complexity, and log files and legacy monitoring solutions are ill-suited to the task.

Jyoti then outlined our core design principles around Business Transaction Monitoring, Self-learning, intelligence and the need to keep app management simple. He then suggested what the audience could expect from AppJam: “AppJam is about sharing knowledge, learning best practices, guiding our direction and Jamming.” (We’re pretty sure by “jamming” he meant “partying.”)

With the intro from Jyoti done, it was time for me to nose dive the stage and introduce our first customer speaker – Ariel Tsetlin from Netflix.

?How Netflix Operates & Monitors in the Cloud

With 27 million customers around the world, Neflix’s growth over the past three years has been meteoric. In fact, they found that they couldn’t build data centers fast enough. Hence, they moved to the public cloud in AWS for better agility.

In his session, Ariel talked about Netflix’s architecture in the cloud and how they built their own PaaS in terms of apps and clusters on top of Amazons IaaS. One unique thing Netflix does is bake their OS, middleware, apps and monitoring agents into a single image rather than using a tool like Chef or Puppet to manage application configuration and deployment separately from the underlying OS, middleware and tools. Everything is automated and managed at the instance level, with developers given the freedom and responsibility to deploy whenever they want to. That’s pretty cool stuff when you consider that developers now manage their own capacity and auto-scaling within the Cloud.

Ariel then talked about the assumption that failure is inevitable in the Cloud, with the need to plan and design around the fact that every part of the application can and will fail at some point. Testing for failure through “monkey theory” and Netflix’s “Simian Army” allows them to simulate failure at every level of the application, from randomly killing instances to taking out entire availability zones in AWS.

From a monitoring perspective, Netflix uses internally developed tools and AppDynamics, which are also baked into their AWS images. Doing so allows developers to live and die by monitoring in production through automated alerts and problem discovery. What’s perhaps different is that Netflix focuses their monitoring at the service level (e.g. app cluster), rather than at the infrastructure level–so they’re really not interested in CPU or memory unless it’s impacting their end users or business transactions.

Finally, Ariel spoke about AppDynamics at Netflix, touching on the fact they monitor over 1 million metrics per minute across 400+ business transactions and 300+ application services, giving them proactive alerts with URL drill-down into business transaction latency and errors from self-learned baselines. Overall, it was a great session for those looking to migrate and operate their application in the Cloud.

When Big Data Meets SOA

Next up was Bob Hartley, development manager from Family Search, who gave an excellent talk about managing SOA and Big Data behind the world largest genealogy architecture. With almost 3 billion names indexed and 550+ million high resolution digital images, FamilySearch has over 20 petabytes of data which needs to be managed by their Java and Node.JS distributed architecture spanning 5,000 servers. What’s scary is that this architecture and data is growing at a rapid pace, meaning application performance and scalability is fundamental to the success of Family Search.

After a brief intro, Bob started to talk about his Big Data architecture in terms of what technologies they were using to manage search queries, images, and people records. Clusters of Apache Lucene, SOLR, and custom map-reduce combined with traditional relational database technology such as Oracle, MySQL, and Postgres.

Bob then talked about his team’s mission – to enable business agility through visibility, responsiveness, standardization, and vendor independence. At the top of this list was to provide joy for customers and stakeholders through delivering features that matter faster.

Bob also emphasized the need for repeatable, reliable and automated processes, as well as the need to monitor everything so his team could manage the performance of their SOA and Big Data application through continuous agile release cycles. Family Search has gone from a 3-month release cycle to a continuous delivery model in which changes can be deployed in just 40 minutes. That’s pretty mind blowing stuff when you consider the size and complexity of their environment!

What’s interesting is that Release != Deploy at FamilySearch; they incrementally roll out out new features to different sets of users using flags, allowing them to test and tease features before making them available to everyone. Monitoring is at the heart of their continuous release cycle, with Dev and Ops using baselines and trending to determine the impact of new features on application performance and scalability.

In terms of the evaluation process, the company looked at 20 different APM vendors over a 6 month period before finally settled on AppDynamics due to our dynamic discovery, baselining, trending, and alerting of business transactions. As Bob said, “AppDynamics gave us valuable performance data in less than one day. The closest competitors took over 2 weeks just to install their tools.”

Today, a single AppDynamics management server is used in production to monitor over 5,000 servers, 40+ application services, and 10 million business transactions a day. Since deployment, Family Search has managed to find dozens of problems they’ve had for years, and have managed to scale their application by 10x without increasing server resources. They’ve also seen MTTD drop from days to minutes and MTTR drop from months to hours and minutes.

Bob finished his talk with his lessons learned for managing SOA, Big Data and Agile applications: “Keep Architecture Simple,” “Speed of delivery is essential,” “Systems will eventually fail,” and “Working with SOA, Big Data and Agile is hard.”

How AppDynamics is accelerating DevOps culture at

After lunch, John Martin, Senior Director of Production Engineering, spoke about DevOps culture at and how AppDynamics has become central to driving team collaboration. After a brief architecture overview outlining his SOA environment of 30 application services, John outlined what DevOps meant to him and his team – “DevOps is really about Collaboration – the most challenging issues we faced were communication.” Openly honest and deeply passionate throughout his session, John talked about three key challenges his team faced over the years that were responsible for the move to DevOps:

1. Infrastructure Growth

2. Communication Failure

3. Go Faster & Be Efficient

In 2005 had just 30 servers; by the end of this year that figure will have risen to 2,500. Through release automation using tools such as Bladelogic and Chef, John and his team are now able to perform a release in minutes versus the 8 hours it took back in 2005.

John gave an example on communication failure in which development was preparing for a major release at using a new CMS platform. This release was performance-tested just two weeks prior to go-live. Unfortunately the new platform showed massive scalability limitations, causing Ops to work around the clock to over-provision resources as a tactical fix. Fortunately the release was delivered on time and the business was happy. However, they suffered as a technology organization due to finding architecture flaws so late in the game – “We needed a clear picture of what went wrong and how we were going to prevent such breakdown in future.”

Another mistake with a release in 2010 which forced a major re-think between development and operations. It was this occurrence that caused to get really serious about DevOps. In fact, the technical leads got together and reorganized specialized teams within Dev and Ops to resolve deployment issues and shed pre-conceptions on who should do what. The result was improved relationships, better tooling, and a clearer perspective on how future projects could work.

John then touched on the tools that were accelerating DevOps culture, specifically Splunk for log files and AppDynamics for application monitoring. “AppDynamics provides a way for Dev and Ops to speak the same language. We’ve saved hundreds of hours in pre-release tests and discovered many new hotspots like the performance of our inventory business transaction which increased by 111%.” In fact, within the first year, AppDynamics generated a ROI of $795,166 with year 2 savings estimated at a further $420k. John laughed, “As you can see, AppDynamics wasn’t a bad investment.”

John ended his session with 5 tips for ensuring that DevOps succeeds in an organization: Be honest, communicate early and often, educate, criticize constructively, and create champions. Overall, a great session on why DevOps is needed in today’s IT teams.

Zero to Production APM in 30 days (while sending half a billion messages per day)

The final customer session of the day came from Kevin Siminski, Director of Infrastructure Operations at ExactTarget and it was definitely worth waiting for. Kevin actually kicked off his talk by describing a weekly product tech sync meeting which he had with his COO. The meeting was full with different stakeholders from development and operations who were discussing a problem that they were currently experiencing in production.

“I literally got my laptop out, brought up the AppDynamics UI and in one minute we’d found the root cause of the problem,” Kevin said. Not a bad way to get his point across of why the value of Application Performance Management (APM) in 2012 is so important.

Kevin then gave a brief intro to ExactTarget and the challenges of powering some of the world’s top brands like Nike, BestBuy and ExactTarget’s .NET messaging environment is highly virtualized with over 5,000 machines that generate north of 500 million messages per day across multiple Terabytes of databases.

Kevin then touched on the role of his global operations team and how his team’s responsibility had shifted over the last four years. “My team went from just triaging system alerts to taking a more proactive approach on how we managed emails and our business. Today my team actively collaborates with development, infrastructure and support teams.” All these teams are now focused and aligned on innovation, stability, performance and high availability.

Kevin then outlined his 30-day implementation plan for deploying AppDynamics across his entire environment using a single dedicated systems engineer and an AppDynamics SaaS management server for production. Week 1 was spent on boarding the IT-security team, reviewing config mgmt and testing agent deployment to validate network and security paths. Week 2 involved deploying agents to a few of the production IIS pools and validating data collection on the AppDynamics management server. Week 3 saw all agents pushed to every IIS pool with collection mechanism sent to disabled. The config mgmt team then took over and “owned” the deployment process for go live. Week 4 saw all services and AppDynamics agents enabled during a production change window with all metrics closely monitored throughout the week to ensure no impact or unacceptable overhead.

AppDynamics’ first mission was to monitor the ExactTarget application as it underwent an upgrade to its mission-critical database from SQL Server 2003 to 2008. It was a high-risk migration as Kevin’s team were unable to assess the full risk due to legacy application components, so with all hands on the deck they watched AppDynamics as the migration happened in real-time. As the switch was made, application calls per minute and response time remained constant but application errors began to spike. By drilling down on these errors in AppDynamics, the dev team was quickly able to locate where they were coming from and resolve the application exceptions.

Today, AppDynamics is used for DevOps collaboration and feedback loops so engineers get to see the true impact of their releases in a production environment, a process that was requested by a product VP outside of Kevin’s global operations team. Overall, Kevin relayed an incredible story of how APM can be deployed rapidly across the enterprise to achieve tangible results in just 30 days.

A nice surprising statistic that I later realized in the evening was that the total number of servers being monitored by AppDynamics across our four customer speakers was well over 20,000 nodes. Having been in the APM market for almost 10 years I’m struggling to think of another vendor with such successful large scale production deployments.

Here’s a link to the photo gallery of AppJam 2012 Americas. A big thank you to our customers for attending and we’ll see you all next year!

For those keen to see my stage nosedive here you go: