How Full Cycle Developers Can Avoid the ‘Fog of Development’

As software engineers, we strive to hone our craft and learn new skills. And with a revolution underway in how we deliver and maintain the platforms we create, these character traits are particularly useful. Change happens fast—from developers being highly specialized, to dabbling in multiple parts of the stack, to today’s blurring of operational lines in a hybrid cloud world. One emerging IT role is that of full cycle developer, a term minted by Netflix last year to describe the reorganization of its Edge Engineering team.

So what’s a full cycle developer? As Netflix explains, it’s a developer who applies engineering discipline to all areas of the software life cycle, one who “thinks and acts” like an SWE, SRE or SDET. A full cycle developer may be called upon to create software and write test cases to solve business problems, automate operational aspects of the system, or deftly perform some other essential task.

For many organizations, however, this type of transformation is easier said than done. Problems arise when software engineers are expected to know the entire tech stack, when at the same time the increasingly distributed nature of enterprise applications makes it very difficult for one person to understand the whole system.

In my career, I’ve learned database, caching, build, load balancing, infrastructure automation, security, performance testing, public cloud, and container orchestration stacks. And compared to how applications were built only a decade ago, I’ve noticed that modern technology stacks are more purpose-built for a specific use case or environment. This rising complexity of systems and platforms spotlights the importance of software engineers’ never-stop-learning mantra, but it also shows the growing need for automated systems that help manage and maintain these intricate environments.

Lost in the Fog

Military leaders are familiar with the fog of war, the uncertainty in situational awareness that occurs in a military campaign. Software development isn’t war, of course, but developers often find themselves experiencing a similar uncertainty—one I call the “fog of development”—which can occur when developing a small piece of a system or platform. For instance, a software engineer might be part of a scrum team working on a dozen endpoints in a platform with hundreds of endpoints. With so many moving pieces, it’s easy to get lost in the software delivery battle.

Operate what You Build

This brings us back to full cycle developers and Netflix’s strong culture of innovation. When an organizational or team-structure change occurs at Netflix, the software industry takes note, such as when the streaming giant’s Edge Engineering team reorganized last year to enable more accountability and ownership of platforms it develops and maintains. This team, inspired by the principles of the DevOps movement, adopted the motto “operate what you build.” Simply put, if you develop a system, you’re also responsible for operating and supporting it.

The goal here may seem simple, but the amount of skill, discipline and infrastructure to make this approach successful is not trivial. Essentially, a full cycle developer can participate in any area of the systems development life cycle (SDLC)—from design and development to deployment and support of the code they write.

The old adage, “so much to do and so little time,” certainly rings true with developers today, particularly those tasked with staying on top of rapid, constant change. Team dynamics, skill sets, and learning curves all impact developers’ productivity. In a recent Stack Overflow survey of more than 100,000 developers, roughly three-fourths of respondents said they expect new coworkers to be fully up to speed within three months. Another interesting finding: developers are “lifelong learners,” with nearly 90% of respondents saying they have taught themselves a new language, framework or tool outside of their formal education.

AIOps: Because We Can’t Do It Alone

With so many platforms to master, an emerging operate-what-you-build ethos, an upsurge in metrics to monitor and extract insights from, shortened time spans on projects, and new responsibilities for full cycle developers, it’s no wonder the fog of development is a real phenomenon.

Artificial intelligence for IT operations (AIOps) can help bridge and support these initiatives, enabling full cycle developers to rely on system automation to help tune their platforms. A key promise of AIOps is the ability to provide systematic self-healing, including autonomous A/B tests when a bottleneck is detected. Look to AppDynamics to provide leadership in the AIOps space, as we’re preparing for the next leap forward in automated software monitoring, observability and remediation.

AWS re:Invent Recap: Freedom for Builders

AWS re:Invent continues to build momentum. Amazon’s annual user conference, now in its seventh year, took place last week in Las Vegas with more than 50,000 builders on hand to share stories of cloud successes and challenges. The atmosphere was exciting and fast-paced, with Amazon once again raising the bar in the public cloud space. And AWS, which unveiled a plethora of new capabilities before and during the show, continues to delight and innovate at a rapid pace.

Are We Builders?

In his opening keynote, AWS CEO Andy Jassy shared a new theme that reverberated throughout the event: software engineers are now “builders,” not “developers.” Indeed, the internet recently has been ablaze with discussion and debate on this moniker shift.

Whether you see yourself as a builder or developer, Jassy helped categorize builders into different personas for enterprises that either like or dislike guardrails. (In AWS, a guardrail is a high-level rule that prevents deployment of resources that don’t conform to policies, thereby providing ongoing governance for the overall environment.)

If you like guardrails you could easily implement machine learning labeling with SageMaker. Not a fan of guardrails? AWS still focuses on core compute, which helps autoscale, for example, the exact number of GPUs a machine learning process needs with Amazon Elastic Inference. Still, the question remains: Are we builders or developers? The debate likely won’t end soon, but one thing is certain: the lower bar of entry for application infrastructure gives us the freedom to pick the best building blocks for our AWS environments.

Simplifying Cloud Migration

Depending on where you are in your cloud migration journey, the move to a public cloud vendor could cannibalize your development stack. AWS certainly has lowered the bar of entry, making diverse application infrastructure available to more individuals and organizations. In fact, you potentially could build a robust API without writing a single line of code on AWS.

Freedom to Transform

Another builder-related term popular at re:Invent was “freedom.” The most notable example was the series of announcements for databases evolving to Amazon Timestream, a managed time series database—check out the hashtag #databasefreedom to learn more. Regardless of whether you agree or disagree with Jassy’s “builder” designation, today’s enterprise is ripe for transformation. Partnering with AppDynamics can help you become an Agent of Transformation.

AWS in Your DataCenter

Some big news from the show made migrating your datacenter to AWS even more tangible. AWS has partnered with VMware to introduce AWS Outposts, an interesting proposition that seems to address the one place AWS was not reaching—the physical datacenter. Microsoft Azure has a similar product in Azure Stack, a combined software and hardware offering with the established management capabilities of VMware. Innovation in both platforms is sure to drive greater competition.

A Driverless Future

The dream of autonomous vehicles has been building ever since the first automobile hit the road. For re:Invent attendees who used Lyft to get around, this dream is now far more real. Lyft partnered with Aptiv to provide autonomous transport up and down the Vegas Strip, allowing those who opted in to be taken from session to session in a self-driving car.

Amazon Web Services also garnered a lot of buzz by introducing AWS DeepRacer, a scaled-down autonomous model race car. This isn’t just another toy, however. DeepRacer is a great tool for teaching machine learning concepts such as reinforcement learning, and for mastering underlying AWS services such as AWS SageMaker. Coupled with the large number of autonomous rides powered by Aptiv, many attendees will no doubt be inspired to use DeepRacer to study up on ML concepts.

AppD at re:Invent

AppDynamics’ Subarno Mukherjee leading an AWS session.

AppDynamics Senior Solutions Architect Subarno Mukherjee led an AWS session called “Five Ways Application Insights Impact Migration Success.” Subarno offered great insights into the importance of the customer and user experience, and how it impacts the success of your public cloud migration.

AppDynamics ran ongoing presentations in our booth throughout the show, covering the shift of cloud workloads to serverless and containers, maturing DevOps capabilities and processes, and the impending shift to AIOps.

Attendees showed a lot of interest in our Lambda Beta Program, too. Feel free to take a look at our program and, if interested, sign up today!

AppD Social Team

The AppDynamics social team was in full swing at re:Invent. From handing out prizes and swag to helping expand the AppDynamics community, we had a great time meeting our current and future customers. AppDynamics and AWS also co-hosted a fantastic happy hour at The Yardbird after Thursday’s sessions.

See You Next Year

We were thrilled to be a part of this year’s re:Invent. The great conversations and shared insights were fantastic. We’re looking forward to expanding our AWS ecosystem and partnerships—and to returning to re:Invent in 2019!



How to Monitor Code You Didn’t Write

Almost no one writes all their own code. Whether you are an engineer in IT operations or a developer, you are most likely working with code written by someone outside your organization. As a result, one question we hear repeatedly is how do you monitor third-party code?

The answer is that it depends whether you have access to the runtime, the runtime and the source code, or neither. The good news is that in every case monitoring the code you didn’t write is easier than you might think.

Standalone Software

Commercial off-the-shelf software (COTS) provides a complete, self-contained solution running in the equivalent of its own box in your environment. While you generally have access to the runtime, it can be difficult to instrument because you don’t know what business transactions are most meaningful. With AppDynamics, the challenge of figuring out what to monitor is made easier with pre-built support for many popular COTS solutions such as SAP, Atlassian Confluence, or IBM Integration Bus.

For products where we do not provide out-of-the-box visibility, you can write an extension that will provide insight into what is going on inside the software. For example, by ingesting metrics such as the number of requests per minute or the associated disk usage provided by the COTS software, you can correlate unexpected changes to baseline behavior to the corresponding business transactions.



Remote Services

In contrast to COTS software, third-party remote services can only be accessed via the network and offer no access to runtime whatsoever. By monitoring the endpoints—calls to and responses from a service—AppDynamics will give you a clear indication whether or not a particular service is responsible for slowness in an application. And we also have additional capabilities that help you monitor the health of these endpoints through Service Availability Monitoring or Synthetic Monitoring. If your third-party service offers visibility metrics, we can integrate those into your AppDynamics dashboard. We can also set up alerts that trigger when the average response time rises above a certain level, so you can easily ensure compliance with SLAs.



Third-party libraries are another common scenario. What is the best way to troubleshoot issues with a threading library, a logging library, or a client library for a cache? If a library is integrated with your code, your business transactions will flow through it. If there are any performance issues with your application, the third-party code will, in most cases, show up in a snapshot and you will be able to see if it is to blame. In some more advanced cases, however, additional instrumentation may be necessary. Please see documentation on how to Configure Instrumentation for more information.



Third-party software may seem like a black box, but extracting performance insights from someone else’s code can be surprisingly simple. AppDynamics offers pre-built support for standalone software and provides visibility into the behavior of third-party services by observing the health of their endpoints. Many third-party libraries are automatically incorporated into AppDynamics’ snapshots. Our belief is that it shouldn’t matter who wrote the code, business and application performance should be transparent all across your networked environment.



What kind of developer are you? [QUIZ]

Are you a code maverick? Maybe you’re a buzzword bandit quick to use “SoMoClo” and other trendy terms? Or maybe you’re just not sure which stereotypical bucket you fall under. That’s why we created this highly scientific personality quiz to assess your true developer personality.

This quick test asks about your food preferences, tools you use, and even your TV show habits. What can we say, it’s pretty accurate.

Are you a developer looking for a new and exciting challenge? You’re in luck, we’re hiring!

Buzzword Bingo by

It seems like every article, tweet, blog post I read someone has a different definition of the same buzzwords – especially in technology.  Mentioning cloud or big data on a tech blog is like bringing sand to the beach. That’s one of the reasons why we made The Real DevOps of Silicon Valley – to make fun of the hype.  I got to thinking… has anyone taken the time to shed some light on these ambiguous terms?  I investigated on Urban Dictionary and this is what I found…

Screen Shot 2013-02-22 at 3.04.29 PM

IT According to
(Not kidding, look it up…)


Screen Shot 2013-02-22 at 5.28.11 PM

cloud com·put·ing, noun.

“Utilizing the resonance of water molecules in clouds when disturbed by
wireless signals to transmit data around the globe from cloud to cloud.
I use cloud computing so I don’t have to worry about viruses, I
only have to worry about birds flying through my cloud.’”



Screen Shot 2013-02-22 at 5.30.08 PMag·ile
, adj.

“Agile is a generalized term for a group of anti-social behaviors used by office workers to avoid doing any work while simultaneously giving the appearance of being insanely busy. Agile methods include visual distraction, subterfuge, camouflage, psycho-babble, buzzwords, deception, disinformation, and ritual humiliation. It has nothing to do with the art and practice of software engineering.”


Screen Shot 2013-02-22 at 5.18.19 PMbig da·ta, noun.

“Modern day version of Big Brother. Online searches, store purchases, Facebook posts, Tweets or Foursquare check-ins, cell phone usage, etc. is creating a flood of data that, when organized and categorized and analyzed, reveals trends and habits about ourselves and society at large.”


Screen Shot 2013-02-22 at 5.13.42 PMdev·ops, adj.

“When developers and operations get together to drink beer and color on whiteboards to avoid drama in the War Room.  Also a buzzword for recruiters to use to promote overpaid dev or ops jobs.”

Watch episode HERE.


soft·ware, noun.

Screen Shot 2013-02-22 at 5.13.21 PM

“The parts of a computer that can’t be kicked, but ironically
deserve it most.


6. IT
i·t, noun. 

“The word the Knights of Ni cannot hear or say.”Screen Shot 2013-02-22 at 5.14.11 PM
(Monty Python & the Holy Grail reference)


QCon: Enough with the theory, already—Let’s see the code

San Francisco’s QCon was expecting a smaller crowd, but ended up bursting at the seams: the event sold out weeks ahead of time and in many sessions it was standing room only.

Targeted at architects and operations folks as much as developers, QCon was heavy on the hot topics of the day: SOA, agile, and DevOps. But if there was a consistent trend throughout the three days, it was “No more theory. Show us the practice.”

At Jesper Boeg’s talk for example—“Raising the Bar: Using Kanban and Lean to Super Optimize Your Agile Implementation”—the talk was peppered with some good sound bites (“If it hurts, do it more often and bring the pain forward”). But it also stressed the meat: Boeg demonstrated a “deployment pipeline” that represented an automated implementation of the build, deploy, test, and release process—a way to find and eliminate bottlenecks in agile delivery.

Similarly, John Allspaw started high in his talk—sharing his ideas on the areas of ownership and specialization between Ops and Dev, a typical DevOps presentation—but backs up the theory with code-level discussions of how logging, metrics, and monitoring works at Etsy.  (His blog entry on the subject and complete Qcon slides can be found on his blog, Kitchen Soap.)

Adrian Cockroft, who is leading a trailblazing public cloud deployment of production-level applications at Netflix, also wrapped theory around juicy substance. He “showed the code” and the screenshots of his company’s app scaling and monitoring tools (you can find his complete slide presentation here).

Not everyone took the time to drill down, though. Tweets from QCon attendees showed that the natives got restless in talks that stayed too high level:

“OK, just because you can draw a block diagram out of something doesn’t mean it makes sense.”

“Ok, we get it. Your company is very interesting, now get to the nerd stuff.”

“These sessions are high-level narratives. Show me the code, guys! Devil’s in the details.”

At the same time, they would shower plaudits and congratulations on speakers who gave them what they wanted: something new to learn.

When the Twitter stream started to compare QCon’s activities with an event happening concurrently in the city, Cloud Expo, the nature of the attendees was draw into sharp relief:

“At #cloudexpo people used laptops during sessions to check email… At #qconsf they are writing code.”

When it comes to agile, SOA, DevOps, and other problems of the day, people are ready for answers.

What’s Under Your Hood? APM for the Non Java Guru: ORMs & Slow SQL

It’s time for another update to our series on Top Application Performance Challenges. Last time I looked at Java’s synchronization mechanism as a source for performance problems. This time around I take on what is likely the Performance Engineer’s bread and butter … slow database access!

Behind this small statement lies a tricky and multifaceted discussion. For now, I’m going to focus on just one particular aspect – the Object Relational Mapper (ORM).

The ORM has become a method of choice for bringing together the two foundational technologies that we base business applications on today – object-oriented applications (Java, .NET) and relational databases (Oracle, mySQL, PostgreSQL, etc.). For many developers, this technology can seem like a godsend, eliminating the need for them to drill-down into the intricacies of how these two technologies interact. But at the same time, ORMs can place an additional burden on applications, significantly impacting performance while everything looks fine on the surface.

Here’s my two cents on ORMs and why developers should take a longer look under the hood:

In the majority of cases, the time and resources taken to retrieve data are orders of magnitude greater than what’s required to process it. It is no surprise that performance considerations should always include the means and ways of accessing and storing data.

I already mentioned the two major technology foundations on which we build business applications today, object oriented applications, used to model and execute the business logic, and relational databases, used to manage and store the data. Object oriented programs unite data and logic into object instances, relational databases on the other hand isolate data into columns and tables, joined by keys and indexes.

This leaves a pretty big gap to bridge, and it falls upon the application to do the legwork. Since bridging this gap is something many applications must do, enter the ORM as a convenient, reusable framework. Hibernate, for instance, is quite likely the most popular ORM out there.

While intuitive for an application developer to use (ORMs do hide the translation complexities) an ORM can also be a significant weight on an application’s performance.

Let me explain.

Take a Call Graph from AppDynamics and follow the execution path of a transaction, method by method, from the moment a user request hits the application until calls to the database are issued to retrieve the requested information. Then size up the layers of code this path has to go through to get to the data. If your application has implemented an ORM like Hibernate, I assure you’ll be surprised how much stuff is actually going on in there.

No finger-pointing intended. A developer will emphasize that the benefits of just using the ORM component (without having to understand how its working) greatly increases his productivity. Point taken. It does pay, however, to review an ORM’s data access strategy.

I recently worked with a company and saw transactions (single user requests) that each bombard the database with 3000+ distinct calls. Seems a little excessive? You would be right. More over, nobody knew that this was happening and certainly nobody intended for it to be so.

In many cases simple configuration settings or using a different ‘fetch’ method offered by the ORM itself can affect performance significantly. Whether for instance the ORM accesses each row of the customer table individually to fill an array of customers in your code or is actually constructing a query that encompasses all of the expected result-set and gets them in one fell swoop does make a big difference.

Seems obvious, right? But do you actually know what your ORM really does when retrieving the information?

You might be surprised.

AppDynamics Lite Turns 5000!

It’s a big day for us here at AppDynamics as we mark (and pass!) the 5000th download of AppDynamics Lite! Released just a few months ago, AppDynamics Lite is our free APM tool for troubleshooting Java performance in production.  We hear every day from companies who have deployed Lite in production environments and are using it to solve real problems.  That’s why we built it, and it’s very rewarding to hear about IT Operations professionals and developers getting real value.

We’re very pleased with the wave of enthusiasm that the tool has been able to generate in such a short time.   Here’s just one example: one of our sales reps was speaking to a company this morning, and he asked how they had heard of us.  The prospect said, “I downloaded and used your Lite tool, which my team thought was phenomenal. One of our business requirements was that our potential vendors could give us immediate access to their product and demonstrate their unique approach to application performance management.  You met this requirement–no one else we talked to did. And therefore, you immediately made our short list.”

This conversation represents another goal of ours: removing the friction that typically exists between software companies and evaluators. With Lite, companies can download it whenever they want and put it immediately into action.  No waiting, no negotiating…just instant access.  This strategy works for us because, as a startup, we believe that if someone becomes familiar with our solution, they’ll see that it’s considerably better than the legacy offerings on the market.

The success of the tool continues to demonstrate that there’s unquestionably a need from both dev and ops teams to have greater insight and understanding of performance as they deploy applications across cloud, virtual and physical environments. Companies big and small are taking advantage of AppDynamics Lite already; you might even recognize some of the names (Ikea, Nokia, MasterCard, Dell, FranklinCovey and Samsung among others).

The great thing about AppDynamics Lite (besides the fact that it takes two minutes to download and it’s free), is that with thousands of users worldwide we’ve been able to gather a lot of product feedback in just a few months. Lite has now been deployed on every conceivable combination of Java application servers and application stacks. This means our core agent technology has been battle-tested in many environments, and we’ve been able to incorporate all that feedback into our commercial product. This is quite different than the usual slow feedback loop that exists in traditional enterprise software go-to-market approach. The benefit to our customers and partners is that our technology is mature beyond its years.

As we’ve always promised, we’ll continue to offer AppDynamics Lite for free. Our goal is to empower developers and IT operations teams to combat one-off performance issues as they occur, while giving them a first-hand look at the AppDynamics approach to application performance management.

Exceptional Performance Improvement

This post is part of our series on the Top Application Performance Challenges.

Over the past couple of weeks we helped a few DevOps groups troubleshoot performance problems in their production Java-based applications. While the applications themselves bore little resemblance to one another, they all shared one common problem. They were full of runtime exceptions.

When I brought the exceptions to the attention of some of the teams I got a variety of different answers.

  • “Oh yeah, we know all about that one, just some old debug info … I think”
  • “Can you just tell the APM tool to just ignore that exception because it happens way too much”
  • “I have no idea where that exception comes from, I think it’s the vendor/contractor/off shore team’s/someone else’s code”
  • “Sometimes that exception indicates an error, sometimes we use it for flow control”

The response was the same – because the exception did not affect the functionality of (read: did not break) the application it was deemed unimportant. But what about performance?

In one high-volume production application (100’s of calls/second) we observed an NPE (Null Pointer Exception) rate of more than 1000 exceptions per minute (17,800/15 min). This particular NPE occurred in the exact same line of code and propagated some 20 calls up in the stack before a catch clause handled it. The pseudo code for the line read if the string equaled null and the length of the trimmed string was zero then do x otherwise just exit. When we read the line out loud the problem was immediately obvious – you cannot trim a null string. While the team agreed that they needed to fix the code, they remained skeptical that simply changing ‘=’ to ‘!’ would really improve performance.

Using basic simulations we estimate this error rate imposes a 3-4% slow down on basic response time at a minimum. You heard me, 3-4% slow down at a minimum.

And these response issues get compounded in multi-threaded application server environments. When multiple pooled threads slow down to cope with error handling then the application has fewer resources available to process new requests. At the JVM level the errors cause more IO, more objects created in the heap, more garbage collection. In a multi-threaded environment the error threatens more than response times, it threatens load capacity.

And if the application is distributed across multiple JVM’s … well you get the picture.

The web contains many articles and discussions on the performance impact of Java’s exception handling. The examples in these posts provide sufficient data to show that a real performance penalty exists for throwing and catching exceptions. Doing the fix really does improve the performance!

Runtime Exceptions happen. When they occur frequently, they do appreciably slow down your application. The slowness becomes contagious to all transactions being served by the application. Don’t mute them. Don’t ignore them. Don’t dismiss them. Don’t convince yourself they are harmless. If you want a simple way to improve your applications performance, start by fixing up these regular occurring exceptions. Who knows, you just might help everyone’s code run a little faster.

APM for the Non Java Guru – Threading & Synchronization Issues

This is the first post in a series on the Top Application Performance Challenges.

Of the many issues affecting the performance of Java/.NET applications, synchronization ranks near the top.  Issues arising from synchronization are often hard to recognize and their impact on performance can be become significant. What’s more, they are often, at least in principle, avoidable.

The fundamental need to synchronize lies with Java’s support for concurrency. This is implemented by allowing the execution of code by separate threads within the same process. Separate threads can share the same resources, objects in memory. While being a very efficient way to get more work done (while one thread waits for an IO operation to complete, another thread gets the CPU to run a computation), the application is also exposed to interference and consistency problems.

The JVM/CLR does not guarantee an execution order of the code running in concurrent threads. If multiple threads reference the same object there is no telling what state that object will be in at a given moment in time. The repercussions of that simple fact can be enormous with, for example, one thread running calculations and returning wrong results because a concurrent thread accessing and modifying shared bits of information at the same time.

To prevent such a scenario (a program needs to execute correctly, after all) a programmer uses the “synchronize” keyword in his/her program to force order on concurrent thread execution. Using “synchronize” prevents threads from obtaining the same object at the same time.

In practice, however, this simple mechanism comes with substantial side effects. Modern business applications are typically highly multi-threaded. Many threads execute concurrently, and consequently “contend” heavily for shared objects. Contention occurs when a thread wants to access a synchronized object that is already held by another thread. All threads contending effectively “block,” halting their execution until they can acquire the object. Synchronization effectively forces concurrent processing back into sequential execution.

With just a few metrics we can show the effects of synchronization on an application’s performance. For instance, take a look at the graph below.

While increasing load (number of users = blue), we see that at some point midway the response time (yellow) takes an upward curve, while at the same time resource usage (CPU = red) somewhat increases to eventually plateau and even recedes. It almost looks like the application runs with the “handbrake on,” a classic, albeit high-level, symptom of an application that has been “over-synchronized.”

With every new version of the JVM/CLR improvements are made to mitigate this issue. However, while helpful, these improvements can’t fully resolve the issue and address the application’s negative performance.

Also, developers have come to adopt “defensive” coding practices, synchronizing large pieces of code to prevent possible problems. In large development organizations this problem is further magnified as no one developer or team has full ownership of an application’s entire code base. The practice to err on the side of safety can quickly exacerbate with large portions of synchronized code significantly impacting the performance of an application’s potential throughput.

It is often too arduous a task to maintain a locking strategy fine grained enough to ensure that only the necessary minimum of execution paths are synchronized. New approaches to better manage state in a concurrent environment are available in newer versions of Java such as readWriteLocks, but they are not widely adopted yet.  These approaches promise a higher degree of concurrency, but it will always be up to the developer to implement and use the mechanism correctly.

Is synchronization, then, always going to result in a high MTTR?

New technologies exist on the horizon that may lend some relief.  Software Transactional Memory Systems (STM), for example, might become a powerful weapon for dealing with synchronization issues. They may not be ready for prime time yet, but given what we’ve seen with database systems, they might be the key to taming the concurrency challenges affecting applications today. Check out JVSTM, Multiverse and Clojure for examples of STMs.

For now, the best development organizations are the ones that can walk the fine line of balancing code review/rewrite burdens and concessions to performance. APM tools can help quite a lot in such scenarios, allowing to monitor application execution under high load (aka “in production”) and quickly pinpoint to the execution times for particular highly contended Objects, Database connections being a prime example. With the right APM in place, the ability to identify thread synchronization issues become greatly increased—and the overall MTTR will drop dramatically.