Why Idempotency and Ephemerality Matter in a Cloud-Native World

September 19 2018

Idempotency and ephemerality are important pillars in Cloud Native. Putting these seemingly disparate terms together might have been unthinkable a few years ago. Fast forward to today, and they represent the Cloud Native world we live in.

The first time I heard “idempotent” and “ephemeral,” I had no idea what they meant. Perhaps I should have, though, because idempotent and ephemeral patterns are not new to the computing world.

In mathematics and computer science, idempotence is a property in which no matter how many times you execute some operations, you achieve the same outcome. Ephemerality is a concept of things being short-lived or transitory. In a Cloud Native environment, we expect consistency and portability with the presumption that infrastructure will likely be impermanent, including containers that are transient and disposable. These shifts are influencing providers across multiple layers of the OSI Model.


Again, “idempotent,” threw me the first time I heard it. I had not felt so perplexed by the meaning of a word since childhood. I was a staff developer and excited to use JMS for the first time. My team was starting to modernize a financial service client’s system to be event-based, implementing ActiveMQ as the message broker. We were determining transaction boundaries and suffering from duplicate messages that were wreaking havoc on our final calculations. One of the project’s senior developers suggested I make the endpoint “idempotent.” I gave him a blank look as if he were speaking pig Latin.

Someone recommended I get a copy of what soon became one of my favorite books, Enterprise Integration Patterns. The book’s authors, Gregor Hohpe and Bobby Woolf, describe a lot of the system-to-system design patterns that we depend on today. In the case of the financial service client, the design pattern was Idempotent Receiver (Consumer). We deployed into one of the client’s data centers, and minus a severe application infrastructure fault, we were under the guise that the center’s infrastructure would be there indefinitely.

By today’s standards, our application infrastructure was fragile. Our duplicate check implementation was very stateful, but if the service had stopped we would have been open to duplicates again. In the next iteration of our application, we designed the idempotent service to be more robust—something we would’ve done sooner had we not been under the guise that our infrastructure was more stable than ephemeral.


Cloud providers are looking to capture more workloads while providing their clients with better ROI. When Google launched its Preemptible Virtual Machine in 2015, it was responding to a demand for lower-cost instances. Due to the short-lived nature of instances, I was having a hard time wrapping my head around the workload that would be appropropriate for a Preemptible Virtual Machine—or even an Amazon Spot Instance, which offers spare compute capacity in the AWS cloud at steep discounts.

The rise of preemptible or spot instances shows the upsurge in ephemeral computing. As enterprises grapple with the nuances of cloud cost, one avenue for lower-cost services is to have the compute live for a shortened, or ephemeral, period.

Prior to preemptible or spot instances, there was a baseline understanding that compute capacity would exist for a finite period of time. Because of cloud availability, some organizations treated traditional instances as indefinite. For a planned hardware upgrade, the cloud vendor would give its customers advance notice to switch workloads over to another instance. For unplanned events like outages, a service designed to be multi-region or multi-zone would suffice.

Today, it feels like we are designing workloads to cope with Chaos Monkey at every level, including infrastructure. There’s a growing understanding that our workload infrastructure likely won’t be there in a predictable format. As a result, we’re building more robust services to cope with this unpredictability. These changes spotlight the importance of keeping workloads portable in case we have to switch to another instance, region, zone or provider.

Software is Eating (Feeding) the Cloud Native World

Marc Andreessen’s famous quote—“software is eating the world”—proves equally true in the Cloud Native space. The most prolific push for generic hardware has been led by public cloud vendors. Similar to enterprises making the move to x86, cloud vendors have been pushing to make all parts of their stack as generic as possible. In case of failure or expansion, vendors can swap a generic part in and out with ease.

With the generic hardware approach, a good amount of logic moves to the software stack. The rationale here is that if hardware is ephemeral, reconstituting the compute, storage, and even networking would be both seamless and consistent with software-defined storage and networking. Applying this to the public/hybrid cloud market, a software-driven solution that’s robust, scalable and portable becomes a core component of Cloud Native.

Save Us, Software!

Configuration control and consistency is moving down the stack: from application to application infrastructure, and now down to infrastructure. With advances in software-defined infrastructure (SDI), the trifecta of load-balancing, clustering and replication can be applied to multiple parts of the stack.


Medium has a very well-written article on the different layers of software-powered networking, from software-defined networking (SDN) to container networking. With the ever-widening adoption of the container networking interface (CNI), containerized applications can have a more consistent approach to network connectivity. For example, with Cisco Application Centric Infrastructure as a robust SDN platform, coupled with a service mesh, enterprises have a consistent and recreatable way of discovering and participating in services. AppDynamics can provide insight into this increasingly complex networking landscape as well.


Not long ago, storage in the cloud world was viewed as non-ephemeral. But as offerings, practices and architectures have begun to shift for some cloud storage products, there’s now a delineation between ephemeral and non-ephemeral storage. Although one of the pillars of a twelve-factor application is to run stateless processes, some sort of state needs to be written somewhere, and a popular place is to disk. Advances in software-defined storage (SDS), with projects such as Ceph and Gluster, provide object and file storage capabilities, respectively. Similar to the delineation of SDN and CNI, there is SDS and Container Storage Interface (CSI). For example, Portworx, a popular cloud-native storage vendor, coupled with commodity cloud or on-premises storage, allows for greater portability and storage consistency from the infrastructure to the container/application level.

One More ‘y’ Term

A successful Cloud Native implementation requires another key component: observability. Because without proper visibility into the stack, it’s nearly impossible to know when and how to react to an ephemeral infrastructure action to maintain idempotency.

Idempotency + Ephemerality + Observability = Cloud Native

Despite the inherent challenges with observability, insight into the system is crucial. Relating changes in ephemeral infrastructure to overall sentiment and KPIs can be a challenge as well. With AppDynamics, it’s much easier to validate and advance your investment in the software-defined world.

AppDynamics provides insight on KPIs for a cloud migration/infrastructure change.


AppDynamics delivers deep insights into containers running across an enterprise infrastructure.

A Look to the Future

Every month it seems like a new project is accepted into the Cloud Native Computing Foundation, which is very exciting. As enterprises march toward infrastructure nirvana, where organizations can recreate robust and consistent infrastructure in an ephemeral world, Cloud Native computing will be an important part of the equation. With the power to create cloud computing almost anywhere, it’s important to not lose focus of non-functional requirements such as security. Cisco’s Tetration Platform, which addresses security and operational challenges for a multicloud data center, can protect hybrid cloud workloads holistically.

Look to AppDynamics for help with navigating the Cloud Native world!

Ravi Lachhman
Ravi Lachhman is an evangelist at AppDynamics focusing on the Cloud and DevOps spaces. Prior to AppDynamics, Ravi has spent time at Mesosphere, Red Hat, and IBM helping enterprises and the federal sector design the next generation of distributed platforms. When not helping to further the technology communities, Ravi enjoys traveling the world especially with his stomach.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form