Engineering

How We Simplified Synthetic User Experience Monitoring Using Ephemeral Containers in Kubernetes

By | | 8 min read


Summary
Learn how AppDynamics helps execute existing synthetic user monitoring workloads at scale and more cost-effectively using a cloud-native, “Lambda-like” Kubernetes architecture.

No application monitoring strategy is complete without synthetic monitoring. By simulating the journeys of an end user or API through a website or web application, synthetic monitoring plays a critical role in proactively alerting IT teams to performance problems, so that the business can ensure these sites and apps are available and responsive for the best user experience.

That’s why AppDynamics is proud to offer some of the leading monitoring capabilities available today — with some recent improvements. In today’s digital world, businesses are relying on their online assets more than ever to meet unparalleled demand for high-performing digital experiences across websites, mobile apps, IoT devices, and more. With so many of our customers moving towards SaaS-based microservices to keep up with this record usage, we saw a need to revamp our architecture for a few reasons.

For example, our offering required Windows-based implementation, with expensive licencing costs and security and performance issues due to heavyweight Windows agents. For hosted agents managed by AppDynamics, we have an in-house agent fleet manager built to address our needs for fleet capacity management, but in the long run, such components require maintenance and engineering efforts.

To keep up with our customers, we now need to be able to monitor and measure the high availability (HA) of both their internal and public-facing microservice APIs, at scale and with minimal maintenance or infrastructure licensing costs.

Enter our engineering team’s latest project: Simple Synth.

The Big Shift to Simplified Synthetics

Simplified Synthetics, or “Simple Synth,” is an effort driven by our engineering team to revamp our solution and deployment architecture, by having Kubernetes clusters spin up ephemeral Docker containers. These containers run on Linux hosts to execute short-lived customer workloads for web and API monitoring, thus scrapping our long-running Windows-based agents and the associated cost of licensing and other complexities.

This new and improved architecture has helped our engineering team achieve engineering excellence in terms of product performance and reduced maintenance. Our new agent platform can now offer measurement frequency as low as five seconds to support advanced monitoring use cases, enhancing the user experience for our customers.

Let’s take a look under the hood.

The New “Lambda-Like,” Kubernetes-Based Architecture

The Simple Synth architecture involves multiple Kubernetes (k8s)-based clusters deployed across various cloud providers and locations. Each cluster consists of a local component called “Heimdall,” developed by our synthetics team, which communicates with remote cloud-based server components.

Heimdall acts as an orchestrator to fetch web/API measurements for each location/cluster and run them as individual k8s jobs. (We chose this over a pod due to the inherent fault-tolerant capabilities of a k8s job vs. a simple pod.) As a local orchestrator component that’s highly available, Heimdall is built on top of a reactive Spring Webflux, therefore requires a low deployment footprint but is highly scalable to handle volumes. Heimdall communicates with k8s to submit and retrieve job-specific information and manage the complete lifecycle of the measurement container.

Heimdall also uses open-source Hazelcast for achieving distributed data grid needs. Leveraged by the multi-node, highly available Heimdall service, distributed collections from Hazelcast can achieve higher performance.

appdynamics simplified synthetic monitoring

Each of these agent-side k8s jobs is based on “Lambda-like” Docker images, which are dynamically chosen based on the metadata from the server. This allows Simple Synth to execute different types of lightweight container images based on the nature of a specific business need.

For example, our web page measurement container includes Headless Chrome and Puppeteer, executed with an industry-first “Visual Completion Time” tracking component with instrumentation of the web page performance. Similarly, our API test container executes based on a different API test image, which includes similar instrumentation capabilities required for API testing.

appdynamics simplified synthetic monitoring

As the load increases, Heimdall deployments can scale up based on auto-scaling rules and then, in turn, can also scale the number of measurements at each location. We used Kubernetes-based, open-source “cluster-autoscaler” components to achieve the reactive, elastic fleet capacity management of underlying nodes/VMs used for k8s clusters. This meant we could avoid the need for in-house fleet management software by leveraging an open-source k8s-based ecosystem.

Why Not Lambda, Fargate, or Any Other Cloud-Specific Solution?

Rather than use Lambda or Fargate directly as a managed service from Amazon Web Services (AWS), we used the above custom ephemeral pod architecture for a few reasons:

Cloud Agnostic Deployment

Fargate, or a similar cloud provider-based offering, can result in dependencies on specific cloud providers. With synthetic monitoring, our ability to expand our geo-locations for testing across the globe and different cloud providers is a key factor for expanding our business.

On-Premise Solution

Our ability to build both for SaaS and on-prem using a single code base can help us achieve operational excellence, by minimizing both development and testing efforts. This enables us to provide both breadth and depth in product innovation.

Cost

Fargate/AWS Lambda and other cloud-based container services charge based on fixed block durations: for example, per-hour or per-minute container runtime. From approximately 3 million tests per day in our current synth offering, we’ve found that right now, 70% of jobs in synthetic monitoring have a 30-second timeout — meaning these services will charge more even for jobs that need only 30 seconds. According to billing details in AWS Portal, Fargate/AWS Lambda start billing from the container image download, and with synth agents baked into the browser as part of the image, this can lead to an increase in time- or data-based billing.

Availability

Cloud provider-specific solutions are not commonly available even with a single cloud provider across all locations. Fargate, for example, is not available in all AWS locations.

Other Limitations

Cloud-specific solutions don’t offer fine-grained combinations of CPU and memory for Simple Synth use cases like browser or API tests. The limited combinations here lead us to pick the nearest max size, which again is a cost overhead.

Finally, there’s the issue of choosing compute-intensive machines. Based on Simple Synth R&D, we’ve established that Synth payloads are much more efficient on compute-intensive machines, due to their high clock rate and large CPU cache.

Benefits and Results So Far

To sum up, Simple Synth has proven to be a highly available, elastic, cloud-agnostic solution for synthetic monitoring agents. Our internal tests have shown that the platform:

  • Is available for deployment out-of-the-box both for SaaS and on-prem
  • Supports high-frequency measurement execution (as low as every five seconds at an agent location)
  • Optimizes cost by moving away from Windows (saving approximately 45% of our infrastructure cost) and due to the low footprint required for measurement containers vs. multiple long-running VMs (expected to save 60-70% of our infrastructure cost in later phases)
  • Can quickly onboard new AppDynamics features with its Lambda-like architecture and different Docker images for each measurement type (API, network tests, etc.)
  • Offers natural sandbox capability, since each measurement runs as an individual short-lived container
  • Improves monitoring and operationalization with control at CPU, memory specifications, burstable limits, and more, which helps us create a better platform for our customer workloads

 

Further, for both hosted agents managed by AppDynamics and on-prem deployments managed by our customers, we have now eliminated the need for a subject matter expert (SME) to manage, scale, and upgrade the synthetic agents. Any DevOps professional with k8s knowledge can now handle synthetic agents with Docker-based images.

Going Forward

Many of the above features are industry-first implementations for synthetic monitoring, making AppDynamics future-ready to handle the evolving monitoring use cases of our customers at scale. The ability to run Lambda-like, short-lived lightweight containers at scale is what makes this architecture unique — and what we believe will truly transform our customer experience.

Our engineering team is now working on plans to further optimize cost using spot instances, which offer up to 90% of the discounted price but come with the limitation of a quick termination. The nature of short-lived synthetic measurement containers (with an average timeout of 30 seconds, based on our current production workload) makes them ideal for execution on top of spare capacity instances. In case of a termination notice, the respective node or machine can be tainted to avoid further scheduling of containers, and can spin up a replacement node to balance the fleet size.

Stay tuned for more updates on our synthetic monitoring offering.

Co-written by Karthikeyan Ramasamy, Senior Technical Leader and Ashish Mishra, Senior Leader – R&D, Engineering at AppDynamics

AppDynamics LLC (“AppDynamics”) reserves the right to change this Product Roadmap at any time, for any reason and without notice or compensation to the recipient. Notwithstanding any forward-looking statements contained herein (such as estimated completion dates), this Product Roadmap is not a guarantee of future product features and should not be relied upon in making any purchasing decisions. Actual product results may vary from forward-looking statements due to changes in AppDynamics and customer technologies, factors related to the economy and target markets, acquisitions of other companies, the hiring and termination of personnel and other factors.