Flexible OpenTelemetry data generation for effective testing (Part 2)

April 18 2023
 

In this second (and final) segment, we continue to show how our OpenTelemetry data generator provides a seamless product-validation experience across all teams working in AppDynamics Cloud.


In the first part of this two-blog series, we provided a high-level overview of OpenTelemetry™ (or OTel). a complete telemetry system for monitoring both modern, distributed architectures in the cloud and more traditional on-prem applications. OpenTelemetry is fast becoming the de facto standard for telemetry observability, and AppDynamics provides support for OTel-based application and infrastructure monitoring across all our offerings.

This gives us many opportunities to optimize the way we perform testing on our products that consume OpenTelemetry data. In this second segment, we’ll continue to explore how we developed our open-source OpenTelemetry data generator and show how to use it. We strongly recommend reading part one (if you haven’t already) before continuing below.

Execution modes

When we first started our OpenTelemetry data generator effort, the goal was for testing teams to use the tool as a dependency in their test code, which they could call. However, two new requirements emerged over time:

  • Sales/demo engineers didn’t want to write code to demo the product.
  • Teams wanted a mechanism to run their load 24/7. In this scenario, their tests would run periodically over the day to validate that their services are responding appropriately to the encoded deviations in the MELT data generated based on the definitions.

To deal with the no-code requirement, we introduced a CLI mode where you only need to write the definitions once based on your requirements and then execute a JAR file with those definitions whenever you want to start the load.

The requirement to run this continuously was a bit harder to solve, as it would mean running multiple instances of the load generator in a web service. That in itself wasn’t problematic, but the decision to use JEL for expression evaluation, while powerful, created a problem. Why? In the standalone context, all the expression methods have to be implemented as static and further, they are not thread safe. That said, the requirement was fair, and we ended up encoding the idea of a request ID at the very top level of the generator. Each thread invoked by every instance of this generator is uniquely identified by this ID, and the ID and threads are maintained on a global map. So, if you end up using this tool in a multi-threaded context, you can be assured that you won’t have to deal with a deadly implementation flaw.

The results

When we began developing the data generation tool, we thought it would be used only by test engineering teams for end-to-end testing. However, as we shared more information via tech talks and dedicated IM spaces, we started getting more and more inquiries about how the tool might be used in different ways. Today, it’s being used by almost all the AppDynamics Cloud engineering squads for end-to-end tests and validations in production environment, by a significant number of teams for component tests, and by some squads for developing continuous flows to validate the state of pre-production environments (not in conjunction with the normal CI). It’s even being used by development engineering teams to validate new changes and troubleshoot new features in their local environments.

In fact, we estimate the tool is now being used in over 1,000 tests, most of them end-to-end tests. This makes sense: With the OpenTelemetry MELT schema being fairly significant in size and the need to send a lot of data for end-to-end tests encompassing various scenarios and deviations, it’s logical that end-to-end tests are where this tool is most useful.

Although formally, we never claimed this tool could be used for performance/load testing — and we’re evaluating how to expand it for that purpose — some teams have used it for performance testing in particular scenarios. To that effect, we did validate that the tool works with some normal-sized loads and also checked its resource consumption:

  • Five metrics from 50,000 entities reported every 15 seconds and consumed about 570 MB of memory.
  • Three log types reported by 20 to 30 entities and having a copy count of 2,000 to 8,000, leading to a total of 330,000 logs, reported every 30 seconds and consumed about 300 MB of memory.
  • 19 spans crossing three traces with a copy count ranging from 8,000 to 50,000, leading to a total of 150,000 spans, reported every 15 seconds and consumed about 550 MB of memory.

Try the open source release!

With a good rate of adoption inside Cisco AppDynamics, the dearth of available tools for flexible and expansive OpenTelemetry simulated data generation, and significant demand from our community, we decided to make this tool publicly available.

The repo is available here. The detailed documentation is available here. And while we’re continuously maintaining and enhancing the capabilities of the telemetry generator, please feel free to make any requests here. We encourage you to contribute to this project and recommend you start with the contributing guidelines.

Special thanks to Aarushi Singh, Severin Neumann, Ashish Tyagi and Girish Grover.

Manpreet Singh is a Senior Engineer at Cisco AppDynamics and implements behind-the-scenes services that power our products. She has over ten years of experience in software testing and development of large-scale data processing and analytics products. Manpreet works out of the Bengaluru office in India and likes learning about macroeconomics.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form