We're excited to partner with AWS Distro for OpenTelemetry.Learn more
As an emerging standard for cloud-native monitoring, OpenTelemetry will expand AppDynamics’ market-leading full stack observability into a broader set of systems, further enriching traditional AI and ML capabilities.
The (Brief) History of OpenTelemetry
Today, telemetry data plays a vital role in providing technologists with a complete picture of app behavior and performance. However, collecting such data from many different systems presents a significant challenge. Enter OpenTelemetry — a nascent project of the Cloud Native Computing Foundation (CNCF) and the result of a merger between the OpenTracing and OpenCensus projects.
In 2019, OpenTracing (a CNCF vendor-agnostic tracing project) and OpenCensus (a Google-led vendor-agnostic tracing and metrics library) merged to form OpenTelemetry with the goal of simplifying the telemetry ecosystem by providing a unified set of instrumentation libraries and specifications for observability telemetry.The result is a complete telemetry system suitable for monitoring both modern distributed services and more traditional environments. By offering a single set of APIs and libraries, it simplifies the collection and transfer of telemetry data used by technologists.
As a similarly vendor-agnostic service, OpenTelemetry is compatible with a growing list of leading OSS and commercial backends, including Jaeger and Zipkin.
How does OpenTelemetry work?
Observability, or the ability to monitor a system’s performance by virtue of the external data it produces, is generally achieved by utilizing 4 fundamental data groups:
- Metrics provide time-based numerical measurements on elements of the application ecosystems. For instance, there are an average of 255 transactions per second.
- Events track individual actions, producing a time-stamped inventory of data related to operations that are defined by the user.
- Logging collects application-generated structured or unstructured text added to help troubleshoot. There is much debate about the effectiveness of log data in a microservices context, as it is difficult to scale and often considered overly expensive.
- Tracing, or spans, provides an understanding of how requests flow through the system. This can be used to understand interacting parts and their behaviors.
These data items can help provide technologists with a richer picture of application health and performance. The four categories are inter-connected, meaning metrics can be used to locate misbehaving distributed traces, and logs associated with trace data can be used to find out exactly what's causing an issue, etc.
Understanding the OpenTelemetry ecosystem
The OpenTelemetry API is used by app developers to instrument their code. Library authors use it to integrate instrumentation within their libraries. The API is further divided into four parts:
The Tracer API enables the generation of spans. A span is a named, timed operation that represents a contiguous segment of work in a trace.
With the Metric API, developers can access different instruments like counters and observers.
The Context API can add valuable context information like W3C Trace Context.
Lastly, there are also a set of semantic conventions included in the OpenTelemetry API that have rules for naming spans and attributes, as well as for associating errors with spans.
Key components for OpenTelemetry integration
For the implementation of the OpenTelemetry API, there's the OpenTelemetry SDK, with three parts that are comparable to the aforementioned APIs. OpenTelemetry provides one API and SDK per language, with the latter allowing for additional configuration like transaction sampling and request filtering.
The in-process Exporter is an important component, as this is what enables developers to choose which backend they want the telemetry to be sent to once translated to standard OTLP or proprietary formats.
The OpenTelemetry Collector gathers the data, then processes and exports that data using proprietary or vendor-agnostic implementation.
Benefits of OpenTelemetry
The primary benefit of OpenTelemetry is that it provides a unified standard for creating and ingesting telemetry data, much like container orchestration standards set years ago by Kubernetes. OpenTelemetry provides a consistent collection mechanism and format, without locking technologists to a specific vendor.
Using metrics, events, logging, and tracing (MELT) to practice observability, OpenTelemetry is able to provide developers and site reliability engineers with a more complete impression of app performance than was previously provided by basic monitoring.
Observability helps achieve business objectives
With observability, the goal is to enable developers, security analysts, and product managers to better understand and fix problems in the app (reliability, security, etc.) that could have a negative impact on the business.
The telemetry data can be used to ascertain if the system is operating as it should be or if the event logs reveal the threat of a potential intrusion or attempted DDoS attack.
Having this watchful eye over the system at all times means that any such issues can be flagged instantly and addressed before any disruption in service is caused.
OpenTelemetry simplifies observability integration
Before OpenTelemetry, there wasn't a centralized and flexible observability solution. IT professionals had to stick with specific vendors and backends.
OpenTelemetry aims to address this challenge by providing a standardized format for the collection of telemetry data in order to improve observability, enabling developers to effectively manage and debug issues.
With language-specific integrations for popular web frameworks, it can automatically capture relevant traces and metrics, in addition to handling context propagation.
OpenTelemetry's automatic instrumentation agents can also collect telemetry from some apps without requiring any code changes. All of the collected data can then be sent to a backend of the developer's choice with no specific vendor requirements.
The future of OpenTelemetry
The industry stands to benefit from the consolidation of observability ecosystems. Vendor-neutral open standards are vital for increased adoption, and OpenTelemetry is going to make integrating observability more viable.
Still in the beta phase, the OpenTelemetry project is only just starting out. At the moment, spans and traces are the most thoroughly developed components. However, the project also currently covers metrics with “manual code change” for some languages, and might eventually integrate logging.
The team is working towards the goal of providing a generally available, production-quality release in the second half of 2020. This will result in the stabilization of core components, in addition to the integration of automated instrumentation through agents, additional language compatibility, and improved metrics capabilities—all of which are in active development. The project is, and will likely remain, focused on getting tech health info from systems in a standard way.
Backward compatibility is a key feature of the project, allowing OpenTelemetry to be implemented with applications that use either the OpenTracing or OpenCensus systems, and providing developers with an easy path to make the transition. OpenTelemetry is one of the Cloud Native Computing Foundation’s biggest open source projects, with growing developer support for the ecosystem.
With ample opportunity to get involved in the project through GitHub, OpenTelemetry is well poised to dominate the cloud-native telemetry landscape.
While integrating observability into an IT system isn't a goal on its own, it should be viewed as a vital component in the quest to achieve business objectives. Helping to relate how infrastructure (whether compute, storage, or network) and supporting services influence an application’s health is the first step towards positively impacting user experiences and business outcomes.
Hear from our customers
"With AppDynamics, we gain better visibility into how microservices interface with the rest of the components of our application, the ability to proactively troubleshoot emerging issues, and the increased velocity to resolve issues faster than ever."
Nuno Pereira, CTO, iJET