Telemetry Data - The way of data transmission and collection

shyamrangaraju9
Jul 8, 2023
6 min read

Updated: Aug 16, 2023

What is Telemetry data?

In observability, telemetry data is the output acquired from system sources. When these results are combined, they create a picture of the relationships and dependencies that exist within a distributed system.

What is Open Telemetry?

OpenTelemetry is focused on data - specifically, the data and data stream (telemetry) needed to best understand, troubleshoot, and improve our applications. Data is only useful if it can be aggregated, analyzed, and visualized at scale.

OpenTelemetry also enables a natural correlation across those data sources, rather than expecting us to attempt that correlation ourselves.

Why Open Telemetry is important?

OpenTelemetry is important because it standardizes the way telemetry data is collected and transmitted to backend platforms. It bridges visibility gaps by providing a common format of instrumentation across all services. SREs/SROs don’t have to re-instrument code or install different proprietary agents every time a backend platform is changed. OpenTelemetry will continue to work, too, as new technologies emerge, unlike commercial solutions, which will require vendors to build new integrations to make their products interoperable.

OpenTelemetry Architecture

OpenTelemetry's architecture is made up of numerous components that work together to deliver optimum value to the teams that use it. Taking a high-level look at each of these components can help us realize the true value of OpenTelemetry. This can help us to see what it has to offer in terms of data collecting and customer analytics.

Otel Architecture — Pic Credits - OTEL Official Site

API - APIs are language-specific computer programs that provide the fundamentals for integrating OpenTelemetry. They make it easy to add new features to the app by providing a framework that can be attached to it.
SDK - The SDK is another language-specific component of the architecture. This serves as a sort of mediator, bridging the gap between the exporter and the APIs. Because of the SDK's additional setup, transaction sampling, and request filtering are both handled more easily.
Collector - Technically, this is the only component that isn't essential. The Collector, on the other hand, makes setting up OpenTelemetry architecture faster and easier. It's a highly handy component that allows us to transmit and receive application telemetry on the backend with a lot of flexibility. The collector can be deployed in two ways: as an agent or as a standalone.
Exporter - We can use an Exporter to determine whose backends they're sending telemetry to. The backend configuration is independent of instrumentation, providing a variety of options. It's simple to switch backends, and there's no need to re-instrument code.

As we focused on Open Telemetry and its architecture, let’s understand its components for MELT approach and the benefits.

Components of OpenTelemetry

The MELT approach, which may be separated into four different data groups, is used by OpenTelemetry to standardize data visibility.

Benefits of OpenTelemetry

OpenTelemetry has a number of features that can help the technology business in a variety of ways:

Ease to Use: It is business-friendly and helps in the achievement of objectives. It provides SREs/ SROs with a complete manual that helps them discover bugs faster, report them, and repair them right away, saving time and resulting in positive results.
Instrumentation: Instrumentation is the act of adding observability code to our application. This can be done with direct calls to the OpenTelemetry API within our code or including a dependency that calls the API and hooks into our project, like a middleware for an HTTP server.
Consistency: It offers a de facto standard for adding observability to cloud-native apps by providing a consistent method for gathering telemetry data and delivering it to a backend without modifying instrumentation. Instead of wrangling with their instrumentation, SREs and product teams can focus on designing new application features.
Streamlined Observability: SREs/ SROs can monitor application usage and performance metrics with OpenTelemetry from any device or web browser. This user-friendly interface makes tracking and analyzing observability data in real-time a breeze.

Now, we have an understanding of the components and benefits, let’s focus on the OTEL objectives.

Open Telemetry Collector Objectives

So, the OpenTelemetry collector is a Go binary that does exactly what its name implies: it collects data and sends it to a back-end. But there’s a lot of functionality that lies in between.

Limitations of OpenTelemetry

Deploying auto instrumentation can be complicated, especially if we are working with it for the first time, as it adds more load to our application. Including OpenTelemetry in our application can increase lines of code and attention to traces, creating an extra burden and requiring a change in the infrastructure where our application is running, meaning it’s going to change the way we deploy our application.

Considering the limitations, components, and benefits of Open Telemetry, let's focus on the best practices to avoid issues during our implementation.

Open Telemetry best practices

Keep Initialization Separate from Instrumentation: One of the biggest benefits of OpenTelemetry is that it enables vendor-agnostic instrumentation through its API. This means that all telemetry calls made inside of an application come through the OpenTelemetry API, which is independent of any vendor being used. Hence, we should define OpenTelemetry to keep the provider configuration at the top level of our application or service–usually at the application’s entry point. This ensures that OpenTelemetry instrumentation is separate from the instrumentation calls and allows us to choose the best tracing framework for our use case without having to change any instrumentation code. Separating the provider configuration from the instrumentation enables us to switch a provider simply with a flag or environmental variable.

Know the Configuration Knobs: OpenTelemetry tracing supports two strategies to get traces out of an application, a “SimpleSpanProcessor” and a “BatchSpanProcessor.” The SimpleSpanProcessor will submit a span every time a span is finished, but the BatchSpanProcessor buffers the spans until a flush event occurs. Flush events can occur when a buffer is filled or when a timeout is reached.

The BatchSpanProcessor has a number of properties: Max Queue Size is the maximum number of spans buffered in memory. Any span beyond this will be discarded. Schedule Delay is the time between flushes. This ensures that we don’t get into a flush loop during times of heavy traffic. Max per batch is the maximum number of spans that will be submitted during each flush.

Look For Examples and some tests: OpenTelemetry is still a very young project. This means the documentation for most of the libraries is still sparse. To illustrate this, for the BatchSpanProcessor discussed above, configuration options are not even documented in the Go OpenTelemetry SDK! The only way to find examples is by searching the code from the source code.

Use Auto-Instrumentation by Default… but Be Careful of Performance Overhead: OpenTelemetry supports auto-tracing for Java, Ruby, Python, JavaScript, and PHP. Auto-tracing will automatically capture traces and metrics for built-in libraries such as:

HTTP Clients
HTTP Servers & Frameworks
Database Clients (Redis, MySQL, Postgres, etc.)

Auto-instrumentation significantly decreases the barrier to adopting observability, but we need to monitor it closely because it adds additional overhead to program execution. During normal execution, the program calls out directly to the HTTP client or database driver. Auto-instrumentation wraps these functions with additional functionality that costs time and resources in terms of memory and CPU. Because of this, it’s important to benchmark our application with auto-tracing enabled vs. auto-tracing disabled to verify that the performance is acceptable.

Unit Test Tracing Using Memory Span Exporters: Most of the time, unit testing focuses on program logic and ignores telemetry. But occasionally, it is useful to verify the correct metadata is present, including tags, metric counts, and trace metadata. A mock or stub implementation that records calls is necessary to achieve this. Tests are able to configure their own test exporter Remember that OpenTelemetry separates instrumentation from exporting, which allows the production code to use a separate exporter from testing. This allows us to test that our code is setting metrics and metric tags correctly.

Measuring critical KPI & tracking code via CI/ CD pipeline: Additionally, use our trace to measure critical KPIs in our application and add a task to CI/CD processes to track our code so we can compare the traces between the older and newer releases of our application. This comparison helps evaluate significant changes.

Majorly defined Use cases for Open Telemetry

OpenTelemetry is used in back- and front-end applications:

At the front end of our application, OpenTelemetry can do the following:

Detect faulty logic or incorrect user input, which can cause JavaScript® errors.
Find poorly implemented JavaScript, which makes our UI extremely slow despite having fast APIs.
Locate geo-specific lag requiring geo-distribution.

At the back end of our application, OpenTelemetry can do the following:

Detect faulty logic or incorrect user input, which leads to exceptions being thrown.
Identify improperly implemented API calls to the back end—for example, requests to infrastructure like databases or downstream APIs—leading to longer response times.
Uncover poorly performing code on an API, which also leads to a longer response time.

For infrastructure, OpenTelemetry can be used to:

Perform version audits to ensure zero vulnerabilities and make sure configurations are working.
Identify configuration changes leading to performance degeneration.
Check for misconfiguration with our domain name system (DNS), causing apps to be inaccessible.

As we are now focused on Observability & Open Telemetry, we need to understand who can implement this and why they need to do this. We will discuss this in the next post.