Concepts & Considerations

Overview

Signal Attributes

As defined by the OpenTelemetry standard, io.Insights provides observability data in three different forms (called signals): metrics (time series), traces (structured sequences of events) and logs (streams of free-form events). These have different semantics and usage, but each of them has the concept of "attributes" (also known in the context of certain systems as "fields", "labels", "metadata", "tags" and so on).

Attributes are key-value pairs attached to a particular signal that provide additional detail about what happened. For example, a tracing span can have attributes such as "user": "Peter", "method": "ShowPortfolio", "tracingAppName": "client-list" and so on, to describe details about who performed the action, in which app, and so on. These attributes are used for querying, reporting and analysis in your observability backend.

io.Insights derives a rich set of attributes for each signal it publishes, whether through the default instrumentation or via the Insights API. The APIs also allow client code to add any necessary attributes, and you can add them via configuration using the "additionalAttributes" and "additionalResourceAttributes" properties.

There are two categories of attributes:

  • Resource attributes - have the same value for the entire run of the platform. They describe the entity producing the data (e.g., the platform name, version, machine and user).
  • Per-signal attributes - are different for each measurement, span or log entry and can be specified dynamically. They describe the specific event (e.g., the name of the app that was started, the error message, the duration of the operation).

Different visualization systems may display the two categories differently. You can use the "addResourceAttributesToAttributes" setting in the configuration to force all resource attributes to be published as per-signal attributes as well, if this works better for your system.

ℹ️ For details on the common resource-level attributes published by io.Insights, see the Signals > Overview > Common Resource Attributes section.

Publishing Architecture

Each signal type published by io.Insights has its own publishing mechanism:

  • Metrics and traces are published by each app and component separately - each web app, native app or platform component sends its own HTTP requests with observability data to the configured OpenTelemetry collector endpoint.
  • Logs from web apps are delegated to the platform, which bridges them to the OpenTelemetry log exporter. Server-side components like io.Manager and the Gateway publish their own logs directly.

This distinction means you can often debug an app's metrics or tracing logic using the browser Dev Tools within io.Connect Desktop, while log-related issues may require examining the platform's logging pipeline.

Graceful Shutdown

When io.Connect Desktop is shut down, there may be remaining telemetry data that hasn't been exported yet. io.Insights supports configurable grace periods that allow the platform to wait for any remaining data to be published before completing the shutdown sequence. This is particularly important in production environments where data loss is unacceptable.

ℹ️ For details on configuring the shutdown grace period, see the Configuration > io.Connect Desktop section.

Exporting to Files

io.Insights supports exporting raw observability data to JSON files which can later be manually uploaded into an observability backend or analysis tool. This is useful for environments where direct connectivity to a collector is not available, or for offline analysis. File export supports log4js file rotation properties to limit file size and number of backup files.

Additionally, traces can be exported in a log format directed to a file or any other logging storage. See Traces as Logs for more information.

Metrics

An OpenTelemetry metric is represented as a set of time series, where each entry consists of a timestamp, a set of attributes (sometimes called fields or labels) and a numeric value. Filtering and aggregating these series allows one to analyze the behavior and performance of a system or set of systems over time, build dashboards, reports, alerts and so on.

io.Insights metrics are used both by the io.Connect Desktop and io.Connect Browser platforms - by publishing a predefined set of default platform metrics that describe platform behavior and performance - and by client apps, which can use the Insights API to publish their own custom metrics.

Differences Between Backends

Different observability backends have different but related concepts of what a metric is and how it's represented, which may map differently to OpenTelemetry metrics. For instance, one OpenTelemetry metric can be represented as multiple metrics in the backend. In Prometheus, an OpenTelemetry Histogram metric has a separate metric for the buckets (<basename>_bucket{le=…}), the sum (<basename>_sum) and the count of measurements (<basename>_count). Metric names can also differ between OpenTelemetry/io.Insights and the backend. For example, in Prometheus you might have a suffix for the unit (<basename>_milliseconds), and backend-specific attributes might be derived from well-known OpenTelemetry attributes (e.g., in Prometheus, exported_job comes from service.name and exported_instance comes from service.instance.id).

When integrating io.Insights with your observability stack, be aware of these differences and consult the documentation of your chosen backend to understand how the OpenTelemetry data will be represented. For a list of vendors that natively support the OpenTelemetry standard, visit the OpenTelemetry official site.

Modality and Cardinality

In most metrics backends, every unique combination of attribute values is linked to a separate time series. This means that the number of possible combinations of attributes determines the possible number of time series that can be created in the backend.

Modality represents the number of possible values each attribute can have. For example, if the "application" attribute can have as many values as the number of apps you have, and the "user" attribute is limited to the number of unique users, the maximum number of time series could be the product of these two values. A Boolean attribute has a modality of 2 (true and false), a numeric attribute's modality is constrained by the range of values it represents, and so on.

It's important to be aware of the modalities of your metric attributes to avoid overloading your backend, as time series databases such as Prometheus and Tempo are sensitive to the number of time series. This can be managed by a judicious choice of attributes, adjusting data retention, or reducing attribute modality.

Special consideration should be paid to attributes that have a high or unlimited modality, since they can cause significant growth in the number of time series. By default, io.Insights has several such attributes:

  • The "service.instance.id" resource attribute (sometimes represented in the backend as "instance" or "exported_instance"). This attribute represents the instance of the platform that generated this time series and changes with each restart.
  • The "applicationInstance" attribute in the "app_cpu" and "app_memory" metrics, which represents the instance of the measured app and is unique per app run.

When integrating io.Insights in your enterprise, you need to decide whether your storage capabilities, data retention strategy and analytical needs require these attributes to be retained as unique values or have their modality reduced. For instance, reducing the modality of "service.instance.id" will reduce your ability to correlate which events occurred within the same run of the platform, since the instance ID will no longer be unique, but will significantly reduce your storage requirements.

Other attributes that can benefit from such analysis are "user", "osUser", "sid", "machineName" and similar attributes that naturally have a high modality in a large enterprise.

ℹ️ For details on reducing attribute modality using the "reduceModality" configuration property, see the Configuration > io.Connect Desktop section.

Side-by-Side Publishing

It's possible to publish the same metric in different forms simultaneously. For instance, you can publish the "platform_startup" metric both as a Histogram (for distribution analysis) and as a Gauge (for last-value dashboards), each with its own name and independent publishing settings. This is achieved by defining multiple entries with the same "type" but different "name" values in the metrics configuration.

Traces

The Traces module provides out-of-the-box tracing for io.Connect Desktop and the io.Connect APIs, as well as an API for client apps to instrument their own code. Tracing data is published as OpenTelemetry spans that describe events during the execution of the system.

A span consists of a name, start and end timestamps, an auto-generated ID, a trace ID, an optional parent span ID, a status (in progress, succeeded or errored) and a set of attributes. All of this information can be used to analyze, reproduce or correlate the execution flow that occurred and to reason about the system when diagnosing issues or analyzing system and user behavior.

Each span can have a parent span, and all the spans connected to the same root (parentless) span form a tree-like structure called a trace, representing a connected set of events. Each span is part of exactly one trace. A new trace is created whenever a span is created without a parent span; spans created with a parent span are added to their parent's trace.

A trace can contain spans published by separate apps, systems and machines, which allows reasoning about the behavior of distributed systems. This is why OpenTelemetry tracing is referred to as "distributed tracing".

ℹ️ For details on configuring traces in io.Connect Desktop, see the Configuration > io.Connect Desktop section.

ℹ️ For details on the available provided instrumentation and the trace spans published by the platform and the io.Connect libraries, see the Signals > Traces > io.Connect Desktop section.

ℹ️ For details on creating custom spans and using the Tracing API, see the Insights API > JavaScript section.

Active Span

The concept of an active span is central to how tracing works. The active span is the span in whose context the code is currently executing. If a new span is created while an active span exists, the active span implicitly becomes the parent of the new span, and the newly created span becomes the new active span. When that span finishes, its parent becomes the active span again, and so on recursively up to the root span. When the root span ends, there is no longer an active span, and creating a new span will start a new trace.

One consequence of this mechanism is that any traced user code that calls into io.Connect APIs will properly show the io.Connect API spans as children of its own spans, and vice versa - io.Connect APIs calling into user callbacks that include tracing logic will properly nest the user spans into the io.Connect ones. User and platform instrumentations integrate seamlessly.

Asynchronous Code Considerations

Asynchronous code that uses Promises, callbacks, setTimeout, async/await and so on presents challenges with respect to managing the active span. A sequentially written piece of code might execute at several different moments in time, interspersed with other logic that might create its own spans (which would have become active in the meantime).

There are three approaches to solving this:

  • Using a Context Manager (e.g., Zone.js) which saves and restores the active context when asynchronous code is stopped and resumed. This generally works by patching the Promise implementation and requires a build step that converts await statements into Promises.
  • Manually restoring the tracing state by saving the TracingState object from the span callback and restoring it via the currentTracingState property after an async call returns.
  • Passing propagation info explicitly by extracting propagation info from the parent span and providing it to subsequent span creation calls.

ℹ️ For details on how to use these approaches in your code, see the Insights API > JavaScript section.

Propagation

In the context of tracing, propagation refers to the extraction, transmission and consumption of the active trace and span information between systems. This allows logically related operations to generate spans with the correct parent-child relationships, even when they span different apps, machines or backends.

For example, imagine an app that traces some internal logic and then calls a backend endpoint. The app can extract information about the currently active trace and span and send it to the backend (e.g., in HTTP headers). The backend can then use this information to create its own tracing spans as children of the app's span, resulting in a single trace that shows the full distributed execution flow.

io.Insights uses the W3C Trace Context standard for propagation information (traceparent and tracestate). The io.Connect APIs (Interop, Contexts, Intents and so on) automatically handle propagation between apps, so traces created by one app's io.Connect API call will correctly include spans from the responding app.

ℹ️ For details on extracting and injecting propagation info in your code, see the Insights API > JavaScript section.

Filtering

An especially important concept in the io.Insights Traces module is the filtering configuration. It's used to control tracing behavior for specific spans and circumstances, allowing you to determine:

  • Whether tracing is enabled or disabled for a specific operation.
  • The level of detail (verbosity) of the attributes added to the span.
  • Whether the span will also be exposed as a metric or log entry.
  • Whether the span can start a new trace when none is currently active, or should only be recorded as part of an existing trace.
  • Whether the span can be nested in existing traces or must always start a new trace.
  • The minimum duration for the span to be sampled.

Whenever a span is created, the library reads the filtering configuration and resolves matching entries from the "filters" array, based on two pieces of information:

  • The span source (its name, e.g., "interopio.api.interop.invoke").
  • The filtering context (an object of key-value pairs provided by the code creating the span, the platform, and the library itself).

The matching algorithm works similarly to CSS rules: the source must equal or be a prefix of the entry's "source", and all properties in the entry's "context" must match the span's filtering context. Multiple matching filter entries are merged in a first-wins manner, together with an optional "defaults" fallback.

ℹ️ For details on configuring trace filters in io.Connect Desktop, see the Configuration > io.Connect Desktop section.

ℹ️ For details on providing filtering context when creating spans, see the Insights API > JavaScript section.

Sampling

Sampling is an intermediate step during the lifecycle of tracing spans that occurs before the spans are published to the OpenTelemetry collector. It allows the library to examine, modify or drop spans before they are exported. Sampling is configured separately from filtering and is particularly useful for controlling tracing that doesn't use the io.Insights filtering configuration, such as auto-instrumentation.

Sampling rules can match spans by their name, attributes and context, and can set a probability (between 0 and 1) for whether the span will be exported. This provides a way to reduce the volume of exported trace data without disabling tracing entirely.

ℹ️ For details on configuring sampling rules, see the Configuration > io.Connect Desktop section.

Verbosity Levels

Each tracing span has a verbosity level that controls how much detail is recorded in its attributes. The level is determined by the filtering configuration and can be one of: "OFF", "LOWEST", "DIAGNOSTIC", "DEBUG", "INFO", "WARN", "HIGHEST".

When adding data to a span, you specify the level at which the data should be recorded. If the specified level is lower than the span's configured level, the data is silently ignored. This allows you to instrument your code with detailed diagnostic information that is only captured when needed, without impacting performance or storage in normal operation.

⚠️ Spans with "DIAGNOSTIC" level will record as much information as possible, including request data, method arguments and similar information which may contain personally identifiable information (PII) or other sensitive data. It's advised to use this level cautiously and only on a limited set of spans, users or development environments.

Provided Instrumentations

In addition to the tracing APIs, io.Insights is used to instrument the io.Connect APIs and the io.Connect Desktop platform. This means that you will get observability into the operations of the platform and your apps' usage of the io.Connect APIs out of the box, without writing any instrumentation code.

For example, operations like invoking Interop methods, updating shared Contexts, raising Intents, starting apps and loading Workspaces are automatically traced. These provided instrumentations use source strings with "interopio.api" and "interopio.desktop" prefixes.

ℹ️ For a full list of the provided instrumentations and their span attributes, see the Signals > Traces section.

User Journey

io.Connect Desktop publishes a User Journey trace that describes over time which app was focused by the user and for how long. This is useful for analyzing user navigation patterns, understanding workflows and identifying areas for improvement.

The User Journey trace can be configured in one of two structural modes:

  • Nested - each subsequent focus span is a child of the previous one, forming a deep hierarchy. This is useful when analyzing traces with tools that rely on hierarchical structure.
  • Sibling - each subsequent focus span is a child of the same initial root span, forming a flat structure. This is more suitable for timeline-based visualizations.

Apps can add custom marker spans to the User Journey trace to describe what workflow the user is starting, what data they are working with and so on. These markers provide additional context for later analysis.

ℹ️ For details on configuring the User Journey trace, see the Configuration > io.Connect Desktop section.

ℹ️ For details on adding User Journey markers from your code, see the Insights API > JavaScript section.

Click Stream

io.Connect Desktop publishes a Click Stream trace that describes the user's interaction with each web app's DOM elements. By default, it tracks only "click" events, although this can be reconfigured. Each app has its own separate Click Stream trace.

The Click Stream span attributes include information to identify the element that the user interacted with, such as the element ID, CSS class selectors and tag name. The Click Stream trace can also capture spans from the event listeners of the recorded events, as long as those listeners were subscribed after io.Insights was initialized.

As with User Journey traces, apps can add custom marker spans to the Click Stream trace using the Insights API.

Traces as Metrics

Tracing information can also be published as counter metrics, allowing any code instrumented with tracing to provide information about how many times it was invoked and how long it took. This is achieved through the filtering configuration, which can enable the following derived metrics for any traced span:

  • insights_trace_count - a counter that measures how many times a span was hit.
  • insights_trace_result - similar to insights_trace_count, but includes the final span status as an attribute (success or error).
  • insights_trace_duration - a histogram that measures how long the span took from start to completion.

Using the Traces as Metrics feature, you can use the tracing instrumentation purely to generate metrics without having any tracing information published (by using the "onDisabledSpans" filter properties). In order for this feature to work, both the Traces and Metrics modules must be enabled.

ℹ️ For details on configuring Traces as Metrics using filter properties, see the Configuration > io.Connect Desktop section.

Traces as Logs

Traces can also be exposed as log entries via the io.Connect logging API. This means you can write them to a file or any other logging storage mechanism without needing a tracing backend, or in addition to one.

One advantage of Traces as Logs over regular tracing is that the log entries are created in near real time, as opposed to the regular OpenTelemetry export mechanism which occurs at a configurable interval. This means that crash conditions in the app or platform will be recorded, even if the trace data never gets a chance to be exported via the normal OpenTelemetry pipeline.

The format of the log entries is:

<trace-id>-<span-id>-<parent-span-id> <span-name> <span-status> <span-duration-in-ms> <span-attributes-as-json>

Where span status is 0 for unknown, 1 for success and 2 for error.

Traces can be exposed as logs without the Logs module being enabled, as they use the io.Connect logging API directly. The Logs module is only needed if you want to publish the log entries over OpenTelemetry as well.

ℹ️ For details on configuring Traces as Logs using filter properties, see the Configuration > io.Connect Desktop section.

Logs

The Logs module of io.Insights bridges the existing io.Connect logging mechanisms with the OpenTelemetry log exporter, allowing platform and client app logs to be published as standard OpenTelemetry log records. Unlike metrics and traces, the Logs module doesn't provide its own API for generating log data - instead, it captures log entries produced by the io.Connect logging infrastructure (based on log4js) and forwards them to the OpenTelemetry pipeline.

This means that by enabling OpenTelemetry log publishing, all log data from the platform and its apps can be collected, filtered and exported to an OpenTelemetry-compatible backend without any changes to the app code.

ℹ️ For details on the log payload structure and the common log attributes, see the Signals > Logs section.

ℹ️ For details on configuring OpenTelemetry log publishing, see the Configuration > io.Connect Desktop section.

Log Sources

In io.Connect Desktop, the logging architecture routes logs from several sources through the log4js pipeline:

  • Platform logs - internal logging from the platform itself, using the "default" and "glue-logger" log4js categories.
  • App logs - client web apps publish logs through the io.Connect Logger API. These log entries are sent to the platform via an internal Interop method, where they are routed through the same log4js pipeline the platform uses.
  • Gateway logs - logs from the io.Connect Gateway process, using the "gw" log4js category.
  • Captured output - the platform can capture console messages, network request errors and unhandled JavaScript errors from web apps, as well as stdout/stderr from native apps, and route them through the logging pipeline.

A custom @interopio/log4js-otel appender is included in the platform's log4js configuration to forward matching log entries to the OpenTelemetry log exporter.

Log Filtering and Sensitive Data

The Logs module supports a filtering configuration (through the "filters" and "defaults" properties) that allows you to control which log entries are published based on their category, severity and body content. You can also use the "hideRegex" property to redact sensitive data from log entries before they are published, and the "allowedAttributes" property to restrict which attributes are included in each log record.

⚠️ Take care when enabling log publishing in production environments. Logs may contain sensitive information such as authentication tokens, passwords or other credentials. Use the "hideRegex" filtering property to mask sensitive patterns and carefully review the log categories and severity levels that are forwarded to the OpenTelemetry exporter.

ℹ️ For details on configuring log filters and sensitive data masking, see the Configuration > io.Connect Desktop section.