Selection Related
What languages do APM support?
Application Performance Management (APM) follows the OpenTelemetry protocol standard and theoretically supports applications written in all languages. See Application Access Overview to complete the access. Does application integration requires code modifications?
APM supports multiple integration schemes. Due to the differences in programming languages, the integration workload varies significantly.
The Java language has the most mature automatic integration scheme, with automatic event tracking for commonly used architectures and components. The event tracking coverage is very high, achieving zero code intrusion during integration.
Languages such as Python, Node.js, PHP , and .Net also have automatic integration schemes, but the event tracking coverage is lower than that of Java, so code event tracking might be needed to introduce custom event tracking.
Languages such as C++ and Erlang currently do not have automatic integration schemes, so code modifications are needed for manual integration.
What frameworks and components does APM support?
Theoretically, all frameworks and components are supported. Automatic integration schemes automatically performed event tracking on commonly used frameworks and components. For other frameworks and components, users can modify the code to introduce custom event tracking. See the integration scheme for the specific language.
Can applications deployed on other clouds be integrated?
APM supports hybrid cloud deployment scenarios. As long as network connectivity is ensured, applications deployed on other clouds and offline data centers can be integrated. You can choose to access the Internet via a public network access point or report to a private access point through Tencent Cloud VPC schemes.
Who provides the agent and SDKs needed for application integration?
For the Java language, APM provides the Tencent Cloud Enhanced OpenTelemetry Java Agent (TencentCloud-OTel Java Agent), with significant enhancements in event tracking density, advanced diagnostics, performance protection, and enterprise-level capabilities. For other languages, the agents and SDKs needed for application integration are provided by the open-source community. Tencent Cloud does not participate in the evolution and iteration of open-source agents and SDKs.
Do agents need to be embedded into container images?
As long as the application can access the agent file, it does not necessarily need to be embedded into the container image.
Is APM compatible with the OpenTracing protocol?
After the OpenTracing standard merged with the OpenCensus standard, the OpenTelemetry protocol standard was born. APM follows the OpenTelemetry protocol standard, so it is compatible with the OpenTracing protocol.
Should I Choose the OpenTelemetry Solution or the Skywalking Solution?
OpenTelemetry provides a unified observability standard. Compared to the Skywalking solution, the OpenTelemetry solution has a richer ecosystem, a more active community, and supports more languages and frameworks. Therefore, OpenTelemetry is the preferred solution when integrating applications. If you were previously familiar with Skywalking, or hope to quickly migrate from open-source Skywalking to Tencent Cloud Application Performance Monitoring (APM), you can also integrate based on the Skywalking solution.
If SkyWalking were previously used, how to migrate to APM?
APM is already compatible with the SkyWalking protocol standard. You only need to modify the report address and fill in the token of the business system in the Resource parameters to complete the migration.
If the open-source OpenTelemetry scheme were previously used, how to migrate to APM?
APM is already compatible with the OpenTelemetry protocol standard. You only need to modify the report address and fill in the token of the business system in the Resource parameters to complete the migration.
Is the reporting of metrics and logs supported via the OpenTelemetry protocol standard?
The OpenTelemetry protocol standard defines three types of telemetry data: linkages, metrics, and logs. APM supports the reporting of traces data and calculates various metrics on the server side based on traces data. APM currently does not support the reporting of metrics and log data. It is recommended to disable the Metric Exporter and Log Exporter when you integrate via the OpenTelemetry scheme.
Integration Related
Do I need to download the agents and SDK myself?
For Java, Python, Node.js, and .Net applications deployed on TKE, APM provides an automatic integration scheme that allows agents to be automatically injected after the application is deployed to TKE, facilitating quick application integration. In other cases, download and install the agents and SDKs yourself.
After Installing the Probe, There Is an Error In the Log: Failed To Export Spans
It is most likely related to the Token filling. Please refer to the access documentation and fill in the Token correctly. For the OpenTelemetry solution, the Token needs to be filled in the Resource Attributes together with the application name service.name
and other instance attributes.
When the automatic integration scheme is used in a TKE environment, can applications in the same TKE cluster report to different business systems?
Yes. When you install the tencent-opentelemetry-operator in a TKE cluster, a default business system needs to be specified. In the application workload, you can add the annotation cloud.tencent.com/apm-token
to report to a specific business system. If the application workload does not set cloud.tencent.com/apm-token
, it reports to the default business system.
When the automatic integration scheme is used in a TKE environment, can applications in the same TKE cluster report to different regions?
No. When you install the tencent-opentelemetry-operator in a TKE cluster, you must specify the access point. The access point cannot be specified at the application level.
Installation of the tencent-opentelemetry-operator on the APM console failed.
The installation and updating of the operator are managed by the TKE Application Center. Issues such as lack of permissions, insufficient resources, cluster failures, or unsupported cluster versions can all cause the installation to fail. Go to the TKE Application Center to check the cause of the installation failure. The tencent-opentelemetry-operator was installed successfully, but access still failed.
Ensure that the versions of the language and framework meet the requirements. Additionally, ensure that the annotation is added to the spec.template.metadata.annotations
of the workload, not to the metadata.annotations
.
Will Updating the Tencent-Opentelemetry-Operator Affect Already Integrated Applications?
No.
After Updating the Tencent-Opentelemetry-Operator, Application Integration Cannot Be Achieved
After the user submits the update operation, it takes a few minutes for the Operator to update. During this process, newly launched application instances cannot be integrated. Because the Operator cannot inject probes by capturing the creation events of application Pods when the update is not completed. You just need to wait for the Operator update to be completed and then recreate the application Pod to integrate the application.
Can cross-region access be possible?
Yes. As long as the network is connected, cross-region access can be achieved. You can choose the public network access or apply for a Private Link to interconnect VPC networks.
Can cross-region access be possible using the automatic access in a TKE environment?
Yes. When installing the Operator in the APM console, specify the correct reporting region. Can the application instance name be customized?
Yes. By setting the host.name
attribute of the Resource during access, you can customize the application instance name. In most scenarios, the IP address can be used as the application instance name, but if there are duplicated IP addresses in the system, you need to use another unique identifier to define the instance name, such as Host IP + Container Name
.
When you access Python applications through OpenTelemetry-Python, the instance name seen on the console is not the IP address.
The OpenTelemetry-Python scheme does not automatically obtain and use the IP address as the instance name. Please set it using the host.name
attribute of the Resource.
Can Java applications directly use the OpenTelemetry community agent?
Yes, but it is not recommended. APM provides the Tencent Cloud Enhanced OpenTelemetry Java Agent (TencentCloud-OTel Java Agent), with significant enhancements in event tracking density, advanced diagnostics, performance protection, and enterprise-level capabilities. It is recommended to use the TencentCloud-OTel Java Agent. Using the OpenTelemetry community agent directly may result in some feature deficiencies.
Console Features Related
What is a business system?
A business system is used for the classified management of applications. Each business system has a unique token, which must be specified when the application is integrated. You can set parameters such as storage duration and reporting quotas at the business system level. Permission management and cost allocation can also be implemented based on business systems. Monitoring data between different business systems is completely isolated.
How to divide business systems?
Since monitoring data between different business systems is completely isolated, business systems can be divided based on the principle of isolation. If there is no possible call relationship between two groups of applications, you can manage them by assigning them to two different business systems. Dividing business systems by different environments is a typical use case; you can create a business system for the development environment, test environment, and production environment, respectively. Another typical use case is to divide business systems by business domains as long as there are no mutual calls between these business domains.
How to understand applications?
In APM, the application is the most important entity. Multiple processes integrated with the same application name appear as multiple instances under the same application. Therefore, an application is a logical combination. In a microservices architecture, it can be equivalent to a service containing multiple peer instances.
Can the same application name be used in multiple business systems?
Yes.
Can logs be output to CLS via the log association feature?
The log association feature enables query association between logs and linkages, making it easier for users to locate issues. Users need to integrate Tencent Cloud CLS and output the trace_id
field in the log body.
Why can the application diagnosis feature not be used when the Go applications are integrated?
The application diagnosis feature is an enhanced capability of Tencent Cloud Enhanced OpenTelemetry Java Agent (TencentCloud-OTel Java Agent) and is currently only available for Java applications.
Is a New Dependency Required To Print TraceId In Logs?
If you use the Tencent Cloud enhanced edition OpenTelemetry Java probe for integration, you don't need to introduce any new dependencies. You only need to modify the pattern in the log configuration file to inject TraceId and SpanId into the logs. For details, see OpenTelemetry Enhanced Edition Java Probe.
Data Related
Data reporting is successful, but some services appear gray in the APM topology map.
If no requests are received by a service within the selected time period, we consider the service inactive from the service provider's perspective and display it in gray.
Why is P99 greater than the maximum time?
APM calculates percentiles based on a linear distribution hypothesis algorithm, providing estimated values that are not always precise; however, the maximum time must be accurate. This algorithm is consistent with the one used by Prometheus for calculating percentiles and is widely adopted in the industry. For more details, see Prometheus percentile error. When the sample size is relatively small, there may be instances where P99 exceeds the maximum time. Why is there only linkage data but no metric data?
It might be related to the absence of span types. Ensure that the reported spans have the span.kind
set to one of Client
, Server
, Consumer
, Producer
, or Internal
.
Why does disconnection occur?
The following situations may cause a call chain disconnection:
2. Tencent Cloud Enhanced OpenTelemetry Java Agent limits the maximum number of spans reported per second to 5,000. The excess is discarded.
3. The automatic event tracking mechanism of the agent or SDK does not cover the related framework or component, and enhancement is needed via custom event tracking.
What is Apdex?
Apdex, short for Application Performance Index, is an industry standard developed by the Apdex Alliance for assessing application performance. From the user's point of view, the Apdex standard quantifies user satisfaction with application response time into a score ranging from 0 to 1.
What are the calculation rules of Apdex?
First, according to the application performance assessment, the minimum threshold of application response time is determined as the Apdex threshold. Then, the actual response time is used to derive three different performance categories:
Satisfied: Application response time is less than or equal to the Apdex threshold.
Tolerating: Application response time is greater than the Apdex threshold but less than or equal to four times the Apdex threshold.
Frustrated: Application response time is greater than four times the Apdex threshold.
Apdex = (Number of Satisfied + Number of Tolerating / 2) / Total Sample Size
Does Application Performance Monitoring Support Link Sampling?
Why Is SQL Truncated In Database Call Analysis?
In the database call analysis and SQL analysis views, SQL statements are saved as dimension information in the metric data of APM to enable statistical analysis based on SQL language. To avoid difficult-to-understand metric aggregation results due to excessively long dimension information, APM limits the value of dimension information to 256 characters, so the part of the SQL statement exceeding 256 characters will be truncated.
SQL truncation only exists in the metric data of APM, and its complete content is still saved in the corresponding Span of each call. You can find the corresponding link and Span through correlation query, and obtain the complete SQL statement in the link detail view.
In API Analysis, Why Are Some Contents In the API Name Replaced With Identifiers Such As {LONG_NUM}?
In the API analysis view, the API name is saved as dimension information in the metric data of APM to enable statistical analysis based on the API name. The API name is reported by the probe or SDK, which has very high flexibility but is also very prone to dimensional divergence. When the API names in an application have a relatively high cardinality, dimensional divergence will occur, and statistical analysis based on the API name will not be able to provide valuable data, affecting the user experience of APM.
Therefore, APM has introduced multiple preset convergence rules to merge APIs of the same type, thereby reducing the cardinality of API names and providing data value for statistical analysis as well as user experience. The replacement of some contents in the API name indicates that the preset convergence rules have taken effect, and you can view statistical data based on the converged API name.
API name convergence only exists in the metric data of APM, and its complete content is still saved in the corresponding Span of each call. You can find the corresponding link and Span through correlation query, and obtain the complete SQL statement in the link tracing and link detail views.
Billing Related
How is APM charged?
Can APM be tried for free?
Yes. New users have a 15-day trial period, with a reporting limit of 100 million spans within those 15 days and a linkage storage duration of 7 days.
In what situations is the packages (prepaid) billing mode appropriate?
The following conditions must be met:
1. In the package mode, linkages are sampled at 10%, preserving all exceptional linkages without affecting metric accuracy.
2. Users have specific expectations regarding the quantities of agents.
3. The average data reporting volume of the agent is quite large, leading to higher costs in a pay-as-you-go mode.
What does the agent in the package mode mean?
One agent corresponds to one application process accessing APM.
What does Agent x Hour mean?
Agent x Hour is the billing unit in the package mode. Each application process integrated to APM will consume 1 Agent x Hour per hour. For example, 10 application processes integrated to APM will consume 10 x 24=240 Agent x Hours per day.
Can package plans be stacked for purchase?
Yes. As long as it is used within the validity period.
How long is the linkage data stored?
During the trial period, it is stored for 7 days by default. After billing officially starts, you can choose any duration within 30 days as needed.
How long is the metric data stored?
30 days.