Task Insights

Data Lake Compute

Product Introduction

Purchase Guide

Configuration Adjustment Fees

Getting Started

Complete Process for New User Activation

DLC Data Import Guide

Quick Start with Data Analytics in Data Lake Compute

Quick Start with Permission Management in Data Lake Compute

Quick Start with Partition Table

Enabling Data Optimization

Cross-Source Analysis of EMR Hive Data

Standard Engine Configuration Guide

Operation Guide

Console Operation Introduction

Data Development and Exploration

Data Exploration

SQL Editor

Data Query Task

SELECT Task

Querying Partition Table

Querying JSON Data

Querying Data from Other Sources

Using View

INSERT INTO

Querying Script Parameters

Obtaining Task Results

Query Script Analysis

Data Job

Overview

Configuring Data Access Policy

Creating Data Job

Managing Data Job

PySpark Dependency Package Management

Resource Management

Engine Management

Network Connection Configuration

Storage Configuration

Managed Storage Configuration

Binding a Metadata Acceleration Bucket

Metadata Management

Data Catalogs and DMC

Data Table Management

Data View Management

Function Management

Partition Field Policy

Ops Management

Historical Task Instances

Historical task(Old version)

Session Management

Insight Management

Task Insights

System Management

User and Permission Management

CAM Service

Permission Overview

User and Work Group

Sub-Account Permission Management

Monitoring and Alarms

Data Engine Monitoring

Data Job Monitoring

Access Point Gateway Engine Monitoring

Monitoring Alarm Configuration

Audit Log

Development Guide

SparkJar Job Development Guide

PySpark Job Development Guide

Query Performance Optimization Guide

UDF Function Development Guide

Materialized View

System Restraints

Metadata Information

Computing Task

Client Access

JDBC Access

DLC JDBC Access

Hive JDBC Access

Presto JDBC Access

Configuring Public Access for Standard Engine

TDLC Command Line Interface Tool Access

Third-party Software Linkage

Python Access

Practical Tutorial

Table Creation Practice

Using Apache Airflow to Schedule DLC Engine to Submit Tasks

Direct Query of DLC Internal Storage with StarRocks

DLC Native Table

DLC Source Table Core Capabilities

DLC Source Table Operation Configuration

DLC Source Table Lake Ingestion Practice

DLC Source Table FAQs

SQL Statement

SuperSQL Statement

Overview of SuperSQL Statement

Unified Statement

DDL Statement

ALTER DATABASE

ALTER DATABASE SET DBPROPERTIES

ALTER DATABASE SET LOCATION

DROP DATABASE

CREATE TABLE

REPLACE TABLE AS SELECT

SHOW COLUMNS IN TABLE

ALTER TABLE

ALTER TABLE ADD COLUMNS

ALTER TABLE ADD COLUMN AFTER/FIRST

ALTER TABLE DROP COLUMN

ALTER TABLE ADD PARTATION

SHOW PARTITIONS

ALTER TABLE DROP PARTITION

ALTER TABLE ADD PARTITION FIELD

ALTER TABLE DROP PARTITION FIELD

ALTER TABLE ... RENAME COLUMN

ALTER TABLE SET TBLPROPERTIES

ALTER TABLE SET LOCATION

ALTER TABLE ... WRITE ORDERED BY

ALTER TABLE ... WRITE DISTRIBUTED BY PARTITION

ALTER TABLE ... SET IDENTIFIER FIELDS

ALTER TABLE ... DROP IDENTIFIER FIELDS

ALTER VIEW

ALTER VIEW SET TBLPROPERTIES

DML Statement

DQL Statement

Iceberg Table Statement

Differences in Statement Between Iceberg External Tables and Native Tables

Materialized View Statement

SQL Implicit Conversion

Functions

Unified Functions

Overview of Unified Functions

Binary Functions

Bitwise Functions

Collection Functions

Date and Time Functions

JSON Functions

Mathematical Functions

Presto Built-in Functions

Comparison of Hive Functions

Overview of Standard Spark Statement

Overview of Standard Presto Statement

API Documentation

Making API Requests

Data Table APIs

DescribeLakeFsDirSummary

DescribeLakeFsInfo

QueryResult

GenerateCreateMangedTableSql

Task APIs

Metadata APIs

DescribeForbiddenTablePro

DescribeDLCCatalogAccess

GrantDLCCatalogAccess

RevokeDLCCatalogAccess

DropDMSTable

DropDLCTable

DescribeDMSDatabaseList

Service Configuration APIs

CreateCHDFSBindingProduct

DeleteCHDFSBindingProduct

DescribeOtherCHDFSBindingList

CreateStoreLocation

DescribeStoreLocation

ModifyDataEngineDescription

RollbackDataEngineImage

SwitchDataEngine

SwitchDataEngineImage

UpgradeDataEngineImage

DeleteThirdPartyAccessUser

DescribeDataEngineImageVersions

DescribeSubUserAccessPolicy

DescribeThirdPartyAccessUser

RegisterThirdPartyAccessUser

RestartDataEngine

UpdateUserDataEngineConfig

UpdateDataEngineConfig

Permission Management APIs

Database APIs

ModifyAdvancedStoreLocation

ModifyGovernEventRule

DescribeAdvancedStoreLocation

Data Source Connection APIs

CheckDataEngineImageCanBeRollback

CheckDataEngineImageCanBeUpgrade

DescribeDataEnginePythonSparkImages

Data Optimization APIs

GetOptimizerPolicy

Data Engine APIs

CreateDataEngine

DescribeDataEnginesScaleDetail

DeleteDataEngine

RenewDataEngine

SuspendResumeDataEngine

UpdateDataEngine

DescribeUpdatableDataEngines

DescribeDataEngine

DescribeUserDataEngineConfig

CheckDataEngineConfigPairsValidity

General Reference

Operation Guide on Connecting Third-Party Software to DLC

Connecting CBoard to DLC

DLC Policy

Data Privacy And Security Agreement

Service Level Agreement

DocumentationData Lake ComputeOperation GuideConsole Operation IntroductionOps ManagementInsight ManagementTask Insights

Task Insights

Download PDF

Last updated: 2025-03-21 12:22:27

Task Insights

Last updated: 2025-03-21 12:22:27

Download PDF

Task insights are made from the task perspective, helping you quickly identify the completed tasks for analysis and providing optimization suggestions.
Prerequisites
1. SuperSQL SparkSQL and Spark job engines:
1. For engines purchased after July 18, 2024, task insights are enabled by default.
2. For Spark kernel versions prior to July 18, 2024, the engine kernel should be upgraded to enable task insights. For details on upgrading, see How to Enable Insights.
3. Standard Spark engine:
1. For engines purchased after December 20, 2024, task insights are supported by default.
2. For engines purchased before December 20, 2024, manual activation of task insights is not supported. Submit a ticket to contact after-sales service for activation.
 Other types of engines do not support task insights currently.
Directions
Log in to the DLC Console, select the Insight Management feature, and switch to the task insights page.
﻿
Insights Overview
Daily-level statistics offer insights into the distribution and trend of tasks requiring optimization, providing a more intuitive understanding of daily tasks.
﻿
Task Insights
The task insights feature supports analyzing the summary metrics of each executed task and identifying the possible optimization issues.
After a task is completed, users only need to select the task to be analyzed and click Task Insights in the operation column to view the details.
Based on the actual execution of the current task, DLC task insights leverage data analysis and algorithmic rules to provide the corresponding optimization recommendations.
﻿
﻿
How to Enable the Insights Feature?
Upgrading Kernel Image for Existing SuperSQL Engines
Note：
For engines purchased after July 18, 2024, or existing engines upgraded to kernel versions after July 18, 2024, Insights are automatically enabled. You can skip this step.
Directions
1. Go to the SuperSQL Engine list page and select the engine for which you want to enable the insights feature.
2. On the engine details page, click Kernel version management > Version upgrade (default upgrade to the latest kernel version).
﻿
Overview of Key Insight Metrics
Metric Name
Metric Definition
Engine execution time
Reflects the time the first task was executed on the Spark engine (the time when the task first preempted the CPU for execution).
CU consumption
Reflects the actual resource consumption of a task. The calculation aggregates the runtime of all Spark task executors. Since each Spark task is executed in parallel across multiple CUs and the CU consumption time is serialized and accumulated, it is greater than the engine execution duration.
Data scan size
Summarizes the input bytes size for each Spark stage.
Total output size
Summarizes the output bytes size for each Spark stage.
Data shuffle size
Summarizes the shuffle read bytes size for each Spark stage.
Number of output files
(This metric requires the Spark engine kernel to be upgraded to a version after November 16, 2024)The total number of files written by tasks through statements such as insert.
Number of output small files
(This metric requires the Spark engine kernel to be upgraded to a version after November 16, 2024)Small files are defined as output files with a size less than 4 MB (controlled by the parameter spark.dlc.monitorFileSizeThreshold, default 4 MB, configurable at the engine or task level). This metric represents the total number of small files written by tasks through statements such as insert.
Parallel task
Displays the parallel execution of tasks, making it easier to analyze affected tasks (up to 200 entries).
Overview of Insight Algorithms
Insight Type
Algorithm Description (Continuously Improving and Adding New Algorithms)
Resource preemption
SQL execution task delay time is greater than 1 minute after stage submission, or delay exceeds 20% of the total runtime (the threshold formula dynamically adjusts based on task runtime and data volume).
Shuffle exception
Stage execution encounters shuffle-related error stack information.
Slow task
Task duration in a stage is greater than twice the average duration of other tasks in the same stage (the threshold formula dynamically adjusts based on task runtime and data volume).
Data skew
Task shuffle data is greater than twice the average shuffle data size of other tasks (the threshold formula dynamically adjusts based on task runtime and data volume).
Disk or memory insufficiency
Error stack information during stage execution includes OOM, insufficient disk space, or COS bandwidth limitation errors related to disk or memory insufficiency.
Excessive small file output
(This insights type requires the Spark engine kernel to be upgraded to a version after November 16, 2024)See the metric number of output small files in the list, and the presence of excessive small file output is determined if any of the following conditions are met:1. For partitioned tables, if any partition outputs more than 200 small files; for non-partitioned tables, if the total number of small files exceeds 200.2. If partitioned or non-partitioned tables output more than 3,000 files with an average file size less than 4 MB.

Was this page helpful?

You can also Contact Sales or Submit a Ticket for help.

Yes

Metric Name	Metric Definition
Engine execution time	Reflects the time the first task was executed on the Spark engine (the time when the task first preempted the CPU for execution).
CU consumption	Reflects the actual resource consumption of a task. The calculation aggregates the runtime of all Spark task executors. Since each Spark task is executed in parallel across multiple CUs and the CU consumption time is serialized and accumulated, it is greater than the engine execution duration.
Data scan size	Summarizes the input bytes size for each Spark stage.
Total output size	Summarizes the output bytes size for each Spark stage.
Data shuffle size	Summarizes the shuffle read bytes size for each Spark stage.
Number of output files	(This metric requires the Spark engine kernel to be upgraded to a version after November 16, 2024)The total number of files written by tasks through statements such as insert.
Number of output small files	(This metric requires the Spark engine kernel to be upgraded to a version after November 16, 2024)Small files are defined as output files with a size less than 4 MB (controlled by the parameter spark.dlc.monitorFileSizeThreshold, default 4 MB, configurable at the engine or task level). This metric represents the total number of small files written by tasks through statements such as insert.
Parallel task	Displays the parallel execution of tasks, making it easier to analyze affected tasks (up to 200 entries).

Insight Type	Algorithm Description (Continuously Improving and Adding New Algorithms)
Resource preemption	SQL execution task delay time is greater than 1 minute after stage submission, or delay exceeds 20% of the total runtime (the threshold formula dynamically adjusts based on task runtime and data volume).
Shuffle exception	Stage execution encounters shuffle-related error stack information.
Slow task	Task duration in a stage is greater than twice the average duration of other tasks in the same stage (the threshold formula dynamically adjusts based on task runtime and data volume).
Data skew	Task shuffle data is greater than twice the average shuffle data size of other tasks (the threshold formula dynamically adjusts based on task runtime and data volume).
Disk or memory insufficiency	Error stack information during stage execution includes OOM, insufficient disk space, or COS bandwidth limitation errors related to disk or memory insufficiency.
Excessive small file output	(This insights type requires the Spark engine kernel to be upgraded to a version after November 16, 2024)See the metric number of output small files in the list, and the presence of excessive small file output is determined if any of the following conditions are met:1. For partitioned tables, if any partition outputs more than 200 small files; for non-partitioned tables, if the total number of small files exceeds 200.2. If partitioned or non-partitioned tables output more than 3,000 files with an average file size less than 4 MB.

tencent cloud

New User Offers

Next-Generation CDN：EdgeOne

Elasticsearch Service Special Offers

Free Tier

Tencent Cloud Startup Program

Special Offers

Lighthouse Special Offers

Cloud Object Storage Special Offers

Featured Products

New Products

Education

Tencent Cloud Online Education Solutions

Gaming

Gaming Solution

Game Media Solutions

Financial Services

Financial Services Solution

Audio & Video

Audio/Video Solution

LVB Recording Solution

Interactive Classroom Solution

Interactive Live Streaming Solution

Audio Chat Social Networking Solution

Real Estate

Tencent Cloud LinkBase(Weiling)

E-commerce

E-commerce retail solutions

Compute

Cloud Virtual Machine

Auto Scaling

Batch Compute

CVM Dedicated Host

Database

TencentDB for MySQL

TencentDB for Redis®

TencentDB for CTSDB

TDSQL for MySQL

Data Transfer Service

TencentDB for MongoDB

TencentDB for PostgreSQL

TencentDB for SQL Server

TencentDB for TcaplusDB

Video Service

Cloud Streaming Services

Video on Demand

Media Processing Service

Cloud Application Rendering

Cloud Contact Center

Game Multimedia Engine

Chat

Real-time Communication

Tencent Effect SDK

AI and Machine Learning

Image Creation Large Model

Face Fusion

eKYC

Optical Character Recognition

Video Creation Large Model

Industry Applications

Tencent HealthCare Omics Platform

Container and Middleware

TDMQ for CKafka

Serverless Cloud Function

Tencent Kubernetes Engine

Tencent Kubernetes Engine for Serverless

Networking

Cloud Load Balancer

Virtual Private Cloud

Direct Connect

Cloud Connect Network

NAT Gateway

VPN Connection

Bandwidth Package

Anycast Internet Acceleration

Elastic Network Interface

Flow Logs

Global Application Acceleration Platform

Security

Captcha