tencent cloud

All product documents
Tencent Kubernetes Engine
QoSAgent
Last updated: 2024-02-05 16:28:54
QoSAgent
Last updated: 2024-02-05 16:28:54
QoS Agent is an extended component enhanced by Tencent Cloud based on quality of service, offering an array of capabilities. It ensures stability while increasing the utilization rate of cluster resources.
Note:
QoS capabilities are only supported on native nodes. If your nodes are not native, or your workload does not reside on native nodes, these capabilities will not be effective.

Kubernetes objects deployed in a cluster

Kubernetes Object Name
Type
Default Resource Occupation
Associated Namespaces
avoidanceactions.ensurance.crane.io
CustomResourceDefinition
-
-
nodeqoss.ensurance.crane.io
CustomResourceDefinition
-
-
podqoss.ensurance.crane.io
CustomResourceDefinition
-
-
timeseriespredictions.prediction.crane.io
CustomResourceDefinition
-
-
kube-system
Namespace
-
-
all-be-pods
PodQOS
-
kube-system
qos-agent
ClusterRole
-
-
qos-agent
ClusterRoleBinding
-
-
crane-agent
Service
-
kube-system
qos-agent
ServiceAccount
-
kube-system
qos-agent
Daemonset
-
kube-system

Feature Overview

Feature
Description
Priority of CPU Usage
The feature of setting CPU usage priority ensures a sufficient supply of resources for high-priority tasks during resource competition, thereby suppressing low-priority tasks.
CPU Burst
CPU Burst permits temporary provision of resources beyond the limit for latency-sensitive applications, ensuring their stability.
CPU Hyperthreading Isolation
Preventing L2 Cache of high-priority container threads from being affected by low-priority threads running on the same CPU physical core.
Memory QoS Enhancement
A comprehensive enhancement of memory performance, along with the flexible limitations on the memory usage of the container.
Network QoS Enhancement
A comprehensive enhancement of network performance, along with flexible limitations on the network usage of the container.
Disk IO QoS Enhancement
A comprehensive enhancement of disk performance, along with flexible limitations on the disk usage of the container.

QoS Agent Permission

Note:
The Permission Scenarios section only lists the permissions related to the core features of the components, for a complete permission list, please refer to the Permission Definition.

Permission Description

The permission of this component is the minimal dependency required for the current feature to operate.

Permission Scenarios

Feature
Involved Object
Involved Operation Permission
Reading podqos, nodeqos, time series, and other configurations
podqoss / nodeqoss / avoidanceactions
get/list/watch/update
Viewing the pod information of the current node
pod
get/list/watch
Enabling isolation capability based on Podqos/ Modifying node resources to increase offline resources
pod status
update/patch
Adding a taint to the node
node
get/list/watch/update
Sending events based on the status of isolation and resource interference
event
All Permissions

Permission Definition

rules:
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- pods/status
verbs:
- update
- patch
- apiGroups:
- ""
resources:
- nodes
verbs:
- get
- list
- watch
- update
- apiGroups:
- ""
resources:
- nodes/status
- nodes/finalizers
verbs:
- update
- patch
- apiGroups:
- ""
resources:
- pods/eviction
verbs:
- create
- apiGroups:
- ""
resources:
- configmaps
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- events
verbs:
- "*"
- apiGroups:
- "ensurance.crane.io"
resources:
- podqoss
- nodeqoss
- avoidanceactions
verbs:
- get
- list
- watch
- update
- apiGroups:
- "prediction.crane.io"
resources:
- timeseriespredictions
- timeseriespredictions/finalizers
verbs:
- get
- list
- watch
- create
- update
- patch
- apiGroups:
- "topology.crane.io"
resources:
- "noderesourcetopologies"
verbs:
- get
- list
- watch
- create
- update
- patch

Deployment Methods

1. Log into the Tencent Kubernetes Engine Console, and choose Cluster from the left navigation bar.
2. In the Cluster list, click the desired Cluster ID to access its detailed page.
3. Select Add-on management from the left-side menu, and click Create within the Component Management page.
4. On the Create Add-on management page, tick the box for QoS Agent.
5. Click Complete to install the add-on.
Please Note:
With the completion of the deployment, you need to manually select the corresponding driver due to potential differences in cgroup driver of the cluster. The instructions are as follows:
1. Within the Add-on in your cluster, locate the successfully deployed QoS Agent, and click Update configuration on the right.
2. On the add-on configuration page of QoS Agent, select the dropdown box to the right of the cgroupDrive option, and choose cgroupDrive that matches your cluster.
3. Click Complete.

FAQs

How to confirm the cgroupDrive of a cluster?

The cgroupDrive of a cluster can only be either cgroupfs or systemd. The confirmation method is as follows:
Initially, the operation of peekcluster can be viewed in the "basic information" page of the cluster, specifically in the "operating add-on", by determining whether the current cluster serves as a docker or containerd.
If the operating cluster is docker, on any node in the cluster, execute docker info and view the field content of Cgroup Driver.
If the operating cluster is containerd, in the file of /etc/containerd/config.toml on any node in the cluster, the presence of the field: SystemdCgroup = true signifies a systemd, otherwise, it is a cgroup.

How to select the operating business or node?

Choosing a specific resource object via label or scope is supported.
Note:
When both of the following selectors exist concurrently, the operation used is an "and", i.e. all conditions must be met.

labelSelector

The labelSelector filters resources by associating them with the resource labels of the object. The usual method of usage is to attach a specific tag to the designated workloads on the business end. This Tag is then given to the operation team. When creating a PodQOS, the operation team associates this tag through the labelSelector field, effectively granting different QoS capabilities to different businesses.

scopeSelector

The scopeSelector is composed of multiple MatchExpressions. The relationship between these MatchExpressions is an "and". There are three fields in MatchExpressions, namely ScopeName, Operator, and Values corresponding to ScopeName;
The ScopeName includes three types: QOSClass, Priority, and Namespace;
QOSClass refers to a desired Workload that is associated with a specific QOSClass. The Values can be one or more among Guaranteed, Burstable, and BestEffort;
Priority refers to a desired Workload that is associated with a specific Priority. The Values can be specific priority values, such as ["1000", "2000-3000"], supporting a range of priorities;
Namespace refers to a desired Workload that is associated with a specific Namespace. The Values can be one or more.
Operator includes two types, specifically In and NotIn. If left it blank, the default type is In.
As illustrated below, it denotes that the BestEffortPod meets a condition of app-type=offline, with a CPU priority of 7:
apiVersion: ensurance.crane.io/v1alpha1
kind: PodQOS
metadata:
name: offline-task
spec:
allowedActions:
- eviction
resourceQOS:
cpuQOS:
cpuPriority: 7
scopeSelector:
matchExpressions:
- operator: In
scopeName: QOSClass
values:
- BestEffort
labelSelector:
matchLabels:
app-type: offline


Was this page helpful?
You can also Contact Sales or Submit a Ticket for help.
Yes
No

Feedback

Contact Us

Contact our sales team or business advisors to help your business.

Technical Support

Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

7x24 Phone Support