QoS Agent is an extended component enhanced by Tencent Cloud based on quality of service, offering an array of capabilities. It ensures stability while increasing the utilization rate of cluster resources.
Note:
QoS capabilities are only supported on native nodes. If your nodes are not native, or your workload does not reside on native nodes, these capabilities will not be effective. Kubernetes objects deployed in a cluster
|
avoidanceactions.ensurance.crane.io | CustomResourceDefinition | - | - |
nodeqoss.ensurance.crane.io | CustomResourceDefinition | - | - |
podqoss.ensurance.crane.io | CustomResourceDefinition | - | - |
timeseriespredictions.prediction.crane.io | CustomResourceDefinition | - | - |
kube-system | Namespace | - | - |
all-be-pods | PodQOS | - | kube-system |
qos-agent | ClusterRole | - | - |
qos-agent | ClusterRoleBinding | - | - |
crane-agent | Service | - | kube-system |
qos-agent | ServiceAccount | - | kube-system |
qos-agent | Daemonset | - | kube-system |
Feature Overview
|
Priority of CPU Usage | The feature of setting CPU usage priority ensures a sufficient supply of resources for high-priority tasks during resource competition, thereby suppressing low-priority tasks. |
CPU Burst | CPU Burst permits temporary provision of resources beyond the limit for latency-sensitive applications, ensuring their stability. |
CPU Hyperthreading Isolation | Preventing L2 Cache of high-priority container threads from being affected by low-priority threads running on the same CPU physical core. |
Memory QoS Enhancement | A comprehensive enhancement of memory performance, along with the flexible limitations on the memory usage of the container. |
Network QoS Enhancement | A comprehensive enhancement of network performance, along with flexible limitations on the network usage of the container. |
Disk IO QoS Enhancement | A comprehensive enhancement of disk performance, along with flexible limitations on the disk usage of the container. |
QoS Agent Permission
Note:
The Permission Scenarios section only lists the permissions related to the core features of the components, for a complete permission list, please refer to the Permission Definition.
Permission Description
The permission of this component is the minimal dependency required for the current feature to operate.
Permission Scenarios
|
Reading podqos, nodeqos, time series, and other configurations | podqoss / nodeqoss / avoidanceactions | get/list/watch/update |
Viewing the pod information of the current node | pod | get/list/watch |
Enabling isolation capability based on Podqos/ Modifying node resources to increase offline resources | pod status | update/patch |
Adding a taint to the node | node | get/list/watch/update |
Sending events based on the status of isolation and resource interference | event | All Permissions |
Permission Definition
rules:
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- pods/status
verbs:
- update
- patch
- apiGroups:
- ""
resources:
- nodes
verbs:
- get
- list
- watch
- update
- apiGroups:
- ""
resources:
- nodes/status
- nodes/finalizers
verbs:
- update
- patch
- apiGroups:
- ""
resources:
- pods/eviction
verbs:
- create
- apiGroups:
- ""
resources:
- configmaps
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- events
verbs:
- "*"
- apiGroups:
- "ensurance.crane.io"
resources:
- podqoss
- nodeqoss
- avoidanceactions
verbs:
- get
- list
- watch
- update
- apiGroups:
- "prediction.crane.io"
resources:
- timeseriespredictions
- timeseriespredictions/finalizers
verbs:
- get
- list
- watch
- create
- update
- patch
- apiGroups:
- "topology.crane.io"
resources:
- "noderesourcetopologies"
verbs:
- get
- list
- watch
- create
- update
- patch
Deployment Methods
2. In the Cluster list, click the desired Cluster ID to access its detailed page.
3. Select Add-on management from the left-side menu, and click Create within the Component Management page.
4. On the Create Add-on management page, tick the box for QoS Agent.
5. Click Complete to install the add-on.
Please Note:
With the completion of the deployment, you need to manually select the corresponding driver due to potential differences in cgroup driver of the cluster. The instructions are as follows:
1. Within the Add-on in your cluster, locate the successfully deployed QoS Agent, and click Update configuration on the right.
2. On the add-on configuration page of QoS Agent, select the dropdown box to the right of the cgroupDrive option, and choose cgroupDrive that matches your cluster.
3. Click Complete.
FAQs
How to confirm the cgroupDrive of a cluster?
The cgroupDrive of a cluster can only be either cgroupfs or systemd. The confirmation method is as follows:
Initially, the operation of peekcluster can be viewed in the "basic information" page of the cluster, specifically in the "operating add-on", by determining whether the current cluster serves as a docker or containerd.
If the operating cluster is docker, on any node in the cluster, execute docker info
and view the field content of Cgroup Driver
.
If the operating cluster is containerd, in the file of /etc/containerd/config.toml on any node in the cluster, the presence of the field: SystemdCgroup = true
signifies a systemd, otherwise, it is a cgroup.
How to select the operating business or node?
Choosing a specific resource object via label
or scope
is supported.
Note:
When both of the following selectors exist concurrently, the operation used is an "and", i.e. all conditions must be met.
labelSelector
The labelSelector filters resources by associating them with the resource labels of the object. The usual method of usage is to attach a specific tag to the designated workloads on the business end. This Tag is then given to the operation team. When creating a PodQOS, the operation team associates this tag through the labelSelector field, effectively granting different QoS capabilities to different businesses.
scopeSelector
The scopeSelector is composed of multiple MatchExpressions. The relationship between these MatchExpressions is an "and". There are three fields in MatchExpressions, namely ScopeName, Operator, and Values corresponding to ScopeName;
The ScopeName includes three types: QOSClass, Priority, and Namespace;
QOSClass refers to a desired Workload that is associated with a specific QOSClass. The Values can be one or more among Guaranteed, Burstable, and BestEffort;
Priority refers to a desired Workload that is associated with a specific Priority. The Values can be specific priority values, such as ["1000", "2000-3000"], supporting a range of priorities;
Namespace refers to a desired Workload that is associated with a specific Namespace. The Values can be one or more.
Operator includes two types, specifically In and NotIn. If left it blank, the default type is In.
As illustrated below, it denotes that the BestEffortPod meets a condition of app-type=offline, with a CPU priority of 7:
apiVersion: ensurance.crane.io/v1alpha1
kind: PodQOS
metadata:
name: offline-task
spec:
allowedActions:
- eviction
resourceQOS:
cpuQOS:
cpuPriority: 7
scopeSelector:
matchExpressions:
- operator: In
scopeName: QOSClass
values:
- BestEffort
labelSelector:
matchLabels:
app-type: offline
Was this page helpful?