Sensitivity adjustment for HPA scale-out is not supported by versions earlier than K8s 1.18.
--horizontal-pod-autoscaler-downscale-stabilization-window
parameter of kube-controller-manager
controls the scale-in time window, which is five minutes by default, that is, a scale-in can be performed at least five minutes after the workload reduction.In this design logic, users cannot customize the speed of HPA scaling. However, different business scenarios may have different requirements for scaling sensitivity:
HPA is updated on K8s 1.18, where scaling sensitivity control is added to v2beta2, but the version number of v2beta2 remains unchanged.
During HPA scaling, the fixed algorithm is first used to calculate the desired number of replicas:
Desired number of replicas = ceil[current number of replicas * (current metric / desired metric)]
Here, if "current metric / desired metric" is close to 1 (which is within the default tolerance of 0.1, that is, the ratio ranges between 0.9 and 1.1), no scaling is performed; otherwise, jitters may cause frequent scaling.
Note:Tolerance is determined by the
--horizontal-pod-autoscaler-tolerance
parameter ofkube-controller-manager
. It defaults to 0.1, that is, 10%.
Scaling speed adjustment described in this document doesn't mean adjusting the algorithm for calculating the desired number of replicas. It doesn't increase/decrease the scaling ratio or quantity, but only controls the scaling speed. The implementation should deliver the following effect: controlling the maximum custom ratio/number of Pods that can be added/released in a custom time period allowed by HPA.
In this update, the behavior
field is added to HPA Spec, which contains the scaleUp
and scaleDown
fields for scaling control. For more information, see HPAScalingRules v2beta2 autoscaling.
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: web
spec:
minReplicas: 1
maxReplicas: 1000
metrics:
- pods:
metric:
name: k8s_pod_rate_cpu_core_used_limit
target:
averageValue: "80"
type: AverageValue
type: Pods
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web
behavior: # This is the key point.
scaleDown:
stabilizationWindowSeconds: 300 # When a scale-in is needed, observe for five minutes first. If it is still needed, perform the scale-in.
policies:
- type: Percent
value: 100 # Allow for releasing all
periodSeconds: 15
scaleUp:
stabilizationWindowSeconds: 0 # Perform a scale-out when needed
policies:
- type: Percent
value: 100
periodSeconds: 15 # Up to one time the current number of Pods can be added every 15 seconds.
- type: Pods
value: 4
periodSeconds: 15 # Up to four Pods can be added every 15 seconds.
selectPolicy: Max # Use the larger value of the two calculated based on the above two scale-out policies
behavior
configuration is default, which means it will be added by default if not specified.scaleUp
and scaleDown
. selectPolicy
determines which policy to use for scaling.selectPolicy
is Max
by default, that is, different calculation results are evaluated and the largest number of Pods is selected for scaling.stabilizationWindowSeconds
is the stable window period, that is, scaling is performed only when the metric is below or above the threshold for the stable window period. This is to avoid frequent scaling caused by jitters. For a scale-out, the stable window defaults to 0, indicating to perform the scale-out immediately; for a scale-in, it defaults to five minutes.policies
defines the scaling policy. type
can be Pods
or Percent
, indicating the maximum number or ratio of replicas that can be added every periodSeconds
.If you need to quickly scale out your application, you can use the following HPA configuration:
behavior:
scaleUp:
policies:
- type: Percent
value: 900
periodSeconds: 15 # Up to nine times the current number of replicas can be added every 15 seconds.
The above configuration indicates that nine times the current number of replicas are added immediately, within the maxReplicas
limit though.
Suppose there is only one Pod, the traffic surges, and the metric constantly exceeds nine times the threshold, a scale-out will be performed quickly, during which the number of Pods will change as follows:
1 -> 10 -> 100 -> 1000
If no scale-in policy is configured, a scale-in will be performed after the global default time window (which is five minutes by default).
When the traffic peak is over and the concurrent volume drops significantly, if the default scale-in policy is used, the number of Pods will drop a few minutes later. If another traffic peak comes unexpectedly after the scale-in, the scale-out will be fast but still take some time. If the traffic surges to a really high level, the backend may fail to keep up, causing some requests to fail. In this case, you can add a scale-in policy for HPA by configuring behavior
as follows:
behavior:
scaleUp:
policies:
- type: Percent
value: 900
periodSeconds: 15 # Up to nine times the current number of replicas can be added every 15 seconds.
scaleDown:
policies:
- type: Pods
value: 1
periodSeconds: 600 # Only one Pod can be released every ten minutes.
In the above sample, the scaleDown
configuration is added, specifying that only one Pod can be released every ten minutes. This greatly slows down the scale-in, during which the number of Pods will change as follows:
1000 -> ... (10 minutes later) -> 999
In this way, key businesses will be able to handle traffic surges, and the requests won't fail.
If you want to make scale-outs slow and stable for general applications, add the following behavior
configuration to HPA:
behavior:
scaleUp:
policies:
- type: Pods
value: 1
periodSeconds: 300 # Only one Pod can be added every five minutes.
Suppose there is only one Pod and the metric constantly exceeds the threshold, the number of Pods will change as follows during the scale-out:
1 -> 2 -> 3 -> 4
If you want to prevent key applications from an automatic scale-in after a scale-out and need to determine the scale-in conditions by manual intervention or a self-developed controller, you can use the following behavior
configuration to disable automatic scale-in:
behavior:
scaleDown:
selectPolicy: Disabled
By default, the time window for scale-in is five minutes. If you need to extend the time window to avoid exceptions caused by traffic peaks, you can specify the time window for scale-in by configuring behavior
as follows:
behavior:
scaleDown:
stabilizationWindowSeconds: 600 # Perform a scale-in ten minutes later
policies:
- type: Pods
value: 5
periodSeconds: 600 # Up to five Pods can be released every ten minutes.
In the above sample, when the load drops, a scale-in will be performed 600 seconds (ten minutes) later, and up to five Pods can be released every ten minutes.
Some applications often undergo frequent scale-outs due to data spikes, and the added Pods may be a waste of resources. In data processing pipelines, the desired number of replicas depends on the number of events in the queue. When a large number of events are heaped in the queue, a fast but not too sensitive scale-out is desired, as the heap may last only a short time and disappear even if no scale-out is performed.
The default scale-out algorithm executes a scale-out after a short period of time. You can add a time window to avoid resource waste after a scale-out caused by spikes. Below is the sample behavior
configuration:
behavior:
scaleUp:
stabilizationWindowSeconds: 300 # A scale-out is performed after a 5-minute time window.
policies:
- type: Pods
value: 20
periodSeconds: 60 # Up to 20 Pods can be added every minute.
In the above sample, a scale-out is performed after a 5-minute time window. If the metric falls below the threshold during this window, no scale-out is performed. If the metric constantly exceeds the threshold, a scale-out is performed, and up to 20 Pods can be added every minute.
This is because HPA has many API versions:
kubectl api-versions | grep autoscaling
autoscaling/v1
autoscaling/v2beta1
autoscaling/v2beta2
The version number is irrelevant to the version for creation (which is automatically converted).
If kubectl is used, during API discovery, various types of resources and version information returned by the API server will be cached. Some resources are available in multiple versions; if the version to get is not specified, the default version will be used, which is v1 for HPA. If the operation is performed on some platform UIs, the result will depend on the platform implementation method. In the TKE console, the default version is v2beta1:
Just specify the complete resource name containing the version information:
kubectl get horizontalpodautoscaler.v2beta2.autoscaling php-apache -o yaml
# kubectl edit horizontalpodautoscaler.v2beta2.autoscaling php-apache
Add the following configuration:
behavior:
scaleUp:
policies:
- type: Percent
value: 900
periodSeconds: 10
It indicates that up to nine times the current number of Pods can be added every ten seconds. In actual tests, it happens that the scale-out is slow when the threshold is greatly exceeded.
Generally, it's due to the calculation period and metric latency:
--horizontal-pod-autoscaler-sync-period
parameter of kube-controller-manager
).--metrics-relist-interval
parameter in Prometheus Adapter determines the monitoring metric refresh period (which can be queried in Prometheus); the sum of the two is the longest period for a monitoring data update.Generally, extreme HPA sensitivity is not necessary, and a certain latency is acceptable. In highly sensitive scenarios, you can use Prometheus to shorten the monitoring metric scrape interval and --metrics-relist-interval
of the Prometheus Adapter.
This document describes how to use new HPA features to control the scaling speed so as to meet the requirements in different scenarios. It also provides some common scenarios and configuration samples that can be used as needed.
Was this page helpful?