Cluster autoscaling in Kubernetes allows for flexible resource management by automatically adding or removing nodes and pods based on the workload.
For autoscaling to function, the cluster must meet the following requirements:
The minimum number of worker nodes is 2.
The maximum number of nodes in a group cannot exceed the maximum allowable number of worker nodes in the cluster. For example, if the cluster limit is set to 100 nodes and one of the groups is already using 30 nodes, the new group can have a maximum of 70 nodes.
To enable autoscaling during cluster creation, select the Autoscaling option in the Worker nodes configuration section.
If the cluster has already been created, you can enable autoscaling as follows:
Navigate to the Resources tab.
Enable the autoscaling option for the group.
The following parameters are used to manage the creation and deletion of nodes:
Parameter |
Description |
|
Time a node must remain idle before being removed (in this case, 5 minutes). |
|
Maximum number of nodes that can be removed simultaneously (in this case, 1). |
|
Maximum wait time for node creation (3 minutes). |
|
Maximum inactivity time before node reduction can begin (3 minutes). |
|
Delay time after adding a node before reduction can begin (3 minutes). |
|
Maximum time allowed for creating a new node (3 minutes). |
These parameters determine how quickly nodes are added or removed based on the current workload. For instance, if the workload decreases and a node remains idle for more than 5 minutes, it will be removed. However, node removal happens gradually to avoid a sudden reduction in resources. When the workload increases, new nodes are added, but this process is also controlled to prevent long delays, with a maximum time limit of 3 minutes.
In this way, autoscaling aims to maintain a balance between resource availability and cost-efficiency.
Autoscaling works only if requests and limits are specified in the deployment. Below is an example of a deployment file:
apiVersion: apps/v1
kind: Deployment
metadata:
name: example-deployment
spec:
selector:
matchLabels:
app: example
template:
metadata:
labels:
app: example
spec:
containers:
- name: example-container
image: example-image
resources:
requests:
cpu: "200m"
memory: "256Mi"
limits:
cpu: "400m"
memory: "512Mi"
m
), where 1000m equals one core.Mi
) or gigabytes (Gi
).The Horizontal Pod Autoscaler (HPA) is used to dynamically adjust the number of pods based on resource consumption, such as CPU or memory. HPA scales pods to maintain a specified level of resource usage and adapt to changing workload conditions. Below is an example of an HPA manifest:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: example-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: example-deployment
minReplicas: 1
maxReplicas: 5
targetCPUUtilizationPercentage: 50
HPA parameters:
apiVersion
: Specifies the API version used to create the resource. In this case, it is autoscaling/v1
.
kind
: The type of resource being created, here it is HorizontalPodAutoscaler
.
metadata
: Contains metadata information, including the name of the HPA (example-hpa
).
scaleTargetRef
: Specifies the object to scale, with the following details:
apiVersion
: The API version of the resource being scaled.
kind
: The type of resource to scale, in this case, Deployment
.
name
: The name of the resource to scale (example-deployment
).
minReplicas
: The minimum number of pod replicas to maintain, here it is set to 1
.
maxReplicas
: The maximum number of pod replicas to scale up to, in this case, 5
.
targetCPUUtilizationPercentage
: The CPU usage percentage at which the number of replicas will be adjusted. In this example, it is set to 50, meaning the number of replicas will be adjusted to maintain an average CPU utilization of 50%.
To set up autoscaling, let’s create an Nginx deployment with defined resource limits:
deployment.yaml
:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
spec:
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
resources:
requests:
cpu: "250m"
memory: "512Mi"
limits:
cpu: "500m"
memory: "1Gi"
This deployment creates an Nginx pod with specified resource limits, essential for the proper functioning of HPA.
To demonstrate pod autoscaling with Nginx, use the HorizontalPodAutoscaler (HPA) object. HPA automatically adjusts the number of pod replicas based on resource utilization.
hpa.yaml
:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: nginx-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: nginx-deployment
minReplicas: 1
maxReplicas: 4
targetCPUUtilizationPercentage: 20
In this example, HPA adjusts the number of replicas for nginx-deployment
based on CPU utilization, maintaining it at 20%.
To distribute load evenly among pods, it is recommended to use a Service object of the LoadBalancer
type.
service.yaml
:
apiVersion: v1
kind: Service
metadata:
name: nginx-service
spec:
selector:
app: nginx
ports:
- protocol: TCP
port: 80
targetPort: 80
type: LoadBalancer
This service provides access to pods using load balancing across multiple ports.
To apply the YAML files, use the following commands:
kubectl apply -f nginx-deployment.yaml
kubectl apply -f nginx-hpa.yaml
kubectl apply -f nginx-service.yaml
These commands will deploy the configurations for the Nginx deployment, HPA, and load balancer, creating all the necessary components for autoscaling.
Before generating load, ensure all resources are created and in the Running state. To observe autoscaling in action, generate load on the service using the hey tool, which sends HTTP requests to test performance.
Generate Load:
hey -z 10m -c 20 http://<load_balancer_ip>
This command generates 10 minutes of load with 20 parallel connections to the Nginx service, increasing resource usage and triggering scaling of pod replicas and, if necessary, worker nodes.
After some time, you can observe the scaling:
Check node scaling in the cluster:
kubectl get nodes
Check pod scaling, which will increase up to 4 replicas as defined in hpa.yaml
:
kubectl get pods