Sign In
Sign In

Cluster Autoscaling

Updated on 01 October 2025

Cluster autoscaling in Kubernetes allows for flexible resource management by automatically adding or removing nodes and pods based on the workload.

Cluster Requirements

The maximum number of nodes in a group cannot exceed the maximum allowable number of worker nodes in the cluster. For example, if the cluster limit is set to 100 nodes and one of the groups is already using 30 nodes, the new group can have a maximum of 70 nodes.

Enabling Autoscaling

For a New Cluster

To enable autoscaling during cluster creation, select the Autoscaling option in the Worker nodes configuration section.

The minimum number of nodes at the cluster creation stage is 1.

1299452f Bc4a 4726 A94f Ee33435b032f

For an Existing Cluster

If the cluster has already been created, you can enable autoscaling as follows:

  1. Navigate to the Resources tab.
  2. Click on the three dots next to the node group and select Edit group.
  3. Enable the autoscaling option for the group.

The minimum number of nodes is 0.

1b067625 E048 42f9 8e17 9837fe130a27

Autoscaling Parameters

The following parameters are used to manage the creation and deletion of nodes:

--scale-down-unneeded-time 5m0s
--max-scale-down-parallelism 1
--provisioning-request-max-backoff-time 2m0s
--max-inactivity 3m0s
--scale-down-delay-after-add 2m0s
--max-node-provision-time 10m0s
--scan-interval 2m0s
--scale-up-from-zero=true
--scale-down-unready-enabled=true
--scale-down-unready-time=30m0s

Parameter

Description

scale-down-unneeded-time

Time a node must remain idle before being removed (in this case, 5 minutes).

max-scale-down-parallelism

Maximum number of nodes that can be removed simultaneously (in this case, 1).

provisioning-request-max-backoff-time

Maximum wait time for node creation (3 minutes).

max-inactivity

Maximum inactivity time before node reduction can begin (3 minutes).

scale-down-delay-after-add

Delay time after adding a node before reduction can begin (3 minutes).

max-node-provision-time

Maximum time allowed for creating a new node (3 minutes).

These parameters determine how quickly nodes are added or removed based on the current workload. For instance, if the workload decreases and a node remains idle for more than 5 minutes, it will be removed. However, node removal happens gradually to avoid a sudden reduction in resources. When the workload increases, new nodes are added, but this process is also controlled to prevent long delays, with a maximum time limit of 3 minutes.

In this way, autoscaling aims to maintain a balance between resource availability and cost-efficiency.

Configuring Resources for Pods

Autoscaling works only if requests and limits are specified in the deployment. Below is an example of a deployment file:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: example-deployment
spec:
  selector:
    matchLabels:
      app: example
  template:
    metadata:
      labels:
        app: example
    spec:
      containers:
      - name: example-container
        image: example-image
        resources:
          requests:
            cpu: "200m"
            memory: "256Mi"
          limits:
            cpu: "400m"
            memory: "512Mi"
  • CPU limits are measured in millicores (m), where 1000m equals one core.
  • Memory limits are measured in megabytes (Mi) or gigabytes (Gi).

Using nodeSelector and nodeAffinity

If a pod manifest uses nodeSelector or nodeAffinity, Kubernetes schedules the pod only on nodes that match the specified conditions. This also affects autoscaling: the autoscaler can create a new node only in a node group where the pod can be scheduled.

If no suitable node group exists, the pod will remain in the Pending state, even if autoscaling is enabled in the cluster.

Example: nodeSelector

nodeSelector is suitable for simple cases where pods must run only on nodes with a specific label. For example, if an application should run only in a particular node group, you can use the following manifest:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: worker-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: worker
  template:
    metadata:
      labels:
        app: worker
    spec:
      nodeSelector:
        workload-type: background
      containers:
      - name: worker
        image: busybox
        command: ["sh", "-c", "sleep infinity"]
        resources:
          requests:
            cpu: "100m"
            memory: "128Mi"
          limits:
            cpu: "200m"
            memory: "256Mi"

In this example, the pod can be scheduled only on nodes with the label workload-type=background. If no suitable nodes exist in the cluster, the autoscaler will be able to add a new node only to a group that uses this label.

Example: nodeAffinity

nodeAffinity is used when more flexible scheduling rules are required. Unlike nodeSelector, it allows you to define expressions with different operators and combine multiple conditions.

Below is an example equivalent to the previous nodeSelector, but implemented using nodeAffinity:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: api
  template:
    metadata:
      labels:
        app: api
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: workload-type
                operator: In
                values:
                - api
      containers:
      - name: api
        image: nginx:latest
        resources:
          requests:
            cpu: "200m"
            memory: "256Mi"
          limits:
            cpu: "400m"
            memory: "512Mi"

In this case, the pod will be scheduled only on nodes with the label workload-type=api. This example uses the In operator, which allows specifying one or more acceptable values for a label.

nodeAffinity also supports other operators, for example:

  • In: the label value must be in the specified list
  • NotIn: the label value must not be in the specified list
  • Exists: the label must be present on the node
  • DoesNotExist: the label must not be present
  • Gt: the label value must be greater than the specified value (for numeric values)
  • Lt: the label value must be less than the specified value (for numeric values)

These operators allow defining more advanced scheduling rules compared to nodeSelector.

Use nodeSelector when simple matching by one or more fixed values is sufficient. For more complex placement requirements, use nodeAffinity.

Configuring Horizontal Pod Autoscaler (HPA)

The Horizontal Pod Autoscaler (HPA) is used to dynamically adjust the number of pods based on resource consumption, such as CPU or memory. HPA scales pods to maintain a specified level of resource usage and adapt to changing workload conditions. Below is an example of an HPA manifest:

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: example-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: example-deployment
  minReplicas: 1
  maxReplicas: 5
  targetCPUUtilizationPercentage: 50

HPA parameters:

  • apiVersion: Specifies the API version used to create the resource. In this case, it is autoscaling/v1.

  • kind: The type of resource being created, here it is HorizontalPodAutoscaler.

  • metadata: Contains metadata information, including the name of the HPA (example-hpa).

  • scaleTargetRef: Specifies the object to scale, with the following details:

    • apiVersion: The API version of the resource being scaled.

    • kind: The type of resource to scale, in this case, Deployment.

    • name: The name of the resource to scale (example-deployment).

  • minReplicas: The minimum number of pod replicas to maintain, here it is set to 1.

  • maxReplicas: The maximum number of pod replicas to scale up to, in this case, 5.

  • targetCPUUtilizationPercentage: The CPU usage percentage at which the number of replicas will be adjusted. In this example, it is set to 50, meaning the number of replicas will be adjusted to maintain an average CPU utilization of 50%.

Practical Example of Autoscaling

Creating an Nginx Deployment for Autoscaling

To set up autoscaling, let’s create an Nginx deployment with defined resource limits:

deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        resources:
          requests:
            cpu: "250m"
            memory: "512Mi"
          limits:
            cpu: "500m"
            memory: "1Gi"

This deployment creates an Nginx pod with specified resource limits, essential for the proper functioning of HPA.

Setting Up Horizontal Pod Autoscaling for Nginx

To demonstrate pod autoscaling with Nginx, use the HorizontalPodAutoscaler (HPA) object. HPA automatically adjusts the number of pod replicas based on resource utilization.

hpa.yaml:

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx-deployment
  minReplicas: 1
  maxReplicas: 4
  targetCPUUtilizationPercentage: 20

In this example, HPA adjusts the number of replicas for nginx-deployment based on CPU utilization, maintaining it at 20%.

Setting Up a Load Balancer

To distribute load evenly among pods, it is recommended to use a Service object of the LoadBalancer type.

service.yaml:

apiVersion: v1
kind: Service
metadata:
  name: nginx-service
spec:
  selector:
    app: nginx
  ports:
  - protocol: TCP
    port: 80
    targetPort: 80
  type: LoadBalancer

This service provides access to pods using load balancing across multiple ports.

Applying the Configuration

To apply the YAML files, use the following commands:

kubectl apply -f nginx-deployment.yaml
kubectl apply -f nginx-hpa.yaml
kubectl apply -f nginx-service.yaml

These commands will deploy the configurations for the Nginx deployment, HPA, and load balancer, creating all the necessary components for autoscaling.

Demonstrating Node and Pod Scaling Under Load

Before generating load, ensure all resources are created and in the Running state. To observe autoscaling in action, generate load on the service using the hey tool, which sends HTTP requests to test performance.

Generate Load:

hey -z 10m -c 20 http://<load_balancer_ip>

This command generates 10 minutes of load with 20 parallel connections to the Nginx service, increasing resource usage and triggering scaling of pod replicas and, if necessary, worker nodes.

After some time, you can observe the scaling:

  • Check node scaling in the cluster:

kubectl get nodes
  • Check pod scaling, which will increase up to 4 replicas as defined in hpa.yaml:

kubectl get pods
Was this page helpful?
Updated on 01 October 2025

Do you have questions,
comments, or concerns?

Our professionals are available to assist you at any moment,
whether you need help or are just unsure of where to start.
Email us
Hostman's Support