Cluster Autoscaling

Updated on 28 November 2024

Cluster autoscaling in Kubernetes allows for flexible resource management by automatically adding or removing nodes and pods based on the workload.

Cluster Requirements

For autoscaling to function, the cluster must meet the following requirements:

  • The minimum number of worker nodes is 2.

  • The maximum number of nodes in a group cannot exceed the maximum allowable number of worker nodes in the cluster. For example, if the cluster limit is set to 100 nodes and one of the groups is already using 30 nodes, the new group can have a maximum of 70 nodes.

Enabling Autoscaling for a New Cluster

To enable autoscaling during cluster creation, select the Autoscaling option in the Worker nodes configuration section.

1299452f Bc4a 4726 A94f Ee33435b032f

Enabling Autoscaling for an Existing Cluster

If the cluster has already been created, you can enable autoscaling as follows:

  1. Navigate to the Resources tab.

  2. Click on the three dots next to the node group and select Edit group.
  3. Enable the autoscaling option for the group.

1b067625 E048 42f9 8e17 9837fe130a27

Autoscaling Functionality

The following parameters are used to manage the creation and deletion of nodes:

Parameter

Description

scale-down-unneeded-time

Time a node must remain idle before being removed (in this case, 5 minutes).

max-scale-down-parallelism

Maximum number of nodes that can be removed simultaneously (in this case, 1).

provisioning-request-max-backoff-time

Maximum wait time for node creation (3 minutes).

max-inactivity

Maximum inactivity time before node reduction can begin (3 minutes).

scale-down-delay-after-add

Delay time after adding a node before reduction can begin (3 minutes).

max-node-provision-time

Maximum time allowed for creating a new node (3 minutes).

These parameters determine how quickly nodes are added or removed based on the current workload. For instance, if the workload decreases and a node remains idle for more than 5 minutes, it will be removed. However, node removal happens gradually to avoid a sudden reduction in resources. When the workload increases, new nodes are added, but this process is also controlled to prevent long delays, with a maximum time limit of 3 minutes.

In this way, autoscaling aims to maintain a balance between resource availability and cost-efficiency.

Configuring Resources for Pods

Autoscaling works only if requests and limits are specified in the deployment. Below is an example of a deployment file:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: example-deployment
spec:
  selector:
    matchLabels:
      app: example
  template:
    metadata:
      labels:
        app: example
    spec:
      containers:
      - name: example-container
        image: example-image
        resources:
          requests:
            cpu: "200m"
            memory: "256Mi"
          limits:
            cpu: "400m"
            memory: "512Mi"
  • CPU limits are measured in millicores (m), where 1000m equals one core.
  • Memory limits are measured in megabytes (Mi) or gigabytes (Gi).

Configuring Horizontal Pod Autoscaler (HPA)

The Horizontal Pod Autoscaler (HPA) is used to dynamically adjust the number of pods based on resource consumption, such as CPU or memory. HPA scales pods to maintain a specified level of resource usage and adapt to changing workload conditions. Below is an example of an HPA manifest:

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: example-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: example-deployment
  minReplicas: 1
  maxReplicas: 5
  targetCPUUtilizationPercentage: 50

HPA parameters:

  • apiVersion: Specifies the API version used to create the resource. In this case, it is autoscaling/v1.

  • kind: The type of resource being created, here it is HorizontalPodAutoscaler.

  • metadata: Contains metadata information, including the name of the HPA (example-hpa).

  • scaleTargetRef: Specifies the object to scale, with the following details:

    • apiVersion: The API version of the resource being scaled.

    • kind: The type of resource to scale, in this case, Deployment.

    • name: The name of the resource to scale (example-deployment).

  • minReplicas: The minimum number of pod replicas to maintain, here it is set to 1.

  • maxReplicas: The maximum number of pod replicas to scale up to, in this case, 5.

  • targetCPUUtilizationPercentage: The CPU usage percentage at which the number of replicas will be adjusted. In this example, it is set to 50, meaning the number of replicas will be adjusted to maintain an average CPU utilization of 50%.

Practical Example of Autoscaling

Creating an Nginx Deployment for Autoscaling

To set up autoscaling, let’s create an Nginx deployment with defined resource limits:

deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        resources:
          requests:
            cpu: "250m"
            memory: "512Mi"
          limits:
            cpu: "500m"
            memory: "1Gi"

This deployment creates an Nginx pod with specified resource limits, essential for the proper functioning of HPA.

Setting Up Horizontal Pod Autoscaling for Nginx

To demonstrate pod autoscaling with Nginx, use the HorizontalPodAutoscaler (HPA) object. HPA automatically adjusts the number of pod replicas based on resource utilization.

hpa.yaml:

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx-deployment
  minReplicas: 1
  maxReplicas: 4
  targetCPUUtilizationPercentage: 20

In this example, HPA adjusts the number of replicas for nginx-deployment based on CPU utilization, maintaining it at 20%.

Setting Up a Load Balancer

To distribute load evenly among pods, it is recommended to use a Service object of the LoadBalancer type.

service.yaml:

apiVersion: v1
kind: Service
metadata:
  name: nginx-service
spec:
  selector:
    app: nginx
  ports:
  - protocol: TCP
    port: 80
    targetPort: 80
  type: LoadBalancer

This service provides access to pods using load balancing across multiple ports.

Applying the Configuration

To apply the YAML files, use the following commands:

kubectl apply -f nginx-deployment.yaml
kubectl apply -f nginx-hpa.yaml
kubectl apply -f nginx-service.yaml

These commands will deploy the configurations for the Nginx deployment, HPA, and load balancer, creating all the necessary components for autoscaling.

Demonstrating Node and Pod Scaling Under Load

Before generating load, ensure all resources are created and in the Running state. To observe autoscaling in action, generate load on the service using the hey tool, which sends HTTP requests to test performance.

Generate Load:

hey -z 10m -c 20 http://<load_balancer_ip>

This command generates 10 minutes of load with 20 parallel connections to the Nginx service, increasing resource usage and triggering scaling of pod replicas and, if necessary, worker nodes.

After some time, you can observe the scaling:

  • Check node scaling in the cluster:

kubectl get nodes
  • Check pod scaling, which will increase up to 4 replicas as defined in hpa.yaml:

kubectl get pods
Was this page helpful?
Updated on 28 November 2024

Do you have questions,
comments, or concerns?

Our professionals are available to assist you at any moment,
whether you need help or are just unsure of where to start.
Email us
Hostman's Support