How to Set Up Load Balancing with Nginx

Technical writer

Nginx Servers

18.11.2024

Reading time: 9 min

Modern applications can handle many requests simultaneously, and even under heavy load, they must return correct information to users. There are different ways to scale applications:

Vertical Scaling: add more RAM or CPU power by renting or purchasing a more powerful server. It is easy during the early stages of the application’s development, but it has drawbacks, such as cost and the limitations of modern hardware.
Horizontal Scaling: add more instances of the application. Set up a second server, deploy the same application on it, and somehow distribute traffic between these instances.

Horizontal scaling, on the one hand, can be cheaper and less restrictive in terms of hardware. You can simply add more instances of the application. However, now we need to distribute user requests between the different instances of the application.

Load Balancing is the process of distributing application requests (network traffic) across multiple devices.

A Load Balancer is a middleware program between the user and a group of applications. The general logic is as follows:

The user accesses the website through a specific domain, which hides the IP address of the load balancer.
Based on its configuration, the load balancer determines which application instance should handle the user's traffic.
The user receives a response from the appropriate application instance.

Load Balancing Advantages

Improved Application Availability: Load balancers have the functionality to detect server failures. If one of the servers goes down, the load balancer can automatically redirect traffic to another server, ensuring uninterrupted service for users.
Scalability: One of the main tasks of a load balancer is to distribute traffic across multiple instances of the application. This enables horizontal scaling by adding more application instances, increasing the overall system performance.
Enhanced Security: Load balancers can include security features such as traffic monitoring, request filtering, and routing through firewalls and other mechanisms, which help improve the application's security.

Using Nginx for Network Traffic Load Balancing

There are quite a few applications that can act as a load balancer, but one of the most popular is Nginx.

Nginx is a versatile web server known for its high performance, low resource consumption, and wide range of capabilities. Nginx can be used as:

A web server
A reverse proxy and load balancer
A mail proxy server
And much more.

You can learn more about Nginx's capabilities on its website. Now, let's move on to the practical setup.

Installing Nginx on Ubuntu

Nginx can be installed on all popular Linux distributions, including Ubuntu, CentOS, and others. In this article, we will be using Ubuntu. To install Nginx, use the following commands:

sudo apt update
sudo apt install nginx

To verify that the installation was successful, you can use the command:

systemctl status nginx

The output should show active (running).

The configuration files for Nginx are located in the /etc/nginx/sites-available/ directory, including the default file that we will use for writing our configuration.

Example Nginx Configuration

First, we need to install nano:

sudo apt install nano

Now, open the default configuration file:

cd /etc/nginx/sites-available/
sudo nano default

Place the following configuration inside:

upstream application {
    server 10.2.2.11;  # IP addresses of the servers to distribute requests between
    server 10.2.2.12;
    server 10.2.2.13;
}

server {
    listen 80;  # Nginx will open on this port

    location / {
        # Specify where to redirect traffic from Nginx
        proxy_pass http://application;
    }
}

Setting Up Load Balancing in Nginx

To configure load balancing in Nginx, you need to define two blocks in the configuration:

upstream — Defines the server addresses between which the network traffic will be distributed. Here, you specify the IP addresses, ports, and, if necessary, load balancing methods. We will discuss these methods later.
server — Defines how Nginx will receive requests. Usually, this includes the port, domain name, and other parameters.

The proxy_pass path specifies where the requests should be forwarded. It refers to the upstream block mentioned earlier.

In this way, Nginx is used not only as a load balancer but also as a reverse proxy. A reverse proxy is a server that sits between the client and backend application instances. It forwards requests from clients to the backend and can provide additional features such as SSL certificates, logging, and more.

Load Balancing Methods

Round Robin

There are several methods for load balancing. By default, Nginx uses the Round Robin algorithm, which is quite simple. For example, if we have three applications (1, 2, and 3), the load balancer will send the first request to the first application, then the second request to the second application, the third request to the third application, and then continue the cycle, sending the next request to the first one again.

Let’s look at an example. I have deployed two applications and configured load balancing with Nginx for them:

upstream application {
    server 172.25.208.1:5002;  # first
    server 172.25.208.1:5001;  # second
}

Let’s see how this works in practice:

The first request goes to the first server.
The second request goes to the second server.
Then it goes back to the first server, and so on.

However, this algorithm has a limitation: backend instances may be idle simply because they are waiting for their turn.

Round Robin with Weights

To avoid idle servers, we can use numerical priorities. Each server gets a weight, which determines how much traffic will be directed to that specific application instance. This way, we ensure that more powerful servers will receive more traffic.

In Nginx, the priority is specified using server weight as follows:

upstream application {
    server 10.2.2.11 weight=5;
    server 10.2.2.12 weight=3;
    server 10.2.2.13 weight=1;
}

With this configuration, the server at address 10.2.2.11 will receive the most traffic because it has the highest weight.

This approach is more reliable than the standard Round Robin, but it still has a drawback. We can manually specify weights based on server power, but requests can still differ in execution time. Some requests might be more complex and slower, while others are fast and lightweight.

upstream application {
    server 172.25.208.1:5002 weight=3;  # first
    server 172.25.208.1:5001 weight=1;  # second
}

Least Connections

What if we move away from Round Robin? Instead of simply distributing requests in order, we can base the distribution on certain parameters, such as the number of active connections to the server.

The Least Connections algorithm ensures an even distribution of load between application instances by considering the number of active connections to each server. To configure it, simply add least_conn; in the upstream block:

upstream application {
    least_conn;
    server 10.2.2.11;
    …
}

Let’s return to our example.

To test how this algorithm works, I wrote a script that sends 500 requests concurrently and checks which application each request is directed to.

Here is the output of that script:

Additionally, this algorithm can be used together with weights for the addresses, similar to Round Robin. In this case, the weights will indicate the relative number of connections to each address — for example, with weights of 1 and 5, the address with a weight of 5 will receive five times more connections than the address with a weight of 1.

Here’s an example of such a configuration:

upstream application {
    least_conn;
    server 10.2.2.11 weight=5;
    …
}

nginx
upstream loadbalancer {
    least_conn;
    server 172.25.208.1:5002 weight=3; # first
    server 172.25.208.1:5001 weight=1; # second
}

And here’s the output of the script:

As we can see, the number of requests to the first server is exactly three times higher than to the second.

IP Hash

This method works based on the client’s IP address. It guarantees that all requests from a specific address will be routed to the same instance of the application. The algorithm calculates a hash of the client’s and server’s addresses and uses this result as a unique key for load balancing.

This approach can be useful in blue-green deployment scenarios, where we update each backend version sequentially. We can direct all requests to the backend with the old version, then update the new one and direct part of the traffic to it. If everything works well, we can direct all users to the new backend version and update the old one.

Example configuration:

upstream app {
    ip_hash;
    server 10.2.2.11;
    …
}

With this configuration, in our example, all requests will now go to the same application instance:

Error Handling

When configuring a load balancer, it's also important to detect server failures and, if necessary, stop directing traffic to "down" application instances.

To allow the load balancer to mark a server address as unavailable, you must define additional parameters in the upstream block: failed_timeout and max_fails.

failed_timeout: This parameter specifies the amount of time during which a certain number of connection errors must occur for the server address in the upstream block to be marked as unavailable.
max_fails: This parameter sets the number of connection errors allowed before the server is considered "down."

Example configuration:

upstream application {
    server 10.2.0.11 max_fails=2 fail_timeout=30s;
    …
}

Now, let's see how this works in practice. We will "take down" one of the test backends and add the appropriate configuration.

The first backend instance from the example is now disabled. Nginx redirects traffic only to the second server.

Comparative Table of Traffic Distribution Algorithms

Algorithm	Pros	Cons
Round Robin	Simple and lightweight algorithm. Evenly distributes load across applications. Scales well.	Does not account for server performance differences. Does not consider the current load of applications.
Weighted Round Robin	Allows setting different weights for servers based on performance.	Does not account for the current load of applications. Manual weight configuration may be required.
Least Connection	Distributes load to applications with the fewest active connections.	Can lead to uneven load distribution with many slow clients.
Weighted Least Connection	Takes server performance into account by focusing on active connections.	Distributes load according to weights and connection count. Manual weight configuration may be required.
IP Hash	Ties the client to a specific IP address. Ensures session persistence on the same server.	Does not account for the current load of applications. Does not consider server performance. Can result in uneven load distribution with many clients from the same IP.

Conclusion

In this article, we explored the topic of load balancing. We learned about the different load balancing methods available in Nginx and demonstrated them with examples.