Modern applications can handle many requests simultaneously, and even under heavy load, they must return correct information to users. There are different ways to scale applications:
Vertical Scaling: add more RAM or CPU power by renting or purchasing a more powerful server. It is easy during the early stages of the application’s development, but it has drawbacks, such as cost and the limitations of modern hardware.
Horizontal Scaling: add more instances of the application. Set up a second server, deploy the same application on it, and somehow distribute traffic between these instances.
Horizontal scaling, on the one hand, can be cheaper and less restrictive in terms of hardware. You can simply add more instances of the application. However, now we need to distribute user requests between the different instances of the application.
Load Balancing is the process of distributing application requests (network traffic) across multiple devices.
A Load Balancer is a middleware program between the user and a group of applications. The general logic is as follows:
The user accesses the website through a specific domain, which hides the IP address of the load balancer.
Based on its configuration, the load balancer determines which application instance should handle the user's traffic.
The user receives a response from the appropriate application instance.
Improved Application Availability: Load balancers have the functionality to detect server failures. If one of the servers goes down, the load balancer can automatically redirect traffic to another server, ensuring uninterrupted service for users.
Scalability: One of the main tasks of a load balancer is to distribute traffic across multiple instances of the application. This enables horizontal scaling by adding more application instances, increasing the overall system performance.
Enhanced Security: Load balancers can include security features such as traffic monitoring, request filtering, and routing through firewalls and other mechanisms, which help improve the application's security.
There are quite a few applications that can act as a load balancer, but one of the most popular is Nginx.
Nginx is a versatile web server known for its high performance, low resource consumption, and wide range of capabilities. Nginx can be used as:
You can learn more about Nginx's capabilities on its website. Now, let's move on to the practical setup.
Nginx can be installed on all popular Linux distributions, including Ubuntu, CentOS, and others. In this article, we will be using Ubuntu. To install Nginx, use the following commands:
sudo apt update
sudo apt install nginx
To verify that the installation was successful, you can use the command:
systemctl status nginx
The output should show active (running)
.
The configuration files for Nginx are located in the /etc/nginx/sites-available/
directory, including the default file that we will use for writing our configuration.
First, we need to install nano:
sudo apt install nano
Now, open the default configuration file:
cd /etc/nginx/sites-available/
sudo nano default
Place the following configuration inside:
upstream application {
server 10.2.2.11; # IP addresses of the servers to distribute requests between
server 10.2.2.12;
server 10.2.2.13;
}
server {
listen 80; # Nginx will open on this port
location / {
# Specify where to redirect traffic from Nginx
proxy_pass http://application;
}
}
To configure load balancing in Nginx, you need to define two blocks in the configuration:
upstream
— Defines the server addresses between which the network traffic will be distributed. Here, you specify the IP addresses, ports, and, if necessary, load balancing methods. We will discuss these methods later.
server
— Defines how Nginx will receive requests. Usually, this includes the port, domain name, and other parameters.
The proxy_pass
path specifies where the requests should be forwarded. It refers to the upstream block mentioned earlier.
In this way, Nginx is used not only as a load balancer but also as a reverse proxy. A reverse proxy is a server that sits between the client and backend application instances. It forwards requests from clients to the backend and can provide additional features such as SSL certificates, logging, and more.
There are several methods for load balancing. By default, Nginx uses the Round Robin algorithm, which is quite simple. For example, if we have three applications (1, 2, and 3), the load balancer will send the first request to the first application, then the second request to the second application, the third request to the third application, and then continue the cycle, sending the next request to the first one again.
Let’s look at an example. I have deployed two applications and configured load balancing with Nginx for them:
upstream application {
server 172.25.208.1:5002; # first
server 172.25.208.1:5001; # second
}
Let’s see how this works in practice:
However, this algorithm has a limitation: backend instances may be idle simply because they are waiting for their turn.
To avoid idle servers, we can use numerical priorities. Each server gets a weight, which determines how much traffic will be directed to that specific application instance. This way, we ensure that more powerful servers will receive more traffic.
In Nginx, the priority is specified using server weight as follows:
upstream application {
server 10.2.2.11 weight=5;
server 10.2.2.12 weight=3;
server 10.2.2.13 weight=1;
}
With this configuration, the server at address 10.2.2.11 will receive the most traffic because it has the highest weight.
This approach is more reliable than the standard Round Robin, but it still has a drawback. We can manually specify weights based on server power, but requests can still differ in execution time. Some requests might be more complex and slower, while others are fast and lightweight.
upstream application {
server 172.25.208.1:5002 weight=3; # first
server 172.25.208.1:5001 weight=1; # second
}
What if we move away from Round Robin? Instead of simply distributing requests in order, we can base the distribution on certain parameters, such as the number of active connections to the server.
The Least Connections algorithm ensures an even distribution of load between application instances by considering the number of active connections to each server. To configure it, simply add least_conn;
in the upstream block:
upstream application {
least_conn;
server 10.2.2.11;
…
}
Let’s return to our example.
To test how this algorithm works, I wrote a script that sends 500 requests concurrently and checks which application each request is directed to.
Here is the output of that script:
Additionally, this algorithm can be used together with weights for the addresses, similar to Round Robin. In this case, the weights will indicate the relative number of connections to each address — for example, with weights of 1 and 5, the address with a weight of 5 will receive five times more connections than the address with a weight of 1.
Here’s an example of such a configuration:
upstream application {
least_conn;
server 10.2.2.11 weight=5;
…
}
nginx
upstream loadbalancer {
least_conn;
server 172.25.208.1:5002 weight=3; # first
server 172.25.208.1:5001 weight=1; # second
}
And here’s the output of the script:
As we can see, the number of requests to the first server is exactly three times higher than to the second.
This method works based on the client’s IP address. It guarantees that all requests from a specific address will be routed to the same instance of the application. The algorithm calculates a hash of the client’s and server’s addresses and uses this result as a unique key for load balancing.
This approach can be useful in blue-green deployment scenarios, where we update each backend version sequentially. We can direct all requests to the backend with the old version, then update the new one and direct part of the traffic to it. If everything works well, we can direct all users to the new backend version and update the old one.
Example configuration:
upstream app {
ip_hash;
server 10.2.2.11;
…
}
With this configuration, in our example, all requests will now go to the same application instance:
When configuring a load balancer, it's also important to detect server failures and, if necessary, stop directing traffic to "down" application instances.
To allow the load balancer to mark a server address as unavailable, you must define additional parameters in the upstream block: failed_timeout
and max_fails
.
failed_timeout
: This parameter specifies the amount of time during which a certain number of connection errors must occur for the server address in the upstream block to be marked as unavailable.
max_fails
: This parameter sets the number of connection errors allowed before the server is considered "down."
Example configuration:
upstream application {
server 10.2.0.11 max_fails=2 fail_timeout=30s;
…
}
Now, let's see how this works in practice. We will "take down" one of the test backends and add the appropriate configuration.
The first backend instance from the example is now disabled. Nginx redirects traffic only to the second server.
Algorithm |
Pros |
Cons |
Round Robin |
|
|
Weighted Round Robin |
|
|
Least Connection |
|
|
Weighted Least Connection |
|
|
IP Hash |
|
|
In this article, we explored the topic of load balancing. We learned about the different load balancing methods available in Nginx and demonstrated them with examples.