How to Set Up Physical Streaming Replication with PostgreSQL on Ubuntu

Technical writer

PostgreSQL

20.12.2024

Reading time: 8 min

Streaming replication is a common method for horizontally scaling relational databases. It involves one or more copies of the same database cluster operating on different devices. The primary database cluster handles both read and write operations, while the replicas are read-only. We can also use streaming replication to provide high availability: if the primary database cluster or server fails unexpectedly, the replicas can continue handling read operations, or one of them can be promoted to become the new primary cluster.

PostgreSQL, a popular relational database, supports both logical and physical replication:

Logical replication streams high-level changes from the primary cluster to replicas, allowing you to replicate changes to a single database or table.
Physical replication, on the other hand, streams changes from the Write-Ahead Log (WAL) files, copying the entire cluster's state rather than specific areas. This method ensures that all changes to the primary cluster are replicated.

This guide will help you set up physical streaming replication with PostgreSQL on Ubuntu 22.04 across two separate devices, each running PostgreSQL 17 clusters. One device will host the primary cluster, and the other will serve as the replica.

Hostman offers a cloud PostgreSQL for your projects.

Prerequisites

To follow this tutorial, you will need:

Two separate devices running Ubuntu 22.04: One will act as the primary server and the other as the replica.
Firewall settings that allow HTTP/HTTPS traffic and traffic on port 5432 (the default port for PostgreSQL 17).
PostgreSQL 17 installed and running on both servers.

Step 1: Configuring the Primary Database to Accept Connections

The first step is to configure the primary database to allow connections from the replica(s). By default, PostgreSQL only accepts connections from localhost (127.0.0.1). To change this behavior, you need to modify the listen_addresses configuration parameter in the primary database.

On the primary server, open the PostgreSQL configuration file postgresql.conf, located in the /etc/postgresql/17/main/ directory:

sudo nano /etc/postgresql/17/main/postgresql.conf

Once the file is open, find the listen_addresses variable and change its value from localhost to the IP address of the primary server. Remove the # symbol at the beginning of the line as well:

listen_addresses = 'your_primary_IP_address'

Save the changes and exit the file.

The primary database is now ready to accept connections from other devices using the specified IP address. Next, you need to create a user role with the appropriate permissions that the replica will use to connect to the primary database.

Step 2: Creating a Replication Role with Permissions

Next, you need to create a dedicated role in the primary database with permissions for database replication. The replica will use this role to connect to the primary database. Creating a specific role for replication is crucial for security, as the replica will only have permission to copy data, not modify it.

Connect to the database cluster:

sudo -u postgres psql

Create a replication role:

Use the CREATE ROLE command to set up a role for replication:

CREATE ROLE test WITH REPLICATION PASSWORD 'testpassword' LOGIN;

This will output:

CREATE ROLE

We have now created the test role with the password testpassword, which has replication permissions for the database cluster.

Configure access for replication:

PostgreSQL has a special pseudo-database, replication, which replicas use to connect. To allow access, edit the pg_hba.conf file. Exit the PostgreSQL prompt by typing:

\q

Then open the configuration file using nano or your preferred editor:

sudo nano /etc/postgresql/17/main/pg_hba.conf

Add a rule for the replica:

Append the following line to the end of the pg_hba.conf file:

host  replication   test  your-replica-IP/32  md5

host: Enables non-local connections over plain or SSL-encrypted TCP/IP sockets.
replication: Specifies the special pseudo-database used for replication.
test: Refers to the previously created replication role.
your-replica-IP/32: Restricts access to the specific IP address of your replica.
md5: Sets the authentication method to password-based.

If you plan to create multiple replicas, repeat this step for each additional replica, specifying its IP address.

Restart the primary database cluster:

To apply these changes, restart the primary cluster:

sudo systemctl restart postgresql@17-main

If the primary cluster restarts successfully, it is properly configured and ready to stream data once the replica connects. Next, proceed with configuring the replica cluster.

Step 3: Backing Up the Primary Cluster to the Replica

During the setup of physical replication with PostgreSQL, you need to perform a physical backup of the primary cluster’s data directory to the replica’s data directory. Before doing this, you must clear the replica’s data directory of all existing files. On Ubuntu, the default data directory for PostgreSQL is /var/lib/postgresql/17/main/.

To find the data directory, you can run the following command on the replica database:

SHOW data_directory;

Once you locate the data directory, run the following command to clear all files:

sudo -u postgres rm -r /var/lib/postgresql/17/main/*

Since the files in the default data directory are owned by the postgres user, you need to run the command as postgres using sudo -u postgres.

Note: If a file in the directory is corrupted and the command does not work (this is very rare), you can remove the main directory entirely and recreate it with the correct permissions:

sudo -u postgres rm -r /var/lib/postgresql/17/main
sudo -u postgres mkdir /var/lib/postgresql/17/main
sudo -u postgres chmod 700 /var/lib/postgresql/17/main

Now that the replica’s data directory is cleared, you can physically back up the primary server’s data files. PostgreSQL provides a useful utility called pg_basebackup to simplify this process. It even allows you to promote the server to standby mode using the -R option.

Run the following pg_basebackup command on the replica:

sudo -u postgres pg_basebackup -h primary-ip-addr -p 5432 -U test -D /var/lib/postgresql/17/main/ -Fp -Xs -R

-h: Specifies the remote host. Enter the IP address of your primary server.
-p: Specifies the port number for connecting to the primary server. By default, PostgreSQL uses port 5432.
-U: Specifies the user role to connect to the primary cluster (the role created in the previous step).
-D: Specifies the backup's destination directory, which is your replica's cleared data directory.
-Fp: Ensures the backup is output in plain format (instead of a tar file).
-Xs: Streams the contents of the WAL file during the backup from the primary database.
-R: Creates a file named standby.signal in the replica’s data directory, signaling that the replica should operate in standby mode. It also adds the connection information for the primary server to the postgresql.auto.conf file. This configuration file is read each time the standard postgresql.conf is read, but the values in the .auto.conf file override those in the regular configuration file.

When you run this command, you will be prompted to enter the password for the replication role created earlier. The time required to copy all the files depends on the size of your primary database cluster.

At this point, your replica now has all the necessary data files from the primary server to begin replication. Next, you need to configure the replica to start in standby mode and proceed with replication.

Step 4: Restarting and Testing Clusters

After successfully creating a backup of the primary cluster’s data files on the replica, you need to restart the replica database cluster and switch it to standby mode. To restart the replica, run the following command:

sudo systemctl restart postgresql@17-main

Once the replica has restarted in standby mode, it should automatically connect to the primary database cluster on the other machine. To check whether the replica is connected and receiving the stream from the primary server, connect to the primary database cluster with the following command:

sudo -u postgres psql

Next, query the pg_stat_replication table on the primary cluster as follows:

SELECT client_addr, state FROM pg_stat_replication;

The output should look something like this:

client_addr     | state
----------------+-----------
your_replica_IP | streaming

If you see this result, the streaming replication from the primary server to the replica is correctly set up.

Conclusion

You now have two Ubuntu 22.04 servers with PostgreSQL 17 clusters, and streaming replication is configured between the servers. Any changes made in the primary database cluster will be reflected in the replica cluster. You can add more replicas if your databases need to handle higher traffic.

To learn more about physical streaming replication, including how to configure synchronous replication to prevent the loss of critical data, refer to the official PostgreSQL documentation.

PostgreSQL

20.12.2024

Reading time: 8 min

Similar

PostgreSQL

How to Set Up Physical Streaming Replication with PostgreSQL on Ubuntu

Prerequisites

Step 1: Configuring the Primary Database to Accept Connections

Step 2: Creating a Replication Role with Permissions

Step 3: Backing Up the Primary Cluster to the Replica

Step 4: Restarting and Testing Clusters

Conclusion

Similar

Upgrading PostgreSQL Version

How to Migrate a PostgreSQL Database to Another Server

Installing PostgreSQL on Debian

Do you have questions, comments, or concerns?

Do you have questions,
comments, or concerns?