Logical Replication in PostgreSQL

Technical writer

PostgreSQL

11.12.2024

Reading time: 7 min

When deploying applications, having more than one copy of the database is always beneficial. After creating copies, it is essential to ensure they are all synchronized. The process of synchronizing database copies is called replication.

Logical replication in PostgreSQL refers to the synchronization of copies without being tied to a specific physical data representation on a disk. It is independent of processor architecture, platform, or database system version. Synchronization is performed based on a replication identifier, which is typically the primary key.

Logical replication uses a publish-and-subscribe model.

Replication Process

In general, the replication process consists of the following steps:

Creating one or more publications on the publisher node.
Subscribing one or more subscribers to one or more publications.

Copying a snapshot of the publisher's database to the subscriber. This step is also known as the table synchronization phase. It is possible to create multiple table synchronization workers to reduce the time required for this phase. However, there can only be one synchronization process for each table.

Sending the subsequent changes made on the publisher node to the subscriber node. These changes are applied in the commit order to ensure transactional consistency.
The subscriber node fetches changes as they occur in the publisher's database in real-time, ensuring that the subscriber and publisher databases remain synchronized.

This mechanism ensures up-to-date data consistency across the replicated databases.

Logical Replication in Practice

Suppose you want to set up logical replication on a single host. To achieve this, use different ports—for example, the publisher will operate on port 5432, and the subscriber on port 5431.

Edit the Configuration File

Start by editing the PostgreSQL configuration file:

sudo nano /etc/postgresql/10/main/postgresql.conf

Uncomment the wal_level parameter and set it to logical. It should look like this:

wal_level = logical

Save and close the configuration file, then restart PostgreSQL:

sudo systemctl restart postgresql

Export Global Objects

On the master, execute the following command for the main database:

pg_dumpall --database=postgres --host=192.168.1.2 --no-password --globals-only --no-privileges | psql

The pg_dumpall command exports databases in script format.
The --database parameter specifies the database used for connecting and exporting global objects and locating other databases. By default, it uses the postgres database.
The --globals-only parameter ensures only global objects are exported, excluding the database contents.

For detailed information, consult the PostgreSQL documentation.

Export Schema on the Replica

On the replica, run:

pg_dump --dbname=db_name --host=192.168.1.2 --no-password --create --schema-only | psql

Prepare Data for Testing

Create a test table with two columns:

CREATE TABLE table1(x int primary key, y int);

The x column will store the primary key.
The y column will store integer values.

Insert a sample row:

INSERT INTO table1 VALUES(10, 11);

At this point, the table contains a single row where the primary key is 10 and the value is 11. This minimal dataset is enough to verify synchronization.

Create a Publication on the Master

Create a publication that replicates the desired table:

CREATE PUBLICATION my_publication FOR TABLE table1;

The FOR TABLE parameter allows you to specify which tables to replicate.
You can limit the changes to be published or include additional tables later.

To create a publication for all existing and future tables, use the ALL TABLES parameter. For more details, refer to the PostgreSQL documentation.

The publication named my_publication is ready. Now it’s time to create a subscription on port 5431.

Recreate the Table on the Subscriber Node

On the subscriber, create the same table structure as on the publisher:

CREATE TABLE table1(x int primary key, y int);

Create a Subscription on the Replica

Create a subscription named my_subscription:

CREATE SUBSCRIPTION my_subscription 
CONNECTION 'host=localhost port=5432 dbname=postgres' PUBLICATION my_publication;

Verify Synchronization

Query the table on the subscriber:

SELECT * FROM table1;

This command will display the rows synchronized from the publisher. Initially, it should return the row added earlier (11 with the primary key 10).

How It Works

The CREATE SUBSCRIPTION command creates a subscription for the current database, which begins receiving logical changes from the publication my_publication.
Upon execution, a logical replication worker is created to fetch changes from the publisher.
On the publisher side, a walsender process starts to read the WAL (Write-Ahead Log), decode changes, and send them to the subscriber.

To test the synchronization, add additional rows on the publisher:

INSERT INTO table1 VALUES(20, 21), (30, 31);

Verify that the subscriber displays these rows:

SELECT * FROM table1;

If you have multiple servers, additional configuration is required.

Allow Connections on the Publisher

On the main server, edit the configuration file to listen on the private IP address:

sudo nano /etc/postgresql/10/main/postgresql.conf

Locate the listen_addresses parameter and modify it to include the private IP address of the master:

listen_addresses = 'localhost, MASTER_PRIVATE_IP'

Configure Access Control

Edit the pg_hba.conf file on the publisher to allow incoming connections from the replica:

sudo nano /etc/postgresql/10/main/pg_hba.conf

Add the following line, replacing REPLICA_PRIVATE_IP with the actual private IP address of the replica:

host replication postgres REPLICA_PRIVATE_IP/32 md5

Look for the comment:

# If you want to allow non-local connections, you need to add more.

Add your new rule below this line.

Firewall Configuration

On the publisher, allow traffic from the replica to port 5432:

sudo ufw allow from REPLICA_PRIVATE_IP to any port 5432

Apply Changes

Restart PostgreSQL to apply all changes:

sudo systemctl restart postgresql

Troubleshooting Issues

If replication doesn’t seem to work, check the PostgreSQL logs on the replica for possible errors. The log file is typically located at: /var/log/postgresql/postgresql-10-main.log.

Common Issues and Solutions:

Private Network Not Enabled. Ensure both servers are in the same private network or correctly configured for cross-network access.
Incorrect IP Address Configuration. Verify that the server is listening on the correct private network IP address.
wal_level Not Set to logical. Double-check the wal_level parameter in the PostgreSQL configuration.
Firewall Blocking Connections. Confirm that the firewall is not blocking incoming connections on the required port (e.g., 5432).
Mismatch in Table or Field Names. Ensure that table and column names match the publisher and subscriber exactly.
Table Not Included in the Publication. Verify that the table is added to the publication on the publisher.

After addressing these issues, replication should resume automatically. If not, drop the existing subscription and recreate it:

DROP SUBSCRIPTION my_subscription;

Physical Replication Overview

PostgreSQL supports two types of replication: logical (discussed above) and physical replication. Here's a brief overview of physical replication.

Key Features of Physical Replication:

Introduced in PostgreSQL 9.0. Physical replication synchronizes databases at the file level.
Block-Level Synchronization. Changes are tracked using precise block addresses and replicated byte-by-byte.
Write-Ahead Log (WAL). Changes from the master are transmitted via WAL and applied on the standby server.

Limitations of Physical Replication:

No Partial Database Replication: You cannot replicate only a portion of the database.
High Overhead: All changes are transmitted, potentially increasing network load.
Platform Restrictions: Physical replication requires identical server platforms, including CPU architecture (e.g., Windows to Windows or Linux to Linux).
Version Compatibility: Databases on different PostgreSQL versions cannot synchronize.

Conclusion

This guide covered setting up and managing logical replication in PostgreSQL, including troubleshooting common issues. We also briefly touched on physical replication, highlighting its characteristics and limitations.

For simplified database management, consider cloud database services like Hostman, which offers managed PostgreSQL and other database solutions to streamline deployment and scaling.

PostgreSQL

11.12.2024

Reading time: 7 min

Similar

PostgreSQL

Logical Replication in PostgreSQL

Replication Process

Logical Replication in Practice

How It Works

Troubleshooting Issues

Physical Replication Overview

Conclusion

Similar

Upgrading PostgreSQL Version

How to Migrate a PostgreSQL Database to Another Server

Installing PostgreSQL on Debian

Do you have questions, comments, or concerns?

Do you have questions,
comments, or concerns?