When deploying applications, having more than one copy of the database is always beneficial. After creating copies, it is essential to ensure they are all synchronized. The process of synchronizing database copies is called replication.
Logical replication in PostgreSQL refers to the synchronization of copies without being tied to a specific physical data representation on a disk. It is independent of processor architecture, platform, or database system version. Synchronization is performed based on a replication identifier, which is typically the primary key.
Logical replication uses a publish-and-subscribe model.
In general, the replication process consists of the following steps:
Creating one or more publications on the publisher node.
Subscribing one or more subscribers to one or more publications.
Copying a snapshot of the publisher's database to the subscriber. This step is also known as the table synchronization phase. It is possible to create multiple table synchronization workers to reduce the time required for this phase. However, there can only be one synchronization process for each table.
Sending the subsequent changes made on the publisher node to the subscriber node. These changes are applied in the commit order to ensure transactional consistency.
The subscriber node fetches changes as they occur in the publisher's database in real-time, ensuring that the subscriber and publisher databases remain synchronized.
This mechanism ensures up-to-date data consistency across the replicated databases.
Suppose you want to set up logical replication on a single host. To achieve this, use different ports—for example, the publisher will operate on port 5432, and the subscriber on port 5431.
Edit the Configuration File
Start by editing the PostgreSQL configuration file:
sudo nano /etc/postgresql/10/main/postgresql.conf
Uncomment the wal_level
parameter and set it to logical. It should look like this:
wal_level = logical
Save and close the configuration file, then restart PostgreSQL:
sudo systemctl restart postgresql
Export Global Objects
On the master, execute the following command for the main database:
pg_dumpall --database=postgres --host=192.168.1.2 --no-password --globals-only --no-privileges | psql
The pg_dumpall
command exports databases in script format.
The --database
parameter specifies the database used for connecting and exporting global objects and locating other databases. By default, it uses the postgres
database.
The --globals-only
parameter ensures only global objects are exported, excluding the database contents.
For detailed information, consult the PostgreSQL documentation.
Export Schema on the Replica
On the replica, run:
pg_dump --dbname=db_name --host=192.168.1.2 --no-password --create --schema-only | psql
Prepare Data for Testing
Create a test table with two columns:
CREATE TABLE table1(x int primary key, y int);
The x
column will store the primary key.
The y
column will store integer values.
Insert a sample row:
INSERT INTO table1 VALUES(10, 11);
At this point, the table contains a single row where the primary key is 10 and the value is 11. This minimal dataset is enough to verify synchronization.
Create a Publication on the Master
Create a publication that replicates the desired table:
CREATE PUBLICATION my_publication FOR TABLE table1;
The FOR TABLE
parameter allows you to specify which tables to replicate.
You can limit the changes to be published or include additional tables later.
To create a publication for all existing and future tables, use the ALL TABLES
parameter. For more details, refer to the PostgreSQL documentation.
The publication named my_publication
is ready. Now it’s time to create a subscription on port 5431.
Recreate the Table on the Subscriber Node
On the subscriber, create the same table structure as on the publisher:
CREATE TABLE table1(x int primary key, y int);
Create a Subscription on the Replica
Create a subscription named my_subscription
:
CREATE SUBSCRIPTION my_subscription
CONNECTION 'host=localhost port=5432 dbname=postgres' PUBLICATION my_publication;
Verify Synchronization
Query the table on the subscriber:
SELECT * FROM table1;
This command will display the rows synchronized from the publisher. Initially, it should return the row added earlier (11 with the primary key 10).
The CREATE SUBSCRIPTION
command creates a subscription for the current database, which begins receiving logical changes from the publication my_publication
.
Upon execution, a logical replication worker is created to fetch changes from the publisher.
On the publisher side, a walsender process starts to read the WAL (Write-Ahead Log), decode changes, and send them to the subscriber.
To test the synchronization, add additional rows on the publisher:
INSERT INTO table1 VALUES(20, 21), (30, 31);
Verify that the subscriber displays these rows:
SELECT * FROM table1;
If you have multiple servers, additional configuration is required.
Allow Connections on the Publisher
On the main server, edit the configuration file to listen on the private IP address:
sudo nano /etc/postgresql/10/main/postgresql.conf
Locate the listen_addresses
parameter and modify it to include the private IP address of the master:
listen_addresses = 'localhost, MASTER_PRIVATE_IP'
Configure Access Control
Edit the pg_hba.conf
file on the publisher to allow incoming connections from the replica:
sudo nano /etc/postgresql/10/main/pg_hba.conf
Add the following line, replacing REPLICA_PRIVATE_IP
with the actual private IP address of the replica:
host replication postgres REPLICA_PRIVATE_IP/32 md5
Look for the comment:
# If you want to allow non-local connections, you need to add more.
Add your new rule below this line.
Firewall Configuration
On the publisher, allow traffic from the replica to port 5432:
sudo ufw allow from REPLICA_PRIVATE_IP to any port 5432
Apply Changes
Restart PostgreSQL to apply all changes:
sudo systemctl restart postgresql
If replication doesn’t seem to work, check the PostgreSQL logs on the replica for possible errors. The log file is typically located at: /var/log/postgresql/postgresql-10-main.log
.
Common Issues and Solutions:
Private Network Not Enabled. Ensure both servers are in the same private network or correctly configured for cross-network access.
Incorrect IP Address Configuration. Verify that the server is listening on the correct private network IP address.
wal_level Not Set to logical. Double-check the wal_level
parameter in the PostgreSQL configuration.
Firewall Blocking Connections. Confirm that the firewall is not blocking incoming connections on the required port (e.g., 5432).
Mismatch in Table or Field Names. Ensure that table and column names match the publisher and subscriber exactly.
Table Not Included in the Publication. Verify that the table is added to the publication on the publisher.
After addressing these issues, replication should resume automatically. If not, drop the existing subscription and recreate it:
DROP SUBSCRIPTION my_subscription;
PostgreSQL supports two types of replication: logical (discussed above) and physical replication. Here's a brief overview of physical replication.
Key Features of Physical Replication:
Introduced in PostgreSQL 9.0. Physical replication synchronizes databases at the file level.
Block-Level Synchronization. Changes are tracked using precise block addresses and replicated byte-by-byte.
Write-Ahead Log (WAL). Changes from the master are transmitted via WAL and applied on the standby server.
Limitations of Physical Replication:
No Partial Database Replication: You cannot replicate only a portion of the database.
High Overhead: All changes are transmitted, potentially increasing network load.
Platform Restrictions: Physical replication requires identical server platforms, including CPU architecture (e.g., Windows to Windows or Linux to Linux).
Version Compatibility: Databases on different PostgreSQL versions cannot synchronize.
This guide covered setting up and managing logical replication in PostgreSQL, including troubleshooting common issues. We also briefly touched on physical replication, highlighting its characteristics and limitations.
For simplified database management, consider cloud database services like Hostman, which offers managed PostgreSQL and other database solutions to streamline deployment and scaling.