Sign In
Sign In

Setting up replication in MySQL

Setting up replication in MySQL
Hostman Team
Technical writer
MySQL
19.06.2024
Reading time: 19 min

When working with databases, having multiple copies of your data can be beneficial. This provides redundancy in case one of the servers fails, as well as improves availability, scalability, and overall performance of the database. The practice of synchronizing data across multiple separate databases is called replication.

MySQL is a relational database management system and the most popular open-source relational database today. It has several built-in replication features that allow you to maintain multiple copies of your data.

In this guide, we will set up two servers with MySQL: the original database, the "source", and its "replica". We will also look at how MySQL performs replication.

Note: Historically, this type of replication was called "master-slave". In a post published in July 2020, the MySQL team acknowledged the negative connotation of this term and announced plans to update the program and accompanying documentation to use more inclusive language.

However, this process is not fully complete. While the documentation and most commands in MySQL version 8 have been updated to refer to servers as "source" and its "replicas", there are still places with the old terminology. By default, this guide will use the more modern terms and commands, but in some cases, it is unavoidable to refer to the concepts of master and slave.

Prerequisites

  • Two cloud servers running Ubuntu 20.04. Both should have a non-root user with sudo privileges and a firewall configured using the UFW utility.

  • MySQL installed on each server. In this guide, we will use the 8.0.25 version.

On Hostman, you can install hosted MySQL on your cloud server with one click.

The process described here involves designating MySQL on one server as the original database, the "source", and then setting up MySQL on another server as its copy, the "replica".

This guide also describes transferring existing data from the source to the replica. This process involves creating a snapshot of the source database and transferring the resulting file to the replica. For this, we recommend setting up SSH keys on the source server and then ensuring that the source's public key is copied to the replica.

How Replication Works in MySQL

In MySQL, replication involves the source database recording every data change in one or more databases into a special file known as the binary log. Once the replica starts, two parallel threads are created. The first, called the IO thread, connects to the source MySQL instance and reads events from the binary log line by line, then copies them to a local file on the replica server called the relay log. The second thread, called the SQL thread, reads events from the relay log and applies them to the replica as quickly as possible.

Recent versions of MySQL support two methods of data replication. The difference between them lies in how the replicas track which database events from the source they have already processed.

The traditional replication method is called position-based binary log file replication. If you set up a MySQL replica using this method, you must provide it with a set of coordinates from the binary log. These consist of the name of the log file on the source that the replica should read from and a specific position in that log. This position represents the first event in the source database that the replica should copy.

Since replicas receive a copy of the entire binary log from the source database, without the correct coordinates, they will start copying every recorded database event. This can cause issues if you want to replicate data only from a certain point in time or duplicate only a subset of the source data.

Position-based binary log file replication is suitable for many cases but can become cumbersome in more complex systems. This led to the development of a new MySQL replication method, sometimes called transaction-based replication. This method involves creating a Global Transaction Identifier (GTID) for each transaction, or isolated piece of work, that the source MySQL instance performs.

The mechanism of this replication is similar to position-based binary log file replication: each time a transaction occurs in the source, MySQL assigns and records a GTID for it in the binary log along with the transaction itself. The GTID and transaction are then sent to the replicas for processing.

Transaction-based replication has several advantages over the traditional method. For example, both the source and its replicas maintain the GTID, so if the source or a replica detects a transaction with an already processed GTID, they skip it. This helps ensure consistency between the source and its replicas. Additionally, with transaction-based replication, replicas do not need to know the binary log coordinates of the next database event. Starting new replicas or changing the order of replicas in the replication chain becomes much simpler.

Keep in mind that this is just an overview of how MySQL handles replication; MySQL provides many settings that you can adapt to your needs. In this guide, we set up position-based binary log file replication. If you want to set up a different replication environment, check the official MySQL documentation.

Step 1 — Configuring the Source Server's Firewall for MySQL Replication

Your firewalls on both servers should be configured using UFW. This will help protect both your servers; however, now the source's firewall will block any connection attempts from the MySQL replica.

To change this, you need to enable a new rule in UFW that will allow connections from the replica through the source's firewall. You can do this by running the following command on the source server. This command allows any connections originating from the replica's IP address (in this example, replica_server_ip) to the default MySQL port, 3306:

sudo ufw allow from replica_server_ip to any port 3306

Be sure to replace replica_server_ip with the actual IP address of your replica server.

After this, you don't need to change anything in the replica's firewall since this server will not receive any incoming connections, and UFW does not block outgoing connections to the source MySQL server. Now, let's move on to configuring the source.

Step 2 — Configuring the Source Database for MySQL Replication

To get your source MySQL database to start synchronizing data, you need to make a few changes to its configuration.

In Ubuntu 20.04, the default MySQL server configuration file is named mysqld.cnf and is located in the /etc/mysql/mysql.conf.d/ directory. Open this file on the source server in any text editor. Here, we will use nano:

sudo nano /etc/mysql/mysql.conf.d/mysqld.cnf

Find the bind-address directive in the file. By default, it looks like this:

/etc/mysql/mysql.conf.d/mysqld.cnf
...
bind-address        = 127.0.0.1
...

127.0.0.1 is the IPv4 address representing localhost. If you specify this value for the bind-address directive, MySQL will listen for connections only on the localhost address. In other words, this MySQL instance will only accept connections originating from the server it is installed on.

Remember that you are turning another MySQL instance into a replica of this one, so the replica needs to be able to read all new data written to the source. Therefore, you need to configure the source MySQL server to listen for connections on an address accessible to the replica, such as the source's public IP address.

Replace 127.0.0.1 with the source server's IP address. After this, the bind-address directive will look like this, but with your server's IP address instead of source_server_ip:

...
bind-address        = source_server_ip
...

Then, find the server-id directive, which defines the identifier by which MySQL distinguishes servers within the replication system. Each server in the system, including the source and all its replicas, must have its unique server-id value. By default, this directive is commented out and looks like this:

/etc/mysql/mysql.conf.d/mysqld.cnf
...
# server-id         = 1

Uncomment this line by removing the # sign. You can choose any number as the value for this directive, but remember that the number must be unique and cannot match other server-id values in your replication group. For simplicity, in the following example, this value remains the default (1):

/etc/mysql/mysql.conf.d/mysqld.cnf
...
server-id         = 1
...

Find the log_bin directive under the server-id line. It specifies the name and location of the MySQL binary log.

By default, this directive is commented out, so binary logging is disabled. To know when and how to start replication, your replica server needs to read the source's binary log, so uncomment this line to enable logging on the source. After this, it will look like this:

/etc/mysql/mysql.conf.d/mysqld.cnf
...
log_bin                   = /var/log/mysql/mysql-bin.log
...

Finally, scroll down to the end of the file to find the commented binlog_do_db directive:

/etc/mysql/mysql.conf.d/mysqld.cnf
...
# binlog_do_db      = include_database_name

Remove the # sign to uncomment this line, and replace include_database_name with the name of the database you want to replicate. In this example, the binlog_do_db directive points to a database named db. If you already have an existing database on the source that you want to replicate, use its name instead of db:

/etc/mysql/mysql.conf.d/mysqld.cnf
...
binlog_do_db      = db

Note: If you are replicating more than one database, you can add another binlog_do_db directive for each database you need to add to the group. In this guide, we will continue with replicating just one database, but if you have several, it will look something like this:

/etc/mysql/mysql.conf.d/mysqld.cnf
...
binlog_do_db      = db
binlog_do_db      = db_1
binlog_do_db      = db_2

Alternatively, you can specify which MySQL databases should not be duplicated by adding a binlog_ignore_db directive for each of them:

/etc/mysql/mysql.conf.d/mysqld.cnf
...
binlog_ignore_db      = db_to_ignore
...

After making these changes, save and close the file. If you used nano, press CTRL + X, Y, and then ENTER.

Restart MySQL by running the following command:

sudo systemctl restart mysql

Now this MySQL instance is ready to act as the source database, which your MySQL server will replicate. However, before configuring the replica, we need to perform a few more steps on the source to ensure that replication works correctly. To begin with, we need to create a special MySQL user account that will perform all the replication-related actions.

Step 3 — Creating a User Account for MySQL Replication

Each replica in a MySQL environment connects to the source database using a username and password. Replicas can connect using any MySQL user that exists in the source database and has the appropriate privileges, but in this guide, we will create a special user account for this purpose.

Start by opening the MySQL shell:

sudo mysql

Note: If you have set up a password-authenticated user account, you can connect to MySQL using the following command:

mysql -u username -p

Replace username with your user name and enter the password when prompted.

Keep in mind that some operations in this guide, including those performed on the replica server, require elevated privileges. For this reason, connecting with administrator privileges may be more convenient. If you still want to use a MySQL user with limited rights, they must at least be granted the CREATE USER, RELOAD, REPLICATION CLIENT, REPLICATION SLAVE, and REPLICATION_SLAVE_ADMIN privileges.

Create a new MySQL user. In this example, a user named replica_user is created, but you can use any name you prefer. Be sure to replace replica_server_ip with the public IP address of your replica server and set a more secure password instead of the default password:

CREATE USER 'replica_user'@'replica_server_ip' IDENTIFIED WITH mysql_native_password BY 'password';

Note that this command specifies that replica_user will use the mysql_native_password plugin for authentication. You can instead use the default mechanism in MySQL, caching_sha2_password, but this will require setting up an encrypted connection between the source and the replica. This type may be suitable for production environments, but setting up encrypted connections is beyond the scope of this guide. If you wish, you can find instructions for setting up a replication environment with encrypted connections in the MySQL documentation.

After creating the new user, grant them the appropriate privileges. At a minimum, such a user should have the REPLICATION SLAVE permissions:

GRANT REPLICATION SLAVE ON *.* TO 'replica_user'@'replica_server_ip';

Next, it is recommended to run the FLUSH PRIVILEGES command. This will clear all cached memory on the server for the previous CREATE USER and GRANT statements:

FLUSH PRIVILEGES;

With this, we have finished setting up the replication user in the source MySQL. However, do not exit the MySQL shell yet. You will need it in the next step to obtain important information about the binary log of the source database, so leave it open for now.

Step 4 — Obtaining Binary Log Coordinates from the Source Database

As mentioned in the "How Replication Works in MySQL" section, MySQL performs replication by line-by-line copying of database events from the source's binary log and executing each event on the replica. When using position-based binary log file replication, you must provide the replica with a set of coordinates detailing the name of the source binary log and the specific position within that file. The replica then uses these coordinates to determine the point in the log file from which it should start copying database events and track which events it has already processed.

In this step, we will see how to obtain the current coordinates of the source database's binary log to start copying data on the replicas from the last line in the log. To avoid issues related to other users changing the data while obtaining the coordinates, you will need to lock the database so that no one can read or write information. This will not take long but will pause your database's operation.

Execute the following command in the open MySQL shell on the source server. It will close all open tables in each database in the source instance and lock them:

FLUSH TABLES WITH READ LOCK;

Then execute the following command to output information about the current state of the source's (master's) binary log:

SHOW MASTER STATUS;

This position is where the replica will start copying database events. Record (remember or write down) the File and Position values; you will need them later when you start the replication.

Your next steps depend on whether there are any existing data in the source database that you want to transfer to your replicas. Proceed to the appropriate subsection.

If Your Source Has No Data to Transfer

If you have just installed this MySQL or there is no existing data that you want to transfer to the replica, you can unlock the tables at this point:

UNLOCK TABLES;

You can create a database for master-slave replication while you still have the MySQL shell open. As in the example from Step 2, the following command will create a database named db:

CREATE DATABASE db;

After that, close the MySQL shell:

exit

Now you can proceed to the next step.

If Your Source Has Data to Transfer

You can transfer existing data from the source database to the replica by creating a snapshot (copy) of the database using the mysqldump utility. However, your database is still locked. If you make any new changes in the same window or close it, it will automatically unlock, which could lead to problems. Unlocking the tables means that data in the database may change. This could potentially lead to discrepancies between your snapshot and the binary log coordinates you just obtained.

Therefore, you should open a new terminal window or tab on your local computer. This will allow you to create a database snapshot without unlocking MySQL.

In the new window or tab, open another SSH session to the server hosting the source MySQL instance:

ssh username@source_server_ip

Then, in the new tab or window, export your database using mysqldump. In this example, a dump file named db.sql is created from the database named db. Make sure to use the name of your database. Also, remember to run this command in the bash shell, not in the MySQL shell:

sudo mysqldump -u root db > db.sql

Now you can close this window or tab and return to the original window where the MySQL shell should still be open. In the MySQL command line, unlock the databases to make them writable again:

UNLOCK TABLES;

Exit the MySQL shell:

exit

You can now send the snapshot to your replica server. Assuming you have already set up SSH keys on the source server and added the source's public key to the authorized_keys file of your replica, you can securely send the snapshot using the scp command:

scp db.sql username@replica_server_ip:/tmp/

Remember to replace username with the name of the Ubuntu administrative user created on the replica server and change replica_server_ip to the IP address of your replica server. Also, note that this command places the snapshot in the replica server's /tmp directory.

After sending the snapshot to the replica server, connect to it via SSH:

ssh username@replica_server_ip

Open the MySQL shell:

sudo mysql

Create the database that you will copy from the source:

CREATE DATABASE db;

You do not need to create tables or load any sample data into this database. Everything will be filled in automatically when you import the database using the snapshot you just created. Instead, exit the MySQL shell:

exit

Import the snapshot:

sudo mysql db < /tmp/db.sql

Now your master-slave replica contains all the existing data from the source database. Let's move on to the final step to configure the replica server to start replication.

Step 5 — Configuring the Replica

Now we need to change the configuration of the replica, similar to how you changed the source database. Open the MySQL configuration file, mysqld.cnf, on the replica server:

sudo nano /etc/mysql/mysql.conf.d/mysqld.cnf

As mentioned earlier, each MySQL instance in the environment must have a unique server-id value. Find the server-id directive on the replica, uncomment it, and change its value to any positive integer different from the source database:

/etc/mysql/mysql.conf.d/mysqld.cnf
server-id           = 2

Next, update the log_bin and binlog_do_db values to match the values you set in the source server's configuration file:

/etc/mysql/mysql.conf.d/mysqld.cnf
...
log_bin             = /var/log/mysql/mysql-bin.log
...
binlog_do_db        = db
...

Finally, add the relay-log directive, defining the location of the replica's relay log file. Include this line at the end of the configuration file:

/etc/mysql/mysql.conf.d/mysqld.cnf
...
relay-log           = /var/log/mysql/mysql-relay-bin.log

After making these changes, save and close the file. Then restart MySQL on the replica to apply the new settings:

sudo systemctl restart mysql

After restarting the MySQL service, the replication is ready to start.

Step 6 — Starting and Testing Replication

By this point, both of your MySQL instances are fully configured to begin master-slave replication. To start it, open the MySQL shell on the replica server:

sudo mysql

In the command line, execute the command below; it sets several MySQL replication parameters simultaneously. After running this command and starting replication, the server will attempt to connect to the IP address in SOURCE_HOST using the login and password (SOURCE_USER and SOURCE_PASSWORD, respectively). It will also look for the binary log named SOURCE_LOG_FILE and begin reading it from position SOURCE_LOG_POS.

Make sure to replace source_server_ip with the IP address of the source server. Similarly, replica_user and password should match the replication user you created in Step 3; mysql-bin.000001 and 899 should reflect the binary log coordinates obtained in Step 4.

You can enter this command in a text editor before running it on the server to make it easier to replace all the necessary information:

CHANGE REPLICATION SOURCE TO 
SOURCE_HOST='source_server_ip', 
SOURCE_USER='replica_user', 
SOURCE_PASSWORD='password', 
SOURCE_LOG_FILE='mysql-bin.000001', 
SOURCE_LOG_POS=899;

Now activate the replica server:

START REPLICA;

If you entered all the information correctly, this instance will start replicating all changes made to the db database on the source.

You can get all the details about the current status of the replica with the following command. The \G modifier restructures the text, making it more readable:

SHOW REPLICA STATUS\G;

This command outputs a lot of useful information that can be used during debugging and troubleshooting.

Note: If the replica has connection issues or replication unexpectedly stops, an event in the source's binary log might be blocking replication. In such cases, try running the SET GLOBAL SQL_SLAVE_SKIP_COUNTER command to skip a certain number of events following the binary log position specified in the previous command. For example, to skip only the first event:

SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1;

Then you will need to restart the replica:

START REPLICA;

If you ever need to stop replication, run this command on the replica server:

STOP REPLICA;

Your replica now copies data from the source. All changes made to the source database will be reflected in the MySQL replica instance. To test, let's create a test table and verify the successful execution of replication.

Open the MySQL shell on the source server:

sudo mysql

Select the database for replication:

USE db;

Create a table in it. In this example, we have a table named example_table with one column example_column:

CREATE TABLE example_table (
example_column varchar(30)
);

If desired, you can add some data:

INSERT INTO example_table VALUES 
('First row'), 
('Second row'), 
('Third row');

After this, return to the MySQL shell on your replica server and select the copied database:

USE db;

Then run the SHOW TABLES statement to display a list of all tables in the selected database:

SHOW TABLES;

If MySQL replication is working correctly, you will see the newly created table in the list.

Additionally, if you added data to the table on the source, you can check it by running a similar query:

SELECT * FROM example_table;

In SQL, the asterisk (*) means "all columns". Essentially, this query tells MySQL to display every column from example_table. If MySQL replication is working correctly, this operation will show this data.

If none of these operations output the table or data from the source, there might be an error somewhere in the configuration. To find the problem, you can try running the SHOW REPLICA STATUS\G operation. Additionally, you can refer to the MySQL documentation for troubleshooting replication.

Conclusion

With this guide, we have set up a position-based binary log replication environment in MySQL with one source and one replica. But remember, this is just one way to set up replication in MySQL. MySQL offers several different master-slave synchronization options that you can choose and configure according to your needs. There are also several third-party tools, such as Galera Cluster, which can be used to extend the built-in MySQL replication features.

MySQL
19.06.2024
Reading time: 19 min

Similar

MySQL

How to Find and Delete Duplicate Rows in MySQL with GROUP BY and HAVING Clauses

Duplicate entries may inadvertently accumulate in databases, which are crucial for storing vast amounts of structured data. These duplicates could show up for a number of reasons, including system errors, data migration mistakes, or repeated user submissions. A database with duplicate entries may experience irregularities, sluggish performance, and erroneous reporting. Using the GROUP BY and HAVING clauses, as well as a different strategy that makes use of temporary tables, we will discuss two efficient methods for locating and removing duplicate rows in MySQL. With these techniques, you can be sure that your data will always be accurate, clean, and well-organized. Database duplication in MySQL tables can clog your data, resulting in inaccurate analytics and needless storage. Locating and eliminating them is a crucial database upkeep task. This is a detailed guide on how to identify and remove duplicate rows. If two or more columns in a row have identical values, it is called a duplicate row. For instance, rows that have the same values in both the userName and userEmail columns of a userDetails table may be considered duplicates. Benefits of Removing Duplicate Data The advantage of eliminating duplicate data is that duplicate entries can slow down query performance, take up extra storage space, and produce misleading results in reports and analytics. The accuracy and speed of data processing are improved by keeping databases clean, which is particularly crucial for databases that are used for critical applications or are expanding. Requirements Prior to starting, make sure you have access to a MySQL database or have MySQL installed on your computer. The fundamentals of general database concepts and SQL queries. One can execute SQL commands by having access to a MySQL client or command-line interface. To gain practical experience, you can create a sample database and table that contains duplicate records so that you can test and comprehend the techniques for eliminating them. Creating a Test Database Launch the MySQL command-line tool to create a Test Database. mysql -u your_username -p Create a new database called test_dev_db after entering your MySQL credentials. CREATE DATABASE test_dev_db; Then, switch to this newly created database:. USE test_dev_db; Add several rows, including duplicates, to the userDetails table after creating it with the CREATE TABLE query and INSERT query below. CREATE TABLE userDetails ( userId INT AUTO_INCREMENT PRIMARY KEY, userName VARCHAR(100), userEmail VARCHAR(100) ); INSERT INTO userDetails (userName, userEmail) VALUES (‘Alisha’, ‘[email protected]’), (‘Bobita, ‘[email protected]’), (‘Alisha’, ‘[email protected]’), (‘Alisha’, ‘[email protected]’); Using GROUP BY and HAVING to Locate Duplicates Grouping rows according to duplicate-defining columns and using HAVING to filter groups with more than one record is the simplest method for finding duplicates. Now that you have duplicate data, you can use SQL to determine which rows contain duplicate entries. MySQL's GROUP BY and HAVING clauses make this process easier by enabling you to count instances of each distinct value. An example of a table structure is the userDetails table, which contains the columns userId, userName, and userEmail. The GROUP BY clause is useful for counting occurrences and identifying duplicates because it groups records according to specified column values. The HAVING clause  allows duplicate entries in groups formed by GROUP BY to be found by combining groups based on specific criteria. Table userDetails Structure userId userName userEmail 1 Alisha  [email protected] 2 Bobita  [email protected] 3 Alisha  [email protected] 4 Alisha  [email protected] In the above table userDetails, records with identical userName and userEmail values are considered duplicates. Finding Duplicates Query for find the duplicate entries: SELECT userName, userEmail, COUNT(*) as count FROM userDetails GROUP BY userName, userEmail HAVING count > 1; Rows are grouped by userName and userEmail in the aforementioned query, which also counts entries within the group and eliminates groups with a single entry (no duplicates). Explanation: SELECT userName, userEmail, COUNT(*) as count: Retrieves the count of each combination of username and userEmail, as well as their unique values. GROUP BY userName, userEmail: Records are grouped by username and user email using the GROUP BY userName, userEmail function COUNT (*): Tallies the rows in each set. HAVING occurrences > 1: Recurring entries are identified by displaying only groups with more than one record. This query will return groups of duplicate records based on the selected columns. userName userEmail count Alisha [email protected] 3 Eliminating Duplicate Rows After finding duplicates, you may need to eliminate some records while keeping the unique ones. Joining the table to itself and removing rows with higher userId values is one effective method that preserves the lowest userId for every duplicate. Use the SQL query to remove duplicate rows while keeping the lowest userId entry. DELETE u1 FROM userDetails u1 JOIN userDetails u2 ON u1. userName = u2. userName AND u1. userEmail = u2. userEmail AND u1. userId > u2. userId ; Explanation: u1 & u2: Aliases for the userDetails table to ease a self-join. ON u1. userName = u2. userName AND u1. userEmail = u2. userEmail: Matches rows with identical userName, userEmail. AND u1. userId > u2. userId: Removes rows with higher userId values, keeping only the row with the smallest userId. Because this action cannot be undone, it is advised that you backup your data before beginning the deletion procedure. Confirming Duplicate Removal To confirm that all duplicates have been removed, repeat the Step 1 identification query. SELECT userName, userEmail, COUNT(*) as count FROM userDetails GROUP BY userName, userEmail HAVING count > 1; All duplicates have been successfully eliminated if this query yields no rows. Benefits of Employing GROUP BY and HAVING The GROUP BY and HAVING clauses serve as vital instruments for the aggregation of data and the filtration of grouped outcomes. These functionalities are especially useful for detecting and handling duplicate entries or for condensing extensive datasets. Below are the primary benefits of employing these clauses. Efficient Identification of Duplicates Data Aggregation and Summarization Filtering Aggregated Results with Precision Versatility Across Multiple Scenarios Compatibility and Simplicity Enhanced Query Readability Support for Complex Aggregations The GROUP BY and HAVING clauses serve as essential instruments for data aggregation, identifying duplicates, and filtering results. Their effectiveness, ease of use, and adaptability render them crucial for database management and data analysis activities, allowing users to derive insights and handle data proficiently across a variety of applications. Identifying Duplicates Using a Temporary Table When dealing with large datasets, it can be easier and more efficient to separate duplicates using a temporary table before deleting them. Creating the Table Make a temporary table to store duplicate groups according to predetermined standards (e.g. A. username, along with userEmail. CREATE TEMPORARY TABLE temp_view_duplicates AS SELECT username, userEmail, MIN (userId) AS minuid FROM userDetails GROUP BY username, userEmail, HAVING COUNT(*) > 1; Explanation: CREATE TEMPORARY TABLE temp_view_duplicates AS: Creates a temporary table named temp_view_duplicates. SELECT userName, userEmail, MIN(userId) AS minuid: Groups duplicates by userName and userEmail, keeping only the row with the smallest userId. GROUP BY userName, userEmail: Groups rows by userName, userEmail. HAVING COUNT(*) > 1: Filters only groups with more than one row, identifying duplicates. This temporary table will now contain one representative row per duplicate group (the row with the smallest id). Deleting Duplicates from the Main Table Now that we have a list of unique rows with duplicates in the temp_view_duplicates table, we can use the temporary table to remove duplicates while keeping only the rows with the smallest userId. Use the following DELETE command: DELETE FROM userDetails WHERE (username, userEmail) IN ( SELECT username, userEmail FROM temp_view_duplicates ) AND userId NOT IN ( SELECT minuid FROM temp_view_duplicates ); Explanation: WHERE (username, userEmail,) IN: Targets only duplicate groups identified in temp_view_duplicates. AND userId NOT IN (SELECT minuid FROM temp_view_duplicates): Ensures that only duplicate rows (those with higher userId values) are deleted. Verifying Results To confirm that duplicates have been removed, query the userDetails table: SELECT * FROM userDetails; Only unique rows should remain. Temporary tables (CREATE TEMPORARY TABLE) are automatically dropped when the session ends, so they don’t persist beyond the current session. When making extensive deletions, think about utilizing a transaction to safely commit or undo changes as necessary. Key Advantages of Using a Temporary Table Lower Complexity: By isolating duplicates, the removal process is simpler and clearer. Enhanced Efficiency: It's faster for large datasets, as it avoids repeated joins. Improved Readability: Using a temporary table makes the process more modular and easier to understand. Conclusion Eliminating duplicate records is essential for maintaining a well-organized database, improving performance, and ensuring accurate reporting. This guide presented two approaches: Direct Method with GROUP BY and HAVING Clauses: Ideal for small datasets, using self-joins to delete duplicates. Temporary Table Approach: More efficient for larger datasets, leveraging temporary storage to streamline deletion. Choose the method that best fits your data size and complexity to keep your database clean and efficient.
19 November 2024 · 8 min to read
MySQL

Installing MariaDB on Ubuntu 22.04

MariaDB is an open-source relational database management system, which has made it a popular alternative to MySQL. It is often included in LAMP stacks (which consist of Linux, Apache, MySQL, and PHP, sometimes with Python and Perl compilers). This document provides a brief guide to setting up MariaDB. The setup involves three steps: updating the package index, installing the mariadb-server, and activating a security script restricting unauthorized access to the remote host.   The sequence of commands looks like this: sudo apt update sudo apt install mariadb-server sudo mysql_secure_installation For this tutorial, we will use a Hostman cloud server with Ubuntu 22.04 installed. Before diving into this guide, we recommend performing initial setup steps, such as creating a user account with sudo privileges and setting basic UFW firewall rules. Step 1: Installing MariaDB Repositories are regularly updated to include stable versions of utilities. Distributions, on the other hand, include outdated releases that were current at the time of the build, so they need to be updated to avoid compatibility issues. This procedure is executed with the following command: sudo apt update Now we can install the package we need: sudo apt install mariadb-server The installation proceeds without prompting for a password or making any other configuration changes. Using MariaDB in this form on the server is not advisable, as it will operate in an insecure mode. To rectify this situation, we will apply the mysql_secure_installation script that comes with the software. This script will restrict access to the server and eliminate unauthorized accounts. Step 2: Setting Up MariaDB The security script modifies the insecure options that are set by default. For instance, it creates test accounts and allows remote connections using the root account. This potentially poses a risk of hacking and unauthorized access to the information stored in the created database. To run the script, use the following command: sudo mysql_secure_installation This command will initiate a series of prompts that allow you to change the database's security parameters.  The first prompt relates to the root account, and the system will request the password for the active database. Press Enter. This indicates that there is currently no protection. Enter current password for root (enter for none): Switch to unix_socket authentication [Y/n] Enter n and press Enter Change the root password? [Y/n] Enter Y and press Enter. New password: Re-enter new password: Enter and re-enter the new root user password and press Enter. Answer Yes (Y) to all the following prompts. Remove anonymous users? [Y/n] Disallow root login remotely? [Y/n] Remove test database and access to it? [Y/n] Reload privilege tables now? [Y/n] The system will remove the test database and anonymous accounts, disable access through the root account, and load the updated rules.  The installation and configuration of the MariaDB package are complete, and you can now begin using it. Alternatively, you can proceed to an optional step to create an administrator account to enable password access. Step 3: Create an Administrator Account By default, on Ubuntu, MariaDB is installed using the unix_socket plugin, which means that password prompts do not apply. In most cases, this approach provides a high level of security. However, it also complicates administrative tasks, such as those done through phpMyAdmin. When starting or stopping the server or managing logs, the root account is used. That is why we did not change its data. However, during a package update, these settings may change on their own, so it's better to enable password authentication right away. As an example, we will create an account named admin and give it the same privileges as the root account. First, open the MariaDB command line: sudo mariadb Next, create the new user: GRANT ALL ON *.* TO 'admin'@'localhost' IDENTIFIED BY 'password' WITH GRANT OPTION; Replace admin and password with any preferred combinations.  After creating the account, flush the privileges while keeping the settings in the current session: FLUSH PRIVILEGES; Now you can close the shell: exit; Next, you should test MariaDB to ensure that the settings are correct. Step 4: Diagnostics When the MariaDB is installed from the official repository, it automatically configures the settings to ensure that the MariaDB module starts automatically. However, it's still a good practice to manually check its status: sudo systemctl status mariadb The output on the screen will look something like this: If the utility is not running, you will need to start it manually and also enable the service: sudo systemctl enable mariadb sudo systemctl start mariadb After forcibly starting the service, you can make a test connection to the database using mysqladmin. It allows you to interact with the database with administrative rights, execute commands, and change settings. Here’s an example of connecting and displaying the version number: sudo mysqladmin version The output on the screen will look like this: If access was configured using the administrator password, you can use the command: mysqladmin -u admin -p version The current version output confirms that the database is running and functioning, and that the user has access to its contents. Conclusions We have completed an overview of the installation and configuration for the MariaDB database management system. We discussed methods to protect against unauthorized access to the database and the creation of a new user who will have access to information equal to that of the root user.
07 November 2024 · 5 min to read
MySQL

How to Calculate Age From Date of Birth Using TIMESTAMPDIFF in MySQL

Calculating age from a date of birth (DOB) is a common requirement in many applications. In MySQL, this can be efficiently achieved using the TIMESTAMPDIFF function. This tutorial will guide you through the process of calculating age using TIMESTAMPDIFF, handling edge cases, and integrating the query into applications. Prerequisites Before diving into the tutorial, ensure you have: A MySQL database set up and accessible. Basic knowledge of SQL queries and MySQL functions. A table with a date of birth column to work with. Overview of TIMESTAMPDIFF Function The TIMESTAMPDIFF function calculates the difference between two dates based on the specified unit of time (e.g., years, months, days). For calculating age, you will use TIMESTAMPDIFF to find the difference in years between the current date and the date of birth. TIMESTAMPDIFF(unit, datetime1, datetime2) unit: The unit of time for the result (e.g., YEAR, MONTH, DAY). datetime1: The first date (usually the date of birth). datetime2: The second date (usually the current date). Writing the Basic TIMESTAMPDIFF Query To calculate age from a date of birth, use the following query: SELECT TIMESTAMPDIFF(YEAR, date_of_birth, CURDATE()) AS ageFROM users; In this query: YEAR specifies that the result should be in years. date_of_birth is the column containing the date of birth. CURDATE() returns the current date. Handling Edge Cases When calculating age, consider the following edge cases: Leap Years Leap years do not significantly affect age calculations, as TIMESTAMPDIFF accurately accounts for these in its calculations. Birthdays on February 29 For individuals born on February 29, TIMESTAMPDIFF will handle their age calculation correctly, but be aware of potential issues if you use functions that do not recognize leap years. Different Date Formats Ensure that the date format stored in the database matches MySQL's date format (YYYY-MM-DD). If you encounter format issues, use the STR_TO_DATE function to convert strings to date formats. Practical Examples and Use Cases Here are some practical examples of using TIMESTAMPDIFF: Example 1: Calculate Age for a Specific User SELECT TIMESTAMPDIFF(YEAR, '1990-05-15', CURDATE()) AS age; This query calculates the age of someone born on May 15, 1990. Example 2: Age Calculation for All Users SELECT name, TIMESTAMPDIFF(YEAR, date_of_birth, CURDATE()) AS ageFROM users; This query retrieves names and ages of all users from the users table. Integrating the Query in Applications To integrate this query into an application: In a PHP Application: $query = "SELECT TIMESTAMPDIFF(YEAR, date_of_birth, CURDATE()) AS age FROM users";$result = mysqli_query($conn, $query); In a Python Application:  query = "SELECT TIMESTAMPDIFF(YEAR, date_of_birth, CURDATE()) AS age FROM users"  cursor.execute(query) Ensure that your application handles database connections securely and efficiently. Performance Considerations The TIMESTAMPDIFF function is optimized for performance, but be mindful of the following: Indexes: Ensure that the date_of_birth column is indexed to speed up queries. Query Optimization: For large datasets, consider optimizing queries to improve performance. Troubleshooting Common Issues Here are some common issues and their solutions: Incorrect Results Issue: Age calculation is incorrect. Solution: Ensure that dates are correctly formatted and the date_of_birth column contains valid date values. Query Errors Issue: Syntax or execution errors. Solution: Verify that the SQL syntax is correct and that you are using valid MySQL functions. Conclusion Calculating age from a date of birth using TIMESTAMPDIFF in MySQL is straightforward and efficient. By following the steps outlined in this tutorial, you can accurately determine age and handle various edge cases. Integrate these calculations into your applications and optimize performance for the best results.
10 September 2024 · 4 min to read

Do you have questions,
comments, or concerns?

Our professionals are available to assist you at any moment,
whether you need help or are just unsure of where to start.
Email us
Hostman's Support