Sign In
Sign In

Working with SQL Databases in Python: A Detailed Guide

Working with SQL Databases in Python: A Detailed Guide
Hostman Team
Technical writer
Python SQL
17.07.2024
Reading time: 12 min

In today's world, where the volume of information is rapidly growing, its effective processing becomes a key point in any field of human activity. In this context, databases act as a repository, allowing information not only to be stored but also to be managed efficiently. In the tutorial below, we will learn how to work with databases in Python.

SQL (Structured Query Language) plays a significant role in the organization of databases. This query language provides a convenient way to interact with data, allowing you to create, modify, and retrieve information. The combination of Python and SQL provides convenient tools for automating processes and will enable developers to create flexible and scalable applications.

Working with SQL in Python is done through database management systems (DBMS). There are many DBMS, each with its own characteristics and purposes. From classical relational databases such as MySQL and PostgreSQL to NoSQL solutions such as MongoDB, the choice of a suitable DBMS depends on the specific project requirements. In this guide, we will look at systems such as:

  • SQLite

  • MySQL

  • PostgreSQL

In this guide, we will go through the key steps of interacting with SQL databases in Python, providing detailed instructions on how to use basic operations with examples from different DBMS. Let's start with installing the libraries and connecting to the database.

Installing the Libraries

Before working with SQL databases in Python, you need to install the appropriate libraries. Each database has its own package; let's look at some of them.

To work with SQLite, you need the sqlite3 library. This library is included in the standard Python library, so there's no need to install it separately. However, for working with MySQL and PostgreSQL, we need to install the libraries.

For MySQL:

pip install mysql-connector-python

For PostgreSQL:

pip install psycopg2

A general way to install for most databases:

pip install SQLAlchemy

The SQLAlchemy library provides an abstraction layer over various DBMS, allowing developers to interact with different systems without changing the core code. Additionally, it includes an ORM (Object-Relational Mapping) mechanism, which is necessary for working with databases by representing information as Python objects. Instead of writing direct SQL queries, you can interact with the database using Python objects that map to tables in the database.

After installing the appropriate libraries, we will be able to establish a connection to the SQL database directly from the Python runtime environment.

Connecting to the Database

Establishing a connection to the SQL database is an important step before starting to work with the information repository. This step ensures interaction between your Python code and the physical information storage. The process of connecting depends on the DBMS used.

For interacting with SQLite, you need to import the sqlite3 library and then connect to the database stored in a file:

import sqlite3 as sq
connection = sq.connect('your_database_name.db')

If the database file does not exist, it will be created automatically.

For interacting with MySQL in Python, the mysql.connector library is used. Here is an example of using this library to establish a connection:

import mysql.connector as con

connection = con.connect(
      host='your_actual_host',
      user='your_actual_user',
      password='your_actual_password',
      database='your_actual_database'
)

In this example:

  • your_actual_host — your MySQL host address;

  • your_actual_user — your username for database access;

  • your_actual_password — your user password;

  • your_actual_database — the name of the database you want to interact with.

For interacting with PostgreSQL in Python, the psycopg2 library is used. Here is an example of using psycopg2 to create a connection:

import psycopg2 as ps

connection = ps.connect(
     host='your_actual_host',
     user='your_actual_user',
     password='your_actual_password',
     database='your_actual_database'
)

When using SQLAlchemy, the connection can be established as follows:

from sqlalchemy import create_engine

database_url = 'sqlite:///your_database.db'
engine = create_engine(database_url)

If you use SQLAlchemy, the URL is a standard way of representing connection parameters. Depending on the type of database, it can include information about the host, user, password, port, and other parameters.

After successfully creating the connection, the next step is to create a cursor object. The cursor provides the program with an interface to move through the results of a query, retrieve information, and make changes to records. This step is independent of the database used, as it provides a common interface for executing SQL queries:

cursor = connection.cursor()

Now everything is ready to execute SQL queries in Python and interact with the database.

When you finish working with the cursor, for example, after performing all operations, it is important to close both the cursor and the connection:

cursor.close()
connection.close()

This is critical for efficient resource management and preventing memory leaks.

After successfully establishing a connection to the database, you are ready to start executing SQL queries and various operations.

Creating Tables and Schemas in DBMS

Before interacting with the database, you will need to create an SQL table in Python. Tables are entities where each row corresponds to a specific record, and columns define various attributes of this record. This is an important stage of database design, where the structure and types of records that will be stored in the database are defined.

Here is an example of creating a table for SQLite:

products_tb = '''
CREATE TABLE IF NOT EXISTS products (
     product_id INTEGER PRIMARY KEY AUTOINCREMENT,
     product_nm TEXT NOT NULL,
     price REAL CHECK (price >= 0),
     stock_quantity INTEGER CHECK(stock_quantity >= 0)
)
'''

This SQL query defines the products table with fields:

  • product_id, a unique product identifier serving as the primary key;

  • product_nm, a text field for the product name. NOT NULL indicates that this field must contain information;

  • price, the product price, which can contain decimal values;

  • stock_quantity, the quantity of products in stock, this field can only contain integers.

The combination of all these fields creates a table where each record (row) represents a separate product, and each column contains information about that product.

After creating the table, you need to execute the SQL query in Python and apply the changes:

cursor.execute(products_tb)
connection.commit()

After completing these steps, you will have a products table created in the SQLite database, ready for use. We can open the database in an SQLite database browser, such as DB Browser for SQLite, and see the created table.

Note that creating a table needs to be done only once, which is why there is a check IF NOT EXISTS.

Similarly, here is an SQL query to create a table for MySQL:

products_tb = '''
CREATE TABLE IF NOT EXISTS products (
     product_id INT AUTO_INCREMENT PRIMARY KEY,
     product_nm VARCHAR(255) NOT NULL,
     price DECIMAL(10, 2),
     stock_quantity INT
)
'''

cursor.execute(products_tb)

connection.commit()

And here is the creation of the products table in PostgreSQL:

products_tb = '''
    CREATE TABLE IF NOT EXISTS products (
        product_id SERIAL PRIMARY KEY,
        product_nm VARCHAR(255) NOT NULL,
        price DECIMAL(10, 2),
        stock_quantity INT
    );
'''

cursor.execute(products_tb)

connection.commit()

All these examples demonstrate creating tables for different DBMS. You can adapt the SQL queries depending on your database structure requirements.

Executing SQL Queries with Python

After successfully creating tables, we proceed to perform basic operations known as CRUD (Create, Read, Update, Delete). These operations allow us to add, read, update, and delete information in the database. Let’s delve into each of these operations in detail.

CREATE Operation

The CREATE operation in SQL is responsible for adding new records to a table. We use the INSERT statement to add records by specifying values for each column. Here are examples of adding a new record for each database:

Add information about a new product to the table we created in the previous section:

new_product = ('Laptop', 99.999, 10)

SQL query to add records for SQLite:

request_to_insert_data = '''
INSERT INTO products (product_nm, price, stock_quantity) VALUES (?, ?, ?);
'''

SQL query to add records for MySQL and PostgreSQL:

request_to_insert_data = '''
INSERT INTO products (product_nm, price, stock_quantity) VALUES (%s, %s, %s);
'''

To execute the SQL query for any database:

cursor.execute(request_to_insert_data, new_product)
connection.commit()

The cursor.execute() command executes the query using the given SQL statement and data, adding a new record to the database. The connection.commit() command applies the changes.

READ Operation

The READ operation retrieves data from the table. We use the SELECT statement to perform this procedure. The syntax is the same for all three databases (SQLite, MySQL, and PostgreSQL):

request_to_read_data = "SELECT * FROM products"

cursor.execute(request_to_read_data)

data = cursor.fetchall()

for row in data:
     print(data)

This code forms an SQL query to select all data from the products table, executes the query with the cursor, fetches the result, and prints it.

UPDATE Operation

The UPDATE operation modifies existing records using the UPDATE statement. Let’s change the price of the product with product_id = 1 to a new value of 109.99:

new_price = 109.99
product_id_to_update = 1

SQL query for updating records for SQLite:

request_to_update_data = "UPDATE products SET price = ? WHERE product_id = ?"

cursor.execute(request_to_update_data, (new_price, product_id_to_update))

connection.commit()

SQL query for updating records for MySQL and PostgreSQL:

request_to_update_data = "UPDATE products SET price = %s WHERE product_id = %s"

cursor.execute(request_to_update_data, (new_price, product_id_to_update))

connection.commit()

DELETE Operation

The DELETE operation removes records from the table using the DELETE statement.

Example for SQLite:

product_id_to_delete = 1
request_to_delete_data = "DELETE FROM products WHERE product_id = ?"
cursor.execute(request_to_delete_data, (product_id_to_delete,))

For MySQL and PostgreSQL:

product_id_to_delete = 1
request_to_delete_data = "DELETE FROM products WHERE product_id = %s"
cursor.execute(request_to_delete_data, (product_id_to_delete,))

connection.commit()

As previously mentioned, it is essential to terminate the interaction with the database correctly. After completing all CRUD operations, remember to close the cursor and connection:

cursor.close()
connection.close()

Database Search Optimization

Indexes are data structures associated with values in one or more columns of a table. They provide fast access to specific values, significantly improving query performance. Creating an index adds a data structure to the database, representing a sorted list of unique values of selected columns. This reduces the number of rows to search.

Creating an index for optimizing the search by the product_nm column in the products table:

request_to_index = "CREATE INDEX idx_product_nm ON products (product_nm(255))"
cursor.execute(request_to_index)
connection.commit()

Ensuring Data Integrity

Transactions ensure data integrity. They guarantee that a series of operations are executed atomically, meaning all changes are applied successfully, or none are applied at all. Here’s how to work with transactions in SQLite:

try:
	update_query = "UPDATE products SET price = ? WHERE product_id = ?"
	new_price = 123.456
	product_id_to_update = 1
	cursor.execute(update_query, (new_price, product_id_to_update))
	
	connection.commit()
	
	print("Transaction completed successfully.")

except Exception as e:
	connection.rollback()
	print(f"Error occurred: {str(e)} Transaction is rolled back.")

finally:
	connection.close()

This example uses a try-except-finally block to handle transactions. If any operation in the try block raises an exception, the transaction is rolled back using connection.rollback(). Otherwise, changes are committed with commit().

Automating Reactions to Events

Triggers are a unique type of stored procedures that automatically activate when certain events occur in the database. They automate reactions to data changes, such as inserting new records, updating existing ones, or deleting records.

In SQLite, triggers can be activated for INSERT, UPDATE, and DELETE events. For example, we have an orders table with fields order_id, product_nm, and quantity, and we want to create a trigger that decreases the inventory in the inventory table each time a new order is added:

trigger_request = '''
CREATE TRIGGER decrease_inventory
AFTER INSERT ON orders
BEGIN
     UPDATE inventory
     SET stock_quantity = stock_quantity - NEW.quantity
     WHERE product_nm = NEW.product_nm;
END;
'''

In MySQL, the syntax for a similar trigger is the same as in SQLite, except for the FOR EACH ROW keyword:

trigger_request = '''
CREATE TRIGGER decrease_inventory
AFTER INSERT ON orders
FOR EACH ROW
BEGIN
     UPDATE inventory
     SET stock_quantity = stock_quantity - NEW.quantity
     WHERE product_nm = NEW.product_nm;
END;
'''

In PostgreSQL, the syntax differs slightly. First, create a function for the trigger:

function_request = '''
CREATE OR REPLACE FUNCTION decrease_inventory()
RETURNS TRIGGER AS $$
BEGIN
     UPDATE inventory
     SET stock_quantity = stock_quantity - NEW.quantity
     WHERE product_nm = NEW.product_nm;
     RETURN NEW;
END;
$$ LANGUAGE plpgsql;
'''

Then create the trigger itself:

trigger_request = '''
CREATE TRIGGER decrease_inventory
AFTER INSERT ON orders
FOR EACH ROW
EXECUTE FUNCTION decrease_inventory();
'''

Conclusion

In this guide, we explored the basics of interacting with databases using Python. You now know how to use Python to connect to an SQL database, create tables, and use CRUD operations to manage data effectively. We also covered indexing, transactions, and using triggers for automation.

This guide serves as an introduction to SQL with Python. Next steps might include learning more complex queries, optimizing performance, and using advanced features specific to each DBMS.

Python SQL
17.07.2024
Reading time: 12 min

Similar

Microservices

Database Connection in Python, Go, and JavaScript

Databases are an essential part of almost any project today. Database interactions are especially familiar to system and database administrators, DevOps/SRE professionals, and software developers. While administrators typically deploy one or multiple database instances and configure the necessary connection parameters for applications, developers need to connect directly to the database within their code. This article explores how to connect to databases using different programming languages. Prerequisites We will provide examples for connecting to MySQL, PostgreSQL, Redis, MongoDB, and ClickHouse databases using Python, Go, and JavaScript. To follow this guide, you will need: A database deployed on a server or in the cloud. Installed environments for Python, Go, and JavaScript, depending on your application programming language. Additionally for Python: pip installed. Additionally for JavaScript: Node.js and npm installed. Database Connection in Python MySQL and Python For connecting to MySQL databases, we can use a Python driver called MySQL Connector. Install the driver using pip: pip install mysql-connector-python Initialize a new connection: Import the mysql.connector library and the Error class to handle specific connection errors. Create a function named create_connection, passing the database address (host), user name (user), and user password (password). To establish the connection, define a class called create_connection that receives the variable names containing the database connection details. import mysql.connector from mysql.connector import Error def create_connection(host_name, user_name, user_password): connection = None try: connection = mysql.connector.connect( host="91.206.179.29", user="gen_user", password="m-EE6Wm}z@wCKe" ) print("Successfully connected to MySQL Server!") except Error as e: print(f"The error '{e}' occurred") return connection def execute_query(connection, query): cursor = connection.cursor() try: cursor.execute(query) connection.commit() print("Query executed successfully") except Error as e: print(f"The error '{e}' occurred") connection = create_connection("91.206.179.29", "gen_user", "m-EE6Wm}z@wCKe") Run the script. If everything works correctly, you will see the "Successfully connected to MySQL Server!" message. If any errors occur, the console will display error code and description. Create a new table: Connect to the database using the connection.database class, specifying the name of the database. Note that the database should already exist. To create a table, initialize a variable create_table_query containing the SQL CREATE TABLE query. For data insertion, initialize another variable insert_data_query with the SQL INSERT INTO query. To execute each query, use the execute_query class, which takes the database connection string and the variable containing the SQL query. connection.database = 'test_db' create_table_query = """ CREATE TABLE IF NOT EXISTS users ( id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(100) NOT NULL, age INT NOT NULL ) """ execute_query(connection, create_table_query) insert_data_query = """ INSERT INTO users (name, age) VALUES ('Alice', 30), ('Bob', 25) """ execute_query(connection, insert_data_query) if connection.is_connected(): connection.close() print("Connection closed") Run the script. PostgreSQL and Python Python offers several plugins for connecting to PostgreSQL, but the most popular one is psycopg2, which we will use here. Psycopg2 is one of the most frequently used Python plugins for PostgreSQL connections. One of its key advantages is its support for multithreading which allows you to maintain the database connection across multiple threads. Install psycopg2 using pip (if not already installed): pip install psycopg2-binary Connect to PostgreSQL. Import the Python psycopg2 package and create a function create_new_conn, using the try block. Establish the connection with the psycopg2.connect function, which requires the database name, user name, password, and database address as input. To initialize the connection, use the create_new_conn() function. Here’s the full code example for connecting to a database: import psycopg2 from psycopg2 import OperationalError def create_new_conn(): conn_to_postgres = None while not conn_to_postgres: try: conn_to_postgres = psycopg2.connect( default_db="default_db", default_user="gen_user", password_for_default_user="PasswordForDefautUser9893#", db_address="91.206.179.128" ) print("The connection to PostgreSQL has been successfully established!") except OperationalError as e: print(e) return conn_to_postgres conn_to_postgres = create_new_conn() Run the script: python3 connect_to_postgres.py If successful, you will see the "The connection to PostgreSQL has been successfully established!" message. . Next, create a table named books, which will have three columns. Use the cursor class for SQL expressions, such as creating database objects. If the query involves adding or modifying data, you must call the conn_to_postgres.commit() function afterward to apply the changes. import psycopg2 from psycopg2 import OperationalError def create_new_conn(): conn_to_postgres = None while not conn_to_postgres: try: conn_to_postgres = psycopg2.connect( default_db="default_db", default_user="gen_user", password_for_default_user="PasswordForDefautUser9893#", db_address="91.206.179.128" ) except OperationalError as e: print(e) return conn_to_postgres conn_to_postgres = create_new_conn() cursor = conn_to_postgres.cursor() cursor.execute(""" CREATE TABLE books ( book_id INT PRIMARY KEY NOT NULL, book_name VARCHAR(255) NOT NULL, book_author VARCHAR(255) NOT NULL ) """) conn_to_postgres.commit() print("Table Created successfully") Run the script: python3 create_table.py Now, let’s run INSERT INTO to add a new line: cursor.execute(""" INSERT INTO books (book_id,book_name,book_author) VALUES (1, 'Long Walk to Freedom', 'Nelson_Mandela') """) The full code is below: import psycopg2 from psycopg2 import OperationalError def create_new_conn(): conn_to_postgres = None while not conn_to_postgres: try: conn_to_postgres = psycopg2.connect( default_db="default_db", default_user="gen_user", password_for_default_user="PasswordForDefautUser9893#", db_address="91.206.179.128" ) except OperationalError as e: print(e) return conn_to_postgres conn_to_postgres = create_new_conn() cursor = conn_to_postgres.cursor() cursor.execute(""" INSERT INTO books (book_id,book_name,book_author) VALUES (1, 'Long Walk to Freedom', 'Nelson_Mandela') """) conn_to_postgres.commit() conn_to_postgres.close() print("Data inserted successfully") Run the script: python3 insert-data.py Redis and Python Redis belongs to the class of NoSQL databases, where data is stored in memory rather than on hard drives. It uses a key-value format for data storage. Redis has a wide range of applications, from data storage and caching to serving as a message broker. We will use the redis-py (or simply redis) library for connecting to Redis. Install the Redis library using pip: pip install redis Connecting to a Redis instance: Use a try block structure for connection, specifying the function redis.StrictRedis where you provide the Redis address, port, and user password. import redis try: connect_to_redis_server = redis.StrictRedis( redis_db_host=91.206.179.128, redis_db_port=6379, redis_user_password='PasswordForRedis6379') print connect_to_redis_server connect_to_redis_server.ping() print 'Successfully connected to Redis Server!' except Exception as ex: print 'Error:', ex exit('Failed to connect to Redis server.') Run the script: python3 connect_to_redis.py If successful, you will see a message like "Successfully connected to Redis Server!". Unlike relational databases, Redis stores data in a key-value format. The key uniquely identifies the corresponding value. Use the set method to create a new record. The example below creates a record with the key City and the value Berlin: print('Create new record:', connect_to_redis_server.set("City", "Berlin")) Use the get method to retrieve the value associated with a key: print('Print record using record key:', connect_to_redis_server.get("City")) Use the delete method to remove a record by its key: print('Delete record with key:', connect_to_redis_server.delete("City")) The complete code fragment is below. import redis try: connect_to_redis_server = redis.StrictRedis( redis_db_host=91.206.179.128, redis_db_port=6379, redis_user_password='PasswordForRedis6379') print ('New record created:', connect_to_redis_server.set("City", "Berlin")) print ('Print created record using record key', connect_to_redis_server.get("City")) print ('Delete created record with key :', connect_to_redis_server.delete("City")) except Exception as ex: print ('Error:', ex) MongoDB and Python MongoDB is another widely used NoSQL database that belongs to the document-oriented category. Data is organized as JSON-like documents. To connect to a MongoDB database with Python, the recommended library is PyMongo, which provides a synchronous API. Install the PyMongo plugin: pip3 install pymongo Connect to MongoDB server using the following Python code. Import the pymongo module and use the MongoClient class to specify the database server address. To establish a connection to the MongoDB server, use a try block for error handling: import pymongo connect_to_mongo = pymongo.MongoClient("mongodb://91.206.179.29:27017/") first_db = connect_to_mongo["mongo-db1"] try: first_db.command("serverStatus") except Exception as e: print(e) else: print("Successfully connected to MongoDB Server!") connect_to_mongo.close() Run: python3 connect_mongodb.py If the connection is successfully established, the script will return the message: "Successfully connected to MongoDB Server!" Add data to MongoDB. To add data, you need to create a dictionary. Let's create a dictionary named record1, containing three keys: record1 = { "name": "Alex", "age": 25, "location": "London" } To insert the dictionary data, use the insert_one method in MongoDB. insertrecord = collection1.insert_one(record1) import pymongo connect_to_mongo = pymongo.MongoClient("mongodb://91.206.179.29:27017/") db1 = connect_to_mongo["newdb"] collection1 = db1["userdata"] record1 = { "name": "Alex", "age": 25, "location": "London" } insertrecord = collection1.insert_one(record1) print(insertrecord) Run the script: python3 connect_mongodb.py ClickHouse and Python ClickHouse is a columnar NoSQL database where data is stored in columns rather than rows. It is widely used for handling analytical queries. Install the ClickHouse driver for Python. There is a dedicated plugin for ClickHouse called clickhouse-driver. Install the driver using the pip package manager: pip install clickhouse-driver Connect to ClickHouse. To initialize a connection with ClickHouse, you need to import the Client class from the clickhouse_driver library. To execute SQL queries, use the client.execute function. You also need to specify the engine. For more details on supported engines in ClickHouse, you can refer to the official documentation. We'll use the default engine, MergeTree. Next, create a new table called users and insert two columns with data. To list the data to be added to the table, use the tuple data type. After executing the necessary queries, make sure to close the connection to the database using the client.disconnect() method. The final code will look like this: from clickhouse_driver import Client client = Client(host=91.206.179.128', user='root', password='P@$$w0rd123', port=9000) client.execute(''' CREATE TABLE IF NOT EXISTS Users ( id UInt32, name String, ) ENGINE = MergeTree() ORDER BY id ''') data = [ (1, 'Alice'), (2, 'Mary') ] client.execute('INSERT INTO Users (id, name) VALUES', data) result = client.execute('SELECT * FROM Users') for row in result: print(row) client.disconnect() Database Connection in Go Go is one of the youngest programming languages, developed in 2009 by Google.  It is widely used in developing microservice architectures and network utilities. For example, services like Docker and Kubernetes are written in Go. Go supports integrating all popular databases, including PostgreSQL, Redis, MongoDB, MySQL, ClickHouse, etc. MySQL and Go For working with the MySQL databases in Go, use the go-sql-driver/mysql driver. Create a new directory for storing project files and navigate into it: mkdir mysql-connect && cd mysql-connect Create a go.mod file to store the dependencies: go mod init golang-connect-mysql Download the MySQL driver using the go get command: go get -u github.com/go-sql-driver/mysql Create a new file named main.go. Specify the database connection details in the dsn variable: package main import ( "database/sql" "fmt" "log" _ "github.com/go-sql-driver/mysql" ) func main() { dsn := "root:password@tcp(localhost:3306)/testdb" db, err := sql.Open("mysql", dsn) if err != nil { log.Fatal(err) } defer db.Close() if err := db.Ping(); err != nil { log.Fatal(err) } fmt.Println("Successfully connected to the database!") query := "INSERT INTO users (name, age) VALUES (?, ?)" result, err := db.Exec(query, "Alex", 25) if err != nil { log.Fatal(err) } lastInsertID, err := result.LastInsertId() if err != nil { log.Fatal(err) } fmt.Printf("Inserted data with ID: %d\n", lastInsertID) } PostgreSQL and Go To connect to PostgreSQL, use the pq driver. Before installing the driver, let's prepare our environment. Create a new directory for storing the project files and navigate into it: mkdir postgres-connect && cd postgres-connect Since we will be working with dependencies, we need to create a go.mod file to store them: go mod init golang-connect-postgres Download the pq driver using the go get command: go get github.com/lib/pq Create a new file named main.go. In addition to importing the pq library, it is necessary to add the database/sql library as Go does not come with official database drivers by default. The database/sql library consists of general, independent interfaces for working with databases. It is also important to note the underscore (empty identifier) when importing the pq module: _ "github.com/lib/pq" The empty identifier is used to avoid the "unused import" error, as in this case, we only need the driver to be registered in database/sql. The fmt package is required to output data to the standard output stream, for example, to the console. To open a connection to the database, the sql.Open function is used, which takes the connection string (connStr) and the driver name (postgres). The connection string specifies the username, database name, password, and host address: package main import ( "database/sql" "fmt" "log" _ "github.com/lib/pq" ) func main() { connStr := "user=golang dbname=db_for_golang password=Golanguserfordb0206$ host=47.45.249.146 sslmode=disable" db, err := sql.Open("postgres", connStr) if err != nil { log.Fatal(err) } defer db.Close() err = db.Ping() if err != nil { log.Fatal(err) } fmt.Println("Successfully connected to PostgreSQL!") } Compile and run: go run main.go If everything works correctly, the terminal will display the message Successfully connected to PostgreSQL! Now, let's look at an example of how to insert data into a table.  First, we need to create a table in the database. When using Hostman cloud databases, you can copy the PostgreSQL connection string displayed in the "Connections" section of the Hostman web interface. Make sure that the postgresql-client utility is installed on your device beforehand. Enter the psql shell and connect to the previously created database: \c db_for_golang Create a table named Cities with three fields — city_id, city_name, and city_population: CREATE TABLE Cities ( city_id INT PRIMARY KEY, city_name VARCHAR(45) NOT NULL, city_population INT NOT NULL); Grant full privileges to the created table for the user: GRANT ALL PRIVILEGES ON TABLE cities TO golang; The function db.Prepare is used to prepare data. It specifies the query for insertion in advance. To insert data, use the function stmt.Exec. In Go, it's common to use plain SQL without using the ORM (Object-Relational Mapping) approach. stmt, err := db.Prepare("INSERT INTO Cities(city_id, city_name, city_population) VALUES($1, $2, $3)") if err != nil { log.Fatal(err) } defer stmt.Close() _, err = stmt.Exec(1, "Toronto", 279435) if err != nil { log.Fatal(err) } fmt.Println("Data inserted successfully!") } If all works correctly, you will see: Data inserted successfully! Redis and Go To connect to Redis, you need to use the go-redis driver. Сreate a new directory: mkdir connect-to-redis && cd connect-to-redis Prepare the dependency file: go mod init golang-connect-redis And optimize them: go mod tidy Download the go-redis module: go get github.com/go-redis/redis/v8 To connect to Redis, use the redis.Options function to specify the address and port of the Redis server. Since Redis does not use authentication by default, you can leave the Password field empty and use the default database (database 0): package main import ( "context" "fmt" "log" "github.com/go-redis/redis/v8" ) func main() { rdb := redis.NewClient(&redis.Options{ Addr: "91.206.179.128:6379", Password: "", DB: 0, }) ctx := context.Background() _, err := rdb.Ping(ctx).Result() if err != nil { log.Fatalf("Couldn't connect to Redis: %v", err) } fmt.Println("Successfully connected to Redis!") } You should see the message «Successfully connected to Redis!» MongoDB and Go To work with MongoDB, we'll use the mongo driver. Create a new directory to store the project structure: mkdir connect-to-mongodb && cd connect-to-mongodb Initialize the dependency file: go mod init golang-connect-mongodb Download the mongo library: go get go.mongodb.org/mongo-driver/mongo Connect to MongoDB using the options.Client().ApplyURI method. It takes a connection string such as mongodb://91.206.179.29:27017, where 91.206.179.29 is the MongoDB server address and 27017 is the port for connecting to MongoDB. The options.Client().ApplyURI string is used only for specifying connection data. To check the connection status, you can use another function, client.Ping, which shows the success or failure of the connection: package main import ( "context" "fmt" "log" "time" "go.mongodb.org/mongo-driver/mongo" "go.mongodb.org/mongo-driver/mongo/options" ) func main() { clientOptions := options.Client().ApplyURI("mongodb://91.206.179.29:27017") client, err := mongo.Connect(context.TODO(), clientOptions) if err != nil { log.Fatalf("Couldn't connect to MongoDB server: %v", err) } fmt.Println("successfully connected to MongoDB!") ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) defer cancel() err = client.Ping(ctx, nil) if err != nil { log.Fatalf("Could not ping MongoDB server: %v", err) } fmt.Println("Ping MongoDB server successfully!") } You should see the message: successfully connected to MongoDB!Ping MongoDB server successfully MongoDB uses collections to store data. You can create collections using the .Collection function.  Below, we will create a database called first-database and a collection called first-collection. The collection will have a new document, containing three keys: user-name, user-age, and user-email. collection := client.Database("first-database").Collection("first-collection") document := map[string]interface{}{ "user-name": "Alice", "user-age": 25, "user-email": "[email protected]", } insertResult, err := collection.InsertOne(ctx, document) if err != nil { log.Fatalf("Couldn't insert new document: %v", err) } fmt.Printf("Inserted new document with ID: %v\n", insertResult.InsertedID) if err := client.Disconnect(ctx); err != nil { log.Fatalf("Could not disconnect from MongoDB: %v", err) } fmt.Println("Disconnected from MongoDB!") } If successful, you will see the Inserted new document message with the document ID.  ClickHouse and Go To work with ClickHouse, use the clickhouse-go driver. Create a new directory to store the project files and navigate to it: clickhouse-connect && cd clickhouse-connect Create a go.mod file to store the dependencies: go mod init golang-connect-clickhouse Download the Clickhouse driver using the command: go get github.com/ClickHouse/clickhouse-go/v2 Create a new file named main.go, where you will specify the connection data to ClickHouse. package main import ( "database/sql" "log" "github.com/ClickHouse/clickhouse-go/v2" ) func main() { dsn := "tcp://localhost:9000?username=user1&password=PasswordForuser175465&database=new_db" db, err := sql.Open("clickhouse", dsn) if err != nil { log.Fatal(err) } defer db.Close() if err := db.Ping(); err != nil { log.Fatal(err) } log.Println("Connected to ClickHouse!") } Database Connection in JavaScript In JavaScript, all connections to external services are made using the Node.js platform. Make sure that you have Node.js and the npm package manager installed on your device. MySQL and JavaScript To work with MySQL, use the mysql2 driver. Create a directory where we will store the project files: mkdir js-mysql-connect && cd js-mysql-connect Initialize the project: npm init -y Install the mysql2 library: npm install mysql2 Use the following code to connect to MySQL: const mysql = require('mysql2'); const connection_to_mysql = mysql.createConnection({ host: 'localhost', user: 'root', password: 'PasswordForRoot74463', database: db1, }); connection_to_mysql.connect((err) => { if (err) { console.error('Error connecting to MySQL:', err.message); return; } console.log('Successfully connected to MySQL Server!'); connection_to_mysql.end((endErr) => { if (endErr) { console.error('Error closing the connection_to_mysql:', endErr.message); } else { console.log('Connection closed.'); } }); }); PostgreSQL and JavaScript Connecting to PostgreSQL is done using the pg library. Create a directory where we will store the project files: mkdir js-postgres-connect && cd js-postgres-connect Initialize the project: npm init -y Install the pg library: npm install pg To connect to PostgreSQL, first import the pg library. Then, create a constant where you specify variables for the database address, username, password, database name, and port. Use the new pg.Client class to pass the connection data. We will create a table called cities and add two records into it. To do this, we will use the queryDatabase function, which contains the SQL queries. const pg = require('pg'); const config = { postgresql_server_host: '91.206.179.29', postgresql_user: 'gen_user', postgresql_user_password: 'PasswordForGenUser56467$', postgresql_database_name: 'default_db', postgresql_database_port: 5432, }; const client = new pg.Client(config); client.connect(err => { if (err) throw err; else { queryDatabase(); } }); function queryDatabase() { const query = ` DROP TABLE IF EXISTS cities; CREATE TABLE cities (id serial PRIMARY KEY, name VARCHAR(80), population INTEGER); INSERT INTO cities (name, population) VALUES ('Berlin', 3645000); INSERT INTO cities (name, population) VALUES ('Paris', 2161000); `; client .query(query) .then(() => { console.log('Table created successfully!'); client.end(console.log('Closed client connection')); }) .catch(err => console.log(err)) .then(() => { console.log('Finished execution, exiting now'); process.exit(); }); } Use this command to run the code: node connect-to-postgres.js Redis and JavaScript To work with Redis, use the ioredis library. Create a directory to store the project files: mkdir js-redis-connect && cd js-redis-connect Initialize the project: npm init -y Install the ioredis library: npm install ioredis To connect to Redis, import the ioredis library. Then create a constant named redis and specify the Redis server address. Inserting data, i.e., creating key-value objects, is done using an asynchronous function named setData, which takes two values — key and value, corresponding to the data format of the Redis system. const Redis = require('ioredis'); const redis = new Redis({ host: '91.206.179.29', port: 6379, password: 'UY+p8e?Kxmqqfa', }); async function setData(key, value) { try { await redis.set(key, value); console.log('Data successfully set'); } catch (error) { console.error('Error setting data:', error); } } async function getData(key) { try { const value = await redis.get(key); console.log('Data retrieved'); return value; } catch (error) { console.error('Error getting data:', error); } } (async () => { await redis.select(1); await setData('user', 'alex'); await getData('user'); redis.disconnect(); })(); Run: node connect-to-redis.js MongoDB and JavaScript To work with MongoDB, use the mongodb driver. Create a directory for storing the project files: mkdir js-mongodb-connect && cd js-mongodb-connect Initialize the project: npm init -y Install the mongodb library: npm install mongodb To connect to MongoDB, import the mongodb library. Specify the database address in the constant uri and pass the address into the MongoClient class. const { MongoClient } = require('mongodb'); const uri = "mongodb://91.206.179.29:27017"; const client = new MongoClient(uri, { useNewUrlParser: true, useUnifiedTopology: true }); async function connectToDatabase() { try { await client.connect(); console.log("Successfully connected to MongoDB!"); const database = client.db("myDatabase"); const collection = database.collection("myCollection"); const documents = await collection.find({}).toArray(); console.log("Documents found:", documents); } catch (error) { console.error("Error connecting to MongoDB:", error); } finally { await client.close(); console.log("Connection closed."); } } connectToDatabase(); ClickHouse and JavaScript To work with ClickHouse, use the clickhouse/client driver. Create a directory where we will store the project files: mkdir js-clickhouse-connect && cd js-clickhouse-connect Initialize the project: npm init -y Install the @clickhouse/client library: npm install @clickhouse/client To connect to ClickHouse, use the code below where we set the connection details and execute a simple SQL query that will return the first 10 records from the system table named system.tables: const { ClickHouse } = require('@clickhouse/client'); const client = new ClickHouse({ host: 'http://localhost:8123', username: 'default', password: 'PasswordforDefaultUser45435', database: 'default', }); async function connectAndQuery() { try { console.log('Successfully connected to ClickHouse Server!'); const rows = await client.query({ query: 'SELECT * FROM system.tables LIMIT 10', format: 'JSON', }).then((result) => result.json()); console.log('Query results:', rows); } catch (error) { console.error('Error Successfully connected to ClickHouse Server! or running the query:', error); } finally { console.log('Done.'); } } connectAndQuery(); Conclusion In today's article, we thoroughly explored how to connect to PostgreSQL, Redis, MongoDB, MySQL, and ClickHouse databases using Python, Go, and JavaScript. These languages can be used to create both web applications and microservices that utilize databases in their operation.
18 February 2025 · 23 min to read
Python

How to Parse HTML with Python

Parsing is the automatic search for various patterns (based on pre-defined structures) in text data sources to extract specific information. Although parsing is a broad term, it most commonly refers to the process of collecting and analyzing data from remote web resources. In the Python programming language, you can create programs for parsing data from external websites can using two key tools: Standard HTTP request package External HTML markup processing libraries However, data processing capabilities are not limited to just HTML documents. Thanks to a wide range of external libraries in Python, you can organize parsing for documents of any complexity, whether they are arbitrary text, popular markup languages (e.g., XML), or even rare programming languages. If there is no suitable parsing library available, you can implement it manually using low-level methods that Python provides by default, such as simple string searching or regular expressions. Although, of course, this requires additional skills. This guide will cover how to organize parsers in Python. We will focus on extracting data from HTML pages based on specified tags and attributes. We run all the examples in this guide using Python 3.10.12 interpreter on a Hostman cloud server with Ubuntu 22.04 and Pip 22.0.2 as the package manager. Structure of an HTML Document Any document written in HTML consists of two types of tags: Opening: Defined within less-than (<) and greater-than (>) symbols, e.g., <div>. Closing: Defined within less-than (<) and greater-than (>) symbols with a forward slash (/), e.g., </div>. Each tag can have various attributes, the values of which are written in quotes after the equal sign. Some commonly used attributes include: href: Link to a resource. E.g., href="https://hostman.com". class: The class of an object. E.g., class="surface panel panel_closed". id: Identifier of an object. E.g., id="menu". Each tag, with or without attributes, is an element (object) of the so-called DOM (Document Object Model) tree, which is built by practically any HTML interpreter (parser). This builds a hierarchy of elements, where nested tags are child elements to their parent tags. For example, in a browser, we access elements and their attributes through JavaScript scripts. In Python, we use separate libraries for this purpose. The difference is that after parsing the HTML document, the browser not only constructs the DOM tree but also displays it on the monitor. <!DOCTYPE html> <html> <head> <title>This is the page title</title> </head> <body> <h1>This is a heading</h1> <p>This is a simple text.</p> </body> </html> The markup of this page is built with tags in a hierarchical structure without specifying any attributes: html head title body h1 p Such a document structure is more than enough to extract information. We can parse the data by reading the data between opening and closing tags. However, real website tags have additional attributes that specify both the specific function of the element and its special styling (described in separate CSS files): <!DOCTYPE html> <html> <body> <h1 class="h1_bright">This is a heading</h1> <p>This is simple text.</p> <div class="block" href="https://hostman.com/products/cloud-server"> <div class="block__title">Cloud Services</div> <div class="block__information">Cloud Servers</div> </div> <div class="block" href="https://hostman.com/products/vps-server-hosting"> <div class="block__title">VPS Hosting</div> <div class="block__information">Cloud Infrastructure</div> </div> <div class="block" href="https://hostman.com/services/app-platform"> <div class="block__title">App Platform</div> <div class="block__information">Apps in the Cloud</div> </div> </body> </html> Thus, in addition to explicitly specified tags, the required information can be refined with specific attributes, extracting only the necessary elements from the DOM tree. HTML Data Parser Structure Web pages can be of two types: Static: During the loading and viewing of the site, the HTML markup remains unchanged. Parsing does not require emulating the browser's behavior. Dynamic: During the loading and viewing of the site (Single-page application, SPA), the HTML markup is modified using JavaScript. Parsing requires emulating the browser's behavior. Parsing static websites is relatively simple—after making a remote request, the necessary data is extracted from the received HTML document. Parsing dynamic websites requires a more complex approach. After making a remote request, both the HTML document itself and the JavaScript scripts controlling it are downloaded to the local machine. These scripts, in turn, usually perform several remote requests automatically, loading additional content and modifying the HTML document while viewing the page. Because of this, parsing dynamic websites requires emulating the browser’s behavior and user actions on the local machine. Without this, the necessary data simply won’t load. Most modern websites load additional content using JavaScript scripts in one way or another. The variety of technical implementations of modern websites is so large that they can’t be classified as entirely static or entirely dynamic. Typically, general information is loaded initially, while specific information is loaded later. Most HTML parsers are designed for static pages. Systems that emulate browser behavior to generate dynamic content are much less common. In Python, libraries (packages) intended for analyzing HTML markup can be divided into two groups: Low-level processors: Compact, but syntactically complex packages with a complicated implementation that parse HTML (or XML) syntax and build a hierarchical tree of elements. High-level libraries and frameworks: Large, but syntactically concise packages with a wide range of features to extract formalized data from raw HTML documents. This group includes not only compact HTML parsers but also full-fledged systems for data scraping. Often, these packages use low-level parsers (processors) from the first group as their core for parsing. Several low-level libraries are available for Python: lxml: A low-level XML syntax processor that is also used for HTML parsing. It is based on the popular libxml2 library written in C. html5lib: A Python library for HTML syntax parsing, written according to the HTML specification by WHATWG (The Web Hypertext Application Technology Working Group), which is followed by all modern browsers. However, using high-level libraries is faster and easier—they have simpler syntax and a wider range of functions: BeautifulSoup: A simple yet flexible library for Python that allows parsing HTML and XML documents by creating a full DOM tree of elements and extracting the necessary data. Scrapy: A full-fledged framework for parsing data from HTML pages, consisting of autonomous “spiders” (web crawlers) with pre-defined instructions. Selectolax: A fast HTML page parser that uses CSS selectors to extract information from tags. Parsel: A Python library with a specific selector syntax that allows you to extract data from HTML, JSON, and XML documents. requests-html: A Python library that closely mimics browser CSS selectors written in JavaScript. This guide will review several of these high-level libraries. Installing the pip Package Manager We can install all parsing libraries (as well as many other packages) in Python through the standard package manager, pip, which needs to be installed separately. First, update the list of available repositories: sudo apt update Then, install pip using the APT package manager: sudo apt install python3-pip -y The -y flag will automatically confirm all terminal prompts during the installation. To verify that pip was installed correctly, check its version: pip3 --version The terminal will display the pip version and the installation path: pip 22.0.2 from /usr/lib/python3/dist-packages/pip (python 3.10) As shown, this guide uses pip version 22.0.2. Installing the HTTP Requests Package Usually, the default Python interpreter includes the Requests package, which allows making requests to remote servers. We will use it in the examples of this guide. However, in some cases, it might not be installed. Then, you can manually install requests via pip: pip install requests If the system already has it, you will see the following message in the terminal: Requirement already satisfied: requests in /usr/lib/python3/dist-packages (2.25.1) Otherwise, the command will add requests to the list of available packages for import in Python scripts. Using BeautifulSoup To install BeautifulSoup version 4, use pip: pip install beautifulsoup4 After this, the library will be available for import in Python scripts. However, it also requires the previously mentioned low-level HTML processors to work properly. First, install lxml: pip install lxml Then install html5lib: pip install html5lib In the future, you can specify one of these processors as the core parser for BeautifulSoup in your Python code. Create a new file in your home directory: nano bs.py Add the following code: import requests from bs4 import BeautifulSoup # Request to the website 'https://hostman.com' response = requests.get('https://hostman.com') # Parse the HTML content of the page using 'html5lib' parser page = BeautifulSoup(response.text, 'html5lib') # Extract the title of the page pageTitle = page.find('title') print(pageTitle) print(pageTitle.string) print("") # Extract all <a> links on the page pageParagraphs = page.find_all('a') # Print the content of the first 3 links (if they exist) for i, link in enumerate(pageParagraphs[:3]): print(link.string) print("") # Find all div elements with a class starting with 'socials--' social_links_containers = page.find_all('div', class_=lambda c: c and c.startswith('socials--')) # Collect the links from these divs for container in social_links_containers: links = container.find_all('a', href=True) for link in links: href = link['href'] # Ignore links related to Cloudflare's email protection if href.startswith('/cdn-cgi/l/email-protection'): continue print(href) Now run the script: python bs.py This will produce the following console output: <title>Hostman - Cloud Service Provider with a Global Cloud Infrastructure</title> Hostman - Cloud Service Provider with a Global Cloud Infrastructure Partners Tutorials API https://wa.me/35795959804 https://twitter.com/hostman_com https://www.facebook.com/profile.php?id=61556075738626 https://github.com/hostman-cloud https://www.linkedin.com/company/hostman-inc/about/ https://www.reddit.com/r/Hostman_com/ Of course, instead of html5lib, you can specify lxml: page = BeautifulSoup(response.text, 'lxml') However, it is best to use the html5lib library as the processor. Unlike lxml, which is specifically designed for working with XML markup, html5lib has full support for modern HTML5 standards. Despite the fact that the BeautifulSoup library has a concise syntax, it does not support browser emulation, meaning it cannot dynamically load content. Using Scrapy The Scrapy framework is implemented in a more object-oriented manner. In Scrapy, website parsing is based on three core entities: Spiders: Classes that contain information about parsing details for specified websites, including URLs, element selectors (CSS or XPath), and page browsing mechanisms. Items: Variables for storing extracted data, which are more complex forms of Python dictionaries with a special internal structure. Pipelines: Intermediate handlers for extracted data that can modify items and interact with external software (such as databases). You can install Scrapy through the pip package manager: pip install scrapy After that, you need to initialize a parser project, which creates a separate directory with its own folder structure and configuration files: scrapy startproject parser Now, you can navigate to the newly created directory: cd parser Check the contents of the current directory: ls It has a general configuration file and a directory with project source files: parser scrapy.cfg Move to the source files directory: cd parser If you check its contents: ls You will see both special Python scripts, each performing its function, and a separate directory for spiders: __init__.py items.py middlewares.py pipelines.py settings.py spiders Let's open the settings file: nano settings.py By default, most parameters are commented out with the hash symbol (#). For the parser to work correctly, you need to uncomment some of these parameters without changing the default values specified in the file: USER_AGENT ROBOTSTXT_OBEY CONCURRENT_REQUESTS DOWNLOAD_DELAY COOKIES_ENABLED Each specific project will require a more precise configuration of the framework. You can find all available parameters in the official documentation. After that, you can generate a new spider: scrapy genspider hostmanspider hostman.com After running the above command, the console should display a message about the creation of a new spider: Created spider ‘hostmanspider' using template 'basic' in module: parser.spiders.hostmanspider Now, if you check the contents of the spiders directory: ls spiders You will see the empty source files for the new spider: __init__.py  __pycache__  hostmanspider.py Let's open the script file: nano spiders/hostmanspider.py And fill it with the following code: from pathlib import Path # Package for working with files import scrapy # Package from the Scrapy framework class HostmanSpider(scrapy.Spider): # Spider class inherits from the Spider class name = 'hostmanspider' # Name of the spider def start_requests(self): urls = ["https://hostman.com"] for url in urls: yield scrapy.Request(url=url, callback=self.parse) def parse(self, response): open("output", "w").close() # Clear the content of the 'output' file someFile = open("output", "a") # Create (or append to) a new file dataTitle = response.css("title::text").get() # Extract the title from the server response using a CSS selector dataA = response.css("a").getall() # Extract the first 3 links from the server response using a CSS selector someFile.write(dataTitle + "\n\n") for i in range(3): someFile.write(dataA[i] + "\n") someFile.close() You can now run the created spider with the following command: scrapy crawl hostmanspider Running the spider will create an output file in the current directory. To view the contents of this file, you can use: cat output The content of this file will look something like this: Hostman - Cloud Service Provider with a Global Cloud Infrastructure <a href="/partners/" itemprop="url" class="body4 medium nd-link-primary"><span itemprop="name">Partners</span></a> <a href="/tutorials/" itemprop="url" class="body4 medium nd-link-primary"><span itemprop="name">Tutorials</span></a> <a href="/api-docs/" itemprop="url" class="body4 medium nd-link-primary"><span itemprop="name">API</span></a> You can find more detailed information on extracting data using selectors (both CSS and XPath) can be found in the official Scrapy documentation. Conclusion Data parsing from remote sources in Python is made possible by two main components: A package for making remote requests Libraries for parsing data These libraries can range from simple ones, suitable only for parsing static websites, to more complex ones that can emulate browser behavior and, consequently, parse dynamic websites. In Python, the most popular libraries for parsing static data are: BeautifulSoup Scrapy These tools, similar to JavaScript functions (e.g., getElementsByClassName() using CSS selectors), allow us to extract data (attributes and text) from the DOM tree elements of any HTML document.
11 February 2025 · 13 min to read
Python

Dunder Methods in Python: Purpose and Application

Dunder methods (double underscore methods) are special methods in the Python programming language that are surrounded by two underscores at the beginning and end of their names. This naming convention is intended to prevent name conflicts with other user-defined functions. Each dunder method corresponds to a specific Python language construct that performs a particular data transformation operation. Here are some commonly used dunder methods: __init__(): Initializes an instance of a class, acting as a constructor. __repr__(): Returns a representative value of a variable in Python expression format. __eq__(): Compares two variables. Whenever the Python interpreter encounters any syntactic construct, it implicitly calls the corresponding dunder method with all necessary arguments. For example, when Python encounters the addition symbol in the expression a + b, it implicitly calls the dunder method a.__add__(b), where the addition operation is performed internally. Thus, dunder methods implement the core mechanics of the Python language. Importantly, these methods are accessible not only to the interpreter but also to ordinary users. Moreover, you can override the implementation of each dunder method within custom classes. In this guide, we'll explore all existing dunder methods in Python and provide examples. The demonstrated scripts were run using the Python 3.10.12 interpreter installed on a Hostman cloud server running Ubuntu 22.04. To run examples from this article, you need to place each script in a separate file with a .py extension (e.g., some_script.py). Then you can execute the file with the following command: python some_script.py Creation, Initialization, and Deletion Creation, initialization, and deletion are the main stages of an object's lifecycle in Python. Each of these stages corresponds to a specific dunder method. Syntax Dunder Method Result Description a = C(b, c) C.__new__(b, c) C Creation a = C(b, c) C.__init__(b, c) None Initialization del a a.__del__() None Deletion The general algorithm for these methods has specific characteristics: Creation: The __new__() method is called with a set of arguments. The first argument is the class of the object (not the object itself). This name is not regulated and can be arbitrary. The remaining arguments are the parameters specified when creating the object in the calling code. The __new__() method must return a class instance — a new object. Initialization: Immediately after returning a new object from __new__(), the __init__() method is automatically called. Inside this method, the created object is initialized. The first argument is the object itself, passed as self. The remaining arguments are the parameters specified when creating the object. The first argument name is regulated and must be the keyword self. Deletion: Explicit deletion of an object using the del keyword triggers the __del__() method. Its only argument is the object itself, accessed through the self keyword. Thanks to the ability to override dunder methods responsible for an object's lifecycle, you can create unique implementations for custom classes: class Initable: instances = 0 # class variable, not an object variable def __new__(cls, first, second): print(cls.instances) cls.instances += 1 return super().__new__(cls) # call the base class's object creation method with the current class name as an argument def __init__(self, first, second): self.first = first # object variable self.second = second # another object variable def __del__(self): print("Annihilated!") inited = Initable("Initialized", 13) # output: 0 print(inited.first) # output: Initialized print(inited.second) # output: 13 del inited # output: Annihilated! Thanks to these hook-like methods, you can manage not only the internal state of an object but also external resources. Comparison Objects created in Python can be compared with one another, yielding either a positive or negative result. Each comparison operator is associated with a corresponding dunder method. Syntax Dunder Method Result Description a == b or a is b a.__eq__(b) bool Equal a != b a.__ne__(b) bool Not equal a > b a.__gt__(b) bool Greater than a < b a.__lt__(b) bool Less than a >= b a.__ge__(b) bool Greater than or equal a <= b a.__le__(b) bool Less than or equal hash(a) a.__hash__() int Hashing In some cases, Python provides multiple syntactic constructs for the same comparison operations. We can replace each of these operations by the corresponding dunder method: a = 5 b = 6 c = "This is a regular string" print(a == b) # Output: False print(a is b) # Output: False print(a.__eq__(b)) # Output: False print(a != b) # Output: True print(a is not b) # Output: True print(a.__ne__(b)) # Output: True print(not a.__eq__(b)) # Output: True print(a > b) # Output: False print(a < b) # Output: True print(a >= b) # Output: False print(a <= b) # Output: True print(a.__gt__(b)) # Output: False print(a.__lt__(b)) # Output: True print(a.__ge__(b)) # Output: False print(a.__le__(b)) # Output: True print(hash(a)) # Output: 5 print(a.__hash__()) # Output: 5 print(c.__hash__()) # Output: 1745008793 The __ne__() method returns the inverted result of the __eq__() method. Because of this, there's often no need to redefine __ne__() since the primary comparison logic is usually implemented in __eq__(). Additionally, some comparison operations implicitly performed by Python when manipulating list elements require hash computation. For this purpose, Python provides the special dunder method __hash__(). By default, any user-defined class already implements the methods __eq__(), __ne__(), and __hash__(): class Comparable: def __init__(self, value1, value2): self.value1 = value1 self.value2 = value2 c1 = Comparable(4, 3) c2 = Comparable(7, 9) print(c1 == c1) # Output: True print(c1 != c1) # Output: False print(c1 == c2) # Output: False print(c1 != c2) # Output: True print(c1.__hash__()) # Example output: -2146408067 print(c2.__hash__()) # Example output: 1076316 In this case, the default __eq__() method compares instances without considering their internal variables defined in the __init__() constructor. The same applies to the __hash__() method, whose results vary between calls. Python's mechanics are designed such that overriding the __eq__() method automatically removes the default __hash__() method: class Comparable: def __init__(self, value1, value2): self.value1 = value1 self.value2 = value2 def __eq__(self, other): if isinstance(other, self.__class__): return self.value1 == other.value1 and self.value2 == other.value2 return False c1 = Comparable(4, 3) c2 = Comparable(7, 9) print(c1 == c1) # Output: True print(c1 != c1) # Output: False print(c1 == c2) # Output: False print(c1 != c2) # Output: True print(c1.__hash__()) # ERROR (method not defined) print(c2.__hash__()) # ERROR (method not defined) Therefore, overriding the __eq__() method requires also overriding the __hash__() method with a new hashing algorithm: class Comparable: def __init__(self, value1, value2): self.value1 = value1 self.value2 = value2 def __eq__(self, other): if isinstance(other, self.__class__): return self.value1 == other.value1 and self.value2 == other.value2 return False def __ne__(self, other): return not self.__eq__(other) def __gt__(self, other): return self.value1 + self.value2 > other.value1 + other.value2 def __lt__(self, other): return not self.__gt__(other) def __ge__(self, other): return self.value1 + self.value2 >= other.value1 + other.value2 def __le__(self, other): return self.value1 + self.value2 <= other.value1 + other.value2 def __hash__(self): return hash((self.value1, self.value2)) # Returns the hash of a tuple of two numbers c1 = Comparable(4, 3) c2 = Comparable(7, 9) print(c1 == c1) # Output: True print(c1 != c1) # Output: False print(c1 == c2) # Output: False print(c1 != c2) # Output: True print(c1 > c2) # Output: False print(c1 < c2) # Output: True print(c1 >= c2) # Output: False print(c1 <= c2) # Output: True print(c1.__hash__()) # Output: -1441980059 print(c2.__hash__()) # Output: -2113571365 Thus, by overriding comparison methods, you can use standard syntactic constructs for custom classes, similar to built-in data types, regardless of their internal complexity. Conversion in Python In Python, we can convert all built-in types from one to another. Similar conversions can be added to custom classes, considering the specifics of their internal implementation. Syntax Dunder Method Result Description str(a) a.__str__() str String bool(a) a.__bool__() bool Boolean int(a) a.__int__() int Integer float(a) a.__float__() float Floating-point number bytes(a) a.__bytes__() bytes Byte sequence complex(a) a.__complex__() complex Complex number By default, we can only convert a custom class to a few basic types: class Convertible: def __init__(self, value1, value2): self.value1 = value1 self.value2 = value2 some_variable = Convertible(4, 3) print(str(some_variable)) # Example output: <__main__.Convertible object at 0x1229620> print(bool(some_variable)) # Output: True However, by overriding the corresponding dunder methods, you can implement conversions from a custom class to any built-in data type: class Convertible: def __init__(self, value1, value2): self.value1 = value1 self.value2 = value2 def __str__(self): return str(self.value1) + str(self.value2) def __bool__(self): return self.value1 == self.value2 def __int__(self): return self.value1 + self.value2 def __float__(self): return float(self.value1) + float(self.value2) def __bytes__(self): return bytes(self.value1) + bytes(self.value2) def __complex__(self): return complex(self.value1) + complex(self.value2) someVariable = Convertible(4, 3) print(str(someVariable)) # output: 43 print(bool(someVariable)) # output: False print(int(someVariable)) # output: 7 print(float(someVariable)) # output: 7.0 print(bytes(someVariable)) # output: b'\x00\x00\x00\x00\x00\x00\x00' print(complex(someVariable)) # output: (7+0j) Thus, implementing dunder methods for conversion allows objects of custom classes to behave like built-in data types, enhancing their completeness and versatility. Element Management in Python Just like lists, we can make any custom class in Python iterable. Python provides corresponding dunder methods for retrieving and manipulating elements. Syntax Dunder Method Description len(a) a.__len__() Length iter(a) or for i in a: a.__iter__() Iterator a[b] a.__getitem__(b) Retrieve element a[b] a.__missing__(b) Retrieve non-existent dictionary item a[b] = c a.__setitem__(b, c) Set element del a[b] a.__delitem__(b) Delete element b in a a.__contains__(b) Check if element exists reversed(a) a.__reversed__() Elements in reverse order next(a) a.__next__() Retrieve next element Even though the internal implementation of an iterable custom class can vary, element management is handled using Python's standard container interface rather than custom methods. class Iterable: def __init__(self, e1, e2, e3, e4): self.e1 = e1 self.e2 = e2 self.e3 = e3 self.e4 = e4 self.index = 0 def __len__(self): len = 0 if self.e1: len += 1 if self.e2: len += 1 if self.e3: len += 1 if self.e4: len += 1 return len def __iter__(self): for i in range(0, self.__len__() + 1): if i == 0: yield self.e1 if i == 1: yield self.e2 if i == 2: yield self.e3 if i == 3: yield self.e4 def __getitem__(self, item): if item == 0: return self.e1 elif item == 1: return self.e2 elif item == 2: return self.e3 elif item == 3: return self.e4 else: raise Exception("Out of range") def __setitem__(self, item, value): if item == 0: self.e1 = value elif item == 1: self.e2 = value elif item == 2: self.e3 = value elif item == 3: self.e4 = value else: raise Exception("Out of range") def __delitem__(self, item): if item == 0: self.e1 = None elif item == 1: self.e2 = None elif item == 2: self.e3 = None elif item == 3: self.e4 = None else: raise Exception("Out of range") def __contains__(self, item): if self.e1 == item: return true elif self.e2 == item: return True elif self.e3 == item: return True elif self.e4 == item: return True else: return False def __reversed__(self): return Iterable(self.e4, self.e3, self.e2, self.e1) def __next__(self): if self.index >=4: self.index = 0 if self.index == 0: element = self.e1 if self.index == 1: element = self.e2 if self.index == 2: element = self.e3 if self.index == 3: element = self.e4 self.index += 1 return element someContainer = Iterable(-2, 54, 6, 13) print(someContainer.__len__()) # output: 4 print(someContainer[0]) # output: -2 print(someContainer[1]) # output: 54 print(someContainer[2]) # output: 6 print(someContainer[3]) # output: 13 someContainer[2] = 117 del someContainer[0] print(someContainer[2]) # output: 117 for element in someContainer: print(element) # output: None, 54, 117, 13 print(117 in someContainer) # output: True someContainerReversed = someContainer.__reversed__() for element in someContainerReversed: print(element) # output: 13, 117, 54, None print(someContainer.__next__()) # output: None print(someContainer.__next__()) # output: 54 print(someContainer.__next__()) # output: 117 print(someContainer.__next__()) # output: 13 print(someContainer.__next__()) # output: None It’s important to understand the difference between the __iter__() and __next__() methods, which facilitate object iteration. __iter__() iterates the object at a given point. __next__() returns an element considering an internal index. A particularly interesting dunder method is __missing__(), which is only relevant in custom classes inherited from the base dictionary type dict. This method allows you to override the default dict behavior when attempting to retrieve a non-existent element: class dict2(dict): def __missing__(self, item): return "Sorry but I don’t exist..." someDictionary = dict2(item1=10, item2=20, item3=30) print(someDictionary["item1"]) # output: 10 print(someDictionary["item2"]) # output: 20 print(someDictionary["item3"]) # output: 30 print(someDictionary["item4"]) # output: Sorry but I don’t exist... Arithmetic Operations Arithmetic operations are the most common type of data manipulation. Python provides corresponding syntactic constructs for performing addition, subtraction, multiplication, and division. Most often, left-handed methods are used, which perform calculations on behalf of the left operand. Syntax Dunder Method Description a + b a.__add__(b) Addition a - b a.__sub__(b) Subtraction a * b a.__mul__(b) Multiplication a / b a.__truediv__(b) Division a % b a.__mod__(b) Modulus a // b a.__floordiv__(b) Floor division a ** b a.__pow__(b) Exponentiation If the right operand does not know how to perform the operation, Python automatically calls a right-handed method, which calculates the value on behalf of the right operand. However, in this case, the operands must be of different types. Syntax Dunder Method Description a + b a.__radd__(b) Addition a - b a.__rsub__(b) Subtraction a * b a.__rmul__(b) Multiplication a / b a.__rtruediv__(b) Division a % b a.__rmod__(b) Modulus a // b a.__rfloordiv__(b) Floor Division a ** b a.__rpow__(b) Exponentiation It is also possible to override in-place arithmetic operations. In this case, dunder methods do not return a new value but modify the existing variables of the left operand. Syntax Dunder Method Description a += b a.__iadd__(b) Addition a -= b a.__isub__(b) Subtraction a *= b a.__imul__(b) Multiplication a /= b a.__itruediv__(b) Division a %= b a.__imod__(b) Modulus a //= b a.__ifloordiv__(b) Floor Division a **= b a.__ipow__(b) Exponentiation By overriding these corresponding dunder methods, you can define specific behaviors for your custom class during arithmetic operations: class Arithmetic: def __init__(self, value1, value2): self.value1 = value1 self.value2 = value2 def __add__(self, other): return Arithmetic(self.value1 + other.value1, self.value2 + other.value2) def __radd__(self, other): return Arithmetic(other + self.value1, other + self.value2) def __iadd__(self, other): self.value1 += other.value1 self.value2 += other.value2 return self def __sub__(self, other): return Arithmetic(self.value1 - other.value1, self.value2 - other.value2) def __rsub__(self, other): return Arithmetic(other - self.value1, other - self.value2) def __isub__(self, other): self.value1 -= other.value1 self.value2 -= other.value2 return self def __mul__(self, other): return Arithmetic(self.value1 * other.value1, self.value2 * other.value2) def __rmul__(self, other): return Arithmetic(other * self.value1, other * self.value2) def __imul__(self, other): self.value1 *= other.value1 self.value2 *= other.value2 return self def __truediv__(self, other): return Arithmetic(self.value1 / other.value1, self.value2 / other.value2) def __rtruediv__(self, other): return Arithmetic(other / self.value1, other / self.value2) def __itruediv__(self, other): self.value1 /= other.value1 self.value2 /= other.value2 return self def __mod__(self, other): return Arithmetic(self.value1 % other.value1, self.value2 % other.value2) def __rmod__(self, other): return Arithmetic(other % self.value1, other % self.value2) def __imod__(self, other): self.value1 %= other.value1 self.value2 %= other.value2 return self def __floordiv__(self, other): return Arithmetic(self.value1 // other.value1, self.value2 // other.value2) def __rfloordiv__(self, other): return Arithmetic(other // self.value1, other // self.value2) def __ifloordiv__(self, other): self.value1 //= other.value1 self.value2 //= other.value2 return self def __pow__(self, other): return Arithmetic(self.value1 ** other.value1, self.value2 ** other.value2) def __rpow__(self, other): return Arithmetic(other ** self.value1, other ** self.value2) def __ipow__(self, other): self.value1 **= other.value1 self.value2 **= other.value2 return self a1 = Arithmetic(4, 6) a2 = Arithmetic(10, 3) add = a1 + a2 sub = a1 - a2 mul = a1 * a2 truediv = a1 / a2 mod = a1 % a2 floordiv = a1 // a2 pow = a1 ** a2 radd = 50 + a1 rsub = 50 - a2 rmul = 50 * a1 rtruediv = 50 / a2 rmod = 50 % a1 rfloordiv = 50 // a2 rpow = 50 ** a2 a1 -= a2 a1 *= a2 a1 /= a2 a1 %= a2 a1 //= a2 a1 **= a2 print(add.value1, add.value2) # output: 14 9 print(sub.value1, sub.value2) # output: -6 3 print(mul.value1, mul.value2) # output: 40 18 print(truediv.value1, truediv.value2) # output: 0.4 2.0 print(mod.value1, mod.value2) # output: 4 0 print(floordiv.value1, floordiv.value2) # output: 0 2 print(pow.value1, pow.value2) # output: 1048576 216 print(radd.value1, radd.value2) # output: 54 56 print(rsub.value1, rsub.value2) # output: 40 47 print(rmul.value1, rmul.value2) # output: 200 300 print(rtruediv.value1, rtruediv.value2) # output: 5.0 16.666666666666668 print(rmod.value1, rmod.value2) # output: 2 2 print(rfloordiv.value1, rfloordiv.value2) # output: 5 16 print(rpow.value1, rpow.value2) # output: 97656250000000000 125000 In real-world scenarios, arithmetic dunder methods are among the most frequently overridden. Therefore, it is good practice to implement both left-handed and right-handed methods simultaneously. Bitwise Operations In addition to standard mathematical operations, Python allows you to override the behavior of custom classes during bitwise transformations. Syntax Dunder Method Description a & b a.__and__(b) Bitwise AND `a b` a.__or__(b) a ^ b a.__xor__(b) Bitwise XOR a >> b a.__rshift__(b) Right Shift a << b a.__lshift__(b) Left Shift Similar to arithmetic operations, bitwise transformations can be performed on behalf of the right operand. Syntax Dunder Method Description a & b a.__rand__(b) Bitwise AND `a b` a.__ror__(b) a ^ b a.__rxor__(b) Bitwise XOR a >> b a.__rrshift__(b) Right Shift a << b a.__rlshift__(b) Left Shift Naturally, bitwise operations can also be performed in-place, modifying the left operand instead of returning a new object. Syntax Dunder Method Description a &= b a.__iand__(b) Bitwise AND `a = b` a.__ior__(b) a ^= b a.__ixor__(b) Bitwise XOR a >>= b a.__irshift__(b) Right Shift a <<= b a.__ilshift__(b) Left Shift By overriding these dunder methods, any custom class can perform familiar bitwise operations on its contents seamlessly. class Bitable: def __init__(self, value1, value2): self.value1 = value1 self.value2 = value2 def __and__(self, other): return Bitable(self.value1 & other.value1, self.value2 & other.value2) def __rand__(self, other): return Bitable(other & self.value1, other & self.value2) def __iand__(self, other): self.value1 &= other.value1 self.value2 &= other.value2 return self def __or__(self, other): return Bitable(self.value1 | other.value1, self.value2 | other.value2) def __ror__(self, other): return Bitable(other | self.value1, other | self.value2) def __ior__(self, other): self.value1 |= other.value1 self.value2 |= other.value2 return self def __xor__(self, other): return Bitable(self.value1 ^ other.value1, self.value2 ^ other.value2) def __rxor__(self, other): return Bitable(other ^ self.value1, other ^ self.value2) def __ixor__(self, other): self.value1 |= other.value1 self.value2 |= other.value2 return self def __rshift__(self, other): return Bitable(self.value1 >> other.value1, self.value2 >> other.value2) def __rrshift__(self, other): return Bitable(other >> self.value1, other >> self.value2) def __irshift__(self, other): self.value1 >>= other.value1 self.value2 >>= other.value2 return self def __lshift__(self, other): return Bitable(self.value1 << other.value1, self.value2 << other.value2) def __rlshift__(self, other): return Bitable(other << self.value1, other << self.value2) def __ilshift__(self, other): self.value1 <<= other.value1 self.value2 <<= other.value2 return self b1 = Bitable(5, 3) b2 = Bitable(7, 2) resultAnd = b1 & b2 resultOr = b1 | b2 resultXor = b1 ^ b2 resultRshift = b1 >> b2 resultLshift = b1 << b2 resultRand = 50 & b1 resultRor = 50 | b2 resultRxor = 50 ^ b1 resultRrshift = 50 >> b2 resultRlshift = 50 << b1 b1 &= b2 b1 |= b2 b1 ^= b2 b1 >>= b2 b1 <<= b2 print(resultAnd.value1, resultAnd.value2) # output: 5 2 print(resultOr.value1, resultAnd.value2) # output: 7 2 print(resultXor.value1, resultAnd.value2) # output: 2 2 print(resultRshift.value1, resultAnd.value2) # output: 0 2 print(resultLshift.value1, resultAnd.value2) # output: 640 2 print(resultRand.value1, resultRand.value2) # output: 0 2 print(resultRor.value1, resultRor.value2) # output: 55 50 print(resultRxor.value1, resultRxor.value2) # output: 55 49 print(resultRrshift.value1, resultRrshift.value2) # output: 0 12 print(resultRlshift.value1, resultRlshift.value2) # output: 1600 400 In addition to operations involving two operands, Python provides dunder methods for bitwise transformations involving a single operand. Syntax Dunder Method Description -a a.__neg__() Negation ~a a.__invert__() Bitwise Inversion +a a.__pos__() Bitwise Positivization The + operator typically does not affect the value of the variable. Many classes override this method to perform alternative transformations. Object Information Extraction Python offers several dunder methods to retrieve additional information about an object. Syntax Dunder Method Description str(a) a.__str__() Returns the object's value repr(a) a.__repr__() Returns the object's representation __str__() returns a user-friendly string representation of the variable’s value. __repr__() returns a more detailed and often code-like representation of the variable, suitable for recreating the original variable via eval(). So, it is important for a custom class to be able to provide additional information about itself. class Human: def __init__(self, name, age): self.name = name self.age = age def __str__(self): return str(self.name + " (" + str(self.age) + " years old)") def __repr__(self): return "Human(" + repr(self.name) + ", " + str(self.age) + ")" someHuman = Human("John", 35) someOtherHuman = eval(repr(someHuman)) print(str(someHuman)) # output: John (35 years old) print(repr(someHuman)) # output: Human('John', 35) print(str(someOtherHuman)) # output: John (35 years old) print(repr(someOtherHuman)) # output: Human('John', 35) Conclusion A distinctive feature of Python dunder methods is using two underscore characters at the beginning and end of the name, which prevents naming conflicts with other user-defined functions. Unlike regular control methods, dunder methods allow you to define unique behavior for a custom class when using standard Python operators responsible for: Arithmetic operations Iteration and access to elements Creation, initialization, and deletion of objects Additional dunder attributes provide auxiliary information about Python program entities, which can simplify the implementation of custom classes.
10 February 2025 · 21 min to read

Do you have questions,
comments, or concerns?

Our professionals are available to assist you at any moment,
whether you need help or are just unsure of where to start.
Email us
Hostman's Support