When working with SQL (Structured Query Language), you may encounter an object called an "index." In this article, we'll discuss indexes and practical examples of working with them.
An index is an object created for one or more columns in a database table to enhance performance, specifically to speed up searching and retrieving the necessary data from the database.
To understand what an index is, consider a real-life analogy — a book. A book has a table of contents or an index, which helps us quickly find the section we need. Without an index, we'd have to search for the section manually, which would take much longer. Database indexing works similarly by enabling fast data retrieval.
An index in SQL is a structure built for table columns and view objects. It consists of keys constructed from one or more columns in a table. The keys are stored in a balanced tree structure (a tree-like structure designed for quick data access). This structure starts from a root node at the top of the hierarchy and continues to leaf nodes at the bottom.
The key advantage of indexes is the speed of data lookup, achieved because the index is built from a structure (a balanced tree) optimized for searching.
It's also important to note that whenever new data is added, or old data is removed, the tree structure of the index is recalculated. This means the more data and indexes stored in a database, the more trees need to be recalculated. For example, if a table has five indexes and 10,000 records, each new entry will trigger the recalculation of all five indexes.
There are several types of indexes:
Unique Index: Ensures all values are unique. Adding a duplicate value will cause an error.
Non-Unique Index: Allows duplicate values.
Simple Index:Built on a single column.
Composite Index: Built on multiple columns, and the column order matters.
B-tree Index: Represented by a root node and leaf nodes.
Partial Index: Created from a subset of table rows based on a specific condition.
There are also clustered and non-clustered indexes. Let's examine these more closely using PostgreSQL 15 as an example.
A clustered index in SQL sorts the data rows in a table and stores the data within the index's leaves. The key feature of a clustered index is that all the values are sorted in a specific order, either ascending or descending. Data in a table is physically sorted only if the table has a clustered index.
With a clustered index, data is physically organized on a disk, speeding up data retrieval when accessed sequentially. However, reorganizing the data can be costly and may require rebuilding. Unlike other indexes, a clustered index is created for the entire table, not just one or more columns. A table can have only one clustered index.
A non-clustered index is applied to non-key columns and does not physically order the data. Instead, it stores the index separately from the actual data. The leaves of a non-clustered index contain only the columns included in the index. This means additional operations are required to retrieve the necessary data. Non-clustered indexes can't be sorted like clustered ones, but multiple non-clustered indexes can be created for a table. You can also add unique constraints to a non-clustered index.
Clustered Index |
Non-Clustered Index |
Sorts and physically stores data according to the sort rule |
Does not physically sort the data; uses pointers to access data |
Data is stored in the leaf nodes of the index |
Does not store data in the leaf nodes |
Occupies more disk space |
Occupies less disk space |
Faster access to data |
Slower access to data |
No additional disk space needed |
Requires additional disk space for index storage |
Increases performance for data retrieval |
Applied only to columns used in joins or queries |
Here are examples using PostgreSQL 15. Let’s create a table named movies
that contains information about films. The table includes two columns: id
(the unique identifier for each film) and title
(the film's name). The SQL code to create this table is as follows:
CREATE TABLE movies (
id SERIAL PRIMARY KEY,
title TEXT NOT NULL
);
To create an index for the id
column, use:
CREATE INDEX cluster_id ON movies (id);
You can then create a clustered index using the following command:
CLUSTER movies USING cluster_id;
This clustered index speeds up data retrieval.
To create a non-clustered index for the title
column:
CREATE INDEX non_clustered_index ON movies (title);
Consider a table orders
that stores information about orders in an online store. We'll create this table with several necessary columns:
CREATE TABLE orders (
order_id INT PRIMARY KEY,
client_id INT,
client_name VARCHAR(100) NOT NULL,
client_address VARCHAR(255) NOT NULL,
client_city VARCHAR(100) NOT NULL,
client_country VARCHAR(100) NOT NULL,
client_ip_address inet
);
To create an index, use the CREATE INDEX
command. The general syntax is:
CREATE INDEX <index_name> ON <table_name> (<column1>, <column2>);
For example, to create an index for the order_id
column in the orders
table:
CREATE INDEX index_for_order ON orders (order_id);
You can create indexes for multiple columns:
CREATE INDEX index_for_order ON orders (order_id, client_id);
To create a unique index:
CREATE UNIQUE INDEX index_for_order ON orders (order_id);
To create a partial index:
CREATE INDEX clients_ip ON orders (client_ip_address)
WHERE NOT (client_ip_address > '10.26.74.1' AND client_ip_address < '10.26.74.2');
You can delete indexes with the DROP INDEX
command:
DROP INDEX index_for_order;
To rename an index:
ALTER INDEX index_for_order RENAME TO new_index_for_order;
While indexes are useful, there are rules to follow for their effective use:
Avoid indexes in small tables.
Avoid indexes in tables with frequent data changes (inserts/updates).
Avoid indexes on columns that will undergo complex data selection queries.
Avoid indexes on columns that frequently contain NULL values.
Use indexes on columns that are frequently searched.
In this article, we explored what an SQL index is and how it is used in databases. Proper use of indexing in SQL can significantly improve the performance of queries in your database.