Sign In
Sign In

SQL COUNT() Function

SQL COUNT() Function
Hostman Team
Technical writer
SQL
13.09.2024
Reading time: 10 min

SQL, also known as the Structured Query Language, is one of the most widely used tools for managing information in relational databases. It offers numerous functions and tools for analyzing and organizing data. One often overlooked but highly useful feature of SQL is the use of counters. The COUNT() function serves as this counter, allowing you to count the number of rows in a table or calculate the number of unique values in a specific column. Despite its apparent simplicity, COUNT() can sometimes pose challenges, especially when dealing with large datasets or unfamiliar data structures.

In this article, we'll explore how to use the COUNT() function in SQL to count rows. Through practical examples, we will see how this function can simplify database analysis. By mastering this material, you'll be able to use COUNT() effectively for data analysis and processing.

The COUNT() Function

A counter in an SQL table, or a "row counter," is a functionality that allows you to count the number of rows in a table that meet a specific condition. This can be useful in various situations, from determining the number of records in a table to more complex analytical queries that count unique records based on certain criteria.

One of the most common uses of the row counter in SQL is through the COUNT() function. This built-in function calculates the number of rows in a table or the number of unique values in a column. It can be applied to a single column or to the entire table.

The syntax for the COUNT() function in SQL is as follows:

SELECT COUNT(column_name)
FROM table_name
WHERE condition;
  • column_name: The name of the column whose rows you want to count.

  • table_name: The name of the table from which the data is taken.

  • condition: An optional condition to filter the rows before counting.

If your goal is to count all rows in a table, regardless of their specific values, you can use the * symbol instead of specifying a column name, as shown below:

SELECT COUNT(*)
FROM table_name;

Usage Examples

Let's look at several examples of how to use the COUNT() operator in SQL.

For this tutorial, we will use the SQLite database management system. You can download the DB Browser for SQLite, a free, open-source program for creating and editing SQLite databases.

Start the DB Browser for SQLite and go to the "SQL" tab, where we will write queries to create tables and count rows.

Counting Total Records in a Table

Suppose you have a table named Purchases that contains order information:

PurchaseID

BuyerID

Product

1

101

Apple

2

102

Banana

3

103

Cherry

4

101

Banana

5

104

Cherry

To create this table, use the following query:

CREATE TABLE Purchases (
    PurchaseID INTEGER PRIMARY KEY,
    BuyerID INTEGER,
    Product TEXT
);

After running the SQL query, go to the "DB Structure" tab to verify that the table was successfully created.

Now, let's populate the table with data by running the following query:

INSERT INTO Purchases (PurchaseID, BuyerID, Product) VALUES (1, 101, 'Apple');
INSERT INTO Purchases (PurchaseID, BuyerID, Product) VALUES (2, 102, 'Banana');
INSERT INTO Purchases (PurchaseID, BuyerID, Product) VALUES (3, 103, 'Cherry');
INSERT INTO Purchases (PurchaseID, BuyerID, Product) VALUES (4, 101, 'Banana');
INSERT INTO Purchases (PurchaseID, BuyerID, Product) VALUES (5, 104, 'Cherry');

Now, if you want to know the total number of orders, you can use a simple COUNT() with the * symbol:

SELECT COUNT(*)
FROM Purchases;

This will count and return the total number of records in the Purchases table. The result of this query will be 5.

Counting Unique Records in a Column

Suppose you have a Buyers table, and you want to know how many unique countries are represented among your customers' addresses:

BuyerID

FirstName

BuyerCountry

101

John

Canada

102

Alice

USA

103

Bob

USA

104

Diego

Mexico

105

Peter

Canada

Create the Buyers table as follows:

CREATE TABLE Buyers (
    BuyerID INTEGER PRIMARY KEY,
    FirstName TEXT,
    BuyerCountry TEXT
);

And fill it with data:

INSERT INTO Buyers (BuyerID, FirstName, BuyerCountry) VALUES (101, 'John', 'Canada');
INSERT INTO Buyers (BuyerID, FirstName, BuyerCountry) VALUES (102, 'Alice', 'USA');
INSERT INTO Buyers (BuyerID, FirstName, BuyerCountry) VALUES (103, 'Bob', 'USA');
INSERT INTO Buyers (BuyerID, FirstName, BuyerCountry) VALUES (104, 'Diego', 'Mexico');
INSERT INTO Buyers (BuyerID, FirstName, BuyerCountry) VALUES (105, 'Peter', 'Canada');

You can see that the BuyerCountry column includes several countries, but some are repeated (e.g., Canada). If you want to count the number of unique countries, you can use COUNT() with the DISTINCT keyword:

SELECT COUNT(DISTINCT BuyerCountry)
FROM Buyers;

This will count and return the number of unique countries in the BuyerCountry column of the Buyers table. The result of this query will be 2.

Counting with a Condition (Data Filtering)

Sometimes, you only want to count rows that meet a certain condition. Orders in this table are made by customers from different countries:

PurchaseID

BuyerID

Product

BuyerCountry

1

101

Apple

Canada

2

102

Banana

USA

3

103

Cherry

USA

4

101

Banana

Canada

5

104

Cherry

Canada

We already have this Purchases table, but it lacks the BuyerCountry column. Let’s add it:

ALTER TABLE Purchases
ADD BuyerCountry TEXT;

And populate it with data:

UPDATE Purchases
SET BuyerCountry = 'Canada'
WHERE BuyerID IN (101, 104);

UPDATE Purchases
SET BuyerCountry = 'USA'
WHERE BuyerID IN (102, 103);

Suppose you want to know how many orders were made by customers from Canada. You can use COUNT() with the WHERE condition:

SELECT COUNT(*)
FROM Purchases
WHERE BuyerCountry = 'Canada';

This SQL query will count and return only the orders made by customers from Canada. The result of this query will be 3.

Counting with Data Grouping (GROUP BY)

When you need to count records within a specific group, the GROUP BY function is used.

Example table Purchases:

PurchaseID

BuyerID

Product

1

101

Apple

2

102

Banana

3

103

Cherry

4

101

Banana

5

104

Cherry

6

101

Apple

Let’s modify the existing Purchases table by adding a new row:

INSERT INTO Purchases (PurchaseID, BuyerID, Product)
VALUES (6, 101, 'Apple');

For example, if you want to find out how many orders each customer has placed, you can run this query:

SELECT BuyerID, COUNT(*)
FROM Purchases
GROUP BY BuyerID;

This query groups the orders by BuyerID and counts the number of orders for each customer. The result will be:

BuyerID

COUNT(*)

101

3

102

1

103

1

104

1

Combining COUNT and AVG

Imagine you have a Ratings table with product ratings by users:

RatingID

ProductID

UserID

Rating

1

101

1

4.0

2

102

2

3.5

3

101

3

4.5

4

103

1

5.0

5

101

2

3.0

6

102

3

4.5

Create the table:

CREATE TABLE Ratings (
    RatingID INT PRIMARY KEY,
    ProductID INT,
    UserID INT,
    Rating DECIMAL(2,1)
);

INSERT INTO Ratings (RatingID, ProductID, UserID, Rating)
VALUES 
(1, 101, 1, 4.0),
(2, 102, 2, 3.5),
(3, 101, 3, 4.5),
(4, 103, 1, 5.0),
(5, 101, 2, 3.0),
(6, 102, 3, 4.5);

To find out how many ratings each product received and their average rating, you can run this query:

SELECT ProductID, COUNT(*), AVG(Rating)
FROM Ratings
GROUP BY ProductID;

The result will be:

ProductID

COUNT(*)

AVG(Rating)

101

3

3.83

102

2

4.0

103

1

5.0

Using COUNT with HAVING

If you want to filter the count results, use HAVING. Suppose you have a large Ratings table, and you want to find which products received more than 100 ratings:

SELECT ProductID, COUNT(*)
FROM Ratings
GROUP BY ProductID
HAVING COUNT(*) > 100;

In this case, the result will be 0, as no product has received more than 100 ratings in the current data. The HAVING clause is used after GROUP BY.

Using COUNT with JOIN

We have two tables: Orders and Customers.

Our Orders table:

OrderID

ProductID

CustomerID

1

201

1

2

202

1

3

203

2

4

204

3

You can create it like this:

CREATE TABLE Orders (
    OrderID INT PRIMARY KEY,
    ProductID INT,
    CustomerID INT
);

INSERT INTO Orders (OrderID, ProductID, CustomerID)
VALUES 
(1, 201, 1),
(2, 202, 1),
(3, 203, 2),
(4, 204, 3);

The Customers table:

CustomerID

FirstName

CustomerCountry

1

John

Canada

2

Jack

USA

3

Diego

Mexico

Create the Customers table:

CREATE TABLE Customers (
    CustomerID INT PRIMARY KEY,
    FirstName VARCHAR(255),
    CustomerCountry VARCHAR(255)
);


INSERT INTO Customers (CustomerID, FirstName, CustomerCountry)
VALUES 
(1, 'John', 'Canada'),
(2, 'Jack', 'USA'),
(3, 'Diego', 'Mexico');

To join the two tables and count how many orders were placed by customers from Canada, run the following query:

SELECT COUNT(*)
FROM Orders
LEFT JOIN Customers ON Customers.CustomerID = Orders.CustomerID
WHERE Customers.CustomerCountry = 'Canada';

This query will return the total number of orders made by customers from Canada.

Sorting Data with ORDER BY

Suppose you have an Employees table with employee details:

EmployeeID

FirstName

LastName

Salary

1

Diego

Alonso

55000

2

Peter

Smith

75000

3

Alexa

Brown

70000

4

James

Smith

80000

5

Maria

Alonso

90000

Create the Employees table:

CREATE TABLE Employees (
    EmployeeID INT PRIMARY KEY,
    FirstName VARCHAR(255),
    LastName VARCHAR(255),
    Salary INT
);

INSERT INTO Employees (EmployeeID, FirstName, LastName, Salary)
VALUES 
(1, 'Diego', 'Alonso', 55000),
(2, 'Peter', 'Smith', 75000),
(3, 'Alexa', 'Brown', 70000),
(4, 'James', 'Smith', 80000),
(5, 'Maria', 'Alonso', 90000);

You want to know how many employees earn more than 60,000, grouped by salary, and sorted in descending order. Run the query:

SELECT Salary, COUNT(*)
FROM Employees
WHERE Salary > 60000
GROUP BY Salary
ORDER BY Salary DESC;

The result will be:

Salary

COUNT(*)

90000

1

80000

1

75000

1

70000

1

Counting with Multiple Conditions (AND/OR)

Suppose you have a Sales table with product sales data:

SaleID

Product

Price

Quantity

1

Apple

20

5

2

Banana

15

10

3

Cherry

10

15

4

Apple

20

20

5

Banana

15

25

Create the Sales table:

CREATE TABLE Sales (
    SaleID INT PRIMARY KEY,
    Product VARCHAR(255),
    Price INT,
    Quantity INT
);


INSERT INTO Sales (SaleID, Product, Price, Quantity)
VALUES 
(1, 'Apple', 20, 5),
(2, 'Banana', 15, 10),
(3, 'Cherry', 10, 15),
(4, 'Apple', 20, 20),
(5, 'Banana', 15, 25);

To find how many times the product Apple was sold in quantities less than 10 or Banana in quantities greater than 20, use this query:

SELECT COUNT(*)
FROM Sales
WHERE (Product = 'Apple' AND Quantity < 10) OR (Product = 'Banana' AND Quantity > 20);

This query will return 2 based on the specified conditions.

Key Points When Using COUNT()

  • Handling NULL Values: COUNT(column_name) excludes rows with NULL values. To include them, use COUNT(*).

  • Using DISTINCT: COUNT() with DISTINCT counts only unique values in a column.

  • GROUP BY: Often used with COUNT() to count the number of rows in each group.

  • Integer Results: COUNT() always returns an integer result, regardless of the argument’s data type.

  • Subqueries: COUNT() can be used in subqueries to count rows meeting specific criteria.

  • Triggers: COUNT() can be used in SQL triggers to count modified rows.

  • Large Data: For large datasets, COUNT() queries can take time, so optimization may be necessary.

  • JOIN Queries: Complex relationships in JOIN queries require a solid understanding of the database structure for accurate counting.

Conclusion

We've covered the main features of the COUNT() function and applied it to practical SQL examples. With this knowledge, you can effectively process and analyze data, regardless of its size or complexity.

Hostman provides a SQL database hosting for your needs.

SQL
13.09.2024
Reading time: 10 min

Similar

SQL

How To Use Nested Queries in SQL

Nested queries, usually referred to as subqueries, are a fundamental feature of SQL that empower users To execute advanced data retrieval and analysis. By embedding one query within another, tasks that might otherwise be challenging or unfeasible utilizing a unique query can be efficiently executed. This tutorial outlines the concept of nested queries through the use of a sample database and explores their various applications for extracting meaningful insights. Creating the Sample Database To demonstrate the potential of nested searches, assume a database called Company. It consists of two primary tables: Employees and Departments. The Employees table contains information about individual employees, while the Departments table provides data about the departments they are associated with. This structured setup serves as the foundation for demonstrating how several types of nested queries can address specific problems. -- Create the database called Company CREATE DATABASE Company ; USE Company ; -- Create the Departments table CREATE TABLE Departments ( department_id INT PRIMARY KEY, department_name VARCHAR(50), location VARCHAR(50) ); -- Insert data into Departments INSERT INTO Departments VALUES (101, 'Sales', 'New York'), (102, 'HR', 'Chicago'), (103, 'IT', 'San Francisco'); -- Create the Employees table CREATE TABLE Employees ( employee_id INT PRIMARY KEY, name VARCHAR(50), department_id INT, salary DECIMAL(10, 2), hire_date DATE, FOREIGN KEY (department_id) REFERENCES Departments(department_id) ); -- Insert data into Employees INSERT INTO Employees VALUES (1, 'Alice', 101, 60000, '2020-01-15'), (2, 'Bob', 102, 55000, '2018-03-22'), (3, 'Charlie', 101, 70000, '2019-11-01'), (4, 'David', 103, 50000, '2021-06-10'), (5, 'Eve', 102, 45000, '2017-07-19'); The tables should look like this: The Departments table The Employees table Applications of Nested Queries Single-Row Subqueries A frequent scenario for single-row subqueries is extracting employees' wages that surpass the company's overall average. In this scenario, an inner query computes the overall average wage, while an outer query retrieves the employees earning above this benchmark. Such queries are particularly effective for leveraging aggregate functions like AVG, MAX, or MIN. -- Select the name and salary of employees SELECT name, salary FROM Employees -- Where the salary is greater than the average salary of all employees WHERE salary > (SELECT AVG(salary) FROM Employees); Multi-Row Subqueries Another practical application involves listing employees who work in departments based in a specific location, such as New York. Here, the inner query identifies the relevant department IDs, and the outer query selects employees linked to these departments. Multi-row subqueries depend on operators like IN, ANY, or ALL to compare sets of values and reach the desired results.. -- Select the name of employees SELECT name FROM Employees -- Where the department ID is in the list of department IDs from the Departments table WHERE department_id IN (SELECT department_id FROM Departments WHERE location = 'New York'); Correlated Subqueries Correlated subqueries support more dynamic comparisons by tying the inner query to each row of the outer query. For locating employees earning more than the average wage within their respective departments, the inner query computes the department-specific average, and the outer query selects employees based on this criterion. While highly flexible, correlated subqueries could be computationally intensive. SELECT name FROM Employees e1 -- Where the salary is greater than the average salary of employees in the same department WHERE salary > (SELECT AVG(salary) FROM Employees e2 WHERE e1.department_id = e2.department_id); Subqueries in the FROM Clause Subqueries can be employed in the FROM clause to generate temporary result sets, often referred to as derived tables. For example, locating departments with average salaries above $50,000 entails calculating department-level salary averages in the inner query and filtering the results in the outer query. This approach is particularly useful for organizing intermediate data before applying further analysis. SELECT department_name, avg_salary FROM (SELECT department_id, AVG(salary) AS avg_salary FROM Employees GROUP BY department_id) AS avg_table -- Join the average salary table with the Departments table on department ID JOIN Departments ON avg_table.department_id = Departments.department_id -- Filter the results to include only departments with an average salary greater than 50,000 WHERE avg_salary > 50000; Data Validation with Nested Queries Nested queries are valuable for validating data integrity. For example, identifying employees associated with non-existent departments involves comparing employee department IDs against a list of valid IDs retrieved by the inner query. This technique helps ensure referential accuracy and detect potential anomalies in the data. SELECT name FROM Employees WHERE department_id NOT IN (SELECT department_id FROM Departments); Conditional Logic with Subqueries Combining subqueries with conditional logic allows for more nuanced insights. For example, to identify departments with at least one employee earning more than $60,000, the inner query selects relevant department IDs based on salary criteria, and the outer query gathers the related department names. This method highlights meaningful relationships between tables through filtering and comparison. SELECT DISTINCT department_name FROM Departments WHERE department_id IN ( -- Select the department ID from the Departments table SELECT department_id FROM Employees WHERE salary > 60000 ); Best Practices for Using Nested Queries Optimize for Performance: When working with huge datasets, evaluate the performance of nested searches and consider indexing commonly used columns to increase efficiency. Simplify Complex Queries: Prevent excessive nesting by leveraging common table expressions (CTEs) or temporary tables, which improve readability and simplify debugging. Validate Inner Queries: Run inner queries independently to ensure they produce the expected results before integrating them into outer queries. Utilize Joins Where Possible: In some scenarios, joins can achieve similar outcomes as nested queries but with better performance. For instance, filtering employees in specific departments can often be implemented using joins. Reduce Correlated Subqueries: Since correlated subqueries execute for each row in the outer query, consider replacing them with joins or CTEs to improve performance. Conclusion Nested queries are a versatile tool in SQL, offering solutions to sophisticated data retrieval challenges through advanced filtering, aggregation, and comparison techniques. Using the Company database as a reference, this discussion has showcased the utility of various types of nested queries in solving real-world problems. By practicing these techniques and adhering to best practices, you can enhance your SQL proficiency and craft efficient, maintainable queries. Hostman provides pre-configured and ready-to-use cloud SQL databases.
25 December 2024 · 6 min to read
SQL

SQL Constraints

When working with SQL tables, you'll often need to set constraints on the data types stored in a specific table. For instance, if you have a table with employee data, it's logical that some fields should not contain null values. You can apply such a constraint to the SQL values with a simple command. You can also require that entered values be unique or that data be checked against certain conditions. In this article, we'll look at how to do this and cover all possible types of constraints, but first, let's start with some terminology. What Are SQL Constraints? An SQL constraint is a rule that we apply to fields in SQL, determining which values are allowed and which are not. After adding a constraint, the program will check whether it's possible to insert, update, or delete data in the table based on the user-defined constraints. If not, the operation will not be executed, and the program will return an error. Now, let's explore all possible types of constraints in SQL databases, and for clarity, we'll provide examples that could be practically useful for you. Adding SQL Constraints You can create SQL constraints using the following commands: PRIMARY KEY, FOREIGN KEY, UNIQUE, CHECK, and NOT NULL. NOT NULL Constraint The NOT NULL constraint ensures that a column must have a value for every record, meaning the value cannot be null. This prevents empty values from being stored in the column. Let's create a table with a column that has this constraint: CREATE TABLE Countries (Country VARCHAR(46) NOT NULL,Capital VARCHAR(46)) Here, we allow the name of the country's capital to be omitted, but the name of the country must always be provided. Let's try to add a record that violates this rule: INSERT INTO Countries VALUES (null, 'Madrid') It will result in this error: Column 'Country' cannot be null However, this record will not trigger an error because we did not prohibit leaving the capital name (Capital) column empty: INSERT INTO Countries VALUES ('Spain', null) The NOT NULL constraint can be useful for columns with contact information when we need to require the user to enter their email address or phone number, for example. Therefore, such mandatory fields often use the NOT NULL constraint to ensure that the user enters a specific value: CREATE TABLE Subscribers (SubscriberName VARCHAR(46) NOT NULL,SubscriberContact VARCHAR(46) NOT NULL,) In this case, we require users to obligatorily enter their name and email address, setting a 64-character limit for each field in the table. Specifying character limits for certain fields can also be useful to prevent the addition of invalid data. This operation is often applied for efficiency so the database size is not unnecessarily inflated. UNIQUE Constraint This constraint’s name fully reflects its essence. The UNIQUE constraint ensures that no two values in the specified column will be the same. Let’s take a look at a table where UNIQUE is used: CREATE TABLE Workers1 (WorkerName VARCHAR(46) NOT NULL,WorkerDate DATE,WorkerContact INTEGER UNIQUE) We created an employee table, where we will add the employee's name (the field cannot be empty, as we have already set the familiar NOT NULL constraint), the hire date (in date format, indicated by the DATE data type), and the phone number. The phone number must be unique, as indicated by the UNIQUE constraint. Let’s insert the following data into our table: INSERT INTO Workers1 VALUES ('John Smith', DATE '2018-05-10', 375123456789) Now, when trying to insert a row with the same phone number: INSERT INTO Workers1 VALUES ('James Smith', DATE '2020-06-11', 375123456789) The program will throw an error: Duplicate entry 375123456789 for key 'uniqueconstraint.WorkerContact' The UNIQUE constraint is ideal for columns that should not contain duplicate values. For example, each of us has a unique social security number. Therefore, if a table contains a column for SSN, it should use the UNIQUE constraint. This is necessary to avoid two people having the same numbers, which could be inserted by mistake or intentionally. CHECK Constraint The CHECK constraint is used to validate values based on a specific condition. Consider the following example: CREATE TABLE Customers1 (CustomerName1 VARCHAR(46),CustomerName2 VARCHAR(46),CustomerEmail VARCHAR(56),CustomerAge INTEGER CHECK (CustomerAge>17)) We have set an age constraint where the age must be greater than 17. Now, let’s see what happens when a customer enters the following data: INSERT INTO Customers1 VALUES ('John', 'Smith, '[email protected]', 17) That’s what we’ll see: Check constraint 'checkconstraint_chk_1' is violated The CHECK constraint can be used to implement custom rules. For example, if a table should only store data for adults, we could use the CHECK constraint for the CustomerAge column (CustomerAge > 17, as shown in the example above). Another example: if the table should only store data for Cyprus citizens, we could use CHECK for a new column CustomerCountry: CHECK (CustomerCountry = 'Cyprus'). PRIMARY KEY Constraint PRIMARY KEY is one of the table key constraints in SQL, specifically the primary key. It is used to create an identifier that corresponds to each row in the table. Additionally, a PRIMARY KEY can only apply to one column in a table (since it serves as the identifier). Therefore, each PRIMARY KEY value must be unique, and null values are not allowed in the column defined by the PRIMARY KEY. To better understand this, let’s look at the following example: CREATE TABLE Workers2 (id INTEGER PRIMARY KEY,WorkerName1 VARCHAR(46),WorkerName2 VARCHAR(46),WorkerAge INTEGER CHECK (WorkerAge>17)) As we can see, the PRIMARY KEY constraint allows us to assign an employee ID, so that each record can be referenced using a unique numeric key. Also, notice the familiar CHECK constraint in the age column. FOREIGN KEY constraint A FOREIGN KEY constraint creates a reference to a PRIMARY KEY from another table. Therefore, a column with a FOREIGN KEY points to the PRIMARY KEY column from another table, linking the current table to it through this constraint. To better understand what this key does, let’s look at an example of a FOREIGN KEY constraint linked to the PRIMARY KEY from the previously created table: CREATE TABLE WorkersTaxes (WorkerTax INTEGER,Worker_id INTEGER,FOREIGN KEY (Worker_id) REFERENCES Workers2(id)) So, we needed to create a table for calculating workers' taxes. To link this table (WorkersTaxes) with the workers' table (Workers2), we used a FOREIGN KEY reference, which identifies workers based on the PRIMARY KEY from the Workers2 table. This way, we achieved data consistency, and now each employee can easily be identified in both tables by the linked keys. Other constraints We should also note that the SQL Standard sometimes includes DEFAULT as a constraint; however, DEFAULT does not restrict the type of data entered, so technically it is not considered a constraint. Nevertheless, it is important to mention here because it enables a crucial function: setting default values when users do not input data. This can be useful, for example, to avoid potential errors when data is not entered. Let’s consider the following example: CREATE TABLE Customers2 (CustomerName1 VARCHAR(46) NOT NULL,CustomerName2 VARCHAR(46) NOT NULL,CustomerAge INTEGER DEFAULT 18,) Now, if the customer does not provide an age, it will be automatically set. And we required the customer to enter their first and last name using the already familiar NOT NULL constraint. We hope you now understand how to use each SQL constraint and the benefits they offer. Good luck with your work!
13 December 2024 · 6 min to read
SQL

How to Select Data in SQL

In the modern world, where information is becoming an increasingly valuable resource, databases (DBs) remain an integral part of any information system, and the ability to retrieve data from them with maximum efficiency becomes a decisive factor in successfully working with these systems. SQL (Structured Query Language) is a specialized programming language for managing records stored in relational databases. Within SQL, there are many operators and methods that allow developers to retrieve the required information from a DB. This article is a practical guide for those who want to learn how to select data from an SQL table. In this guide, we will explore the syntax of the SELECT statement, learn how to filter data using WHERE, and examine how to aggregate data using GROUP BY and HAVING. Basics of the SELECT Statement SQL, being an incredibly flexible language for managing data, offers many tools for working with information stored in databases. One of the most important and widely used tools is the SELECT statement, which allows users to retrieve information from a DB. This statement allows us to select the specific columns from a table and apply various operations to the data. The syntax of the SELECT statement is simple and easy to understand. It begins with the keyword SELECT, followed by a list of columns from which we will retrieve data and the table name from which we will extract the data. Here’s what it looks like in practice: SELECT field1, field2FROM data_table; In this example, field1 and field2 are the specific columns we want to retrieve, and data_table is the name of the table from which we want to fetch the data. You can use the SELECT statement in many different ways. For example, if the task is to select all columns from a particular table, we can use the * symbol, which serves as a wildcard for all columns: SELECT * FROM StaffMembers; This query will return all the data contained in the StaffMembers table. In addition, we can use SELECT to retrieve only unique values from a specific column, excluding duplicate entries, which is especially useful when analyzing data: SELECT DISTINCT DivisionIDFROM StaffMembers; In this example, the query returns a list of unique DivisionID values from the StaffMembers table, removing all duplicate entries. The SELECT statement also allows the use of various aggregation functions, such as COUNT, SUM, AVG, and others. These functions are key for performing aggregate operations that help analyze large volumes of data to obtain totals, averages, or other types of aggregate statistics. For example, we can use the COUNT function to count the number of rows in a table: SELECT COUNT(StaffID)FROM StaffMembers; This query will return the total number of employees. Similarly, we can use other aggregation functions to calculate sums, averages, and other aggregate statistics for the data. Another useful operator is ORDER BY, which orders the results of a query according to specific criteria. This operator allows us to sort data either in ascending (ASC) or descending (DESC) order. If we do not specify the sort order explicitly, ascending order will be used by default. Here's how it looks in practice: SELECT *FROM StaffMembersORDER BY Surname DESC; In this example, the query results will be presented in sorted order, where the data will be sorted by employees' surnames in reverse alphabetical order, from the last name in the alphabet to the first. The SELECT statement plays an important role in SQL as it determines which specific data will be included in the query results. It can be used in conjunction with other operators, so let’s move on to the next key SQL query component—the WHERE clause, which allows us to set specific conditions for data selection. Using WHERE to Filter Data The WHERE clause in SQL provides data filtering based on specified conditions, allowing you to retrieve, update, or delete only the data that meets certain criteria. Without the WHERE clause, we would be forced to extract all data from the table and then manually filter it, for example, in an application, to perform specific tasks. This would be highly inefficient, especially for large databases. The WHERE clause can be used with various operators such as equality (=), inequality (<>), greater than (>), less than (<), greater than or equal to (>=), less than or equal to (<=), as well as more specialized operators like BETWEEN, which allows you to specify a range of values, LIKE, designed for pattern matching, and IN, which allows you to select data from a specific set. Let’s look at some examples of using WHERE to filter data. Using the WHERE clause with the equality operator (=): SELECT *FROM StaffMembers WHERE StaffID = 123456; In this case, the equality operator selects the record where the employee ID matches the number 123456. This is the simple equivalent of the equality operator in mathematics. Using WHERE with the greater than (>) or less than (<) operators: SELECT *FROM StaffMembers WHERE Wage > 60000; Here, the > operator is used, which filters out unnecessary data and returns information about employees whose wage exceeds 60,000. This operator can be useful when searching for records with values above or below a certain threshold. Using WHERE with BETWEEN: SELECT *FROM StaffMembers WHERE Wage BETWEEN 60000 AND 80000; The BETWEEN operator allows you to select records that fall within a specified range of values. In this case, it selects all employees whose salary is between 60,000 and 80,000, inclusive. This is useful when you need to extract a specific range of values. Using WHERE with LIKE and Wildcard Characters: SELECT *FROM StaffMembers WHERE StaffName LIKE '%ja%'; The LIKE operator is used for pattern matching. In SQL databases, two wildcard characters can represent patterns: % replaces zero or more characters, and _ replaces exactly one character. In this specific example, the query returns all records from the StaffMembers table where the staff names contain "ja." This approach is often used when the exact value is not known, or when multiple matches need to be found. These are just a few examples of the possibilities with WHERE in SQL. The variety of combinations and operators makes it a powerful tool when working with data. Next, we will look at the AND, OR, and NOT operators, which are commonly used together with WHERE to create more complex queries for databases. Using the AND, OR, and NOT Operators AND, OR, and NOT are key logical operators in SQL. They are used to combine or invert conditions in SQL operators such as WHERE, HAVING, and others. The AND operator is used to create a query that returns true only when both conditions being compared are true. Let’s consider an example: SELECT * FROM StaffMembers WHERE Wage > 60000 and ExperienceYears > 3; In this case, the AND operator links two selection criteria: a wage greater than 60,000 and more than 3 years of experience. The result of this query will be records from the table that satisfy both conditions simultaneously. The OR operator returns true if at least one of the conditions is true: SELECT * FROM StaffMembers WHERE Division = 'Production' OR Division = 'Advertising'; Here, the OR operator connects two selection conditions. The query will return records from the StaffMembers table where the employee belongs to either the 'Production' or 'Advertising' department. The NOT operator inverts the logical value of a condition, returning true if the condition is false, and false if the condition is true: SELECT * FROM StaffMembers WHERE NOT (Division = 'HR'); In this query, the NOT operator inverts the condition Division = 'HR'. The query will return all rows from the StaffMembers table where the department is not 'HR'. This allows you to create queries that exclude certain categories of data. These operators can be combined in any way to create complex conditions. For example: SELECT * FROM StaffMembers WHERE (Division = 'Production' OR Division = 'Advertising') AND ExperienceYears > 5; Here, the AND and OR operators are combined to create a more complex selection condition. The query will return only those records from the StaffMembers table where the department is either 'Production' or 'Advertising' and the employee has more than five years of experience. Aggregating Data with GROUP BY and HAVING In SQL, GROUP BY and HAVING are often used together to aggregate data and calculate various statistical measures based on grouping data by predefined criteria. Let’s take a closer look at the GROUP BY operator. It is used to group rows in the result set by the values of a specific column or group of columns.  After the grouping, we can use aggregation functions like COUNT, SUM, AVG, and others to calculate statistical data for each individual group. Example: SELECT ClientID, COUNT(PurchaseID)FROM PurchasesGROUP BY ClientID; In this example, we count the total number of purchases (PurchaseID) made by each client (ClientID). The HAVING operator is similar to WHERE, but the key difference is that HAVING is applied after the grouping has been done using GROUP BY. The main purpose of HAVING is to filter groups based on already computed aggregate values. This allows us to display only those groups that meet the criteria we set. For instance, after performing the aggregation, we can filter to only groups meeting a certain threshold. Example: SELECT ClientID, COUNT(PurchaseID)FROM PurchasesGROUP BY ClientIDHAVING COUNT(PurchaseID) > 3; In this example, we only see clients (ClientID) whose total number of orders exceeds three. Note that HAVING is applied in SQL queries exclusively after using GROUP BY. You cannot use HAVING without first grouping the data using GROUP BY. In general, the order of operations in SQL looks like this: FROM: Specify the data source. WHERE: Filter data before grouping. GROUP BY: Group rows into sets based on column values. HAVING: Filter groups after they’ve been created. SELECT: Specify which columns will appear in the query result. ORDER BY: Sort the results in the desired order. This sequence reflects the logic of query processing in SQL. The filtering conditions through WHERE are applied before grouping, which helps reduce the volume of data being processed. Conditions defined in HAVING apply to already formed data groups, allowing for more detailed analysis. The GROUP BY and HAVING operators are essential tools for data aggregation in SQL.  Their use provides extensive data analysis capabilities, allowing statistical data collection and the identification of patterns, trends, and relationships within the data. Using JOIN to Combine Tables Often, developers need to select data from two SQL tables. To accomplish this, the JOIN operator is used, allowing data from two or more sources to be combined based on matching values in specific columns. Tables in a database usually have linking columns that correlate with keys in other tables, thus enabling the linking of data. This allows for automatic synchronization of changes across related tables, which is an invaluable advantage when working with large databases where information is split across multiple tables. The structure of a query using JOIN looks like this: SELECT dataField(s)FROM tableAJOIN tableBON tableA.dataField = tableB.dataField; In this case, JOIN is used to combine two tables (tableA and tableB). The join is performed based on a common column (dataField). Additionally, the query includes the selection of specific columns (dataField(s)) that the developer wants to display in the final result. It is important to note that in SQL, there are different types of table joins, including: INNER JOIN: This allows us to retrieve only those rows that have matching records in both tables, meaning where the join condition is met: SELECT Purchases.PurchaseID, Clients.ClientNameFROM PurchasesINNER JOIN ClientsON Purchases.ClientID = Clients.ClientID; LEFT (OUTER) JOIN: This is used when we need to retrieve all rows from the left table (the one specified first in the query), and the matching rows from the right table. If there are no matching rows in the right table, the results for those rows will contain NULL values: SELECT Clients.ClientName, Purchases.PurchaseIDFROM ClientsLEFT JOIN PurchasesON Clients.ClientID = Purchases.ClientID; RIGHT (OUTER) JOIN: This works similarly to the LEFT JOIN, but in reverse. Here, we get all the records from the right table, supplemented with matching data from the left table. If no matches are found for records from the right table, NULL will be placed in the columns for the left table: SELECT Clients.ClientName, Purchases.PurchaseIDFROM ClientsRIGHT JOIN PurchasesON Clients.ClientID = Purchases.ClientID; FULL (OUTER) JOIN: This type of join gives us all rows from both tables that have corresponding records. In other words, it combines the LEFT and RIGHT JOINs. If there are rows in the first table with no matching rows in the second table, the columns from the second table will contain NULL for those rows. Similarly, if records from the second table do not have matches in the first table, the columns from the first table will contain NULL for those rows: SELECT Clients.ClientName, Purchases.PurchaseIDFROM ClientsFULL OUTER JOIN PurchasesON Clients.ClientID = Purchases.ClientID; It is worth noting that although the FULL (OUTER) JOIN is a standard SQL feature, not all SQL systems support it. For example, MySQL does not have built-in support for FULL (OUTER) JOIN, but you can emulate it using a combination of LEFT JOIN and UNION: SELECT Clients.ClientName, Purchases.PurchaseID FROM Clients LEFT JOIN Purchases ON Clients.ClientID = Purchases.ClientID UNION SELECT Clients.ClientName, Purchases.PurchaseID FROM Purchases LEFT JOIN Clients ON Clients.ClientID = Purchases.ClientID WHERE Clients.ClientID IS NULL; This query first performs a left outer join, attaching records from the Purchases table to the Clients table. Then, it joins records from Clients to Purchases that were not selected in the first query (i.e., those where ClientID is NULL). Finally, it combines the results of these two queries. In this section, we discussed different types of JOIN in SQL. Each of these joins provides flexibility in managing which data from related tables we want to see in the result set. Conclusion In this guide, we explored the use of SQL operators such as SELECT, WHERE, ORDER BY, JOIN, GROUP BY, and HAVING through practical examples. These operators offer users extensive capabilities for processing information, enabling complex analytical queries and extracting maximum value from stored data. We hope that you now have a clear understanding of how to use SQL to extract data from a database!
13 December 2024 · 12 min to read

Do you have questions,
comments, or concerns?

Our professionals are available to assist you at any moment,
whether you need help or are just unsure of where to start.
Email us
Hostman's Support