How to Use Grep and Regular Expressions in Linux

Hostman Team

Technical writer

Linux

11.02.2025

Reading time: 16 min

GREP (short for "global regular expression print") is one of the most popular utilities in the Linux operating system.

With it, you can search for phrases (sequences of characters) in multiple files simultaneously using regular expressions and filter the output of other commands, keeping only the necessary information.

This guide will cover how to search for specific expressions in a set of text files with various contents using the GREP utility.

All examples shown were run on a cloud server hosted by Hostman running Ubuntu version 22.04.

How Does GREP Work

The GREP command follows this structure:

grep [OPTIONS] [PATTERN] [SOURCES]

Where:

OPTIONS: Special parameters (flags) that activate certain mechanisms in the utility related to searching for expressions and displaying results.
PATTERN: A regular expression (or plain string) containing the phrase (pattern, template, sequence of characters) you want to find.
SOURCES: The path to the files where we will search for the specified expression.

If the GREP command is used to filter the output of another command, its structure looks a bit different:

[COMMAND] | grep [OPTIONS] [PATTERN]

Thus:

COMMAND: An arbitrary command with its own set of parameters whose output needs to be filtered.
The "pipe" symbol (|) is necessary to create a command pipeline, redirecting streams so that the output of an arbitrary command becomes the input for the GREP command.

Preparation

To understand the nuances of using GREP, it's best to start with small examples of searching for specific phrases. Therefore, we will first create a few text files and then test the GREP command on them.

Let’s first prepare a separate directory where the search will take place:

mkdir texts

Next, create the first file:

nano texts/poem

It will contain one of Langston Hughes's poems:

Hold fast to dreams  
For if dreams die  
Life is a broken-winged bird  
That cannot fly.  
Hold fast to dreams  
For when dreams go  
Life is a barren field  
Frozen with snow.

Now, create the second file:

nano texts/code.py

It will contain a simple Python script:

from datetime import date

dateNow = date.today()
print("Current time:", dateNow)

Finally, create the third file:

nano texts/page.html

This one will have simple HTML markup:

<html>
	<head>
		<title>Some Title</title>
	</head>

	<body>
		<div class="block">
			<p>There's gold here</p>
		</div>

		<div class="block">
			<p>A mixture of wax and clouds</p>
		</div>

		<div class="block block_special">
			<p>Today there's nothing</p>
		</div>
	</body>
</html>

By using files of different formats, we can better understand what the GREP command does by utilizing the full range of the utility's features.

Regular Expressions

Regular expressions are the foundation of the GREP command. Unlike a regular string, regular expressions contain special characters that allow you to specify phrases with a certain degree of variability.

When using the GREP utility, regular expressions are placed within single quotes:

'^date[[:alpha:]]*'

Thus, the full command can look like this:

grep '^date[[:alpha:]]*' texts/*

In this case, the console output will be:

texts/code.py:dateNow = date.today()

However, using double quotes allows you to pass various system data into the expression. For example, you can first create an environment variable with the search expression:

PATTERN="^date[[:alpha:]]*"

And then use it in the GREP command:

grep "$PATTERN" ./texts/*

Additionally, using single backticks allows you to use bash subprocess commands within the GREP command. For example, you can extract a regular expression from a pre-prepared file:

grep `cat somefile` ./texts/*

Note that with the asterisk symbol (wildcard), you can specify all the files in the directory at once. However, the GREP command also allows you to specify just one file:

grep '^date[[:alpha:]]' texts/code.py

Because regular expressions are a universal language used in many operating systems and programming languages, their study is a separate vast topic.

However, it makes sense to briefly cover the main special characters and their functions. It’s important to note that regular expressions in Linux can work in two modes: basic (Basic Regular Expression, BRE) and extended (Extended Regular Expression, ERE). The extended mode is activated with the additional flag -E. The difference between the two modes lies in the number of available special characters and, consequently, the breadth of available functionality.

Basic Syntax

Basic syntax allows you to define only general formal constructs without considering the specific configuration of their characters.

Start of a line — ^

The caret symbol indicates that the sought sequence of characters must be at the beginning of the line:

grep '^Hold' texts/*

The console output will be as follows:

texts/poem:Hold fast to dreams
texts/poem:Hold fast to dreams

End of a line — $

The dollar sign indicates that the sought sequence of characters must be at the end of the line:

grep '</p>$' texts/*

Output:

texts/page.html:                        <p>There's gold here</p>
texts/page.html:                        <p>A mixture of wax and clouds</p>
texts/page.html:                        <p>Today there's nothing</p>

Note that the console output preserves the original representation of the found lines as they appear in the files.

Start of a word — \<

The backslash and less-than symbol indicate that the sought phrase must be at the beginning of a word:

grep '\<br' texts/*

Output:

texts/poem:Life is a broken-winged bird

End of a word — \>

The backslash and greater-than symbol indicate that the sought sequence of characters must be at the end of a word:

grep 'en\>' texts/*

Output:

texts/poem:Life is a broken-winged bird
texts/poem:For when dreams go
texts/poem:Life is a barren field
texts/poem:Frozen with snow.

Start or end of a word — \b

You can specify the start or end of a word using the more universal sequence of characters — backslash and the letter b.

For example, this marks the beginning:

grep '\bdie' texts/*

Output:

texts/poem:For if dreams die

And this marks the end:

grep '<div\b' texts/*

In this case, the console terminal output will be as follows:

texts/page.html:                <div class="block">
texts/page.html:                <div class="block">
texts/page.html:                <div class="block block_special">

Any character — .

Certain characters in the sought phrases can be left unspecified using the dot symbol:

grep '..ere' texts/*

Output:

texts/page.html:                        <p>There's gold here</p>
texts/page.html:                        <p>Today there's nothing</p>

Extended Syntax

Unlike basic syntax, extended syntax allows you to specify the exact number of characters in the sought phrases, thus expanding the range of possible matches.

Combining patterns — |

To avoid running the GREP command multiple times, you can specify several patterns in a single regular expression:

grep -E '^Hold|</p>$' texts/*

The result of running this command will be a combined console output containing the search results for the two separate regular expressions shown earlier.

texts/page.html:                        <p>There's gold here</p>
texts/page.html:                        <p>A mixture of wax and clouds</p>
texts/page.html:                        <p>Today there's nothing</p>
texts/poem:Hold fast to dreams
texts/poem:Hold fast to dreams

Repetition range — {n, d}

In some cases, certain characters in the sought phrase may vary in quantity. Therefore, in the regular expression, you can specify a range of the allowed number of specific characters.

grep -E 'en{1,2}' texts/*

Output:

texts/code.py:print("Current time:", dateNow)
texts/poem:Life is a broken-winged bird
texts/poem:For when dreams go
texts/poem:Life is a barren field
texts/poem:Frozen with snow.

However, frequently used repetition intervals are more conveniently written as special characters, thus simplifying the appearance of the regular expression.

One or more repetitions — +

A repetition interval from one to infinity can be expressed using the plus sign:

grep -E 'en+' texts/*

In this case, the console output will not differ from the previous example.

texts/code.py:print("Current time:", dateNow)
texts/poem:Life is a broken-winged bird
texts/poem:For when dreams go
texts/poem:Life is a barren field
texts/poem:Frozen with snow.

Zero or one repetition — ?

A repetition interval from 0 to 1 can be expressed using the question mark:

grep -E 'ss?' texts/*

As a result, this command will produce the following output in the console terminal:

texts/page.html:                <div class="block">
texts/page.html:                        <p>There's gold here</p>
texts/page.html:                <div class="block">
texts/page.html:                        <p>A mixture of wax and clouds</p>
texts/page.html:                <div class="block block_special">
texts/page.html:                        <p>Today there's nothing</p>
texts/poem:Hold fast to dreams
texts/poem:For if dreams die
texts/poem:Life is a broken-winged bird
texts/poem:Hold fast to dreams
texts/poem:For when dreams go
texts/poem:Life is a barren field
texts/poem:Frozen with snow.

Character set — [abc]

Instead of one specific character, you can specify an entire set enclosed in square brackets:

grep -E '[Hh]o[Ll]' texts/*

Output:

texts/poem:Hold fast to dreams
texts/poem:Hold fast to dreams

Character range — [a-z]

We can replace a large set of allowed characters with a range written using a hyphen:

grep -E 'h[a-z]+' texts/*

Output:

texts/page.html:<html>
texts/page.html:        <head>
texts/page.html:        </head>
texts/page.html:                        <p>There's gold here</p>
texts/page.html:                        <p>Today there's nothing</p>
texts/page.html:</html>
texts/poem:That cannot fly.
texts/poem:For when dreams go

Moreover, character sets and ranges can be combined:

grep -E 'h[abcd-z]+' texts/*

Each range is implicitly transformed into a set of characters:

[a-e] into [abcde]
[0-6] into [0123456]
[a-eA-F] into [abcdeABCDEF]
[A-Fa-e] into [ABCDEFabcde]
[A-Fa-e0-9] into [ABCDEFabcde0123456789]
[a-dA-CE-G] into [abcdABCEFG]
[acegi-l5-9] into [acegijkl56789]

Character type — [:alpha:]

Frequently used ranges can be replaced with predefined character types, whose names are specified in square brackets with colons:

`[:lower:]`	characters from a to z in lowercase
`[:upper:]`	characters from A to Z in uppercase
`[:alpha:]`	all alphabetic characters
`[:digit:]`	all digit characters
`[:alnum:]`	all alphabetic characters and digits

It is important to understand that the character type is a separate syntactic construct. This means that it must be enclosed in square brackets, which denote a set or range of characters:

grep -E '[[:alpha:]]+ere' texts/*

Output:

texts/page.html:                        <p>There's gold here</p>
texts/page.html:                        <p>Today there's nothing</p>

Filtering Output

To filter the output of another command, you need to write a pipe symbol after it, followed by the standard call to the GREP utility, but without specifying the files to search:

cat texts/code.py | grep 'import'

Like when searching in regular files, the console output will contain the lines with the matches of the specified phrases:

from datetime import date

In this case, the cat command extracts the file content and passes it to the input stream of the GREP utility.

Search Options

In addition to regular expressions, you can specify additional keys for the GREP command, which are special options in flag format that refine the search.

Extended Regular Expressions (-E)

Activates the extended regular expressions mode, allowing the use of more special characters.

Case Insensitivity (-i)

Performs a search for a regular expression without considering the case of characters:

grep -E -i 'b[ar]' texts/*

The console output corresponding to this command will be:

texts/poem:Life is a broken-winged bird
texts/poem:Life is a barren field

You can also specify flags together in a single string:

grep -Ei 'b[ar]' texts/*

Whole Word (-w)

Performs a search so that the specified regular expression is a complete word (not just a substring) in the found line:

grep -w and texts/*

Note that quotes are not required when specifying a regular string without special characters.

The result of this command will be:

texts/page.html: <p>A mixture of wax and clouds</p>

Multiple Expressions (-e)

To avoid running the command multiple times, you can specify several expressions at once:

grep -e 'Hold' -e 'html' texts/*

The result of this command will be identical to this one:

grep -E 'Hold|html' texts/*

In both cases, the console terminal will display the following output:

texts/page.html:<html>
texts/page.html:</html>
texts/poem:Hold fast to dreams
texts/poem:Hold fast to dreams

Recursive Search (-r)

Performs a recursive search in the specified directory to the maximum depth of nesting:

grep -r '[Ff]ilesystem' /root

The console terminal will display output containing file paths at different nesting levels relative to the specified directory:

/root/parser/parser/settings.py:#HTTPCACHE_STORAGE = "scrapy.extensions.httpcache.FilesystemCacheStorage"
/root/resize.log:Resizing the filesystem on /dev/vda1 to 3931904 (4k) blocks.
/root/resize.log:The filesystem on /dev/vda1 is now 3931904 (4k) blocks long.

Search for Special Characters (-F)

Allows the use of special characters as the characters of the search phrase:

grep -F '[' texts/*

Without this flag, you would encounter an error in the console terminal:

grep: Invalid regular expression

An alternative to this flag would be using the escape character in the form of a backslash (\):

grep '\[' texts/*

Including Files (--include)

Allows limiting the search to the specified files only:

grep --include='*.py' 'date' texts/*

The console output will be:

texts/code.py:from datetime import date
texts/code.py:dateNow = date.today()
texts/code.py:print("Current time:", dateNow)

We can also write this command without the wildcard by using an additional recursive search flag:

grep -r --include='*.py' 'date' texts

Excluding Files (--exclude)

Selectively excludes certain files from the list of search sources:

grep --exclude='*.py' 'th' texts/*

The console output will be:

texts/page.html: <p>Today there's nothing</p>
texts/poem:Frozen with snow.

Output Options

Some parameters of the GREP command affect only the output of search results, improving their informativeness and clarity.

Line Numbers (-n)

To increase the informativeness of the GREP results, you can add the line numbers where the search phrases were found:

grep -n '</p>$' texts/*

Each line in the output will be supplemented with the corresponding line number:

texts/page.html:8:                      <p>There's gold here</p>
texts/page.html:12:                     <p>A mixture of wax and clouds</p>
texts/page.html:16:                     <p>Today there's nothing</p>

Lines Before (-B)

Displays a specified number of lines before the lines with found matches:

grep -B3 'mix' texts/*

After the flag, you specify the number of previous lines to be displayed in the console terminal:

texts/page.html-                </div>
texts/page.html-
texts/page.html-                <div class="block">
texts/page.html:                        <p>A mixture of wax and clouds</p>

Lines After (-A)

Displays a specified number of lines after the lines with found matches:

grep -A3 'mix' texts/*

After the flag, you specify the number of subsequent lines to be displayed in the console terminal:

texts/page.html:                        <p>A mixture of wax and clouds</p>
texts/page.html-                </div>
texts/page.html-
texts/page.html-                <div class="block block_special">

Lines Before and After (-C)

Displays a specified number of lines both before and after the lines with found matches:

grep -C3 'mix' texts/*

After the flag, you specify the number of preceding and following lines to be displayed in the console terminal:

texts/page.html-                </div>
texts/page.html-
texts/page.html-                <div class="block">
texts/page.html:                        <p>A mixture of wax and clouds</p>
texts/page.html-                </div>
texts/page.html-
texts/page.html-                <div class="block block_special">

Line Count (-c)

Instead of listing the found lines, the GREP command will output only the number of matches:

grep -c 't' texts/*

The console output will contain the count of matches found in all specified files:

texts/code.py:3
texts/page.html:5
texts/poem:4

If only one file is specified as the source:

grep -c 't' texts/block

The console output will contain only the number:

File Names (-l)

This flag allows you to output only the names of the files in which matches were found:

grep -l 't' texts/*

The console output will be as follows:

texts/code.py
texts/page.html
texts/poem

Limit Output (-m)

Limits the number of lines output to the console terminal to the number specified next to the flag:

grep -m2 't' texts/*

The console output will be:

texts/code.py:from datetime import date
texts/code.py:dateNow = date.today()
texts/page.html:<html>
texts/page.html:                <title>Some Title</title>
texts/poem:Hold fast to dreams
texts/poem:That cannot fly.

As you can see, the limiting number affects not the entire output but the lines of each file.

Exact Match of Whole Line (-x)

Searches for an exact match of the entire line with no variability:

grep -x 'Life is a broken-winged bird' texts/*

The console output will be:

texts/poem:Life is a broken-winged bird

Conclusion

The GREP command in Linux is the most flexible and precise tool for searching expressions in large volumes of text data.

When using the command, you need to specify the following elements:

A specific set of options (flags) that configure the search and output mechanisms.
One or more regular expressions that describe the search phrase.
A list of sources (files and directories) where the search will be performed.

Additionally, the utility is used to filter the output of other commands by redirecting input and output streams.

The core of the GREP command is regular expressions. Unlike a simple string, they allow you to define a phrase with a certain degree of variability, making it match multiple similar entries.

There are two modes of operation for regular expressions:

Basic Mode: A limited set of special characters that allow you to formalize expressions only in general terms.
Extended Mode: A full set of special characters that allows you to formalize expressions with precision down to each character.

The extended mode provides complete flexibility and accuracy when working with regular expressions.

In rare cases where you only need to find matches for trivial patterns, you can limit yourself to the basic mode.

Linux

11.02.2025

Reading time: 16 min

Similar

Linux

How to Use Grep and Regular Expressions in Linux

How Does GREP Work

Preparation

Regular Expressions

Basic Syntax

Start of a line — ^

End of a line — $

Start of a word — \<

End of a word — \>

Start or end of a word — \b

Any character — .

Extended Syntax

Combining patterns — |

Repetition range — {n, d}

One or more repetitions — +

Zero or one repetition — ?

Character set — [abc]

Character range — [a-z]

Character type — [:alpha:]

Filtering Output

Search Options

Extended Regular Expressions (-E)

Case Insensitivity (-i)

Whole Word (-w)

Multiple Expressions (-e)

Recursive Search (-r)

Search for Special Characters (-F)

Including Files (--include)

Excluding Files (--exclude)

Output Options

Line Numbers (-n)

Lines Before (-B)

Lines After (-A)

Lines Before and After (-C)

Line Count (-c)

File Names (-l)

Limit Output (-m)

Exact Match of Whole Line (-x)

Conclusion

Similar

How to Set Up Backup with Bacula

Installing Arch Linux in a Cloud Environment

How to Open a Port on Linux

Do you have questions, comments, or concerns?

Do you have questions,
comments, or concerns?