GREP (short for "global regular expression print") is one of the most popular utilities in the Linux operating system.
With it, you can search for phrases (sequences of characters) in multiple files simultaneously using regular expressions and filter the output of other commands, keeping only the necessary information.
This guide will cover how to search for specific expressions in a set of text files with various contents using the GREP utility.
All examples shown were run on a cloud server hosted by Hostman running Ubuntu version 22.04.
The GREP command follows this structure:
grep [OPTIONS] [PATTERN] [SOURCES]
Where:
OPTIONS
: Special parameters (flags) that activate certain mechanisms in the utility related to searching for expressions and displaying results.
PATTERN
: A regular expression (or plain string) containing the phrase (pattern, template, sequence of characters) you want to find.
SOURCES
: The path to the files where we will search for the specified expression.
If the GREP command is used to filter the output of another command, its structure looks a bit different:
[COMMAND] | grep [OPTIONS] [PATTERN]
Thus:
COMMAND
: An arbitrary command with its own set of parameters whose output needs to be filtered.
The "pipe" symbol (|
) is necessary to create a command pipeline, redirecting streams so that the output of an arbitrary command becomes the input for the GREP command.
To understand the nuances of using GREP, it's best to start with small examples of searching for specific phrases. Therefore, we will first create a few text files and then test the GREP command on them.
Let’s first prepare a separate directory where the search will take place:
mkdir texts
Next, create the first file:
nano texts/poem
It will contain one of Langston Hughes's poems:
Hold fast to dreams
For if dreams die
Life is a broken-winged bird
That cannot fly.
Hold fast to dreams
For when dreams go
Life is a barren field
Frozen with snow.
Now, create the second file:
nano texts/code.py
It will contain a simple Python script:
from datetime import date
dateNow = date.today()
print("Current time:", dateNow)
Finally, create the third file:
nano texts/page.html
This one will have simple HTML markup:
<html>
<head>
<title>Some Title</title>
</head>
<body>
<div class="block">
<p>There's gold here</p>
</div>
<div class="block">
<p>A mixture of wax and clouds</p>
</div>
<div class="block block_special">
<p>Today there's nothing</p>
</div>
</body>
</html>
By using files of different formats, we can better understand what the GREP command does by utilizing the full range of the utility's features.
Regular expressions are the foundation of the GREP command. Unlike a regular string, regular expressions contain special characters that allow you to specify phrases with a certain degree of variability.
When using the GREP utility, regular expressions are placed within single quotes:
'^date[[:alpha:]]*'
Thus, the full command can look like this:
grep '^date[[:alpha:]]*' texts/*
In this case, the console output will be:
texts/code.py:dateNow = date.today()
However, using double quotes allows you to pass various system data into the expression. For example, you can first create an environment variable with the search expression:
PATTERN="^date[[:alpha:]]*"
And then use it in the GREP command:
grep "$PATTERN" ./texts/*
Additionally, using single backticks allows you to use bash subprocess commands within the GREP command. For example, you can extract a regular expression from a pre-prepared file:
grep `cat somefile` ./texts/*
Note that with the asterisk symbol (wildcard), you can specify all the files in the directory at once. However, the GREP command also allows you to specify just one file:
grep '^date[[:alpha:]]' texts/code.py
Because regular expressions are a universal language used in many operating systems and programming languages, their study is a separate vast topic.
However, it makes sense to briefly cover the main special characters and their functions. It’s important to note that regular expressions in Linux can work in two modes: basic (Basic Regular Expression, BRE) and extended (Extended Regular Expression, ERE). The extended mode is activated with the additional flag -E. The difference between the two modes lies in the number of available special characters and, consequently, the breadth of available functionality.
Basic syntax allows you to define only general formal constructs without considering the specific configuration of their characters.
The caret symbol indicates that the sought sequence of characters must be at the beginning of the line:
grep '^Hold' texts/*
The console output will be as follows:
texts/poem:Hold fast to dreams
texts/poem:Hold fast to dreams
The dollar sign indicates that the sought sequence of characters must be at the end of the line:
grep '</p>$' texts/*
Output:
texts/page.html: <p>There's gold here</p>
texts/page.html: <p>A mixture of wax and clouds</p>
texts/page.html: <p>Today there's nothing</p>
Note that the console output preserves the original representation of the found lines as they appear in the files.
The backslash and less-than symbol indicate that the sought phrase must be at the beginning of a word:
grep '\<br' texts/*
Output:
texts/poem:Life is a broken-winged bird
The backslash and greater-than symbol indicate that the sought sequence of characters must be at the end of a word:
grep 'en\>' texts/*
Output:
texts/poem:Life is a broken-winged bird
texts/poem:For when dreams go
texts/poem:Life is a barren field
texts/poem:Frozen with snow.
You can specify the start or end of a word using the more universal sequence of characters — backslash and the letter b
.
For example, this marks the beginning:
grep '\bdie' texts/*
Output:
texts/poem:For if dreams die
And this marks the end:
grep '<div\b' texts/*
In this case, the console terminal output will be as follows:
texts/page.html: <div class="block">
texts/page.html: <div class="block">
texts/page.html: <div class="block block_special">
Certain characters in the sought phrases can be left unspecified using the dot symbol:
grep '..ere' texts/*
Output:
texts/page.html: <p>There's gold here</p>
texts/page.html: <p>Today there's nothing</p>
Unlike basic syntax, extended syntax allows you to specify the exact number of characters in the sought phrases, thus expanding the range of possible matches.
To avoid running the GREP command multiple times, you can specify several patterns in a single regular expression:
grep -E '^Hold|</p>$' texts/*
The result of running this command will be a combined console output containing the search results for the two separate regular expressions shown earlier.
texts/page.html: <p>There's gold here</p>
texts/page.html: <p>A mixture of wax and clouds</p>
texts/page.html: <p>Today there's nothing</p>
texts/poem:Hold fast to dreams
texts/poem:Hold fast to dreams
In some cases, certain characters in the sought phrase may vary in quantity. Therefore, in the regular expression, you can specify a range of the allowed number of specific characters.
grep -E 'en{1,2}' texts/*
Output:
texts/code.py:print("Current time:", dateNow)
texts/poem:Life is a broken-winged bird
texts/poem:For when dreams go
texts/poem:Life is a barren field
texts/poem:Frozen with snow.
However, frequently used repetition intervals are more conveniently written as special characters, thus simplifying the appearance of the regular expression.
A repetition interval from one to infinity can be expressed using the plus sign:
grep -E 'en+' texts/*
In this case, the console output will not differ from the previous example.
texts/code.py:print("Current time:", dateNow)
texts/poem:Life is a broken-winged bird
texts/poem:For when dreams go
texts/poem:Life is a barren field
texts/poem:Frozen with snow.
A repetition interval from 0 to 1 can be expressed using the question mark:
grep -E 'ss?' texts/*
As a result, this command will produce the following output in the console terminal:
texts/page.html: <div class="block">
texts/page.html: <p>There's gold here</p>
texts/page.html: <div class="block">
texts/page.html: <p>A mixture of wax and clouds</p>
texts/page.html: <div class="block block_special">
texts/page.html: <p>Today there's nothing</p>
texts/poem:Hold fast to dreams
texts/poem:For if dreams die
texts/poem:Life is a broken-winged bird
texts/poem:Hold fast to dreams
texts/poem:For when dreams go
texts/poem:Life is a barren field
texts/poem:Frozen with snow.
Instead of one specific character, you can specify an entire set enclosed in square brackets:
grep -E '[Hh]o[Ll]' texts/*
Output:
texts/poem:Hold fast to dreams
texts/poem:Hold fast to dreams
We can replace a large set of allowed characters with a range written using a hyphen:
grep -E 'h[a-z]+' texts/*
Output:
texts/page.html:<html>
texts/page.html: <head>
texts/page.html: </head>
texts/page.html: <p>There's gold here</p>
texts/page.html: <p>Today there's nothing</p>
texts/page.html:</html>
texts/poem:That cannot fly.
texts/poem:For when dreams go
Moreover, character sets and ranges can be combined:
grep -E 'h[abcd-z]+' texts/*
Each range is implicitly transformed into a set of characters:
[a-e]
into [abcde]
[0-6]
into [0123456]
[a-eA-F]
into [abcdeABCDEF]
[A-Fa-e]
into [ABCDEFabcde]
[A-Fa-e0-9]
into [ABCDEFabcde0123456789]
[a-dA-CE-G]
into [abcdABCEFG]
[acegi-l5-9]
into [acegijkl56789]
Frequently used ranges can be replaced with predefined character types, whose names are specified in square brackets with colons:
|
characters from a to z in lowercase |
|
characters from A to Z in uppercase |
|
all alphabetic characters |
|
all digit characters |
|
all alphabetic characters and digits |
It is important to understand that the character type is a separate syntactic construct. This means that it must be enclosed in square brackets, which denote a set or range of characters:
grep -E '[[:alpha:]]+ere' texts/*
Output:
texts/page.html: <p>There's gold here</p>
texts/page.html: <p>Today there's nothing</p>
To filter the output of another command, you need to write a pipe symbol after it, followed by the standard call to the GREP utility, but without specifying the files to search:
cat texts/code.py | grep 'import'
Like when searching in regular files, the console output will contain the lines with the matches of the specified phrases:
from datetime import date
In this case, the cat command extracts the file content and passes it to the input stream of the GREP utility.
In addition to regular expressions, you can specify additional keys for the GREP command, which are special options in flag format that refine the search.
Activates the extended regular expressions mode, allowing the use of more special characters.
Performs a search for a regular expression without considering the case of characters:
grep -E -i 'b[ar]' texts/*
The console output corresponding to this command will be:
texts/poem:Life is a broken-winged bird
texts/poem:Life is a barren field
You can also specify flags together in a single string:
grep -Ei 'b[ar]' texts/*
Performs a search so that the specified regular expression is a complete word (not just a substring) in the found line:
grep -w and texts/*
Note that quotes are not required when specifying a regular string without special characters.
The result of this command will be:
texts/page.html: <p>A mixture of wax and clouds</p>
To avoid running the command multiple times, you can specify several expressions at once:
grep -e 'Hold' -e 'html' texts/*
The result of this command will be identical to this one:
grep -E 'Hold|html' texts/*
In both cases, the console terminal will display the following output:
texts/page.html:<html>
texts/page.html:</html>
texts/poem:Hold fast to dreams
texts/poem:Hold fast to dreams
Performs a recursive search in the specified directory to the maximum depth of nesting:
grep -r '[Ff]ilesystem' /root
The console terminal will display output containing file paths at different nesting levels relative to the specified directory:
/root/parser/parser/settings.py:#HTTPCACHE_STORAGE = "scrapy.extensions.httpcache.FilesystemCacheStorage"
/root/resize.log:Resizing the filesystem on /dev/vda1 to 3931904 (4k) blocks.
/root/resize.log:The filesystem on /dev/vda1 is now 3931904 (4k) blocks long.
Allows the use of special characters as the characters of the search phrase:
grep -F '[' texts/*
Without this flag, you would encounter an error in the console terminal:
grep: Invalid regular expression
An alternative to this flag would be using the escape character in the form of a backslash (\
):
grep '\[' texts/*
Allows limiting the search to the specified files only:
grep --include='*.py' 'date' texts/*
The console output will be:
texts/code.py:from datetime import date
texts/code.py:dateNow = date.today()
texts/code.py:print("Current time:", dateNow)
We can also write this command without the wildcard by using an additional recursive search flag:
grep -r --include='*.py' 'date' texts
Selectively excludes certain files from the list of search sources:
grep --exclude='*.py' 'th' texts/*
The console output will be:
texts/page.html: <p>Today there's nothing</p>
texts/poem:Frozen with snow.
Some parameters of the GREP command affect only the output of search results, improving their informativeness and clarity.
To increase the informativeness of the GREP results, you can add the line numbers where the search phrases were found:
grep -n '</p>$' texts/*
Each line in the output will be supplemented with the corresponding line number:
texts/page.html:8: <p>There's gold here</p>
texts/page.html:12: <p>A mixture of wax and clouds</p>
texts/page.html:16: <p>Today there's nothing</p>
Displays a specified number of lines before the lines with found matches:
grep -B3 'mix' texts/*
After the flag, you specify the number of previous lines to be displayed in the console terminal:
texts/page.html- </div>
texts/page.html-
texts/page.html- <div class="block">
texts/page.html: <p>A mixture of wax and clouds</p>
Displays a specified number of lines after the lines with found matches:
grep -A3 'mix' texts/*
After the flag, you specify the number of subsequent lines to be displayed in the console terminal:
texts/page.html: <p>A mixture of wax and clouds</p>
texts/page.html- </div>
texts/page.html-
texts/page.html- <div class="block block_special">
Displays a specified number of lines both before and after the lines with found matches:
grep -C3 'mix' texts/*
After the flag, you specify the number of preceding and following lines to be displayed in the console terminal:
texts/page.html- </div>
texts/page.html-
texts/page.html- <div class="block">
texts/page.html: <p>A mixture of wax and clouds</p>
texts/page.html- </div>
texts/page.html-
texts/page.html- <div class="block block_special">
Instead of listing the found lines, the GREP command will output only the number of matches:
grep -c 't' texts/*
The console output will contain the count of matches found in all specified files:
texts/code.py:3
texts/page.html:5
texts/poem:4
If only one file is specified as the source:
grep -c 't' texts/block
The console output will contain only the number:
4
This flag allows you to output only the names of the files in which matches were found:
grep -l 't' texts/*
The console output will be as follows:
texts/code.py
texts/page.html
texts/poem
Limits the number of lines output to the console terminal to the number specified next to the flag:
grep -m2 't' texts/*
The console output will be:
texts/code.py:from datetime import date
texts/code.py:dateNow = date.today()
texts/page.html:<html>
texts/page.html: <title>Some Title</title>
texts/poem:Hold fast to dreams
texts/poem:That cannot fly.
As you can see, the limiting number affects not the entire output but the lines of each file.
Searches for an exact match of the entire line with no variability:
grep -x 'Life is a broken-winged bird' texts/*
The console output will be:
texts/poem:Life is a broken-winged bird
The GREP command in Linux is the most flexible and precise tool for searching expressions in large volumes of text data.
When using the command, you need to specify the following elements:
Additionally, the utility is used to filter the output of other commands by redirecting input and output streams.
The core of the GREP command is regular expressions. Unlike a simple string, they allow you to define a phrase with a certain degree of variability, making it match multiple similar entries.
There are two modes of operation for regular expressions:
The extended mode provides complete flexibility and accuracy when working with regular expressions.
In rare cases where you only need to find matches for trivial patterns, you can limit yourself to the basic mode.