Sign In
Sign In

How to Use Grep and Regular Expressions in Linux

How to Use Grep and Regular Expressions in Linux
Hostman Team
Technical writer
Linux
11.02.2025
Reading time: 16 min

GREP (short for "global regular expression print") is one of the most popular utilities in the Linux operating system.

With it, you can search for phrases (sequences of characters) in multiple files simultaneously using regular expressions and filter the output of other commands, keeping only the necessary information.

This guide will cover how to search for specific expressions in a set of text files with various contents using the GREP utility.

All examples shown were run on a cloud server hosted by Hostman running Ubuntu version 22.04.

How Does GREP Work

The GREP command follows this structure:

grep [OPTIONS] [PATTERN] [SOURCES]

Where:

  • OPTIONS: Special parameters (flags) that activate certain mechanisms in the utility related to searching for expressions and displaying results.

  • PATTERN: A regular expression (or plain string) containing the phrase (pattern, template, sequence of characters) you want to find.

  • SOURCES: The path to the files where we will search for the specified expression.

If the GREP command is used to filter the output of another command, its structure looks a bit different:

[COMMAND] | grep [OPTIONS] [PATTERN]

Thus:

  • COMMAND: An arbitrary command with its own set of parameters whose output needs to be filtered.

  • The "pipe" symbol (|) is necessary to create a command pipeline, redirecting streams so that the output of an arbitrary command becomes the input for the GREP command.

Preparation

To understand the nuances of using GREP, it's best to start with small examples of searching for specific phrases. Therefore, we will first create a few text files and then test the GREP command on them.

Let’s first prepare a separate directory where the search will take place:

mkdir texts

Next, create the first file:

nano texts/poem

It will contain one of Langston Hughes's poems:

Hold fast to dreams  
For if dreams die  
Life is a broken-winged bird  
That cannot fly.  
Hold fast to dreams  
For when dreams go  
Life is a barren field  
Frozen with snow.

Now, create the second file:

nano texts/code.py

It will contain a simple Python script:

from datetime import date

dateNow = date.today()
print("Current time:", dateNow)

Finally, create the third file:

nano texts/page.html

This one will have simple HTML markup:

<html>
	<head>
		<title>Some Title</title>
	</head>

	<body>
		<div class="block">
			<p>There's gold here</p>
		</div>

		<div class="block">
			<p>A mixture of wax and clouds</p>
		</div>

		<div class="block block_special">
			<p>Today there's nothing</p>
		</div>
	</body>
</html>

By using files of different formats, we can better understand what the GREP command does by utilizing the full range of the utility's features.

Regular Expressions

Regular expressions are the foundation of the GREP command. Unlike a regular string, regular expressions contain special characters that allow you to specify phrases with a certain degree of variability.

When using the GREP utility, regular expressions are placed within single quotes:

'^date[[:alpha:]]*'

Thus, the full command can look like this:

grep '^date[[:alpha:]]*' texts/*

In this case, the console output will be:

texts/code.py:dateNow = date.today()

However, using double quotes allows you to pass various system data into the expression. For example, you can first create an environment variable with the search expression:

PATTERN="^date[[:alpha:]]*"

And then use it in the GREP command:

grep "$PATTERN" ./texts/*

Additionally, using single backticks allows you to use bash subprocess commands within the GREP command. For example, you can extract a regular expression from a pre-prepared file:

grep `cat somefile` ./texts/*

Note that with the asterisk symbol (wildcard), you can specify all the files in the directory at once. However, the GREP command also allows you to specify just one file: 

grep '^date[[:alpha:]]' texts/code.py 

Because regular expressions are a universal language used in many operating systems and programming languages, their study is a separate vast topic. 

However, it makes sense to briefly cover the main special characters and their functions. It’s important to note that regular expressions in Linux can work in two modes: basic (Basic Regular Expression, BRE) and extended (Extended Regular Expression, ERE). The extended mode is activated with the additional flag -E. The difference between the two modes lies in the number of available special characters and, consequently, the breadth of available functionality.

Basic Syntax

Basic syntax allows you to define only general formal constructs without considering the specific configuration of their characters.

Start of a line — ^

The caret symbol indicates that the sought sequence of characters must be at the beginning of the line:

grep '^Hold' texts/*

The console output will be as follows:

texts/poem:Hold fast to dreams
texts/poem:Hold fast to dreams

End of a line — $

The dollar sign indicates that the sought sequence of characters must be at the end of the line:

grep '</p>$' texts/*

Output:

texts/page.html:                        <p>There's gold here</p>
texts/page.html:                        <p>A mixture of wax and clouds</p>
texts/page.html:                        <p>Today there's nothing</p>

Note that the console output preserves the original representation of the found lines as they appear in the files.

Start of a word — \<

The backslash and less-than symbol indicate that the sought phrase must be at the beginning of a word:

grep '\<br' texts/*

Output:

texts/poem:Life is a broken-winged bird

End of a word — \>

The backslash and greater-than symbol indicate that the sought sequence of characters must be at the end of a word:

grep 'en\>' texts/*

Output:

texts/poem:Life is a broken-winged bird
texts/poem:For when dreams go
texts/poem:Life is a barren field
texts/poem:Frozen with snow.

Start or end of a word — \b

You can specify the start or end of a word using the more universal sequence of characters — backslash and the letter b.

For example, this marks the beginning:

grep '\bdie' texts/*

Output:

texts/poem:For if dreams die

And this marks the end:

grep '<div\b' texts/*

In this case, the console terminal output will be as follows:

texts/page.html:                <div class="block">
texts/page.html:                <div class="block">
texts/page.html:                <div class="block block_special">

Any character — .

Certain characters in the sought phrases can be left unspecified using the dot symbol:

grep '..ere' texts/*

Output:

texts/page.html:                        <p>There's gold here</p>
texts/page.html:                        <p>Today there's nothing</p>

Extended Syntax

Unlike basic syntax, extended syntax allows you to specify the exact number of characters in the sought phrases, thus expanding the range of possible matches.

Combining patterns — |

To avoid running the GREP command multiple times, you can specify several patterns in a single regular expression:

grep -E '^Hold|</p>$' texts/*

The result of running this command will be a combined console output containing the search results for the two separate regular expressions shown earlier.

texts/page.html:                        <p>There's gold here</p>
texts/page.html:                        <p>A mixture of wax and clouds</p>
texts/page.html:                        <p>Today there's nothing</p>
texts/poem:Hold fast to dreams
texts/poem:Hold fast to dreams

Repetition range — {n, d}

In some cases, certain characters in the sought phrase may vary in quantity. Therefore, in the regular expression, you can specify a range of the allowed number of specific characters.

grep -E 'en{1,2}' texts/*

Output:

texts/code.py:print("Current time:", dateNow)
texts/poem:Life is a broken-winged bird
texts/poem:For when dreams go
texts/poem:Life is a barren field
texts/poem:Frozen with snow.

However, frequently used repetition intervals are more conveniently written as special characters, thus simplifying the appearance of the regular expression.

One or more repetitions — +

A repetition interval from one to infinity can be expressed using the plus sign:

grep -E 'en+' texts/*

In this case, the console output will not differ from the previous example.

texts/code.py:print("Current time:", dateNow)
texts/poem:Life is a broken-winged bird
texts/poem:For when dreams go
texts/poem:Life is a barren field
texts/poem:Frozen with snow.

Zero or one repetition — ?

A repetition interval from 0 to 1 can be expressed using the question mark:

grep -E 'ss?' texts/*

As a result, this command will produce the following output in the console terminal:

texts/page.html:                <div class="block">
texts/page.html:                        <p>There's gold here</p>
texts/page.html:                <div class="block">
texts/page.html:                        <p>A mixture of wax and clouds</p>
texts/page.html:                <div class="block block_special">
texts/page.html:                        <p>Today there's nothing</p>
texts/poem:Hold fast to dreams
texts/poem:For if dreams die
texts/poem:Life is a broken-winged bird
texts/poem:Hold fast to dreams
texts/poem:For when dreams go
texts/poem:Life is a barren field
texts/poem:Frozen with snow.

Character set — [abc]

Instead of one specific character, you can specify an entire set enclosed in square brackets:

grep -E '[Hh]o[Ll]' texts/*

Output:

texts/poem:Hold fast to dreams
texts/poem:Hold fast to dreams

Character range — [a-z]

We can replace a large set of allowed characters with a range written using a hyphen:

grep -E 'h[a-z]+' texts/*

Output:

texts/page.html:<html>
texts/page.html:        <head>
texts/page.html:        </head>
texts/page.html:                        <p>There's gold here</p>
texts/page.html:                        <p>Today there's nothing</p>
texts/page.html:</html>
texts/poem:That cannot fly.
texts/poem:For when dreams go

Moreover, character sets and ranges can be combined:

grep -E 'h[abcd-z]+' texts/*

Each range is implicitly transformed into a set of characters:

  • [a-e] into [abcde]
  • [0-6] into [0123456]
  • [a-eA-F] into [abcdeABCDEF]
  • [A-Fa-e] into [ABCDEFabcde]
  • [A-Fa-e0-9] into [ABCDEFabcde0123456789]
  • [a-dA-CE-G] into [abcdABCEFG]
  • [acegi-l5-9] into [acegijkl56789]

Character type — [:alpha:]

Frequently used ranges can be replaced with predefined character types, whose names are specified in square brackets with colons:

[:lower:]

characters from a to z in lowercase

[:upper:]

characters from A to Z in uppercase

[:alpha:]

all alphabetic characters

[:digit:]

all digit characters

[:alnum:]

all alphabetic characters and digits

It is important to understand that the character type is a separate syntactic construct. This means that it must be enclosed in square brackets, which denote a set or range of characters:

grep -E '[[:alpha:]]+ere' texts/*

Output:

texts/page.html:                        <p>There's gold here</p>
texts/page.html:                        <p>Today there's nothing</p>

Filtering Output

To filter the output of another command, you need to write a pipe symbol after it, followed by the standard call to the GREP utility, but without specifying the files to search:

cat texts/code.py | grep 'import'

Like when searching in regular files, the console output will contain the lines with the matches of the specified phrases:

from datetime import date

In this case, the cat command extracts the file content and passes it to the input stream of the GREP utility.

Search Options

In addition to regular expressions, you can specify additional keys for the GREP command, which are special options in flag format that refine the search.

Extended Regular Expressions (-E)

Activates the extended regular expressions mode, allowing the use of more special characters.

Case Insensitivity (-i)

Performs a search for a regular expression without considering the case of characters:

grep -E -i 'b[ar]' texts/*

The console output corresponding to this command will be:

texts/poem:Life is a broken-winged bird
texts/poem:Life is a barren field

You can also specify flags together in a single string:

grep -Ei 'b[ar]' texts/*

Whole Word (-w)

Performs a search so that the specified regular expression is a complete word (not just a substring) in the found line:

grep -w and texts/*

Note that quotes are not required when specifying a regular string without special characters.

The result of this command will be:

texts/page.html: <p>A mixture of wax and clouds</p>

Multiple Expressions (-e)

To avoid running the command multiple times, you can specify several expressions at once:

grep -e 'Hold' -e 'html' texts/*

The result of this command will be identical to this one:

grep -E 'Hold|html' texts/*

In both cases, the console terminal will display the following output:

texts/page.html:<html>
texts/page.html:</html>
texts/poem:Hold fast to dreams
texts/poem:Hold fast to dreams

Recursive Search (-r)

Performs a recursive search in the specified directory to the maximum depth of nesting:

grep -r '[Ff]ilesystem' /root

The console terminal will display output containing file paths at different nesting levels relative to the specified directory:

/root/parser/parser/settings.py:#HTTPCACHE_STORAGE = "scrapy.extensions.httpcache.FilesystemCacheStorage"
/root/resize.log:Resizing the filesystem on /dev/vda1 to 3931904 (4k) blocks.
/root/resize.log:The filesystem on /dev/vda1 is now 3931904 (4k) blocks long.

Search for Special Characters (-F)

Allows the use of special characters as the characters of the search phrase:

grep -F '[' texts/*

Without this flag, you would encounter an error in the console terminal:

grep: Invalid regular expression

An alternative to this flag would be using the escape character in the form of a backslash (\):

grep '\[' texts/*

Including Files (--include)

Allows limiting the search to the specified files only:

grep --include='*.py' 'date' texts/*

The console output will be:

texts/code.py:from datetime import date
texts/code.py:dateNow = date.today()
texts/code.py:print("Current time:", dateNow)

We can also write this command without the wildcard by using an additional recursive search flag:

grep -r --include='*.py' 'date' texts

Excluding Files (--exclude)

Selectively excludes certain files from the list of search sources:

grep --exclude='*.py' 'th' texts/*

The console output will be:

texts/page.html: <p>Today there's nothing</p>
texts/poem:Frozen with snow.

Output Options

Some parameters of the GREP command affect only the output of search results, improving their informativeness and clarity.

Line Numbers (-n)

To increase the informativeness of the GREP results, you can add the line numbers where the search phrases were found:

grep -n '</p>$' texts/*

Each line in the output will be supplemented with the corresponding line number:

texts/page.html:8:                      <p>There's gold here</p>
texts/page.html:12:                     <p>A mixture of wax and clouds</p>
texts/page.html:16:                     <p>Today there's nothing</p>

Lines Before (-B)

Displays a specified number of lines before the lines with found matches:

grep -B3 'mix' texts/*

After the flag, you specify the number of previous lines to be displayed in the console terminal:

texts/page.html-                </div>
texts/page.html-
texts/page.html-                <div class="block">
texts/page.html:                        <p>A mixture of wax and clouds</p>

Lines After (-A)

Displays a specified number of lines after the lines with found matches:

grep -A3 'mix' texts/*

After the flag, you specify the number of subsequent lines to be displayed in the console terminal:

texts/page.html:                        <p>A mixture of wax and clouds</p>
texts/page.html-                </div>
texts/page.html-
texts/page.html-                <div class="block block_special">

Lines Before and After (-C)

Displays a specified number of lines both before and after the lines with found matches:

grep -C3 'mix' texts/*

After the flag, you specify the number of preceding and following lines to be displayed in the console terminal:

texts/page.html-                </div>
texts/page.html-
texts/page.html-                <div class="block">
texts/page.html:                        <p>A mixture of wax and clouds</p>
texts/page.html-                </div>
texts/page.html-
texts/page.html-                <div class="block block_special">

Line Count (-c)

Instead of listing the found lines, the GREP command will output only the number of matches:

grep -c 't' texts/*

The console output will contain the count of matches found in all specified files:

texts/code.py:3
texts/page.html:5
texts/poem:4

If only one file is specified as the source:

grep -c 't' texts/block

The console output will contain only the number:

4

File Names (-l)

This flag allows you to output only the names of the files in which matches were found:

grep -l 't' texts/*

The console output will be as follows:

texts/code.py
texts/page.html
texts/poem

Limit Output (-m)

Limits the number of lines output to the console terminal to the number specified next to the flag:

grep -m2 't' texts/*

The console output will be:

texts/code.py:from datetime import date
texts/code.py:dateNow = date.today()
texts/page.html:<html>
texts/page.html:                <title>Some Title</title>
texts/poem:Hold fast to dreams
texts/poem:That cannot fly.

As you can see, the limiting number affects not the entire output but the lines of each file.

Exact Match of Whole Line (-x)

Searches for an exact match of the entire line with no variability:

grep -x 'Life is a broken-winged bird' texts/*

The console output will be:

texts/poem:Life is a broken-winged bird

Conclusion

The GREP command in Linux is the most flexible and precise tool for searching expressions in large volumes of text data.

When using the command, you need to specify the following elements:

  • A specific set of options (flags) that configure the search and output mechanisms.
  • One or more regular expressions that describe the search phrase.
  • A list of sources (files and directories) where the search will be performed.

Additionally, the utility is used to filter the output of other commands by redirecting input and output streams.

The core of the GREP command is regular expressions. Unlike a simple string, they allow you to define a phrase with a certain degree of variability, making it match multiple similar entries.

There are two modes of operation for regular expressions:

  • Basic Mode: A limited set of special characters that allow you to formalize expressions only in general terms.
  • Extended Mode: A full set of special characters that allows you to formalize expressions with precision down to each character.

The extended mode provides complete flexibility and accuracy when working with regular expressions.

In rare cases where you only need to find matches for trivial patterns, you can limit yourself to the basic mode.

Linux
11.02.2025
Reading time: 16 min

Similar

Linux

How to Copy Files and Directories in Linux

When you first start working with Linux, one of the essential tasks you’ll encounter is file management. Whether you’re organizing your personal documents, migrating system files, or preparing comprehensive backups, knowing how to duplicate your files accurately is crucial. At the heart of this process is the cp command—a robust utility designed to replicate files and directories effortlessly. This guide is designed to help you master the cp command. We’ll explore everything from basic file copying to recursive directory replication, along with tips for preserving file metadata and preventing accidental data loss. With detailed examples, real-world scenarios, and best practices, you’ll soon be equipped to use cp like a seasoned Linux professional. Diving into the cp Command In Linux, the cp command functions as your primary tool for copying data. Its versatility allows you to handle everything from a single file copy to mirroring complex directory structures with nested subfolders. Unlike graphical file managers, the cp command works entirely from the terminal, giving you precise control over every aspect of the copy process. How It Works At its simplest, cp takes a source file (or directory) and duplicates it to a new location. Its flexibility, however, lies in its options—flags that let you modify its behavior to suit your needs. Whether you’re preserving file permissions, ensuring no accidental overwrites occur, or copying entire folder trees, cp has a flag for every scenario. Basic Command Structure The cp command follows a simple format. Here’s the canonical syntax: cp [options] source destination cp: The command to initiate a copy. [options]: Additional parameters (flags) that control the behavior of the copy process. source: The file or directory you wish to duplicate. destination: The target location or filename for the copy. This straightforward structure makes cp a favorite among system administrators and casual users alike. Exploring Key Options The true power of cp is unlocked through its myriad options. Let’s review some of the most useful ones: Recursive Copying (-r or -R): When you need to copy an entire directory—complete with all its subdirectories and files—the recursive flag is indispensable. It tells cp to traverse the directory tree, ensuring nothing is left behind. Interactive Mode (-i): Safety first! The interactive option prompts you before replacing an existing file. This extra step is critical when you’re working with important data, as it minimizes the risk of accidental overwrites. Force Copy (-f): Sometimes you need to override warnings and ensure the file is copied no matter what. The force flag does just that, replacing existing files without a prompt. Use this with caution. Preserve Attributes (-p): File integrity matters, especially when dealing with permissions, timestamps, and ownership information. The preserve flag ensures that the new copy retains all of these attributes, making it perfect for backups or sensitive system files. Verbose Output (-v): For a detailed view of what’s happening during the copy process, the verbose option prints each step to the terminal. This can be particularly helpful when copying large sets of files or debugging complex operations. Practical Examples: Copying Files Let’s now dive into some practical examples to see how these options come together in everyday tasks. Copying a Single File Imagine you have a file named notes.txt and you want to create a backup copy in the same directory. You can simply run: cp notes.txt notes_backup.txt This command creates an exact duplicate named notes_backup.txt. However, if a file by that name already exists and you want to avoid overwriting it without confirmation, you can use: cp -i notes.txt notes_backup.txt The -i flag ensures that you’re asked before any overwriting takes place. Transferring Files Between Folders If your goal is to move a file from one location to another, specify the destination directory. For instance, to move report.pdf to a directory called archive, use: cp report.pdf /home/username/archive/ Make sure that the destination directory already exists; cp will not create it for you. If it doesn’t, you can create it with the mkdir command beforehand. Copying Multiple Files at Once Sometimes, you might need to duplicate several files simultaneously. To copy file1.txt, file2.txt, and file3.txt into a directory named backup, you would type: cp file1.txt file2.txt file3.txt /home/username/backup/ This command handles multiple files in one go. If you’re dealing with many files that share a common pattern—say, all log files—you can use a wildcard: cp *.log /home/username/logs/ This instructs cp to copy every file ending with .log into the logs directory, streamlining the process when working with numerous files. Mastering Recursive Copying for Directories Often, the task isn’t limited to a single file but involves entire directories. Copying directories requires a recursive approach to capture every nested file and folder. Recursively Duplicating a Directory Suppose you want to duplicate a website’s content located in /var/www/html to create a backup. The command would be: cp -r /var/www/html /backup/html_backup Here, the -r flag tells cp to copy everything within /var/www/html—subdirectories, hidden files, and all. Combining Recursive and Preserve Options When backing up directories, it’s often crucial to maintain file permissions, timestamps, and other metadata. In such cases, combine the recursive flag with the preserve flag: cp -rp /var/www/html /backup/html_backup This command ensures that every file in /var/www/html is copied to /backup/html_backup with all its original attributes intact. It’s an ideal solution for sensitive data or system configurations. Tips, Tricks, and Advanced Techniques Now that you understand the basics, let’s explore some advanced strategies and best practices for using the cp command effectively. Combine Options for Enhanced Safety It’s common to use multiple options together to tailor the behavior of cp. For instance, to safely copy a directory while preserving file attributes and prompting for overwrites, you can use: cp -rpi /data/source_directory /data/destination_directory This powerful combination ensures a thorough and secure copy process. Handling File Names with Special Characters File names in Linux may include spaces or special characters. To ensure these names are handled correctly, enclose them in quotes. For example: cp "My Important Document.txt" "My Important Document Copy.txt" This prevents the shell from misinterpreting spaces as delimiters between different arguments. Avoiding Unintentional Overwrites For batch operations or automated scripts, you might want to ensure that existing files are never overwritten. The -n option (short for no-clobber) achieves this: cp -n *.conf /backup/configs/ This command copies configuration files only if a file with the same name doesn’t already exist in the destination, adding an extra layer of safety. Use Verbose Mode for Debugging When dealing with a large volume of files or troubleshooting a copy operation, the verbose flag (-v) can be immensely helpful: cp -rv /source/folder /destination/folder Verbose mode prints every file as it is processed, giving you a clear view of the ongoing operation and making it easier to identify any issues. Real-World Applications and Scenarios The cp command isn’t just for occasional use—it’s a vital tool in many professional settings. Here are a few real-world scenarios where mastering cp can make a significant difference: System Administration and Backups System administrators often use cp to create backups before making critical changes to system configurations. For instance: cp -rp /etc /backup/etc_backup This command creates a comprehensive backup of the /etc directory, preserving all system settings and permissions. In the event of an error or system failure, such backups are indispensable. Data Migration and Server Transfers When moving data between servers or different parts of a network, cp helps ensure that all files are transferred accurately. Combining cp with other tools like rsync can create robust solutions for data migration. Development and Testing Developers frequently duplicate directories to create test environments or sandbox copies of their projects. Whether you’re testing a new feature or debugging an issue, copying the entire project directory with preserved attributes can save you time and prevent potential errors. Best Practices for Using cp Effectively To wrap up, here are some key recommendations to keep in mind when using the cp command: Double-check Destination Paths: Always verify that the target directory exists to avoid errors during the copy process. Use Interactive Mode for Critical Files: When working with important data, the -i flag can prevent unintentional overwrites by asking for confirmation. Quote File Names with Spaces: Ensure that any file names containing spaces or special characters are enclosed in quotes. Plan Your Backup Strategy: Regularly back up essential directories using recursive and preserve options to maintain data integrity. Combine Options Thoughtfully: Mix and match flags such as -r, -p, and -v to tailor cp to your specific needs, ensuring safety and clarity in your file operations. Final Thoughts The Linux cp command is a cornerstone of effective file management. Its simplicity belies the powerful functionality hidden within its many options. By mastering cp, you not only streamline your workflow but also protect your data through careful handling of file attributes, recursive copying, and thoughtful automation. Whether you’re a novice stepping into the Linux world or an experienced user looking to refine your skills, the techniques and examples provided in this guide will serve as a reliable reference for your file duplication tasks. Remember to consult the manual page (man cp) for additional details and advanced options. Embrace the versatility of the cp command, and soon you’ll find that managing files and directories on Linux becomes second nature.
07 February 2025 · 8 min to read
Linux

How to Use SSH Keys for Authentication

Many cloud applications are built on the popular SSH protocol—it is widely used for managing network infrastructure, transferring files, and executing remote commands. SSH stands for Secure Socket Shell, meaning it provides a shell (command-line interface) around the connection between multiple remote hosts, ensuring that the connection is secure (encrypted and authenticated). SSH connections are available on all popular operating systems, including Linux, Ubuntu, Windows, and Debian. The protocol establishes an encrypted communication channel within an unprotected network by using a pair of public and private keys. Keys: The Foundation of SSH SSH operates on a client-server model. This means the user has an SSH client (a terminal in Linux or a graphical application in Windows), while the server side runs a daemon, which accepts incoming connections from clients. In practice, an SSH channel enables remote terminal management of a server. In other words, after a successful connection, everything entered in the local console is executed directly on the remote server. The SSH protocol uses a pair of keys for encrypting and decrypting information: public key and private key. These keys are mathematically linked. The public key is shared openly, resides on the server, and is used to encrypt data. The private key is confidential, resides on the client, and is used to decrypt data. Of course, keys are not generated manually but with special tools—keygens. These utilities generate new keys using encryption algorithms fundamental to SSH technology. More About How SSH Works Exchange of Public Keys SSH relies on symmetric encryption, meaning two hosts wishing to communicate securely generate a unique session key derived from the public and private data of each host. For example, host A generates a public and private key pair. The public key is sent to host B. Host B does the same, sending its public key to host A. Using the Diffie-Hellman algorithm, host A can create a key by combining its private key with the public key of host B. Likewise, host B can create an identical key by combining its private key with the public key of host A. This results in both hosts independently generating the same symmetric encryption key, which is then used for secure communication. Hence, the term symmetric encryption. Message Verification To verify messages, hosts use a hash function that outputs a fixed-length string based on the following data: The symmetric encryption key The packet number The encrypted message text The result of hashing these elements is called an HMAC (Hash-based Message Authentication Code). The client generates an HMAC and sends it to the server. The server then creates its own HMAC using the same data and compares it to the client's HMAC. If they match, the verification is successful, ensuring that the message is authentic and hasn't been tampered with. Host Authentication Establishing a secure connection is only part of the process. The next step is authenticating the user connecting to the remote host, as the user may not have permission to execute commands. There are several authentication methods: Password Authentication: The user sends an encrypted password to the server. If the password is correct, the server allows the user to execute commands. Certificate-Based Authentication: The user initially provides the server with a password and the public part of a certificate. Once authenticated, the session continues without requiring repeated password entries for subsequent interactions. These methods ensure that only authorized users can access the remote system while maintaining secure communication. Encryption Algorithms A key factor in the robustness of SSH is that decrypting the symmetric key is only possible with the private key, not the public key, even though the symmetric key is derived from both. Achieving this property requires specific encryption algorithms. There are three primary classes of such algorithms: RSA, DSA, and algorithms based on elliptic curves, each with distinct characteristics: RSA: Developed in 1978, RSA is based on integer factorization. Since factoring large semiprime numbers (products of two large primes) is computationally difficult, the security of RSA depends on the size of the chosen factors. The key length ranges from 1024 to 16384 bits. DSA: DSA (Digital Signature Algorithm) is based on discrete logarithms and modular exponentiation. While similar to RSA, it uses a different mathematical approach to link public and private keys. DSA key length is limited to 1024 bits. ECDSA and EdDSA: These algorithms are based on elliptic curves, unlike DSA, which uses modular exponentiation. They assume that no efficient solution exists for the discrete logarithm problem on elliptic curves. Although the keys are shorter, they provide the same level of security. Key Generation Each operating system has its own utilities for quickly generating SSH keys. In Unix-like systems, the command to generate a key pair is: ssh-keygen -t rsa Here, the type of encryption algorithm is specified using the -t flag. Other supported types include: dsa ecdsa ed25519 You can also specify the key length with the -b flag. However, be cautious, as the security of the connection depends on the key length: ssh-keygen -b 2048 -t rsa After entering the command, the terminal will prompt you to specify a file path and name for storing the generated keys. You can accept the default path by pressing Enter, which will create standard file names: id_rsa (private key) and id_rsa.pub (public key). Thus, the public key will be stored in a file with a .pub extension, while the private key will be stored in a file without an extension. Next, the command will prompt you to enter a passphrase. While not mandatory (it is unrelated to the SSH protocol itself), using a passphrase is recommended to prevent unauthorized use of the key by a third-party user on the local Linux system. Note that if a passphrase is used, you must enter it each time you establish the connection. To change the passphrase later, you can use: ssh-keygen -p Or, you can specify all parameters at once with a single command: ssh-keygen -p old_password -N new_password -f path_to_files For Windows, there are two main approaches: Using ssh-keygen from OpenSSH: The OpenSSH client provides the same ssh-keygen command as Linux, following the same steps. Using PuTTY: PuTTY is a graphical application that allows users to generate public and private keys with the press of a button. Installing the Client and Server Components The primary tool for an SSH connection on Linux platforms (both client and server) is OpenSSH. While it is typically pre-installed on most operating systems, there may be situations (such as with Ubuntu) where manual installation is necessary. The general command for installing SSH, followed by entering the superuser password, is: sudo apt-get install ssh However, in some operating systems, SSH may be divided into separate components for the client and server. For the Client To check whether the SSH client is installed on your local machine, simply run the following command in the terminal: ssh If SSH is supported, the terminal will display a description of the command. If nothing appears, you’ll need to install the client manually: sudo apt-get install openssh-client You will be prompted to enter the superuser password during installation. Once completed, SSH connectivity will be available. For the Server Similarly, the server-side part of the OpenSSH toolkit is required on the remote host. To check if the SSH server is available on your remote host, try connecting locally via SSH: ssh localhost If the SSH daemon is running, you will see a message indicating a successful connection. If not, you’ll need to install the SSH server: sudo apt-get install openssh-server As with the client, the terminal will prompt you to enter the superuser password. After installation, you can check whether SSH is active by running: sudo service ssh status Once connected, you can modify SSH settings as needed by editing the configuration file: ./ssh/sshd_config For example, you might want to change the default port to a custom one. Don’t forget that after making changes to the configuration, you must manually restart the SSH service to apply the updates: sudo service ssh restart Copying an SSH Key to the Server On Hostman, you can easily add SSH keys to your servers using the control panel. Using a Special Copy Command After generating a public SSH key, it can be used as an authorized key on a server. This allows quick connections without the need to repeatedly enter a password. The most common way to copy the key is by using the ssh-copy-id command: ssh-copy-id -i ~/.ssh/id_rsa.pub name@server_address This command assumes you used the default paths and filenames during key generation. If not, simply replace ~/.ssh/id_rsa.pub with your custom path and filename. Replace name with the username on the remote server. Replace server_address with the host address. If the usernames on both the client and server are the same, you can shorten the command: ssh-copy-id -i ~/.ssh/id_rsa.pub server_address If you set a passphrase during the SSH key creation, the terminal will prompt you to enter it. Otherwise, the key will be copied immediately. In some cases, the server may be configured to use a non-standard port (the default is 22). If that’s the case, specify the port using the -p flag: ssh-copy-id -i ~/.ssh/id_rsa.pub -p 8129 name@server_address Semi-Manual Copying There are operating systems where the ssh-copy-id command may not be supported, even though SSH connections to the server are possible. In such cases, the copying process can be done manually using a series of commands: ssh name@server_address 'mkdir -pm 700 ~/.ssh; echo ' $(cat ~/.ssh/id_rsa.pub) ' >> ~/.ssh/authorized_keys; chmod 600 ~/.ssh/authorized_keys' This sequence of commands does the following: Creates a special .ssh directory on the server (if it doesn’t already exist) with the correct permissions (700) for reading and writing. Creates or appends to the authorized_keys file, which stores the public keys of all authorized users. The public key from the local file (id_rsa.pub) will be added to it. Sets appropriate permissions (600) on the authorized_keys file to ensure it can only be read and written by the owner. If the authorized_keys file already exists, it will simply be appended with the new key. Once this is done, future connections to the server can be made using the same SSH command, but now the authentication will use the public key added to authorized_keys: ssh name@server_address Manual Copying Some hosting platforms offer server management through alternative interfaces, such as a web-based control panel. In these cases, there is usually an option to manually add a public key to the server. The web interface might even simulate a terminal for interacting with the server. Regardless of the method, the remote host must contain a file named ~/.ssh/authorized_keys, which lists all authorized public keys. Simply copy the client’s public key (found in ~/.ssh/id_rsa.pub by default) into this file. If the key pair was generated using a graphical application (typically PuTTY on Windows), you should copy the public key directly from the application and add it to the existing content in authorized_keys. Connecting to a Server To connect to a remote server on a Linux operating system, enter the following command in the terminal: ssh name@server_address Alternatively, if the local username is identical to the remote username, you can shorten the command to: ssh server_address The system will then prompt you to enter the password. Type it and press Enter. Note that the terminal will not display the password as you type it. Just like with the ssh-copy-id command, you can explicitly specify the port when connecting to a remote server: ssh client@server_address -p 8129 Once connected, you will have control over the remote machine via the terminal; any command you enter will be executed on the server side. Conclusion Today, SSH is one of the most widely used protocols in development and system administration. Therefore, having a basic understanding of its operation is crucial. This article aimed to provide an overview of SSH connections, briefly explain the encryption algorithms (RSA, DSA, ECDSA, and EdDSA), and demonstrate how public and private key pairs can be used to establish secure connections with a personal server, ensuring that exchanged messages remain inaccessible to third parties. We covered the primary commands for UNIX-like operating systems that allow users to generate key pairs and grant clients SSH access by copying the public key to the server, enabling secure connections.
30 January 2025 · 10 min to read
Linux

How to Download Files with cURL

Downloading content from remote servers is a regular task for both administrators and developers. Although there are numerous tools for this job, cURL stands out for its adaptability and simplicity. It’s a command-line utility that supports protocols such as HTTP, HTTPS, FTP, and SFTP, making it crucial for automation, scripting, and efficient file transfers. You can run cURL directly on your computer to fetch files. You can also include it in scripts to streamline data handling, thereby minimizing manual effort and mistakes. This guide demonstrates various ways to download files with cURL. By following these examples, you’ll learn how to deal with redirects, rename files, and monitor download progress. By the end, you should be able to use cURL confidently for tasks on servers or in cloud setups. Basic cURL Command for File Download The curl command works with multiple protocols, but it’s primarily used with HTTP and HTTPS to connect to web servers. It can also interact with FTP or SFTP servers when needed. By default, cURL retrieves a resource from a specified URL and displays it on your terminal (standard output). This is often useful for previewing file contents without saving them, particularly if you’re checking a small text file. Example: To view the content of a text file hosted at https://example.com/file.txt, run: curl https://example.com/file.txt For short text documents, this approach is fine. However, large or binary files can flood the screen with unreadable data, so you’ll usually want to save them instead. Saving Remote Files Often, the main goal is to store the downloaded file on your local machine rather than see it in the terminal. cURL simplifies this with the -O (capital O) option, which preserves the file’s original remote name. curl -O https://example.com/file.txt This retrieves file.txt and saves it in the current directory under the same name. This approach is quick and retains the existing filename, which might be helpful if the file name is significant. Choosing a Different File Name Sometimes, renaming the downloaded file is important to avoid collisions or to create a clear naming scheme. In this case, use the -o (lowercase o) option: curl -o myfile.txt https://example.com/file.txt Here, cURL downloads the remote file file.txt but stores it locally as myfile.txt. This helps keep files organized or prevents accidental overwriting. It’s particularly valuable in scripts that need descriptive file names. Following Redirects When requesting a file, servers might instruct your client to go to a different URL. Understanding and handling redirects is critical for successful downloads. Why Redirects Matter Redirects are commonly used for reorganized websites, relocated files, or mirror links. Without redirect support, cURL stops after receiving an initial “moved” response, and you won’t get the file. Using -L or --location To tell cURL to follow a redirect chain until it reaches the final target, use -L (or --location): curl -L -O https://example.com/redirected-file.jpg This allows cURL to fetch the correct file even if its original URL points elsewhere. If you omit -L, cURL will simply print the redirect message and end, which is problematic for sites with multiple redirects. Downloading Multiple Files cURL can also handle multiple file downloads at once, saving you from running the command repeatedly. Using Curly Braces and Patterns If filenames share a pattern, curly braces {} let you specify each name succinctly: curl -O https://example.com/files/{file1.jpg,file2.jpg,file3.jpg} cURL grabs each file in sequence, making it handy for scripted workflows. Using Ranges For a series of numbered or alphabetically labeled files, specify a range in brackets: curl -O https://example.com/files/file[1-5].jpg cURL automatically iterates through files file1.jpg to file5.jpg. This is great for consistently named sequences of files. Chaining Multiple Downloads If you have different URLs for each file, you can chain them together: curl -O https://example1.com/file1.jpg -O https://example2.com/file2.jpg This approach downloads file1.jpg from the first site and file2.jpg from the second without needing multiple commands. Rate Limiting and Timeouts In certain situations, you may want to control the speed of downloads or prevent cURL from waiting too long for an unresponsive server. Bandwidth Control To keep your network from being overwhelmed or to simulate slow conditions, limit the download rate with --limit-rate: curl --limit-rate 2M -O https://example.com/bigfile.zip 2M stands for 2 megabytes per second. You can also use K for kilobytes or G for gigabytes. Timeouts If a server is too slow, you may want cURL to stop after a set time. The --max-time flag does exactly that: curl --max-time 60 -O https://example.com/file.iso Here, cURL quits after 60 seconds, which is beneficial for scripts that need prompt failures. Silent and Verbose Modes cURL can adjust its output to show minimal information or extensive details. Silent Downloads For batch tasks or cron jobs where you don’t need progress bars, include -s (or --silent): curl -s -O https://example.com/file.jpg This hides progress and errors, which is useful for cleaner logs. However, troubleshooting is harder if there’s a silent failure. Verbose Mode In contrast, -v (or --verbose) prints out detailed request and response information: curl -v https://example.com Verbose output is invaluable when debugging issues like invalid SSL certificates or incorrect redirects. Authentication and Security Some downloads require credentials, or you might need a secure connection. HTTP/FTP Authentication When a server requires a username and password, use -u: curl -u username:password -O https://example.com/protected/file.jpg Directly embedding credentials can be risky, as they might appear in logs or process lists. Consider environment variables or .netrc files for more secure handling. HTTPS and Certificates By default, cURL verifies SSL certificates. If the certificate is invalid, cURL blocks the transfer. You can bypass this check with -k or --insecure, though it introduces security risks. Whenever possible, use a trusted certificate authority so that connections remain authenticated. Using a Proxy In some environments, traffic must route through a proxy server before reaching the target. Downloading Through a Proxy Use the -x or --proxy option to specify the proxy: curl -x http://proxy_host:proxy_port -O https://example.com/file.jpg Replace proxy_host and proxy_port with the relevant details. cURL forwards the request to the proxy, which then retrieves the file on your behalf. Proxy Authentication If your proxy requires credentials, embed them in the URL: curl -x https://proxy.example.com:8080 -U myuser:mypassword -O https://example.com/file.jpg Again, storing sensitive data in plain text can be dangerous, so environment variables or configuration files offer more secure solutions. Monitoring Download Progress Tracking download progress is crucial for large files or slower links. Default Progress Meter By default, cURL shows a progress meter, including total size, transfer speed, and estimated finish time. For example: % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current                                 Dload  Upload   Total   Spent    Left  Speed100  1256  100  1256    0     0   2243      0 --:--:-- --:--:-- --:--:--  2246 This readout helps you gauge how much remains and if the transfer rate is acceptable. Compact Progress Bar If you want fewer details, add -#: curl -# -O https://example.com/largefile.iso A simpler bar shows the overall progress as a percentage. It’s easier on the eyes but lacks deeper stats like current speed. Capturing Progress in Scripts When using cURL within scripts, you might want to record progress data. cURL typically sends progress info to stderr, so you can redirect it: curl -# -O https://example.com/largefile.iso 2>progress.log Here, progress.log contains the status updates, which you can parse or store for later review. Conclusion cURL shines as a flexible command-line tool for downloading files in multiple protocols and environments. Whether you need to handle complex redirects, rename files on the fly, or throttle bandwidth, cURL has you covered. By mastering its core flags and modes, you’ll be able to integrate cURL seamlessly into your daily workflow for scripting, automation, and more efficient file transfers.
29 January 2025 · 7 min to read

Do you have questions,
comments, or concerns?

Our professionals are available to assist you at any moment,
whether you need help or are just unsure of where to start.
Email us
Hostman's Support