Sign In
Sign In

How to Automate Data Export Using n8n

How to Automate Data Export Using n8n
Hostman Team
Technical writer
Linux
30.10.2025
Reading time: 17 min

If you’ve ever exported data from websites manually, you know how tedious it can be: you have to open the site and many links, then go through each one, copy the data, and paste it into a spreadsheet. And if there’s a lot of data, the process turns into endless routine work.

The good news is that this can be automated, and you don’t need programming skills to do it. Once you set up the scenario, everything will run automatically: the n8n platform will collect the data, save it to a database, and send it further if necessary.

In this article, we’ll look at how to set up such a process with minimal effort. We’ll create a chain that:

  • retrieves a list of articles,
  • saves the data to PostgreSQL,
  • collects the full text of each publication,
  • stores everything in the database.

All this doesn’t require any special skills, just a basic understanding of how the terminal and web panel work. You can figure it out even if you’ve never heard of n8n before.

Next, we’ll break down the process step by step, from starting the server to building the working process. By the end, you’ll have a workflow that saves you hours and handles routine tasks automatically.

Overview

Let’s say you need to collect the texts of all articles in the “Tutorials” section. To complete the task, we’ll break it down into a sequence of steps, also known as a pipeline.

What needs to be done?

  • Collect the titles of all articles in the catalog along with their links. The site provides the data page by page; you can’t get all the links at once, so you need to collect them in a loop.
  • Within the loop, save the collected links to the database. If there are many links, it’s most reliable to store intermediate data in a database.
  • After the loop, extract the links from the database and start a new loop. By this stage, we’ll have a table with links to articles. Now we need to process each link and extract the text.
  • Save the article texts. In the new loop, we’ll store the data in a new table in the database.

What will we use?

To implement the project, we’ll use ready-made cloud services. With Hostman, you can quickly deploy:

Step 1. Create a Server and Install n8n

Go to the control panel and open the Cloud servers section in the left panel. Click Create server. Choose the appropriate location and configuration.

When selecting a configuration, keep in mind that n8n itself is very lightweight. The main load falls on memory (RAM). It’s used to handle multiple simultaneous tasks and store large logs/history. Additional CPU cores help with complex chains with many transformations or a large number of concurrent executions.

Below is a comparative table to help you choose the right configuration:

Configuration

Characteristics

Best For

1 × 3.3 GHz, 2 GB, 40 GB

Low

Test scenarios, 1–2 simple workflows without large loops or attachment handling.

2 × 3.3 GHz, 2 GB, 60 GB

Optimal for most tasks

Small automations: data exports, API operations, database saves, periodic jobs. Good starting tier.

2 × 3.3 GHz, 4 GB, 80 GB

Universal option

Moderate load: dozens of active workflows, loops over hundreds of items, JSON handling and parsing. Good memory margin.

4 × 3.3 GHz, 8 GB, 160 GB

For production and large scenarios

High load: constant cron triggers, processing large data sets, integrations with multiple services.

8 × 3.3 GHz, 16 GB, 320 GB

Overkill for n8n

Suitable if you plan to run additional containers (e.g., message queue, custom API). Usually excessive for n8n alone.

In section Network keep the public IPv4 address enabled; this ensures the server is accessible from any network. Add a private network for connecting to the database; you can use the default settings. Adjust other parameters as needed. Click Order.

Server creation and setup take about 10 minutes.

After that, install n8n on it following the official documentation.

Step 2. Create a PostgreSQL Database

Once the n8n server is up and running, you need to prepare a place to store your data. For this, we’ll use a cloud PostgreSQL database (DBaaS).

This is more convenient and practical than deploying it yourself: you don’t have to install and maintain hardware, configure software, or manage complex storage systems. 

Go to the control panel, click on the Databases tab in the left panel, then click Create Database. In section Database Type, choose PostgreSQL.

In section 4. Network, you can disable the public IPv4 address; the connection to the database will occur through the private network. This is not only safer but also more cost-effective.

Click Order. The database will be ready in about 5 minutes.

Step 3. Learn the Basics of n8n

It’s easy to get familiar with n8n, and you’ll quickly see that for yourself. In this step, we’ll look at n8n’s main elements, what they do, and when to use them.

What Nodes Are and Why They’re Needed

In n8n, every automation is built from nodes—blocks that perform one specific task.

Node Type

Function

Trigger

Starts a workflow based on an event: by time (Schedule), webhook, or service change.

Action

Sends a request or performs an operation: HTTP Request, email sending, database write.

Logic

Controls flow: If, Switch, Merge, Split In Batches.

Function / Code

Allows you to insert JS code (Function, Code) or quick expressions.

Any scenario can be built using these node types.

How to Create Nodes

  1. Click “+” in the top-right corner of the workspace or on the output arrow of another node.

  2. Type the node name in the search, for example: http or postgresql.

  3. Click it. The node will appear and open its settings panel.

  4. Fill in the required fields: URL, method, and credentials. Fields with a red border are mandatory.

  5. Click Execute Node. You’ll see a green checkmark and an OUTPUT section with data. This is a quick way to verify the node works correctly.

Other Useful Features in n8n

Feature

Where to Find

Purpose

Credentials

Main page (Overview) → Credentials tab

Stores logins/tokens; set once, use in any node.

Variables

Any input field supports expressions {{ ... }}

Use for dynamic dates, counters, or referencing data from previous nodes.

Executions

Main page (Overview) → Executions tab

Logs of all runs: see input/output data, errors, execution time.

Workflow History

Enabled via advanced features; button in top panel on Workflow page

Similar to Git: revert to any previous scenario version.

Folders

Main screen; click the folder-with-plus icon near sorting and search

Keeps workflows organized if you have many.

Templates

Templates tab on the left of the Workflow screen, or via link

Ready-made recipes: connect Airtable, Slack bot, RSS parsing, etc.

Step 4. Build a Workflow in n8n

Now we have everything we need: a server with n8n and a PostgreSQL database. We can start building the pipeline.

On the main screen, click Create workflow. This will open the workspace.

To start the pipeline, you need a trigger. For testing, use Trigger manually: it allows you to launch the process with a single button click. After testing, you can switch to another trigger, such as scheduling data export once a day.

56986125 645a 4169 8c1f C372b9c29496

n8n window after creating a workflow: choosing a trigger for manual or scheduled start. Screenshot by the author  / n8n.io

We’ll create a universal pipeline. It will go through websites, extract links page by page, then go through all of them and extract data. However, since every website is structured differently and uses different technologies, there’s no guarantee that this setup will work everywhere without adjustments.

Get the Request from the Browser

Click “+” next to the trigger. The action selection panel will open. In the search field, type http and select HTTP Request.

F9d600e0 0ede 40ae 89b0 Ff63989990d6

Selecting the next step in n8n: adding the “HTTP Request” node for sending requests to a website. Screenshot by the author / n8n.io

A panel will open to configure the parameters. But you can simply import the required data from your browser; that way, you don’t have to dive into the details of HTTP requests.

Now you need to understand how exactly the browser gets the data that it displays on the page. Usually, this happens in one of two ways:

  1. The server responds with a ready-made HTML page containing the data.
  2. The server responds with a JSON dictionary.

Open in your browser the page you want to get data from. For example, we’ll use the Tutorials page. Then open the Developer Tools (DevTools) by pressing F12 and go to the Network tab.

On our example site, there’s a See more button. When clicked, the browser sends a request to the server and receives a response.

When a user clicks a button to view details, usually a single request is sent, which immediately returns the necessary information. Let’s study the response. Click the newly appeared request and go to the Response tab. Indeed, there you’ll find all the article information, including the link.

If you’re following this example, look for a GET request starting with:

https://content.hostman.com/items/tutorials?...

That’s the one returning the list of publications. Yours might differ if you’re analyzing another site.

On the Headers tab, you can study the structure of the response to understand how it’s built. You’ll see that parameters are passed to the server: limit and offset.

  • limit restricts the number of articles returned per request (6 in our case).
  • offset shifts the starting point. offset = 6 makes sense because the first 6 articles are already displayed initially, so the browser doesn’t need to fetch them again.

To fetch articles from other pages, we’ll shift the offset parameter with each request and accumulate the data.

Copy the command in cURL format: it contains all the request details. Right-click the request in the web inspector → Copy valueCopy as cURL. An example command might look like this:

curl 'https://content.hostman.com/items/tutorials?limit=6&offset=6&fields[]=path&fields[]=title&fields[]=image&fields[]=date_created&fields[]=topics&fields[]=text&fields[]=locale&fields[]=author.name&fields[]=author.path&fields[]=author.avatar&fields[]=author.details&fields[]=author.bio&fields[]=author.email&fields[]=author.link_twitch&fields[]=author.link_facebook&fields[]=author.link_linkedin&fields[]=author.link_github&fields[]=author.link_twitter&fields[]=author.link_youtube&fields[]=author.link_reddit&fields[]=author.tags&fields[]=topics.tutorials_topics_id.name&fields[]=topics.tutorials_topics_id.path&meta=filter_count&filter=%7B%22_and%22%3A%5B%7B%22status%22%3A%7B%22_eq%22%3A%22published%22%7D%7D%2C%7B%22_or%22%3A%5B%7B%22publish_after%22%3A%7B%22_null%22%3A%22true%22%7D%7D%2C%7B%22publish_after%22%3A%7B%22_lte%22%3A%22$NOW(%2B3+hours)%22%7D%7D%5D%7D%2C%7B%22locale%22%3A%7B%22_eq%22%3A%22en%22%7D%7D%5D%7D&sort=-date_created' \
  -H 'sec-ch-ua-platform: "Windows"' \
  -H 'Referer: https://hostman.com/' \
  -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/141.0.0.0 Safari/537.36' \
  -H 'Accept: application/json, text/plain, */*' \
  -H 'sec-ch-ua: "Google Chrome";v="141", "Not?A_Brand";v="8", "Chromium";v="141"' \
  -H 'sec-ch-ua-mobile: ?0'

Now go back to n8n. Click Import cURL and paste the copied value.

Important: if you copy the command from Firefox, the URL might contain extra ^ symbols that can break the request.

To remove them:

Method 1. In n8n:

  • After import, click the gear icon next to the URL field.

  • Choose Add Expression. The URL becomes editable.

  • Press Ctrl + F (Cmd + F on macOS), enable Replace mode, type ^ in the search field, leave the replacement field empty, and click Replace All.

Method 2. In VSCode:

  • Paste the cURL command into a new .txt or .sh file.

  • Press Ctrl + H (Cmd + H on macOS).

  • In Find, enter ^, leave Replace with empty, and click Replace All.

  • Copy the cleaned command back into n8n.

Click Import, then Execute step. After a short delay, you should see the data fetched from the site in the right-hand window.

Now you know how to retrieve data from a website via n8n.

Add a Cyclical Algorithm

Let’s recall the goal: we need to loop through all pages and store the data in a database. To do that, we’ll build the following pipeline:

  1. Add a manual trigger: Trigger manually. It starts the workflow when you click the start button. Connect all nodes sequentially to it.

  2. In the first node, set values for limit and offset.

    • If they exist in the input, leave them as is.

    • Otherwise, default limit = 100 and offset = 0 (for pagination).
      Add a Edit Fields node → click Add Field.

    • In the “name” field: limit

    • In the “value” field:
      {{ $json.limit !== undefined ? $json.limit : 100 }}

  3. Add another field:

    • “name”: offset

    • “value”:
      {{ $json.offset !== undefined ? $json.offset : 0 }}

  4. Both expressions dynamically assign values. If this is the first loop run, it sets the default value; otherwise, it receives the updated variable.
    Set both to Number type and enable Include Other Input Fields so the loop can pass values forward.

  5. In the HTTP Request node, the API call uses the limit and offset values. The server returns an array under the key data. Set the URL field to Expression, inserting the previous node’s variables: {{ $json.limit }} and {{ $json.offset }}.

  6. Next, an If node checks if the returned data array is empty.

    • If empty → stop the loop.

    • If not → continue.
      Condition: {{ $json.data }} (1); Array (2) → is empty (3).

  7. Under the false branch, add a Split Out node. It splits the data array into separate items for individual database writes.

  8. Add an Insert or update rows in a table (PostgreSQL) node. Create credentials by clicking + Create new credential.
    Use Hostman’s database details:

    • Host: “Private IP” field

    • Database: default_db

    • User / Password: “User login” and “Password” fields

Example SQL for creating the table (run once via n8n’s “Execute a SQL query” node):

CREATE TABLE tutorials (
    id SERIAL PRIMARY KEY,
    author_name TEXT,
    topic_name TEXT UNIQUE,
    topic_path TEXT,
    text TEXT
);
  1.  This prepares the table to store article data. Each item writes to tutorials with fields topic_name, author_name, and topic_path.

  2. The Merge node combines:

    • Database write results

    • Old limit and offset values

  3. Since the PostgreSQL node doesn’t return output, include it in Merge just to synchronize: the next node starts only after writing completes.

  4. The next Edit Fields node increases offset by limit (offset = offset + limit).
    This prepares for the next API call—fetching the next page.

  5. Connect this last Edit Fields node back to the initial Edit Fields node, forming a loop. The workflow repeats until the server returns an empty data array, which the If node detects to stop the cycle.

Add a Second Loop to Extract Article Texts

In our setup, when the If node’s true branch triggers (data is fully collected), we need to fetch all article links from the database and process each one.

802d6741 1363 4463 Ac27 A20d05788cf9

Second loop in n8n: fetching links from DB and saving article text to a table. Screenshot by the author / n8n.io

Here, each iteration requests one article and saves its text to the database.

  • Add Select rows from a table (PostgreSQL): it retrieves the rows added earlier. Since n8n doesn’t have intermediate data storage, the database serves this role. Use SELECT operation and enable Return All to fetch all rows without limits.

  • This node returns all articles at once, but we need to handle each separately. Add a Loop over items node. It has two outputs:

    • loop: connects nodes that should repeat per item,

    • done: connects what should run after the loop ends.

  • Inside the loop, add a request node to fetch each article’s content. Use DevTools again to find the correct JSON or HTML request. In this case, the needed request corresponds to the article’s page URL.
    Note: this request appears only when you navigate to an article from the Tutorials section. Refreshing inside the article gives HTML instead.
    To learn how to extract data from HTML, check n8n’s documentation.

  • In the request node, insert the article path from the database (convert URL field to Expression).

  • Finally, add an Update rows in a table node to store the article text from the previous node’s output.

At this point, the loop is complete. You can test your setup.

Step 5. Schedule Workflow Execution

To avoid running the workflow manually every time, you can set up automatic execution on a schedule. This is useful when you need to refresh your database regularly, for example, once a day or once an hour.

n8n handles this through a special node called Schedule Trigger. Add it to your pipeline instead of Trigger manually.

In its settings, you can specify the time interval for triggering, starting from one second.

67ce50df 93d8 4991 8110 059fcf7be6aa

Configuring the Schedule Trigger node in n8n for automatic workflow execution. Screenshot by the author / n8n.io

That’s it. The entire pipeline is now complete. To make the Schedule Trigger work, activate your workflow: toggle the Inactive switch at the top-right of the screen.

36b5ac97 4c7b 44fb 8939 9a797b268201

Screenshot by the author / n8n.io

With the collected data, you can, for example, automate customer support so a bot can automatically search for answers in your knowledge base.

Common Errors Overview

The table below lists common issues, their symptoms, and solutions.

Symptom

Cause (Error)

Working Solution

When switching the webhook from “Test” to “Prod,” the workflow fails with “The workflow has issues and cannot be executed.”

Validation failed in one of the nodes (a required field is empty, outdated credentials, etc.)

Open the workflow, fix nodes marked with a red triangle (fill in missing fields, update credentials), then reactivate.

PostgreSQL node returns “Connection refused.”

The database service is unreachable: firewall closed, wrong port/host, or no Docker network permission.

If DB runs in Docker: check that it listens on port 5432, its IP is whitelisted, and n8n runs in the same network; add network_mode: bridge or a private network. If using Hostman DBaaS, check that the database and n8n host are on the same private network and ensure the DB is active.

Node fails with “Cannot read properties of undefined.”

A script/node tries to access a field that doesn’t exist in the incoming JSON.

Before accessing the field, use an IF node or {{ $json?.field ?? '' }}; make sure the previous node actually outputs the expected field.

Execution stops with a log message: “n8n may have run out of memory.”

The workflow processes too many elements at once; Split In Batches keeps a large array in RAM.

Reduce batch size, add a Wait node, split the workflow, or upgrade your plan for more RAM.

Split In Batches crashes or hangs on the last iteration (OOM).

Memory leak due to repeated loop cycles.

Set the smallest reasonable batch size, add a 200–500 ms Wait, or switch to Queue Mode for large data volumes.

Database connection error: pq: SSL is not enabled on the server.

The client attempts SSL while the server doesn’t support it.

Add sslmode=disable to the connection string.

Conclusion

Automating data export through n8n isn’t about complex code or endless scripting; it’s about setting up a workflow once and letting it collect and store data automatically.

We’ve gone through the full process:

  • Created a server with n8n without manual terminal setup,
  • Deployed a cloud PostgreSQL database,
  • Built a loop that collects links and article texts,
  • Set up scheduled execution so everything runs automatically.

All of this runs on ready-made cloud infrastructure. You can easily scale up upgrading plans as your workload grows, connect new services, and enhance your workflow.

This example demonstrates one of the most common n8n patterns:

  • Iterate through a website’s pages and gather all links,
  • Fetch data for each link,
  • Write everything to a database.

This same approach works perfectly for:

  • Collecting price lists and monitoring competitors,
  • Content archiving,
  • CRM integrations.

It’s all up to your imagination. The beauty of n8n is that you can adapt it to any task without writing complex code.

We also prepared special VPS with NVMe storage so you can do everything you want with your projects!

Linux
30.10.2025
Reading time: 17 min

Similar

Linux

Linux cp Command

Linux has an unlimited set of commands to perform assigned tasks. The Linux cp command is the primary tool and the basis for copying and managing files and directories in this operating system. This function is designed to duplicate files or directories in the same or different location. Armed with this functionality, users have advanced capabilities: from creating backup copies to moving files between directories. Linux cp command is simple to learn You can find all the necessary information covered in this tutorial. You will discover how the Linux cp command and cp directory work, as well as its grammatical structures, crucial hints, parameters, settings, and recommended practices. Readers will learn the tricks of the cp command, which will help them become more proficient. And if you’re looking for a reliable, high-performance, and budget-friendly solution for your workflows, Hostman has you covered with virtual servers with NVMe storage, Linux VPS Hosting options, including Debian VPS, Ubuntu VPS, and VPS CentOS. The core of the cp command in Linux The functionality of the command allows users to control the creation of copies. One feature offers overwriting existing files, another is responsible for recursively copying a directory with its entire entities, and the third protects the first data for repeating backups. This command demonstrates more features for specific purposes and user experience during the process. A key benefit of the cp command is its exceptional accuracy in duplicating files and directories. You can be absolutely sure that the duplicated files are identical to the original ones with all its interior. Therefore, the user can replicate the original file without any changes. The cp command in Linux inherently tells the user a destination directory for storing copies in a specific repository. The command's precision makes it indispensable for both novice and advanced users. Linux cp syntax This command consists of the following parameters: source file or directory and destination directory. The basic syntax of the Linux cp command is as follows: cp [...file/directory-sources] [destination] Here [file/directory-sources] specifies the files or directories sources to copy, while the [destination] specifies the location to copy the file to. There are the letter flags to specify the way of creation a replica of files and directories: -a leaves the first file attributes the same; -r recursively replicates directories and their interior entities; -v shows copied files in detail; -i requires consent to overwrite the file; -u rewrites new or missing files in the destination directory; -f forcibly copies without user consent; -s makes a symbolic link instead of a file replica; -ra recreates an exact duplicate of a file or directory without changing attributes; -rf updates or changes a file or directory with the original name in the same place; -pv (if installed) monitors and shows the time required to complete copying large folders. How to copy files with the cp command To make a file copy, apply the cp command in Linux as follows: cp ./DirectoryA_1/README.txt ./DirectoryA_2 where ./DirectoryA_1/README.txt is the source file, and ./DirectoryA_2 is the destination. The cp command was originally designed to interact with files. To replicate directories, you must use the -r flag to command that the directory with all its interior entities to be copied recursively. Therefore, you should write cp -r before the directory sources in Linux as follows: cp -r ./DirectoryA_1/Folder/ ./DirectoryA_2 The cp -r command in Linux will recursively duplicate the Folder directory in ./DirectoryA_1/ as well as all contents in the Folder directory. For instance, if you need to replicate the whole file contents in DirectoryA_1 with the .txt extension, try following command: cp ./DirectoryA_1/*.txt ./DirectoryA_2 where ./DirectoryA_1/*.txt matches files with the .txt extension in their names, and the cp command duplicates all those data to the destination. Best practices of the cp Linux command To duplicate one unit of information via the Linux cp command, write down the file name and destination directory. For instance, to replicate a file named example.txt to the 'Documents' directory, try the following command: cp example.txt Documents/ The action leads to creating a file duplicate in the 'Documents' directory with the original name. To copy multiple files at once, utilize the cp command in Linux, specifying the file names separated by a space. For instance, to duplicate three files named 'file1.txt', 'file2.txt', and 'file3.txt' to the 'Documents' directory, try the following command: cp file1.txt file2.txt file3.txt Documents/ To replicate a directory with all its interior entities, apply the -r that means cp recursive feature in Linux. For instance, to duplicate a directory named 'Pictures' to the 'Documents' directory, try the following command: cp -r Pictures Documents/ The action leads to creating a copy of the 'Pictures' directory with all its interior contents in the 'Documents' directory. To replicate a folder in Linux, you should utilize the -r flag. For instance, to duplicate a folder named 'Pictures' from the existing directory to a folder named 'Photos' in the home directory, try the following command: cp -r Pictures/ ~/Photos/ The destination folder will be created automatically if none exists. The files in the destination folder will be combined with the core of the source folder if one already exists. The cp -a feature in Linux leaves unchanged the initial file attributes while copying. Therefore, the duplicates will have the same parameters as their originals. For instance, to replicate a file named 'example.txt' to the 'Documents' directory while leaving unchanged its attributes, try the following command: cp -a example.txt Documents/ The Linux cp -v function showcases the progress of the duplication. At the same time the user can copy large files while monitoring the process. For instance, to replicate a file named 'largefile.zip' to the 'Downloads' directory while watching the progress, try the following command: cp -v largefile.zip Downloads/ The -i option requires the consent before overwriting an initial file. to protect against an accidental file rewriting. For instance, to duplicate a file named 'example.txt' to the 'Documents' directory, if a file with the identical name already exists, the cp command will require the consent before rewriting the original file. Initially, the Linux cp command copies a file or a directory to a default location. The system allows the user to specify any other location for the duplicate file or directory. For instance, to replicate a file named 'example.txt' from the 'Documents' directory to the 'Downloads' directory, try the following command: cp Documents/example.txt Downloads/ The cp -ra function in Linux is designed to carry out the copying process of directories with all their contents inside. The -r flag gives an order to repeat all the files and directories within an existing location, while the -a flag keeps the initial attributes preserved. Therefore, it is possible to make an exact duplicate of a directory without changing attributes. For instance, if you apply the command cp -ra /home/user1/documents /home/user2, it will replicate the 'documents' directory with all its entities inside in the 'user2' directory. The new folder will show the identical attributes as the initial item. The cp -rf feature in Linux is similar to the previous -ra option. The difference between these two functions is that the -f flag rewrites the given files or directories in the destination without requiring consent. Therefore, it is possible to update or replace an item with the identical name in the place of destination. For instance, if you apply the command cp -rf /home/user1/documents /home/user2, and there is already a 'documents' directory in the 'user2' directory, it will be overwritten with the contents of the 'documents' directory from the 'user1' directory. Be careful while utilizing the -rf function. Incorrect use of it leads to data loss. Check up twice the destination folder to avoid unwanted rewriting items. It is simpler to work with files and directories when you use Linux's cp -r capability with the -a and -f settings. Whereas the -rf particle modifies or replaces files and directories, the -ra particle precisely copies a directory and everything within it. You can learn how to handle stuff in this operating system by properly applying these differences. If you want to monitor and control the process of item duplication, which is not possible with other parameters of the cp command, use the -pv utility. To install the pv utility on Debian/Ubuntu you need to open the terminal and run the following command:  apt-get install pv After the installation is complete, verify it by running the following command in the terminal pv --version To install the pv utility on CentOS/Fedora, you need to connect the EPEL repository, which contains additional software packages unavailable in the default repositories. Run in the terminal: yum install epel-release Then run the following command in the terminal:  yum install pv  After the installation is complete, verify it by running the following command in the terminal:  pv --version To use this particle with the cp command, you should utilize | symbol. You can use the ~ symbol to indicate the root directory if the full path needs to be specified. For instance, to replicate a folder named 'Documents' from the root directory to a folder named 'Backup' in the home directory, try the following action: cp -r Documents/ ~/Backup/ | pv Example of executed Linux cp command Conclusion The cp command, although not an inherently difficult tool to learn, nevertheless provides basic knowledge of using the Linux operating system in terms of managing files and directories. In this tutorial, we tried to show the capabilities of the cp command in Linux from all sides, demonstrating best practices and useful tips of its various parameters. With new knowledge, you will be able to improve your skills in interacting with files and directories in Linux. The extreme accuracy of the copying process and additional options allow you to solve a wide range of problems. Multifunctionality helps users choose the file management mode and complete tasks efficiently. The command is a prime example of the many capabilities of this operating system, including the cp with progress feature in Linux. Altogether they unlock a potential of the system for novice and advanced users. Frequently Asked Questions (FAQ) How to copy files from one directory to another in Linux?  Use the cp command followed by the source path and then the destination path. Syntax: cp [source_file] [destination_directory] Example: cp /home/user/downloads/photo.jpg /home/user/pictures/ What are the most common cp command options? -r (Recursive): Essential for copying directories. It copies the folder and every file inside it. -i (Interactive): Prompts you for confirmation before overwriting an existing file. Highly recommended for beginners. -v (Verbose): Prints the name of each file as it is copied, so you can see the progress. -p (Preserve): Preserves the original file attributes like modification time, access time, and ownership modes. How do I copy a directory (folder)?  You must use the -r (recursive) flag. If you try to copy a folder without it, Linux will give you an error saying the source is a directory. Command: cp -r source_folder/ destination_folder/ How do I copy multiple files at once?  You can list multiple source files before the destination directory, or use wildcards. List: cp file1.txt file2.txt /backup/ Wildcard: cp *.jpg /home/user/images/ (Copies all JPG files). How do I prevent cp from overwriting existing files?  Use the -n (no clobber) flag. This tells Linux to silently skip any files that already exist in the destination folder, rather than replacing them. cp -n file.txt /backup/ What is the difference between cp -u and cp -n? -n never overwrites. -u (Update) only overwrites if the source file is newer than the destination file, or if the destination file is missing. This is useful for syncing folders.
22 January 2026 · 10 min to read
Linux

Using the ps aux Command in Linux

Effective system administration in Linux requires constant awareness of running processes. Whether diagnosing performance bottlenecks, identifying unauthorized tasks, or ensuring critical services remain operational, the ps aux command is an indispensable tool.  This guide provides a comprehensive exploration of ps aux, from foundational concepts to advanced filtering techniques, equipping you to extract actionable insights from process data. And if you’re looking for a reliable, high-performance, and budget-friendly solution for your workflows, Hostman has you covered with Linux VPS Hosting options, including Debian VPS, Ubuntu VPS, and VPS CentOS. Prerequisites To follow the tutorial: Deploy a Linux cloud server instance at Hostman SSH into the server instance Understanding Processes in Linux Before we explore the ps aux command, let's take a moment to understand what processes are in the context of a Linux system. What are Processes? A process represents an active program or service running on your Linux system. Each time you execute a command, launch an application, or initiate a background service, you create a process. Linux assigns a unique identifier, called a Process ID (PID), to each process. This PID allows the system to track and manage individual processes effectively. Why are Processes Grouped in Linux? Linux employs a hierarchical structure to organize processes. This structure resembles a family tree, where the initial process, init (or systemd), acts as the parent or ancestor. All other processes descend from this initial process, forming a parent-child relationship. This hierarchy facilitates efficient process management and resource allocation. The ps Command The ps (process status) command provides a static snapshot of active processes at the moment of execution. Unlike dynamic tools such as top or htop, which update in real-time, ps is ideal for scripting, logging, or analyzing processes at a specific point in time. The ps aux syntax merges three key options: a: Displays processes from all users, not just the current user. u: Formats output with user-oriented details like CPU and memory usage. x: Includes processes without an attached terminal, such as daemons and background services. This combination offers unparalleled visibility into system activity, making it a go-to tool for troubleshooting and analysis. Decoding the ps aux Output Executing ps aux generates a table with 11 columns, each providing critical insights into process behavior. Below is a detailed explanation of these columns: USER This column identifies the process owner. Entries range from standard users to system accounts like root, mysql, or www-data. Monitoring this field helps detect unauthorized processes or identify which users consume excessive resources. PID The Process ID (PID) is a unique numerical identifier assigned to each task. Administrators use PIDs to manage processes—for example, terminating a misbehaving application with kill [PID] or adjusting its priority using renice. %CPU and %MEM These columns display the percentage of CPU and RAM resources consumed by the process. Values above 50% in either column often indicate performance bottlenecks. For instance, a database process consuming 80% CPU might signal inefficient queries or insufficient hardware capacity. VSZ and RSS VSZ (Virtual Memory Size) denotes the total virtual memory allocated to the process, including memory swapped to disk. On the other hand, RSS (Resident Set Size) represents the physical memory actively used by the process. A process with a high VSZ but low RSS might reserve memory without actively utilizing it, which is common in applications that preallocate resources. TTY This field shows the terminal associated with the process. A ? indicates no terminal linkage, which is typical for background services like cron or systemd-managed tasks. STAT The STAT column reveals process states through a primary character + optional attributes: Primary States: R: Running or ready to execute. S: Sleeping, waiting for an event or signal. I: Idle kernel thread D: Uninterruptible sleep (usually tied to I/O operations). Z: Zombie—a terminated process awaiting removal by its parent. Key Attributes: s: Session leader N: Low priority <: High priority For example, a STAT value of Ss denotes a sleeping session leader, while l< indicates an idle kernel thread with high priority. START and TIME START indicates the time or date the process began. Useful for identifying long-running tasks. TIME represents the cumulative CPU time consumed since launch. A process running for days with minimal TIME is likely idle. COMMAND This column displays the command or application that initiated the process. It helps identify the purpose of a task—for example, /usr/bin/python3 for a Python script or /usr/sbin/nginx for an Nginx web server. Advanced Process Filtering Techniques While ps aux provides a wealth of data, its output can be overwhelming on busy systems. Below are methods to refine and analyze results effectively. Isolating Specific Processes To focus on a particular service—such as SSH—pipe the output to grep: ps aux | grep sshd Example output: root 579 0.0 0.5 15436 5512 ? Ss 2024 9:35 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups root 2090997 0.0 0.8 17456 8788 ? Ss 11:26 0:00 sshd: root@pts/0 root 2092718 0.0 0.1 4024 1960 pts/0 S+ 12:19 0:00 grep --color=auto sshd This filters lines containing sshd, revealing all SSH-related processes. To exclude the grep command itself from results, use a regular expression: ps aux | grep "[s]shd"  Example output: root 579 0.0 0.5 15436 5512 ? Ss 2024 9:35 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups root 2090997 0.0 0.8 17456 8788 ? Ss 11:26 0:00 sshd: root@pts/0 Sorting by Resource Consumption Identify CPU-intensive processes by sorting the output in descending order: ps aux --sort=-%cpu | head -n 10 Example output: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND mysql 1734280 0.4 36.4 1325172 357284 ? Ssl Jan30 87:39 /usr/sbin/mysqld redis 1424968 0.3 0.6 136648 6240 ? Ssl Jan18 112:25 /usr/bin/redis-server 127.0.0.1:6379 root 1 0.0 0.6 165832 6824 ? Ss 2024 5:51 /lib/systemd/systemd --system --deserialize 45 root 2 0.0 0.0 0 0 ? S 2024 0:00 [kthreadd] root 3 0.0 0.0 0 0 ? I< 2024 0:00 [rcu_gp] root 4 0.0 0.0 0 0 ? I< 2024 0:00 [rcu_par_gp] root 5 0.0 0.0 0 0 ? I< 2024 0:00 [slub_flushwq] root 6 0.0 0.0 0 0 ? I< 2024 0:00 [netns] root 8 0.0 0.0 0 0 ? I< 2024 0:00 [kworker/0:0H-events_highpri] Similarly, you can sort by memory usage to detect potential leaks: ps aux --sort=-%mem | head -n 10 Example output: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND mysql 1734280 0.4 36.4 1325172 357284 ? Ssl Jan30 87:39 /usr/sbin/mysqld root 330 0.0 4.4 269016 43900 ? S<s 2024 22:43 /lib/systemd/systemd-journald root 368 0.0 2.7 289316 27100 ? SLsl 2024 8:19 /sbin/multipathd -d -s root 1548462 0.0 2.5 1914688 25488 ? Ssl Jan23 2:08 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock root 1317247 0.0 1.8 1801036 17760 ? Ssl Jan14 22:24 /usr/bin/containerd root 556 0.0 1.2 30104 11956 ? Ss 2024 0:00 /usr/bin/python3 /usr/bin/networkd-dispatcher --run-startup-triggers root 635 0.0 1.1 107224 11092 ? Ssl 2024 0:00 /usr/bin/python3 /usr/share/unattended-upgrades/unattended-upgrade-shutdown --wait-for-signal root 2090997 0.0 0.8 17456 8788 ? Ss 11:26 0:00 sshd: root@pts/0 root 2091033 0.0 0.8 9936 8480 pts/0 Ss 11:26 0:00 bash --rcfile /dev/fd/63 Real-Time Monitoring Combine ps aux with the watch command to refresh output every 2 seconds: watch -n 2 "ps aux --sort=-%cpu" This provides a dynamic view of CPU usage trends. Zombie Process Detection Zombie processes, though largely harmless, clutter the process list. Locate them with: ps aux | grep 'Z' Persistent zombies often indicate issues with parent processes failing to clean up child tasks. Practical Use Cases Now, let’s explore some common use cases of the ps aux command in Linux: Diagnosing High CPU Usage Follow the below steps: Execute this command to list processes by CPU consumption. ps aux --sort=-%cpu Identify the culprit—for example, a malfunctioning script using 95% CPU. If unresponsive, terminate the process gracefully with: kill [PID] Or forcibly with: kill -9 [PID] Detecting Memory Leaks Simply do the following: Sort processes by memory usage: ps aux --sort=-%mem Investigate tasks with abnormally high %MEM values. Restart the offending service or escalate to developers for code optimization. Auditing User Activity List all processes owned by a specific user (e.g., Jenkins): ps aux | grep ^jenkins This helps enforce resource quotas or investigate suspicious activity. Best Practices for Process Management Let’s now take a quick look at some best practices to keep in mind when managing Linux processes: Graceful Termination: Prefer kill [PID] over kill -9 to allow processes to clean up resources. Log Snapshots: Periodically save process lists for audits: ps aux > /var/log/process_audit_$(date +%F).log Contextual Analysis: A high %CPU value might be normal for a video encoder but alarming for a text editor. Hence, it’s essential to consider the context when making an analysis. Common Pitfalls to Avoid Here are some pitfalls to look out for when using ps aux in Linux: Misinterpreting VSZ: High virtual memory usage doesn’t always indicate a problem—it includes swapped-out data. Overlooking Zombies: While mostly benign, recurring zombies warrant investigating parent processes. Terminating Critical Services: Always verify the COMMAND field before using kill to avoid disrupting essential services. Conclusion The ps aux command is a cornerstone of Linux system administration, offering deep insights into process behavior and resource utilization. You can diagnose performance issues, optimize resource allocation, and maintain system stability by mastering its output interpretation, filtering techniques, and real-world applications.  Did you know? Hostman prepared an Object Storage for your project to save all necessary info for your server. Start using now! For further exploration, consult the ps manual (man ps) or integrate process monitoring into automated scripts for proactive system management. Frequently Asked Questions (FAQ) What is the ps aux command in Linux?  It is the most common command to view a snapshot of all running processes on the system. The flags break down as follows: a: Shows processes for all users, not just the current user. u: Displays the process's user/owner and provides detailed resource usage (CPU, RAM). x: Shows processes not attached to a terminal (background daemons). Why do we use the ps command in Linux? We use it to monitor system health and troubleshoot performance. It helps you identify which applications are consuming the most CPU or Memory, find the Process ID (PID) needed to stop a frozen program, and verify if background services are running correctly. How do you use the ps aux command to find zombie processes? Zombie processes (defunct) appear with a Z in the STAT column. You can filter for them specifically by running: ps aux | grep 'Z' Alternatively, to get a cleaner list excluding the grep command itself: ps aux | awk '$8=="Z" {print $0}' How do I sort the output by Memory or CPU usage?  By default, ps aux does not sort by usage. You can use the --sort option: Sort by Memory: ps aux --sort=-%mem Sort by CPU: ps aux --sort=-%cpu (The minus sign sorts in descending order). What do the VSZ and RSS columns mean? VSZ (Virtual Memory Size): The total virtual memory available to the process (including swap and shared libraries). RSS (Resident Set Size): The actual physical RAM the process is currently using. RSS is usually the more important number for checking memory usage. How do I kill a process I found using ps aux?  First, locate the PID (Process ID) in the second column of the output. Then run: sudo kill [PID] If the process refuses to close, you can force kill it with sudo kill -9 [PID].
22 January 2026 · 10 min to read
Linux

How to Create a Text File in Linux Terminal

In Linux, you can access and edit text files using a text editor that is designed to work with plain text. These files are not specifically coded or formatted. Choose your server now! There are several different ways to create a file in Linux. The Linux Command Line or Terminal is most likely the fastest. This is a crucial skill for any user, but especially for server administrators, who need to create text files, scripts, or configuration files quickly for their jobs. Let's proceed to the guide on four standard techniques for creating a text file on the terminal. And if you’re looking for a reliable, high-performance, and budget-friendly solution for your workflows, Hostman has you covered with Linux VPS Hosting options, including Debian VPS, Ubuntu VPS, and VPS CentOS. File Creation in Linux Can be Frustrating Sometimes Prerequisites for File Creation in Linux Ensure these prerequisites are met before generating files in a Linux environment using the command-line interface: Access to a Functional Linux System: You must either have a Linux-based operating system installed on your computer or secure access to a Linux server via SSH (Secure Shell) protocol. Operational Terminal Interface: Confirm that your terminal application is accessible and fully operational. The terminal serves as your primary gateway to executing commands. Adequate User Permissions: Verify you can create files within the chosen directory. You may need to use sudo (for directories with access restrictions) to escalate privileges. Fundamental Commands Proficiency: You must get familiar with essential commands, such as touch for file creation, echo for printing text, cat for viewing file contents, and text editors like nano, vim, or vi for editing files directly. Text Editing Utilities: Ensure your system includes text editing tools: nano for command line simplicity, vim for advanced configurations, or graphical options like gedit for user-friendly navigation. Directory Management Expertise: Develop familiarity with directory navigation commands like cd for changing the working directory and ls for listing directory contents. This knowledge streamlines your workflow and avoids potential errors. Using the touch Command Generally, we use the touch command to create empty files and change timestamps. It will create an empty file if it doesn't exist already.  To create a text file in the current directory with the touch command: Open your terminal emulator. Type the command: touch filename.txt Start with "touch" command Replace "filename" with the name you picked for the file. If the file with the same name already exists, the access and modification timestamps will be updated without affecting the content of the file. If not, a blank file with the specified name will be generated. Press Enter—if it is successful, there will be no output. Use the ls command to list the directory content and verify file creation. "LS" command is also important of you want to generate text file in Linux Using the echo Command Redirection The echo command is widely used to display text on the terminal. But its capabilities go beyond that; it may also be used to write content to a file or create an empty file. For this, combine the echo command with double redirect symbols (you can also use a single >) and the desired filename. A text file can be created by redirecting the output of the echo command to a file. See how it works: Open your terminal emulator. Type the command: echo “Your text content here” > filename.txt "Echo" command is also important in the process Replace the text in double quotations (do not delete them) with yours to add it to the file.  After you press Enter, your text will be added to the file filename.txt. It will overwrite an existing file, if there is one. Otherwise, it will just create a new one. Press Enter. To verify that the file has been created and contains the desired content, use cat command to display the content.  "Cat" command can help you to display your file you just created Using the cat Command Redirection In Linux, the cat command is mostly used to concatenate and show file contents. It can, however, also be used to generate a text document by redirecting the standard output of cat to a file. Open your terminal emulator. Type the following command: cat > filename.txt This is what you'll see after "cat" command Replace filename.txt with the name for your text file. This command instructs cat to receive input rom the terminal and to redirect it into the filename.txt. Press Enter. The terminal will be waiting for input.  Enter the text you want in the file. Press Enter after each line. Press Ctrl + D when you are done. This signals the end of input to the cat and saves the content.  Run the cat command to check that the file has been created and contains the desired content. This is how you can check how your file in Linux is created Using printf for Advanced File Creation The printf utility is a powerful alternative to echo, offering enhanced formatting options for structuring text. It allows users to create files with precisely formatted content. Open the terminal. Use printf to define the text layout, incorporating formatting elements like newlines (\n) or tabs (\t). Redirect the output to a file using the > operator. Example: printf "First Line\nSecond Line\nIndented\tThird Line\n" >  formatted_file.txt Run the cat command to inspect the file's content and ensure the formatting matches expectations. Append Without Overwriting: To add content to an existing file without overwriting its current data, replace > with the append operator >>: printf "Additional content here.\n" >> formatted_file.txt Using a Text Editor You can also create new files in linux text editors. There is always at least one integrated command-line text editor in your Linux distribution. But you can choose and install a different one according to your preferences, for example, Vim, Nano, or Emacs. Each of them has its own features and advantages. Vim vim, which stands for "Vi IMproved," is a very flexible and adaptable text editor. It is well-known for its modal editing, which allows for distinct modes for various functions like text entry, navigation, and editing. It allows split windows, multiple buffers, syntax highlighting, and a large selection of plugins for extra features. To create a text file using vim, follow the steps below: Open vim, with the desired filename as an argument. "Vim" command is one of the key steps in file creation Press i to switch to Insert mode. Start typing and editing the filename.txt.  To save and exit, press Esc to ensure that command mode is running. Type: wq (write and quit) and press Enter. Simple command to finish your work Nano nano is ideal for short adjustments and straightforward text files because it is lightweight and requires little setup. It provides support for basic text manipulation functions, search and replace, and syntax highlighting. To create a text file using nano, follow the steps below:  Run nano with the desired filename as an argument. It will open a new buffer for editing the file filename.txt. Nano is useful in you want to fix something in your text file Start typing and editing the filename.txt.  To save and exit, press Ctrl + O to write the file, confirm the filename, and then press Ctrl + X to exit Nano. Click "yes" to exit Emacs emacs is a powerful and flexible text editor that supports syntax highlighting, multiple buffers, split windows, and integration with external tools and programming languages. To create a text file using emacs, follow the steps below:  Open emacs, with the desired filename as an argument. Start typing and editing the filename.txt.  "Emacs" is more flexible text editor To save and exit, press Ctrl + X, followed by Ctrl + S to save the file, and then Ctrl + X, followed by Ctrl + C to exit Emacs. Note: If a message states that "VIM command not found", "nano command not found" or "emacs command not found" in Linux, it typically means that the vim, nano or emacs text editor is not installed on the system, or it's not included in the PATH environment variable, which is a list of directories where the operating system looks for executable files. Don't forget to install necessary command in Linux To resolve this, install the text editor first using the command:  apt-get install vim apt-get install nano  apt-get install emacs Gedit An intuitive text editor that supports working with plain text and has syntax highlighting for programming languages. A straightforward graphical interface makes it usable for various tasks, from quick edits to complex document preparation. Open the Gedit Application: Launch Gedit either through the applications menu or by executing the following command in the terminal: gedit example.txt Gedit will create a new file if the specified one does not exist. Input Your Text: Type or paste your desired content into the editor. Save the File: Save your work with Ctrl + S or select File > Save. If creating a new file, specify a filename and a location. Verify: Return to the terminal and confirm the file exists with the ls command or review its content with cat. Linux File Creation Recommendations Ensure you have sufficient permissions to create files in the target directory. If they are insufficient, consider working in a directory where you have full rights (or elevate privileges with sudo). Check if a file with the identical name is already present before using the > operator, as the command will overwrite existing content. To prevent data loss, opt for the append operator >>. Familiarize yourself with the printf, echo, and text editors like vim or nano. These tools will help you reduce errors when working with files in Linux, as well as boost productivity. Use printf for creating files requiring structured content, such as configuration files or scripts with precise formatting needs. Choose your server now! Conclusion Now you have acquainted yourself with the fundamental skill of creating a file in Linux using the terminal! Using the Linux command line, several fast and efficient methods exist to create and manage text files. Apply several techniques to meet a different requirement using the touch, echo, cat, printf commands, or text editors like vim, nano, gedit, or emacs. Users can select the method that sufficiently meets their requirements, such as creating empty files, appending text, or significantly modifying material. In summary, any of these methods enable Linux users to easily and quickly handle text files straight from the command line. Frequently Asked Questions (FAQ) How do I create an empty text file in Linux?  The standard command is touch. Simply run: touch filename.txt This creates a blank file immediately. How do I create a file and add content at the same time?  You can use the echo command with the redirection operator (>). echo "Hello World" > filename.txt This creates the file and puts "Hello World" inside it. How do I create and open a file for editing?  Use a terminal text editor like nano or vi. When you run: nano filename.txt Linux will open a blank editor screen. Once you type your text and save (Ctrl+O in nano), the file is created on your disk. What is the fastest way to create a file?  The redirection symbol alone is the quickest method for creating an empty file:> filename.txt This tells the shell to redirect "nothing" into a new file, creating it instantly. How do I create a large file for testing?  Use the fallocate command. For example, to create a 1GB file instantly:fallocate -l 1G bigfile.img How do I view the content of a text file?  Use the cat command to print the text to your terminal: cat filename.txtFor longer files, use less filename.txt to scroll through pages.
21 January 2026 · 10 min to read

Do you have questions,
comments, or concerns?

Our professionals are available to assist you at any moment,
whether you need help or are just unsure of where to start.
Email us
Hostman's Support