Sign In
Sign In

How to Set Up Backup with Bacula

How to Set Up Backup with Bacula
Hostman Team
Technical writer
Linux
18.07.2025
Reading time: 14 min

Bacula is a cross-platform client-server open source backup software that enables you to back up files, directories, databases, mail server data (Postfix, Exim, Sendmail, Dovecot), system images, and entire operating systems.

In this guide, we’ll walk you through the process of installing and configuring Bacula on Linux, as well as creating backups and restoring user data.

To get started with Bacula, you’ll need a server or virtual machine running any Linux distribution. In this tutorial, we’ll be using a cloud server from Hostman with Debian 12.

Bacula Architecture

Bacula’s architecture consists of the following components:

Director (Bacula Director)

The core component responsible for managing all backup, restore, and verification operations. The Director schedules jobs, sends commands to other components, and writes information to the database.

Storage Daemon (Bacula Storage)

Handles communication with storage devices such as disks, cloud storage, etc. The Storage Daemon receives data from the File Daemon and writes it to the configured storage medium.

File Daemon (Bacula File)

The agent installed on client machines to perform the actual backup operations.

Catalog

A database (MySQL, PostgreSQL, or SQLite) used by Bacula to store information about completed jobs, such as backup metadata, file lists, and restore history.

Console (Bacula Console, bconsole)

A command-line utility for interacting with Bacula. The Console allows administrators to control the Director via a CLI. GUI tools such as Bacula Web and Baculum are also available.

Monitor (Optional)

A component for monitoring the Bacula system status. It tracks job statuses, daemon states, and storage device conditions.

Creating Test Data for Backup

Let’s create some test files to use in our backup.

Create a test directory and navigate into it:

mkdir /root/test_backups && cd /root/test_backups

Now create six sequential files:

touch file{1..6}.txt

Also, create a directory in advance for storing restored files:

mkdir /root/restored-files

Installing Bacula

In this tutorial, we will install all Bacula components on a single server. However, Bacula also supports a distributed setup where components such as the Director, Storage Daemon, Client, and database can be installed on separate servers. This decentralized setup is suitable for backing up multiple systems without overloading a single server.

We'll be using Debian 12 and installing PostgreSQL (version 15) as the backend database.

Update the package index and install Bacula (server and client components):

apt update && apt -y install bacula-server bacula-client

PostgreSQL 15 will also be installed during this process.

During installation:

  • When prompted with: “Configure database for bacula-director-pgsql with dbconfig-common?”, press ENTER.

0218df50 951a 42b8 A101 Ff85aad736c1.png

  • When asked to choose the database host, select localhost, since we are installing everything on one server.

A161e20b 7f7b 4e95 9ce3 281f9405a6a9.png

  • When prompted with: “PostgreSQL application password for bacula-director-pgsql”, set a password for the Bacula database. 

7a341b1f 69c8 424e 972e C03758fc1b0e.png

Do not leave this field empty, or a random password will be generated.

  • Re-enter the password when asked to confirm.

3822e6ec 984d 4734 A93b 82d475bd07d1.png

The installation will then continue normally.

After the installation is complete, verify the status of Bacula components and PostgreSQL.

Check the status of the Bacula Director:

systemctl status bacula-director

Check the Storage Daemon:

systemctl status bacula-sd

Check the File Daemon:

systemctl status bacula-fd

Check PostgreSQL:

systemctl status postgresql

If all components display a status of active, then Bacula has been successfully installed and is running.

Bacula Configuration

Bacula is configured by editing the configuration files of the program components. By default, all Bacula configuration files are located in the /etc/bacula directory.

929ce894 32a0 4470 B052 C39e57c232c2.png

Next, we will configure each Bacula component individually.

Configuring Bacula Director

Using any text editor, open the bacula-dir.conf configuration file for editing:

nano /etc/bacula/bacula-dir.conf

Let’s start with the Director block, which sets the main configuration parameters for the Director component:

Director {
  Name = 4142939-bi08079-dir
  DIRport = 9101
  QueryFile = "/etc/bacula/scripts/query.sql"
  WorkingDirectory = "/var/lib/bacula"
  PidDirectory = "/run/bacula"
  Maximum Concurrent Jobs = 20
  Password = "ohzb29XNWSFISd6qN6fG2urERzxOl9w68"
  Messages = Daemon
  DirAddress = 127.0.0.1
}

Explanation of parameters:

  • Name: The name of the Director component. This is a unique identifier used to connect with other components like the File Daemon and Storage Daemon. By default, it includes the server's hostname and the -dir suffix. Example: 4142939-bi08079-dir.

  • DIRport: The port that Bacula Director listens to for incoming connections from the management console (bconsole). Default is 9101.

  • QueryFile: Path to the SQL script file used to run queries on the database. It contains predefined SQL queries for job management, verification, data restoration, etc. Default: /etc/bacula/scripts/query.sql.

  • WorkingDirectory: The working directory where Bacula Director temporarily saves files during job execution.

  • PidDirectory: The directory where the Director saves its PID file (process identifier). This is used to track if the process is running.

  • Maximum Concurrent Jobs: The maximum number of jobs that can run simultaneously. The default is 20.

  • Password: Password used for authenticating the management console (bconsole) with the Director. Must match the one specified in the console’s configuration.

  • Messages: Specifies the name of the message resource that determines how messages (errors, warnings, events) are handled. Common values: Daemon, Standard, Custom.

  • DirAddress: The IP address the Director listens on. This can be 127.0.0.1 for local connections or an external IP.

Catalog Configuration

By default, Bacula comes with its own PostgreSQL instance on the same host, and in that case, database connection settings don’t need changes. But if you're deploying the database separately (recommended for production), the address, username, and password must be specified in the Catalog block:

Catalog {
  Name = MyCatalog
  dbname = "bacula"; DB Address = "localhost"; dbuser = "bacula"; dbpassword = "StrongPassword4747563"
}

7e112a31 0a15 4470 Bece 04b49c53691b.png

Explanation of parameters:

  • dbname: The name of the database used by Bacula (default is bacula). The database must already exist (when deployed separately).

  • DB Address: Host address where the DBMS is deployed. Use IP or a domain name. For local setup: localhost or 127.0.0.1.

  • dbuser: The user Bacula will use to connect to the database.

  • dbpassword: Password for the specified database user. Must be preconfigured.

Restore Job Configuration

Locate the Job block named RestoreFiles, responsible for file restoration. Set the Where parameter to specify the directory where restored files will be saved. Earlier, we created /root/restored-files, which we’ll use here:

Job {
  Name = "RestoreFiles"
  Type = Restore
  Client=4244027-bi08079-fd
  Storage = File1
  # The FileSet and Pool directives are not used by Restore Jobs
  # but must not be removed
  FileSet="Full Set"
  Pool = File
  Messages = Standard
  Where = /root/restored-files
}

57bd6862 C03a 4c80 9c0f 1e3fa9dbd994.png

Backup Schedule Configuration

Next, we set up the Schedule block that defines when backups are created.

We create:

  • A full backup every Monday at 00:01.
  • A differential backup every Sunday (2nd to 5th week) at 23:05.
  • An incremental backup daily at 23:00:
Schedule {
  Name = "WeeklyCycle"
  Run = Full 1st mon at 00:01
  Run = Differential 2nd-5th sun at 23:05
  Run = Incremental mon-sun at 23:00
}

D1bc1e2e 74eb 47f4 88fc 50a2a2e656e1.png

FileSet Configuration

Now, we specify which files and directories will be backed up. This is defined in the FileSet block. Earlier we created /root/test_backups with six files. We’ll specify that path:

FileSet {
  Name = "Full Set"
  Include {
    Options {
      signature = MD5
    }
    File = /root/test_backups
  }
}

Explanation of parameters:

  • Name: The name of the FileSet block, used for identification in configuration.
  • Options: Settings that apply to all files listed under Include.
  • signature = MD5: Specifies the checksum algorithm used to verify file integrity. MD5 generates a 128-bit hash to track file changes.

64c03e6b 0146 4ade Ac5d 0b98134eaf75.png

Exclude Configuration (Optional)

The Exclude block is used to specify files or directories that should not be backed up. This block is placed inside the FileSet definition and acts on files included via Include.

Exclude {
    File = /var/lib/bacula
    ...
}

Pool Configuration

The Pool block defines a group of volumes (storage units) used for backup. Pools help manage how data is stored, rotated, and deleted.

Pool {
  Name = Default
  Pool Type = Backup
  Recycle = yes
  AutoPrune = yes
  Volume Retention = 7 days
  Maximum Volume Bytes = 10G
  Maximum Volumes = 2
}

Explanation of parameters:

  • Name: The pool's name, here it's Default.
  • Pool Type: Defines the pool's function:
    • Backup: Regular backups.
    • Archive: Long-term storage.
    • Cloning: Data duplication.
  • Recycle: Indicates whether volumes can be reused once they're no longer needed (yes or no).
  • AutoPrune: Enables automatic cleanup of expired volumes.
  • Volume Retention: How long (in days) to retain data on a volume. After 7 days, the volume becomes eligible for reuse.
  • Maximum Volume Bytes: The max size for a volume. If it exceeds 10 GB, a new volume is created (if allowed).
  • Maximum Volumes: Limits the number of volumes in the pool. Here, it's 2. Older volumes are recycled when the limit is hit (if Recycle = yes).

Validating Configuration and Restarting Bacula

After making all changes, check the bacula-dir.conf file for syntax errors:

/usr/sbin/bacula-dir -t -c /etc/bacula/bacula-dir.conf

If the command output is empty, there are no syntax errors. If there are errors, the output will specify the line number and error description.

Restart the Bacula Director service:

systemctl restart bacula-director

Configuring Bacula Storage

The next step is configuring Bacula Storage, where the backup files will be stored.

Using any text editor, open the configuration file bacula-sd.conf for editing:

nano /etc/bacula/bacula-sd.conf

We'll start with the Storage block, which defines the storage daemon responsible for physically saving backup files:

Storage {                             
 Name = 4149195-bi08079-sd
 SDPort = 9103                  
 WorkingDirectory = "/var/lib/bacula"
 Pid Directory = "/run/bacula"
 Plugin Directory = "/usr/lib/bacula"
 Maximum Concurrent Jobs = 20
 SDAddress = 127.0.0.1
}

Here’s what each parameter means:

  • Name: Name of the storage daemon instance, used to identify it uniquely.
  • SDPort: Port number the Storage Daemon listens on. The default is 9103.
  • WorkingDirectory: Working directory for temporary files. Default: /var/lib/bacula.
  • Pid Directory: Directory to store the PID file (process ID) for the storage daemon. Default: /run/bacula.
  • Plugin Directory: Path where Bacula’s plugins for the storage daemon are located. These plugins can provide extra features such as encryption or cloud integration.
  • Maximum Concurrent Jobs: Maximum number of jobs the storage daemon can handle simultaneously.
  • SDAddress: IP address the Storage Daemon is available at. This can be an IP or a domain name. Since in our case the Storage Daemon runs on the same server as the Director, we use localhost.

The next block to configure is Device, which defines the storage device where backups will be written.

The device can be physical (e.g., a tape drive) or logical (e.g., a directory on disk). For testing, one Device block will suffice. By default, bacula-sd.conf may contain more than one Device block, including a Virtual Autochanger — a mechanism that emulates a physical autochanger (used for managing tapes or other media). It lets you manage multiple virtual volumes (typically as disk files) just like real tapes in a tape library.

Locate the Autochanger block and remove the FileChgr1-Dev2 value from the Device parameter:

Autochanger {
  Name = FileChgr1
  Device = FileChgr1-Dev1
  Changer Command = ""
  Changer Device = /dev/null
}

Cbcd432e 7603 43cc A453 Fe910d6add06.png

Next, in the Device block below, specify the full path to the directory we previously created for storing backup files (/srv/backup) in the Archive Device parameter:

Device {
  Name = FileChgr1-Dev1
  Media Type = File1
  Archive Device = /srv/backup
  LabelMedia = yes;                   
  Random Access = Yes;
  AutomaticMount = yes;               
  RemovableMedia = no;
  AlwaysOpen = no;
  Maximum Concurrent Jobs = 5
}

7c002890 0919 4b4a A1da 70317de2a43f.png

Any blocks referencing FileChgr2 and FileChgr1-dev2 should be deleted:

45494f14 1ac3 4b66 9325 D0bf06f6d8ea.png

Explanation of the parameters:

  • Autochanger Block:
    • Name: Identifier for the autochanger (you can have multiple).
    • Device: Name of the device linked to this autochanger—must match the Device block name.
    • Changer Command: Script or command used to manage the changer. An empty value ("") means none is used—suitable for virtual changers or simple setups.
    • Changer Device: Refers to the device tied to the autochanger, typically for physical devices.
  • Device Block:
    • Name: Identifier for the device.
    • Media Type: Media type associated with the device. Must match the Pool block media type.
    • Archive Device: Full path to the device or directory for storing backups; /srv/backup in this case.
    • LabelMedia: Whether Bacula should auto-label new media.
    • Random Access: Whether random access is supported.
    • AutomaticMount: Whether to auto-mount the device when used.
    • RemovableMedia: Specifies if the media is removable.
    • AlwaysOpen: Whether the device should always stay open.
    • Maximum Concurrent Jobs: Maximum number of simultaneous jobs using this device.

Since we previously specified the directory for backup storage, create it:

mkdir -p /srv/backup

Set the ownership to the bacula user:

chown bacula:bacula /srv/backup

Next, check the config file for syntax errors:

/usr/sbin/bacula-sd -t -c /etc/bacula/bacula-sd.conf

If there are no syntax errors, the output will be empty. Otherwise, it will indicate the line number and description of any error.

Restart the storage daemon:

systemctl restart bacula-sd

Creating a Backup

Backups in Bacula are created using the bconsole command-line tool. Launch the utility:

bconsole

If it connects to the Director component successfully, it will display 1000 OK.

Before running a backup, you can check the status of all components by entering the command:

status

This will display a list of the five Bacula system components. To check them all, enter 6.

3a083e2a F725 4888 A74d F04916aef01c.png

To initiate a backup, enter the command:

run

8ae0bca2 5120 4e94 93cf D61addf848f1.png

From the list, choose the BackupClient1 option (your client name might differ based on previous config), by typing 1.

After selecting the option, you’ll see detailed info about the backup operation.

You’ll then be prompted with three choices:

  • yes — start the backup process;
  • mod — modify parameters before starting;
  • no — cancel the backup.

07b80496 424f 4863 Ad05 B9af870458d8.png

If you enter mod, you’ll be able to edit up to 9 parameters.

0da17284 9bd8 477b Bb13 58714a7a45f4.png

To proceed with the backup, type yes.

To view all backup and restore jobs and their statuses:

list jobs

Db38902b Bc4d 41c0 9050 A4934dfcce6c.png

In our case, a backup with Job ID 1 was created:

list jobid=1

803a23fa Df3b 4e4c Aca3 4250bc23c5f6.png

If the status is T, the backup was successful.

Possible statuses in the "Terminated Jobs" column:

  • T (Success) — Job completed successfully.
  • E (Error) — Job ended with an error.
  • A (Canceled) — Job was canceled by the user.
  • F (Fatal) — Job ended due to a critical error.
  • R (Running) → Terminated — Job completed (may be successful or not).

You can also monitor backup activity and errors via the log file:

cat /var/log/bacula/bacula.log

Once the backup finishes, the file will be saved in the specified directory.

file Vol-0001

2658cf99 4409 431e B989 935dbac470d6.png

Restoring Files from Backup

Earlier, we backed up the /root/test_backups directory, which contained six .txt files. Suppose these files were lost or deleted. Let’s restore them:

Launch the Bacula console:

bconsole

Start the restore process:

restore

You’ll see 12 available restore options.

Bd1ea76b 7a6a 4063 8d40 17288feb3f2b.png

We’ll use option 3. Type 3.

Fc131259 0663 4834 Aea3 2f9fcea28890.png

Earlier we used Job ID 1 for our backup. Enter 1. 

93119f19 C369 4af9 815a Db864b572c81.png

You’ll enter a file selection mode. Since our files were in the root/test_backups directory, navigate there.

Ba432a23 629a 4b9e Ad7e 3fc2f2285605.png

All previously saved files should be visible.

To restore the whole directory, go up one level:

cd ..

Then mark the whole test_backups folder:

mark test_backups/

99f84ff3 F432 4bdd 943b A137fd64a56b.png

Finish selection:

done

The system will display a final summary showing which data will be restored and the target directory (in our case: /root/restored-files).

06f2a9d4 Cdc2 4a2e 95d2 7b3e26cf5201.png

To start the restore, enter yes.

F70754c6 508c 465e 8bc5 D2536446c5d2.png

Finally, verify that the files have been successfully restored.

1994faaf 233d 4307 B370 34dc24e290d0.png

Conclusion

We’ve now reviewed the installation and configuration of Bacula, a client-server backup solution. Bacula isn’t limited to backing up regular files—thanks to its plugin support, it can also handle backups of virtual machines, OS images, and more.

Linux
18.07.2025
Reading time: 14 min

Similar

Linux

How to Automate Data Export Using n8n

If you’ve ever exported data from websites manually, you know how tedious it can be: you have to open the site and many links, then go through each one, copy the data, and paste it into a spreadsheet. And if there’s a lot of data, the process turns into endless routine work. The good news is that this can be automated, and you don’t need programming skills to do it. Once you set up the scenario, everything will run automatically: the n8n platform will collect the data, save it to a database, and send it further if necessary. In this article, we’ll look at how to set up such a process with minimal effort. We’ll create a chain that: retrieves a list of articles, saves the data to PostgreSQL, collects the full text of each publication, stores everything in the database. All this doesn’t require any special skills, just a basic understanding of how the terminal and web panel work. You can figure it out even if you’ve never heard of n8n before. Next, we’ll break down the process step by step, from starting the server to building the working process. By the end, you’ll have a workflow that saves you hours and handles routine tasks automatically. Overview Let’s say you need to collect the texts of all articles in the “Tutorials” section. To complete the task, we’ll break it down into a sequence of steps, also known as a pipeline. What needs to be done? Collect the titles of all articles in the catalog along with their links. The site provides the data page by page; you can’t get all the links at once, so you need to collect them in a loop. Within the loop, save the collected links to the database. If there are many links, it’s most reliable to store intermediate data in a database. After the loop, extract the links from the database and start a new loop. By this stage, we’ll have a table with links to articles. Now we need to process each link and extract the text. Save the article texts. In the new loop, we’ll store the data in a new table in the database. What will we use? To implement the project, we’ll use ready-made cloud services. With Hostman, you can quickly deploy: a cloud server, a cloud PostgreSQL database. Step 1. Create a Server and Install n8n Go to the control panel and open the Cloud servers section in the left panel. Click Create server. Choose the appropriate location and configuration. When selecting a configuration, keep in mind that n8n itself is very lightweight. The main load falls on memory (RAM). It’s used to handle multiple simultaneous tasks and store large logs/history. Additional CPU cores help with complex chains with many transformations or a large number of concurrent executions. Below is a comparative table to help you choose the right configuration: Configuration Characteristics Best For 1 × 3.3 GHz, 2 GB, 40 GB Low Test scenarios, 1–2 simple workflows without large loops or attachment handling. 2 × 3.3 GHz, 2 GB, 60 GB Optimal for most tasks Small automations: data exports, API operations, database saves, periodic jobs. Good starting tier. 2 × 3.3 GHz, 4 GB, 80 GB Universal option Moderate load: dozens of active workflows, loops over hundreds of items, JSON handling and parsing. Good memory margin. 4 × 3.3 GHz, 8 GB, 160 GB For production and large scenarios High load: constant cron triggers, processing large data sets, integrations with multiple services. 8 × 3.3 GHz, 16 GB, 320 GB Overkill for n8n Suitable if you plan to run additional containers (e.g., message queue, custom API). Usually excessive for n8n alone. In section Network keep the public IPv4 address enabled; this ensures the server is accessible from any network. Add a private network for connecting to the database; you can use the default settings. Adjust other parameters as needed. Click Order. Server creation and setup take about 10 minutes. After that, install n8n on it following the official documentation. Step 2. Create a PostgreSQL Database Once the n8n server is up and running, you need to prepare a place to store your data. For this, we’ll use a cloud PostgreSQL database (DBaaS). This is more convenient and practical than deploying it yourself: you don’t have to install and maintain hardware, configure software, or manage complex storage systems.  Go to the control panel, click on the Databases tab in the left panel, then click Create Database. In section Database Type, choose PostgreSQL. In section 4. Network, you can disable the public IPv4 address; the connection to the database will occur through the private network. This is not only safer but also more cost-effective. Click Order. The database will be ready in about 5 minutes. Step 3. Learn the Basics of n8n It’s easy to get familiar with n8n, and you’ll quickly see that for yourself. In this step, we’ll look at n8n’s main elements, what they do, and when to use them. What Nodes Are and Why They’re Needed In n8n, every automation is built from nodes—blocks that perform one specific task. Node Type Function Trigger Starts a workflow based on an event: by time (Schedule), webhook, or service change. Action Sends a request or performs an operation: HTTP Request, email sending, database write. Logic Controls flow: If, Switch, Merge, Split In Batches. Function / Code Allows you to insert JS code (Function, Code) or quick expressions. Any scenario can be built using these node types. How to Create Nodes Click “+” in the top-right corner of the workspace or on the output arrow of another node. Type the node name in the search, for example: http or postgresql. Click it. The node will appear and open its settings panel. Fill in the required fields: URL, method, and credentials. Fields with a red border are mandatory. Click Execute Node. You’ll see a green checkmark and an OUTPUT section with data. This is a quick way to verify the node works correctly. Other Useful Features in n8n Feature Where to Find Purpose Credentials Main page (Overview) → Credentials tab Stores logins/tokens; set once, use in any node. Variables Any input field supports expressions {{ ... }} Use for dynamic dates, counters, or referencing data from previous nodes. Executions Main page (Overview) → Executions tab Logs of all runs: see input/output data, errors, execution time. Workflow History Enabled via advanced features; button in top panel on Workflow page Similar to Git: revert to any previous scenario version. Folders Main screen; click the folder-with-plus icon near sorting and search Keeps workflows organized if you have many. Templates Templates tab on the left of the Workflow screen, or via link Ready-made recipes: connect Airtable, Slack bot, RSS parsing, etc. Step 4. Build a Workflow in n8n Now we have everything we need: a server with n8n and a PostgreSQL database. We can start building the pipeline. On the main screen, click Create workflow. This will open the workspace. To start the pipeline, you need a trigger. For testing, use Trigger manually: it allows you to launch the process with a single button click. After testing, you can switch to another trigger, such as scheduling data export once a day. n8n window after creating a workflow: choosing a trigger for manual or scheduled start. Screenshot by the author  / n8n.io We’ll create a universal pipeline. It will go through websites, extract links page by page, then go through all of them and extract data. However, since every website is structured differently and uses different technologies, there’s no guarantee that this setup will work everywhere without adjustments. Get the Request from the Browser Click “+” next to the trigger. The action selection panel will open. In the search field, type http and select HTTP Request. Selecting the next step in n8n: adding the “HTTP Request” node for sending requests to a website. Screenshot by the author / n8n.io A panel will open to configure the parameters. But you can simply import the required data from your browser; that way, you don’t have to dive into the details of HTTP requests. Now you need to understand how exactly the browser gets the data that it displays on the page. Usually, this happens in one of two ways: The server responds with a ready-made HTML page containing the data. The server responds with a JSON dictionary. Open in your browser the page you want to get data from. For example, we’ll use the Tutorials page. Then open the Developer Tools (DevTools) by pressing F12 and go to the Network tab. On our example site, there’s a See more button. When clicked, the browser sends a request to the server and receives a response. When a user clicks a button to view details, usually a single request is sent, which immediately returns the necessary information. Let’s study the response. Click the newly appeared request and go to the Response tab. Indeed, there you’ll find all the article information, including the link. If you’re following this example, look for a GET request starting with: https://content.hostman.com/items/tutorials?... That’s the one returning the list of publications. Yours might differ if you’re analyzing another site. On the Headers tab, you can study the structure of the response to understand how it’s built. You’ll see that parameters are passed to the server: limit and offset. limit restricts the number of articles returned per request (6 in our case). offset shifts the starting point. offset = 6 makes sense because the first 6 articles are already displayed initially, so the browser doesn’t need to fetch them again. To fetch articles from other pages, we’ll shift the offset parameter with each request and accumulate the data. Copy the command in cURL format: it contains all the request details. Right-click the request in the web inspector → Copy value → Copy as cURL. An example command might look like this: curl 'https://content.hostman.com/items/tutorials?limit=6&offset=6&fields[]=path&fields[]=title&fields[]=image&fields[]=date_created&fields[]=topics&fields[]=text&fields[]=locale&fields[]=author.name&fields[]=author.path&fields[]=author.avatar&fields[]=author.details&fields[]=author.bio&fields[]=author.email&fields[]=author.link_twitch&fields[]=author.link_facebook&fields[]=author.link_linkedin&fields[]=author.link_github&fields[]=author.link_twitter&fields[]=author.link_youtube&fields[]=author.link_reddit&fields[]=author.tags&fields[]=topics.tutorials_topics_id.name&fields[]=topics.tutorials_topics_id.path&meta=filter_count&filter=%7B%22_and%22%3A%5B%7B%22status%22%3A%7B%22_eq%22%3A%22published%22%7D%7D%2C%7B%22_or%22%3A%5B%7B%22publish_after%22%3A%7B%22_null%22%3A%22true%22%7D%7D%2C%7B%22publish_after%22%3A%7B%22_lte%22%3A%22$NOW(%2B3+hours)%22%7D%7D%5D%7D%2C%7B%22locale%22%3A%7B%22_eq%22%3A%22en%22%7D%7D%5D%7D&sort=-date_created' \ -H 'sec-ch-ua-platform: "Windows"' \ -H 'Referer: https://hostman.com/' \ -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/141.0.0.0 Safari/537.36' \ -H 'Accept: application/json, text/plain, */*' \ -H 'sec-ch-ua: "Google Chrome";v="141", "Not?A_Brand";v="8", "Chromium";v="141"' \ -H 'sec-ch-ua-mobile: ?0' Now go back to n8n. Click Import cURL and paste the copied value. Important: if you copy the command from Firefox, the URL might contain extra ^ symbols that can break the request. To remove them: Method 1. In n8n: After import, click the gear icon next to the URL field. Choose Add Expression. The URL becomes editable. Press Ctrl + F (Cmd + F on macOS), enable Replace mode, type ^ in the search field, leave the replacement field empty, and click Replace All. Method 2. In VSCode: Paste the cURL command into a new .txt or .sh file. Press Ctrl + H (Cmd + H on macOS). In Find, enter ^, leave Replace with empty, and click Replace All. Copy the cleaned command back into n8n. Click Import, then Execute step. After a short delay, you should see the data fetched from the site in the right-hand window. Now you know how to retrieve data from a website via n8n. Add a Cyclical Algorithm Let’s recall the goal: we need to loop through all pages and store the data in a database. To do that, we’ll build the following pipeline: Add a manual trigger: Trigger manually. It starts the workflow when you click the start button. Connect all nodes sequentially to it. In the first node, set values for limit and offset. If they exist in the input, leave them as is. Otherwise, default limit = 100 and offset = 0 (for pagination).Add a Edit Fields node → click Add Field. In the “name” field: limit In the “value” field:{{ $json.limit !== undefined ? $json.limit : 100 }} Add another field: “name”: offset “value”:{{ $json.offset !== undefined ? $json.offset : 0 }} Both expressions dynamically assign values. If this is the first loop run, it sets the default value; otherwise, it receives the updated variable.Set both to Number type and enable Include Other Input Fields so the loop can pass values forward. In the HTTP Request node, the API call uses the limit and offset values. The server returns an array under the key data. Set the URL field to Expression, inserting the previous node’s variables: {{ $json.limit }} and {{ $json.offset }}. Next, an If node checks if the returned data array is empty. If empty → stop the loop. If not → continue.Condition: {{ $json.data }} (1); Array (2) → is empty (3). Under the false branch, add a Split Out node. It splits the data array into separate items for individual database writes. Add an Insert or update rows in a table (PostgreSQL) node. Create credentials by clicking + Create new credential.Use Hostman’s database details: Host: “Private IP” field Database: default_db User / Password: “User login” and “Password” fields Example SQL for creating the table (run once via n8n’s “Execute a SQL query” node): CREATE TABLE tutorials ( id SERIAL PRIMARY KEY, author_name TEXT, topic_name TEXT UNIQUE, topic_path TEXT, text TEXT );  This prepares the table to store article data. Each item writes to tutorials with fields topic_name, author_name, and topic_path. The Merge node combines: Database write results Old limit and offset values Since the PostgreSQL node doesn’t return output, include it in Merge just to synchronize: the next node starts only after writing completes. The next Edit Fields node increases offset by limit (offset = offset + limit).This prepares for the next API call—fetching the next page. Connect this last Edit Fields node back to the initial Edit Fields node, forming a loop. The workflow repeats until the server returns an empty data array, which the If node detects to stop the cycle. Add a Second Loop to Extract Article Texts In our setup, when the If node’s true branch triggers (data is fully collected), we need to fetch all article links from the database and process each one. Second loop in n8n: fetching links from DB and saving article text to a table. Screenshot by the author / n8n.io Here, each iteration requests one article and saves its text to the database. Add Select rows from a table (PostgreSQL): it retrieves the rows added earlier. Since n8n doesn’t have intermediate data storage, the database serves this role. Use SELECT operation and enable Return All to fetch all rows without limits. This node returns all articles at once, but we need to handle each separately. Add a Loop over items node. It has two outputs: loop: connects nodes that should repeat per item, done: connects what should run after the loop ends. Inside the loop, add a request node to fetch each article’s content. Use DevTools again to find the correct JSON or HTML request. In this case, the needed request corresponds to the article’s page URL.Note: this request appears only when you navigate to an article from the Tutorials section. Refreshing inside the article gives HTML instead.To learn how to extract data from HTML, check n8n’s documentation. In the request node, insert the article path from the database (convert URL field to Expression). Finally, add an Update rows in a table node to store the article text from the previous node’s output. At this point, the loop is complete. You can test your setup. Step 5. Schedule Workflow Execution To avoid running the workflow manually every time, you can set up automatic execution on a schedule. This is useful when you need to refresh your database regularly, for example, once a day or once an hour. n8n handles this through a special node called Schedule Trigger. Add it to your pipeline instead of Trigger manually. In its settings, you can specify the time interval for triggering, starting from one second. Configuring the Schedule Trigger node in n8n for automatic workflow execution. Screenshot by the author / n8n.io That’s it. The entire pipeline is now complete. To make the Schedule Trigger work, activate your workflow: toggle the Inactive switch at the top-right of the screen. Screenshot by the author / n8n.io With the collected data, you can, for example, automate customer support so a bot can automatically search for answers in your knowledge base. Common Errors Overview The table below lists common issues, their symptoms, and solutions. Symptom Cause (Error) Working Solution When switching the webhook from “Test” to “Prod,” the workflow fails with “The workflow has issues and cannot be executed.” Validation failed in one of the nodes (a required field is empty, outdated credentials, etc.) Open the workflow, fix nodes marked with a red triangle (fill in missing fields, update credentials), then reactivate. PostgreSQL node returns “Connection refused.” The database service is unreachable: firewall closed, wrong port/host, or no Docker network permission. If DB runs in Docker: check that it listens on port 5432, its IP is whitelisted, and n8n runs in the same network; add network_mode: bridge or a private network. If using Hostman DBaaS, check that the database and n8n host are on the same private network and ensure the DB is active. Node fails with “Cannot read properties of undefined.” A script/node tries to access a field that doesn’t exist in the incoming JSON. Before accessing the field, use an IF node or {{ $json?.field ?? '' }}; make sure the previous node actually outputs the expected field. Execution stops with a log message: “n8n may have run out of memory.” The workflow processes too many elements at once; Split In Batches keeps a large array in RAM. Reduce batch size, add a Wait node, split the workflow, or upgrade your plan for more RAM. Split In Batches crashes or hangs on the last iteration (OOM). Memory leak due to repeated loop cycles. Set the smallest reasonable batch size, add a 200–500 ms Wait, or switch to Queue Mode for large data volumes. Database connection error: pq: SSL is not enabled on the server. The client attempts SSL while the server doesn’t support it. Add sslmode=disable to the connection string. Conclusion Automating data export through n8n isn’t about complex code or endless scripting; it’s about setting up a workflow once and letting it collect and store data automatically. We’ve gone through the full process: Created a server with n8n without manual terminal setup, Deployed a cloud PostgreSQL database, Built a loop that collects links and article texts, Set up scheduled execution so everything runs automatically. All of this runs on ready-made cloud infrastructure. You can easily scale up upgrading plans as your workload grows, connect new services, and enhance your workflow. This example demonstrates one of the most common n8n patterns: Iterate through a website’s pages and gather all links, Fetch data for each link, Write everything to a database. This same approach works perfectly for: Collecting price lists and monitoring competitors, Content archiving, CRM integrations. It’s all up to your imagination. The beauty of n8n is that you can adapt it to any task without writing complex code.
30 October 2025 · 17 min to read
Linux

How to Find a File in Linux

In Unix-like operating systems, a file is more than just a named space on a disk. It is a universal interface for accessing information. A Linux user should know how to quickly find the necessary files by name and other criteria.  The locate Command The first file search command in Linux that we will look at is called locate. It performs a fast search by name in a special database and outputs all names matching the specified substring. Suppose we want to find all programs that begin with zip. Since we are looking specifically for programs, it is logical to assume that the directory name ends with bin. Taking this into account, let’s try to find the necessary files: locate bin/zip Output: locate performed a search in the pathname database and displayed all names containing the substring bin/zip. For more complex search criteria, locate can be combined with other programs, for example, grep: locate bin | grep zip Output: Sometimes, in Linux, searching for a file name with locate works incorrectly (it may output names of deleted files or fail to include newly created ones). In such a case, you need to update the database of indexes: sudo updatedb locate supports wildcards and regular expressions. If the string contains metacharacters, you pass a pattern instead of a substring as an argument, and the command matches it against the full pathname. Let’s say we need to find all names with the suffix .png in the Pictures directory: locate '*Pictures/*.png' Output: To search using a regular expression, the -r option is used (POSIX BRE standard): locate -r 'bin/\(bz\|gz\|zip\)' The find Command find is the main tool for searching files in Linux through the terminal. Unlike locate, find allows you to search files by many parameters, such as size, creation date, permissions, etc. In the simplest use case, we pass the directory name as an argument and find searches for files in this directory and all of its subdirectories. If you don’t specify any options, the command outputs a list of all files.  For example, to get all names in the home directory, you can use: find ~ The output will be very large because find will print all names in the directory and its subdirectories.  To make the search more specific, use options to set criteria. Search Criteria Suppose we want to output only directories. For this, we will use the -type option: find ~/playground/ -type d Output: This command displayed all subdirectories in the ~/playground directory. Supported types are: b — block device c — character device d — directory f — regular file l — symbolic link We can also search by size and name. For example, let’s try to find regular files matching the pattern .png and larger than one kilobyte: find ~ -type f -name "*.png" -size +1k Output: The -name option specifies the name. In this example, we use a wildcard pattern, so it is enclosed in quotes. The -size parameter restricts the search by size. A + sign before the number means we are looking for files larger than the given size, a - sign means smaller. If no sign is present, find will display only files exactly matching the size. Symbols for size units: b — 512-byte blocks (default if no unit is specified) c — bytes w — 2-byte words k — kilobytes M — megabytes G — gigabytes find supports a huge number of checks that allow searching by various criteria. You can check them all in the documentation. Operators Operators help describe logical relationships between checks more precisely.  Suppose we need to detect insecure permissions. To do this, we want to output all files with permissions not equal to 0600 and all directories with permissions not equal to 0700. find provides special logical operators to combine such checks: find ~ \( -type f -not -perm 0600 \) -or \( -type d -not -perm 0700 \) Supported logical operators: -and / -a — logical AND. If no operators are specified between checks, AND is assumed by default. -or / -o — logical OR. -not / ! — logical NOT. ( ) — allows grouping checks and operators to create complex expressions. Must be escaped. Predefined Actions We can combine file search with performing actions on the found files. There are predefined and user-defined actions. For the former, find provides the following options: -delete — delete found files -ls — equivalent to ls -dils -print — output the full file name (default action) -quit — stop after the first match Suppose we need to delete all files with the .bak suffix. Of course, we could immediately use find with the -delete option, but for safety it’s better to first output the list of files to be deleted, and then remove them: find ~ -type f -name '*.bak' -print Output: After verification, delete them: find ~ -type f -name '*.bak' -delete User-defined Actions With user-defined actions, we can combine the search with using various Linux utilities: -exec command '{}' ';' Here, command is the command name, {} is the symbolic representation of the current pathname, and ; is the command separator. For example, we can apply the ls -l command to each found file: find ~ -type f -name 'foo*' -exec ls -l '{}' ';' Output: Sometimes commands can take multiple arguments at once, for example, rm. To avoid applying the command separately to each found name, put a + at the end of -exec instead of a separator: find ~ -type f -name 'foo*' -exec ls -l '{}' + Output: A similar task can be done using the xargs utility. It takes a list of arguments as input and forms commands based on them. For example, here’s a well-known command for outputting files that contain “uncomfortable” characters in their names (spaces, line breaks, etc.): find ~ -iname '*.jpg' -print0 | xargs --null ls -l The -print0 argument forces found names to be separated by the null character (the only character forbidden in file names). The --null option in xargs indicates that the input is a list of arguments separated by the null character. Conclusion In Linux, searching for a file by name is done using the locate and find commands. Of course, you can also use file managers with a familiar graphical interface for these purposes. However, the utilities we have considered help make the search process more flexible and efficient. And if you’re looking for a reliable, high-performance, and budget-friendly solution for your workflows, Hostman has you covered with Linux VPS Hosting options, including Debian VPS, Ubuntu VPS, and VPS CentOS.
22 August 2025 · 6 min to read
Java

Switching between Java Versions on Ubuntu

Managing multiple Java versions on Ubuntu is essential for developers working on diverse projects. Different applications often require different versions of the Java Development Kit (JDK) or Java Runtime Environment (JRE), making it crucial to switch between these versions efficiently. Ubuntu provides powerful tools to handle this, and one of the most effective methods is using the update-java-alternatives command. Switching Between Java Versions In this article, the process of switching between Java versions using updata-java-alternatives will be shown. This specialized tool simplifies the management of Java environments by updating all associated commands (such as java, javac, javaws, etc.) in one go.  And if you’re looking for a reliable, high-performance, and budget-friendly solution for your workflows, Hostman has you covered with Linux VPS Hosting options, including Debian VPS, Ubuntu VPS, and VPS CentOS. Overview of Java version management A crucial component of development is Java version control, especially when working on many projects with different Java Runtime Environment (JRE) or Java Development Kit (JDK) needs. In order to prevent compatibility problems and ensure efficient development workflows, proper management ensures that the right Java version is utilized for every project. Importance of using specific Java versions You must check that the Java version to be used is compatible with the application, program, or software running on the system. Using the appropriate Java version ensures that the product runs smoothly and without any compatibility issues. Newer versions of Java usually come with updates and security fixes, which helps protect the system from vulnerabilities. Using an out-of-date Java version may expose the system to security vulnerabilities. Performance enhancements and optimizations are introduced with every Java version. For maximum performance, use a Java version that is specific to the application. Checking the current Java version It is important to know which versions are installed on the system before switching to other Java versions.  To check the current Java version, the java-common package has to be installed. This package contains common tools for the Java runtimes including the update-java-alternatives method. This method allows you to list the installed Java versions and facilitates switching between them. Use the following command to install the java-common package: sudo apt-get install java-common Upon completing the installation, verify all installed Java versions on the system using the command provided below: sudo update-java-alternatives --list The report above shows that Java versions 8 and 11 are installed on the system. Use the command below to determine which version is being used at the moment. java -version The displayed output indicates that the currently active version is Java version 11. Installing multiple Java versions Technically speaking, as long as there is sufficient disk space and the package repositories support it, the administrator of Ubuntu is free to install as many Java versions as they choose. Follow the instructions below for installing multiple Java versions. Begin by updating the system using the following command:   sudo apt-get update -y && sudo apt-get upgrade -y To add another version of Java, run the command below. sudo apt-get install <java version package name> In this example, installing Java version 17 can be done by running:  sudo apt-get install openjdk-17-jdk openjdk-17-jre Upon completing the installation, use the following command to confirm the correct and successful installation of the Java version: sudo update-java-alternatives --list Switching and setting the default Java version To switch between Java versions and set a default version on Ubuntu Linux, you can use the update-java-alternatives command.  sudo update-java-alternatives --set <java_version> In this case, the Java version 17 will be set as default: sudo update-java-alternatives --set java-1.17.0-openjdk-amd64 To check if Java version 17 is the default version, run the command:  java -version The output shows that the default version of Java is version 17. Managing and Switching Java Versions in Ubuntu Conclusion In conclusion, managing multiple Java versions on Ubuntu Linux using update-java-alternatives is a simple yet effective process. By following the steps outlined in this article, users can seamlessly switch between different Java environments, ensuring compatibility with various projects and taking advantage of the latest features and optimizations offered by different Java versions. Because Java version management is flexible, developers may design reliable and effective Java apps without sacrificing system performance or stability.
22 August 2025 · 4 min to read

Do you have questions,
comments, or concerns?

Our professionals are available to assist you at any moment,
whether you need help or are just unsure of where to start.
Email us
Hostman's Support