If you’ve ever exported data from websites manually, you know how tedious it can be: you have to open the site and many links, then go through each one, copy the data, and paste it into a spreadsheet. And if there’s a lot of data, the process turns into endless routine work.
The good news is that this can be automated, and you don’t need programming skills to do it. Once you set up the scenario, everything will run automatically: the n8n platform will collect the data, save it to a database, and send it further if necessary.
In this article, we’ll look at how to set up such a process with minimal effort. We’ll create a chain that:
All this doesn’t require any special skills, just a basic understanding of how the terminal and web panel work. You can figure it out even if you’ve never heard of n8n before.
Next, we’ll break down the process step by step, from starting the server to building the working process. By the end, you’ll have a workflow that saves you hours and handles routine tasks automatically.
Let’s say you need to collect the texts of all articles in the “Tutorials” section. To complete the task, we’ll break it down into a sequence of steps, also known as a pipeline.
To implement the project, we’ll use ready-made cloud services. With Hostman, you can quickly deploy:
Go to the control panel and open the Cloud servers section in the left panel. Click Create server. Choose the appropriate location and configuration.
When selecting a configuration, keep in mind that n8n itself is very lightweight. The main load falls on memory (RAM). It’s used to handle multiple simultaneous tasks and store large logs/history. Additional CPU cores help with complex chains with many transformations or a large number of concurrent executions.
Below is a comparative table to help you choose the right configuration:
| Configuration | Characteristics | Best For | 
| 1 × 3.3 GHz, 2 GB, 40 GB | Low | Test scenarios, 1–2 simple workflows without large loops or attachment handling. | 
| 2 × 3.3 GHz, 2 GB, 60 GB | Optimal for most tasks | Small automations: data exports, API operations, database saves, periodic jobs. Good starting tier. | 
| 2 × 3.3 GHz, 4 GB, 80 GB | Universal option | Moderate load: dozens of active workflows, loops over hundreds of items, JSON handling and parsing. Good memory margin. | 
| 4 × 3.3 GHz, 8 GB, 160 GB | For production and large scenarios | High load: constant cron triggers, processing large data sets, integrations with multiple services. | 
| 8 × 3.3 GHz, 16 GB, 320 GB | Overkill for n8n | Suitable if you plan to run additional containers (e.g., message queue, custom API). Usually excessive for n8n alone. | 
In section Network keep the public IPv4 address enabled; this ensures the server is accessible from any network. Add a private network for connecting to the database; you can use the default settings. Adjust other parameters as needed. Click Order.
Server creation and setup take about 10 minutes.
After that, install n8n on it following the official documentation.
Once the n8n server is up and running, you need to prepare a place to store your data. For this, we’ll use a cloud PostgreSQL database (DBaaS).
This is more convenient and practical than deploying it yourself: you don’t have to install and maintain hardware, configure software, or manage complex storage systems.
Go to the control panel, click on the Databases tab in the left panel, then click Create Database. In section Database Type, choose PostgreSQL.
In section 4. Network, you can disable the public IPv4 address; the connection to the database will occur through the private network. This is not only safer but also more cost-effective.
Click Order. The database will be ready in about 5 minutes.
It’s easy to get familiar with n8n, and you’ll quickly see that for yourself. In this step, we’ll look at n8n’s main elements, what they do, and when to use them.
In n8n, every automation is built from nodes—blocks that perform one specific task.
| Node Type | Function | 
| Trigger | Starts a workflow based on an event: by time (Schedule), webhook, or service change. | 
| Action | Sends a request or performs an operation: HTTP Request, email sending, database write. | 
| Logic | Controls flow: If, Switch, Merge, Split In Batches. | 
| Function / Code | Allows you to insert JS code (Function, Code) or quick expressions. | 
Any scenario can be built using these node types.
Click “+” in the top-right corner of the workspace or on the output arrow of another node.
Type the node name in the search, for example: http or postgresql.
Click it. The node will appear and open its settings panel.
Fill in the required fields: URL, method, and credentials. Fields with a red border are mandatory.
Click Execute Node. You’ll see a green checkmark and an OUTPUT section with data. This is a quick way to verify the node works correctly.
| Feature | Where to Find | Purpose | 
| Credentials | Main page (Overview) → Credentials tab | Stores logins/tokens; set once, use in any node. | 
| Variables | Any input field supports expressions  | Use for dynamic dates, counters, or referencing data from previous nodes. | 
| Executions | Main page (Overview) → Executions tab | Logs of all runs: see input/output data, errors, execution time. | 
| Workflow History | Enabled via advanced features; button in top panel on Workflow page | Similar to Git: revert to any previous scenario version. | 
| Folders | Main screen; click the folder-with-plus icon near sorting and search | Keeps workflows organized if you have many. | 
| Templates | Templates tab on the left of the Workflow screen, or via link | Ready-made recipes: connect Airtable, Slack bot, RSS parsing, etc. | 
Now we have everything we need: a server with n8n and a PostgreSQL database. We can start building the pipeline.
On the main screen, click Create workflow. This will open the workspace.
To start the pipeline, you need a trigger. For testing, use Trigger manually: it allows you to launch the process with a single button click. After testing, you can switch to another trigger, such as scheduling data export once a day.

n8n window after creating a workflow: choosing a trigger for manual or scheduled start
We’ll create a universal pipeline. It will go through websites, extract links page by page, then go through all of them and extract data. However, since every website is structured differently and uses different technologies, there’s no guarantee that this setup will work everywhere without adjustments.
Click “+” next to the trigger. The action selection panel will open. In the search field, type http and select HTTP Request.

Selecting the next step in n8n: adding the “HTTP Request” node for sending requests to a website
A panel will open to configure the parameters. But you can simply import the required data from your browser; that way, you don’t have to dive into the details of HTTP requests.
Now you need to understand how exactly the browser gets the data that it displays on the page. Usually, this happens in one of two ways:
Open in your browser the page you want to get data from. For example, we’ll use the Tutorials page. Then open the Developer Tools (DevTools) by pressing F12 and go to the Network tab.
On our example site, there’s a See more button. When clicked, the browser sends a request to the server and receives a response.
When a user clicks a button to view details, usually a single request is sent, which immediately returns the necessary information. Let’s study the response. Click the newly appeared request and go to the Response tab. Indeed, there you’ll find all the article information, including the link.
If you’re following this example, look for a GET request starting with:
https://content.hostman.com/items/tutorials?...That’s the one returning the list of publications. Yours might differ if you’re analyzing another site.
On the Headers tab, you can study the structure of the response to understand how it’s built. You’ll see that parameters are passed to the server: limit and offset.
offset = 6 makes sense because the first 6 articles are already displayed initially, so the browser doesn’t need to fetch them again.To fetch articles from other pages, we’ll shift the offset parameter with each request and accumulate the data.
Copy the command in cURL format: it contains all the request details. Right-click the request in the web inspector → Copy value → Copy as cURL. An example command might look like this:
curl 'https://content.hostman.com/items/tutorials?limit=6&offset=6&fields[]=path&fields[]=title&fields[]=image&fields[]=date_created&fields[]=topics&fields[]=text&fields[]=locale&fields[]=author.name&fields[]=author.path&fields[]=author.avatar&fields[]=author.details&fields[]=author.bio&fields[]=author.email&fields[]=author.link_twitch&fields[]=author.link_facebook&fields[]=author.link_linkedin&fields[]=author.link_github&fields[]=author.link_twitter&fields[]=author.link_youtube&fields[]=author.link_reddit&fields[]=author.tags&fields[]=topics.tutorials_topics_id.name&fields[]=topics.tutorials_topics_id.path&meta=filter_count&filter=%7B%22_and%22%3A%5B%7B%22status%22%3A%7B%22_eq%22%3A%22published%22%7D%7D%2C%7B%22_or%22%3A%5B%7B%22publish_after%22%3A%7B%22_null%22%3A%22true%22%7D%7D%2C%7B%22publish_after%22%3A%7B%22_lte%22%3A%22$NOW(%2B3+hours)%22%7D%7D%5D%7D%2C%7B%22locale%22%3A%7B%22_eq%22%3A%22en%22%7D%7D%5D%7D&sort=-date_created' \
  -H 'sec-ch-ua-platform: "Windows"' \
  -H 'Referer: https://hostman.com/' \
  -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/141.0.0.0 Safari/537.36' \
  -H 'Accept: application/json, text/plain, */*' \
  -H 'sec-ch-ua: "Google Chrome";v="141", "Not?A_Brand";v="8", "Chromium";v="141"' \
  -H 'sec-ch-ua-mobile: ?0'
Now go back to n8n. Click Import cURL and paste the copied value.
Important: if you copy the command from Firefox, the URL might contain extra ^ symbols that can break the request.
To remove them:
Method 1. In n8n:
After import, click the gear icon next to the URL field.
Choose Add Expression. The URL becomes editable.
Press Ctrl + F (Cmd + F on macOS), enable Replace mode, type ^ in the search field, leave the replacement field empty, and click Replace All.
Method 2. In VSCode:
Paste the cURL command into a new .txt or .sh file.
Press Ctrl + H (Cmd + H on macOS).
In Find, enter ^, leave Replace with empty, and click Replace All.
Copy the cleaned command back into n8n.
Click Import, then Execute step. After a short delay, you should see the data fetched from the site in the right-hand window.
Now you know how to retrieve data from a website via n8n.
Let’s recall the goal: we need to loop through all pages and store the data in a database. To do that, we’ll build the following pipeline:
Add a manual trigger: Trigger manually. It starts the workflow when you click the start button. Connect all nodes sequentially to it.
In the first node, set values for limit and offset.
If they exist in the input, leave them as is.
Otherwise, default limit = 100 and offset = 0 (for pagination).
Add a Edit Fields node → click Add Field.
In the “name” field: limit
In the “value” field:{{ $json.limit !== undefined ? $json.limit : 100 }}
Add another field:
“name”: offset
“value”:{{ $json.offset !== undefined ? $json.offset : 0 }}
Both expressions dynamically assign values. If this is the first loop run, it sets the default value; otherwise, it receives the updated variable.
Set both to Number type and enable Include Other Input Fields so the loop can pass values forward.
In the HTTP Request node, the API call uses the limit and offset values. The server returns an array under the key data. Set the URL field to Expression, inserting the previous node’s variables: {{ $json.limit }} and {{ $json.offset }}.
Next, an If node checks if the returned data array is empty.
If empty → stop the loop.
If not → continue.
Condition: {{ $json.data }} (1); Array (2) → is empty (3).
Under the false branch, add a Split Out node. It splits the data array into separate items for individual database writes.
Add an Insert or update rows in a table (PostgreSQL) node. Create credentials by clicking + Create new credential.
Use Hostman’s database details:
Host: “Private IP” field
Database: default_db
User / Password: “User login” and “Password” fields
Example SQL for creating the table (run once via n8n’s “Execute a SQL query” node):
CREATE TABLE tutorials (
    id SERIAL PRIMARY KEY,
    author_name TEXT,
    topic_name TEXT UNIQUE,
    topic_path TEXT,
    text TEXT
);
 This prepares the table to store article data. Each item writes to tutorials with fields topic_name, author_name, and topic_path.
The Merge node combines:
Database write results
Old limit and offset values
Since the PostgreSQL node doesn’t return output, include it in Merge just to synchronize: the next node starts only after writing completes.
The next Edit Fields node increases offset by limit (offset = offset + limit).
This prepares for the next API call—fetching the next page.
Connect this last Edit Fields node back to the initial Edit Fields node, forming a loop. The workflow repeats until the server returns an empty data array, which the If node detects to stop the cycle.
In our setup, when the If node’s true branch triggers (data is fully collected), we need to fetch all article links from the database and process each one.

Second loop in n8n: fetching links from DB and saving article text to a table
Here, each iteration requests one article and saves its text to the database.
Add Select rows from a table (PostgreSQL): it retrieves the rows added earlier. Since n8n doesn’t have intermediate data storage, the database serves this role. Use SELECT operation and enable Return All to fetch all rows without limits.
This node returns all articles at once, but we need to handle each separately. Add a Loop over items node. It has two outputs:
loop: connects nodes that should repeat per item,
done: connects what should run after the loop ends.
Inside the loop, add a request node to fetch each article’s content. Use DevTools again to find the correct JSON or HTML request. In this case, the needed request corresponds to the article’s page URL.
Note: this request appears only when you navigate to an article from the Tutorials section. Refreshing inside the article gives HTML instead.
To learn how to extract data from HTML, check n8n’s documentation.
In the request node, insert the article path from the database (convert URL field to Expression).
Finally, add an Update rows in a table node to store the article text from the previous node’s output.
At this point, the loop is complete. You can test your setup.
To avoid running the workflow manually every time, you can set up automatic execution on a schedule. This is useful when you need to refresh your database regularly, for example, once a day or once an hour.
n8n handles this through a special node called Schedule Trigger. Add it to your pipeline instead of Trigger manually.
In its settings, you can specify the time interval for triggering, starting from one second.

Configuring the Schedule Trigger node in n8n for automatic workflow execution
That’s it. The entire pipeline is now complete. To make the Schedule Trigger work, activate your workflow: toggle the Inactive switch at the top-right of the screen.

With the collected data, you can, for example, automate customer support so a bot can automatically search for answers in your knowledge base.
The table below lists common issues, their symptoms, and solutions.
| Symptom | Cause (Error) | Working Solution | 
| When switching the webhook from “Test” to “Prod,” the workflow fails with “The workflow has issues and cannot be executed.” | Validation failed in one of the nodes (a required field is empty, outdated credentials, etc.) | Open the workflow, fix nodes marked with a red triangle (fill in missing fields, update credentials), then reactivate. | 
| PostgreSQL node returns “Connection refused.” | The database service is unreachable: firewall closed, wrong port/host, or no Docker network permission. | If DB runs in Docker: check that it listens on port  | 
| Node fails with “Cannot read properties of undefined.” | A script/node tries to access a field that doesn’t exist in the incoming JSON. | Before accessing the field, use an IF node or  | 
| Execution stops with a log message: “n8n may have run out of memory.” | The workflow processes too many elements at once; Split In Batches keeps a large array in RAM. | Reduce batch size, add a Wait node, split the workflow, or upgrade your plan for more RAM. | 
| Split In Batches crashes or hangs on the last iteration (OOM). | Memory leak due to repeated loop cycles. | Set the smallest reasonable batch size, add a 200–500 ms Wait, or switch to Queue Mode for large data volumes. | 
| Database connection error: pq: SSL is not enabled on the server. | The client attempts SSL while the server doesn’t support it. | Add sslmode=disable to the connection string. | 
Automating data export through n8n isn’t about complex code or endless scripting; it’s about setting up a workflow once and letting it collect and store data automatically.
We’ve gone through the full process:
All of this runs on ready-made cloud infrastructure. You can easily scale up upgrading plans as your workload grows, connect new services, and enhance your workflow.
This example demonstrates one of the most common n8n patterns:
This same approach works perfectly for:
It’s all up to your imagination. The beauty of n8n is that you can adapt it to any task without writing complex code.
