Sign In
Sign In

What is a CDN: Principles of Content Delivery Networks

What is a CDN: Principles of Content Delivery Networks
Hostman Team
Technical writer
Infrastructure

Latency, latency, latency! It has always been a problem of the Internet. It was, it is, and it probably will be. Delivering data from one geographic point to another takes time.

However, latency can be reduced. This can be achieved in several ways:

  • Reduce the number of intermediate nodes on the data path from the remote server to the user. The fewer the handlers, the faster the data reaches the destination. But this is hardly feasible. The global Internet continues to grow and become more complex, increasing the number of nodes. More nodes = more power. That’s the global trend. Evolution!

  • Instead of regularly sending data over long distances, we can create copies of it on nodes closer to the user. Fortunately, the number of network nodes keeps growing, and the topology spreads ever wider. Eureka!

The latter option seems like an absolute solution. With a large number of geographically distributed nodes, it's possible to create a kind of content delivery network. In addition to the main function—speeding up loading—such a network brings several other benefits: traffic optimization, load balancing, and increased fault tolerance.

Wait a second! That's exactly what a CDN is—Content Delivery Network. So, let’s let this article explain what a CDN is, how it works, and what problems it solves. 

What is a CDN?

A CDN (Content Delivery Network) is a distributed network of servers designed to accelerate multimedia content delivery (images, videos, HTML pages, JavaScript scripts, CSS styles) to nearby users.

Like a vast web, the CDN infrastructure sits between the server and the user, acting as an intermediary. Thus, content is not delivered directly from the server to the user but through the powerful "tentacles" of the CDN.

What Types of Content Exist?

Since the early days of the Internet, content has been divided into two types:

  • Static (requires memory, large in size). Stored on a server and delivered to users upon request. Requires sufficient HDD or SSD storage.

  • Dynamic (requires processing power, small in size). Generated on the server with each user request. Requires enough RAM and CPU power.

The volume of static content on the Internet far exceeds that of dynamic content. For instance, a website's layout weighs much less than the total size of the images embedded in it.

Storing static and dynamic content separately (on different servers) is considered good practice. While heavy multimedia requests are handled by one server, the core logic of the site runs on another.

CDN technology takes this practice to the next level. It stores copies of static content taken from the origin server on many other remote servers. Each of these servers serves data only to nearby users, reducing load times to a minimum.

What Does a CDN Consist Of?

CDN infrastructure consists of many geographically distributed computing machines, each with a specific role in the global data exchange:

  • User. The device from which the user sends requests to remote servers.
  • Origin Server. The main server of a website that processes user requests for dynamic content and stores the original static files used by the CDN as source copies.
  • Edge Node. A server node in the CDN infrastructure that delivers static content (copied from the origin server) to nearby users. Also called a Point of Presence (PoP).

A single CDN infrastructure simultaneously includes many active users, origin servers, and edge nodes.

What Happens Inside a CDN?

First, CDN nodes perform specific operations to manage the rotation of static content:

  • Caching. The process of loading copies of content from the origin server to a CDN server, followed by optimization and storage.
  • Purge (Cache Clearing). Cached content is cleared after a certain period or on demand to maintain freshness on edge nodes. For example, if a file is updated on the origin server, the update will take some time to propagate to the caching nodes.

Second, CDN nodes have several configurable parameters that ensure the stable operation of the entire infrastructure:

  • Time to Live (TTL). A timeout after which cached content is deleted from an edge node. For images and videos, TTL can range from 1 day to 1 year; for API responses (JSON or XML), from 30 seconds to 1 hour; HTML pages may not be cached at all. CDN nodes usually respect the HTTP Cache-Control header.
  • Caching Rule. A set of rules that determines how an edge node caches content. The primary parameter is how long the file remains in the cache (TTL).
  • Restriction. A set of rules on the edge node that moderates access to cached content for security purposes. For example, an edge node may serve requests only from nearby IP addresses or specific domains.

Thus, static content flows from the origin server through edge nodes to users, cached based on specific caching rules, and cleared once the TTL expires. Meanwhile, access restrictions are enforced on every edge node for security.

How Does a CDN Work?

Let's see how a CDN works from the user's perspective. We can divide the process into several stages:

  1. User Request Execution. When a user opens a website, the browser sends requests to CDN servers specified in HTML tags or within JavaScript code (such as Ajax requests). Without a CDN, requests would go directly to the origin server.
  2. Finding the Nearest Server. Upon receiving the request, the CDN system locates the server closest to the user.
  3. Content Caching. If the requested content is in the cache of the found CDN server, it is immediately delivered to the user. If not, the CDN server sends a request to the origin server and caches the content.
  4. Data Optimization. Content copies on CDN servers are optimized in various ways. For example, files can be compressed using Gzip or Brotli to reduce size.
  5. Content Delivery. The optimized and cached content is delivered to the user and displayed in their browser.

For instance, if a website’s origin server is in Lisbon and the user is in Warsaw, the CDN will automatically find the nearest server with cached static content—say, in Berlin.

If there is no nearby CDN server with cached content, the CDN will request the origin server. Subsequent requests will then be served through the CDN.

The straight-line distance from Warsaw to Lisbon is about 2800 km, while the distance from Warsaw to Berlin is only about 570 km.

Someone unfamiliar with networking might wonder: “How can a CDN speed up content delivery if data travels through cables at the speed of light—300,000 km/s?”

In reality, delays in data transmission are due to technical, not physical, limitations:

  • Routing. Data passes through many routers and nodes, each adding small delays from processing and forwarding packets.
  • Network Congestion. High traffic in some network segments can lead to delays and packet loss, requiring retransmissions.
  • Data Transmission Protocols. Protocols like TCP include features such as connection establishment, error checking, and flow control, all of which introduce delays.

Thus, the difference between 2800 km and 570 km is negligible in terms of signal propagation. But from a network infrastructure perspective, it makes a big difference.

Moreover, a CDN server in Berlin, finding no cached content, might request it not from the origin server but from a neighboring CDN node in Prague, if that node has the content cached.

Therefore, CDN infrastructure nodes can also exchange cached content among themselves.

What Types of CDN Exist?

There are several ways to classify CDNs. The most obvious is based on the ownership of the infrastructure:

  • Public. The CDN infrastructure is rented from a third-party provider. Suitable for small and medium-sized companies.
  • Private. The CDN infrastructure is deployed internally by the company itself. Suitable for large companies and IT giants.

Each type has its own pros and cons:

 

Public

Private

Connection speed

High

Low

Initial costs

Low

High

Maintenance complexity

Low

High

Cost of large-scale traffic

High

Low

Control capabilities

Low

High

Dependence on third parties

High

Low

Many CDN providers offer free access to their infrastructure resources to attract users. However, in such cases, there are limitations on:

  • Server capacity
  • Traffic volumes
  • Geographical coverage
  • Advanced configuration options

Paid CDN providers use various pricing models:

  • Pay-as-you-go. Costs depend on the volume of data transferred, measured in gigabytes or terabytes.
  • Flat-rate pricing. Costs depend on the chosen plan with a fixed amount of available traffic.
  • Request-based pricing. Costs depend on the number of user requests made.

Deploying your own CDN infrastructure is a serious step, usually justified by strong reasons:

  • Public CDN costs exceed the cost of running your own infrastructure. For example, high expenses due to massive multimedia traffic volumes.
  • The product hits technical limitations of public CDNs. For example, heavy network loads or a specific user geography.
  • The project demands higher reliability, security, and data privacy that public CDNs cannot provide. For example, a government institution or bank.

Here are a few examples of private CDN networks used by major tech companies:

  • Netflix Open Connect. Delivers Netflix’s streaming video to users worldwide.
  • Google Global Cache (GGC). Speeds up access to Google services.
  • Apple Private CDN. Delivers operating system updates and Apple services to its users.

What Problems Does a CDN Solve?

CDN technology has evolved to address several key tasks:

  • Faster load times. Files load more quickly (with less latency) because CDN servers with cached static content are located near the user.
  • Reduced server load. Numerous requests for static content go directly to the CDN infrastructure, bypassing the origin server.
  • Global availability. Users in remote regions can access content more quickly, regardless of the main server’s location.
  • Protection against attacks. Properly configured CDN servers can block malicious IP addresses or limit their requests, preventing large-scale attacks.
  • Traffic optimization. Static content is compressed before caching and delivery to reduce size, decreasing transmitted data volumes and easing network load.
  • Increased fault tolerance. If one CDN server fails or is overloaded, requests can be automatically rerouted to other servers.

The CDN, being a global infrastructure, takes over nearly all core responsibilities for handling user requests for static content.

What Are the Drawbacks of Using a CDN?

Despite solving many network issues, CDNs do have certain drawbacks:

  • Costs. In addition to paying for the origin server, you also need to pay for CDN services.
  • Privacy. CDN nodes gain access to static data from the origin server for caching purposes. Some of this data may not be public.
  • Availability. A site’s key traffic may come from regions where the chosen CDN provider has little or no presence. Worse, the provider might even be blocked by local laws.
  • Configuration. Caching requires proper setup. Otherwise, users may receive outdated data. Proper setup requires some technical knowledge.

Of course, we can minimize these drawbacks by carefully selecting the CDN provider and properly configuring the infrastructure they offer.

What Kind of Websites Use CDNs?

In today’s cloud-based reality, websites with multimedia content, high traffic, and a global audience are practically required to use CDN technology. Otherwise, they won’t be able to handle the load effectively.

Yes, websites can function without a CDN, but the question is, how? Slower than with a CDN.

Almost all major websites, online platforms, and services use CDNs for faster loading and increased resilience. These include:

  • Google
  • Amazon
  • Microsoft
  • Apple
  • Netflix
  • Twitch
  • Steam
  • Aliexpress

However, CDNs aren’t just for the big players — smaller websites can benefit too. Several criteria suggest that a website needs distributed caching:

  • International traffic. If users from different countries or continents visit the site. For example, a European media site with Chinese readers.
  • Lots of static content. If the site contains many multimedia files. For example, a designer’s portfolio full of photos and videos.
  • Traffic spikes. If the site experiences sharp increases in traffic. For example, an online store running frequent promotions or ads.

That said, there are cases where using a CDN makes little sense and only complicates the web project architecture:

  • Local reach. If the site is targeted only at users from a single city or region. For example, a website for a local organization.
  • Low traffic. If only a few dozen or hundreds of users visit the site per day.
  • Simple structure. If the site is a small blog or a minimalist business card site.

Still, the main indicator for needing a CDN is a large volume of multimedia content.

Where Are CDN Servers Located?

While each CDN’s infrastructure is globally distributed, there are priority locations where CDN servers are most concentrated:

  • Capitals and major cities. These areas have better-developed network infrastructure and are more evenly spaced worldwide.
  • Internet exchange points (IXPs). These are locations where internet providers exchange traffic directly. Examples include DE-CIX (Frankfurt), AMS-IX (Amsterdam), LINX (London).
  • Data centers of major providers. These are hubs of major internet backbones that enable fast and affordable data transmission across long distances.

The smallest CDN networks comprise 10 to 150 servers, while the largest can include 300 to 1,500 nodes.

Popular CDN Providers

Here are some of the most popular, large, and technologically advanced CDN providers. Many offer CDN infrastructure as an add-on to their cloud services:

  • Akamai
  • Cloudflare
  • Amazon CloudFront (AWS CDN)
  • Fastly
  • Google Cloud CDN
  • Microsoft Azure CDN

There are also more affordable options:

  • BunnyCDN
  • KeyCDN
  • StackPath

Some providers specialize in CDN infrastructure for specific content types, such as video, streams, music, or games:

  • CDN77
  • Medianova

Choosing the right CDN depends on the business goals, content type, and budget. To find the optimal option, you should consider a few key factors:

  • Goals and purpose. What type of project needs the CDN: blog, online store, streaming service, media outlet?
  • Geography. The provider's network should cover regions where your target audience is concentrated.
  • Content. The provider should support caching and storage for the type of content used in your project.
  • Pricing. Which billing model offers the best value for performance?

In practice, it’s best to test several suitable CDN providers to find the right one for long-term use.

In a way, choosing a CDN provider is like choosing a cloud provider. They all offer similar services, but the implementation always differs.

Conclusion

It’s important to understand that a CDN doesn’t fully store static data; it only distributes copies across its nodes to shorten the distance between the origin server and the user.

Therefore, the main role of a CDN is to speed up loading and optimize traffic. This is made possible through the caching mechanism for static data, which is distributed according to defined rules between the origin server and CDN nodes.

Infrastructure

Similar

Infrastructure

Virtualization vs Containerization: What They Are and When to Use Each

This article explores two popular technologies for abstracting physical hardware: virtualization and containerization. We will provide a general overview of each and also discuss the differences between virtualization and containerization. What Is Virtualization The core component of this technology is the virtual machine (VM). A VM is an isolated software environment that emulates the hardware of a specific platform. In other words, a VM is an abstraction that allows a single physical server to be transformed into multiple virtual ones. Creating a VM makes sense when you need to manage all operating system kernel settings. This avoids kernel conflicts with hardware, supports more features than a specific OS build might provide, and allows you to optimize and install systems with a modified kernel. What Is Containerization Containers work differently: to install and run a container platform, a pre-installed operating system kernel is required (this can also be on a virtual OS). The OS allocates system resources for the containers that provide a fully configured environment for deploying applications. Like virtual machines, containers can be easily moved between servers and provide a certain level of isolation. However, to deploy them successfully, it’s sufficient for the base kernel (e.g., Linux, Windows, or macOS) to match — the specific OS version doesn’t matter. Thus, containers serve as a bridge between the system kernel layer and the application layer. What Is the Difference Between Containerization and Virtualization Some, especially IT beginners, often frame it as "virtualization vs containerization." But these technologies shouldn't be pitted against each other — they actually complement one another. Let’s examine how they differ and where they overlap by looking at how both technologies perform specific functions. Isolation and Security Virtualization makes it possible to fully isolate a VM from the rest of the server, including other VMs. Therefore, VMs are useful when you need to separate your applications from others located on the same servers or within the same cluster. VMs also increase the level of network security. Containerization provides a certain level of isolation, too, but containers are not as robust when it comes to boundary security compared to VMs. However, solutions exist that allow individual containers to be isolated within VMs — one such solution is Hyper-V. Working with the Operating System A VM is essentially a full-fledged OS with its own kernel, which is convenient but imposes high demands on hardware resources (RAM, storage, CPU). Containerization uses only a small fraction of system resources, especially with adapted containers. When forming images in a hypervisor, the minimal necessary software environment is created to ensure the container runs on an OS with a particular kernel. Thus, containerization is much more resource-efficient. OS Updates With virtualization, you have to download and install OS updates on each VM. To install a new OS version, you need to update the VM — in some cases, even create a new one. This consumes a significant amount of time, especially when many virtual machines are deployed. With containers, the situation is similar. First, you modify a file (called a Dockerfile) that contains information about the image. You change the lines that specify the OS version. Then the image is rebuilt and pushed to a registry. But that’s not all: the image must then be redeployed. To do this, you use orchestrators — platforms for managing and scaling containers. Orchestration tools (the most well-known are Kubernetes and Docker Swarm) allow automation of these procedures, but developers must install and learn them first. Deployment Mechanisms To deploy a single VM, Windows (or Linux) tools will suffice, as will the previously mentioned Hyper-V. But if you have two or more VMs, it’s more convenient to use solutions like PowerShell. Single containers are deployed from images via a hypervisor (such as Docker), but for mass deployment, orchestration platforms are essential. So in terms of deployment mechanisms, virtualization and containerization are similar: different tools are used depending on how many entities are being deployed. Data Storage Features With virtualization, VHDs are used when organizing local storage for a single VM. If there are multiple VMs or servers, the SMB protocol is used for shared file access. Hypervisors for containers have their own storage tools. For example, Docker has a local Registry repository that lets you create private storage and track image versions. There is also the public Docker Hub repository, which is used for integration with GitHub. Orchestration platforms offer similar tools: for instance, Kubernetes can set up file storage using Azure’s infrastructure. Load Balancing To balance the load between VMs, they are moved between servers or even clusters, selecting the one with the best fault tolerance. Containers are balanced differently. They can’t be moved per se, but orchestrators provide automatic starting or stopping of individual containers or whole groups. This enables flexible load distribution between cluster nodes. Fault Tolerance Faults are also handled in similar ways. If an individual VM fails, it’s not difficult to transfer that VM to another server and restart the OS there. If there’s an issue with the server hosting the containerization platform, containers can be quickly recreated on another server using the orchestrator. Pros and Cons of Virtualization Advantages: Reliable isolation. Logical VM isolation means failures in one VM don’t affect the others on the same server. VMs also offer a good level of network security: if one VM is compromised, its isolation prevents infection of others. Resource optimization. Several VMs can be deployed on one server, saving on purchasing additional hardware. This also facilitates the creation of clusters in data centers. Flexibility and load balancing. VMs are easily transferred, making it simpler to boost cluster performance and maintain systems. VMs can also be copied and restored from backups. Furthermore, different VMs can run different OSs, and the kernel can be any type — Linux, Windows, or macOS — all on the same server. Disadvantages: Resource consumption. VMs can be several gigabytes in size and consume significant CPU power. There are also limits on how many VMs can run on a single server. Sluggishness. Deployment time depends on how "heavy" the VM is. More importantly, VMs are not well-suited to scaling. Using VMs for short-term computing tasks is usually not worthwhile. Licensing issues. Although licensing is less relevant for Russian developers, you still need to consider OS and software licensing costs when deploying VMs — and these can add up significantly in a large infrastructure. Pros and Cons of Containerization Advantages: Minimal resource use. Since all containers share the same OS kernel, much less hardware is needed than with virtual machines. This means you can create far more containers on the same system. Performance. Small image sizes mean containers are deployed and destroyed much faster than virtual machines. This makes containers ideal for developers handling short-term tasks and dynamic scaling. Immutable images. Unlike virtual machines, container images are immutable. This allows the launch of any number of identical containers, simplifying testing. Updating containers is also easy — a new image with updated contents is created on the container platform. Disadvantages: Compatibility issues. Containers created in one hypervisor (like Docker) may not work elsewhere. Problems also arise with orchestrators: for example, Docker Swarm may not work properly with OpenShift, unlike Kubernetes. Developers need to carefully choose their tools. Limited lifecycle. While persistent container storage is possible, special tools (like Docker Data Volumes) are required. Otherwise, once a container is deleted, all its data disappears. You must plan ahead for data backup. Application size. Containers are designed for microservices and app components. Heavy containers, such as full-featured enterprise software, can cause deployment and performance issues. Conclusion Having explored the features of virtualization and containerization, we can draw a logical conclusion: each technology is suited to different tasks. Containers are fast and efficient, use minimal hardware resources, and are ideal for developers working with microservices architecture and application components. Virtual machines are full-fledged OS environments, suitable for secure corporate software deployment. Therefore, these technologies do not compete — they complement each other.
10 June 2025 · 7 min to read
Infrastructure

Top RDP Clients for Linux in 2025: Remote Access Tools for Every Use Case

RDP (Remote Desktop Protocol) is a proprietary protocol for accessing a remote desktop. All modern Windows operating systems have it by default. However, a Linux system with a graphical interface and the xrdp package installed can also act as a server. This article focuses on Linux RDP clients and the basic principles of how the protocol works. Remote Desktop Protocol RDP operates at the application layer of the OSI model and is based on the Transport Layer Protocol (TCP). Its operation follows this process: A connection is established using TCP at the transport layer. An RDP session is initialized. The RDP client authenticates, and data transmission parameters are negotiated. A remote session is launched: the RDP client takes control of the server. The server is the computer being remotely accessed. The RDP client is the application on the computer used to initiate the connection. During the session, all computational tasks are handled by the server. The RDP client receives the graphical interface of the server's OS, which is controlled using input devices. The graphical interface may be transmitted as a full graphical copy or as graphical primitives (rectangles, circles, text, etc.) to save bandwidth. By default, RDP uses port 3389, but this can be changed if necessary. A typical use case is managing a Windows remote desktop from a Linux system. From anywhere in the world, you can connect to it via the internet and work without worrying about the performance of the RDP client. Originally, RDP was introduced in Windows NT 4.0. It comes preinstalled in all modern versions of Windows. However, implementing a Linux remote desktop solution requires special software. RDP Security Two methods are used to ensure the security of an RDP session: internal and external. Standard RDP Security: This is an internal security subsystem. The server generates RSA keys and a public key certificate. When connecting, the RDP client receives these. If confirmed, authentication takes place. Enhanced RDP Security: This uses external tools to secure the session, such as TLS encryption. Advantages of RDP RDP is network-friendly: it can work over NAT, TCP, or UDP, supports port forwarding, and is resilient to connection drops. Requires only 300–500 Kbps bandwidth. A powerful server can run demanding apps even on weak RDP clients. Supports Linux RDP connections to Windows. Disadvantages of RDP Applications sensitive to latency, like games or video streaming, may not perform well. Requires a stable server. File and document transfer between the client and server may be complicated due to internet speed limitations. Configuring an RDP Server on Windows The most common RDP use case is connecting to a Windows server from another system, such as a Linux client. To enable remote access, the target system must be configured correctly. The setup is fairly simple and works "out of the box" on most modern Windows editions.  Enable remote desktop access via the Remote Access tab in System Properties. Select the users who can connect (by default, only administrators). Check firewall settings. Some profiles like “Public” or “Private” may block RDP by default. If the server is not in a domain, RDP might not work until you allow it manually via Windows Firewall → Allowed Apps. If behind a router, you might need to configure port forwarding via the router’s web interface (typically under Port Forwarding). Recall that RDP uses TCP port 3389 by default. Best RDP Clients for Linux Remmina Website: remmina.org Remmina is a remote desktop client with a graphical interface, written in GTK+ and licensed under GPL. In addition to RDP, it supports VNC, NX, XDMCP, SPICE, X2Go, and SSH. One of its key features is extensibility via plugins. By default, RDP is not available until you install the freerdp plugin. After installing the plugin, restart Remmina, and RDP will appear in the menu. To connect: Add a new connection. Fill in connection settings (you only need the remote machine's username and IP). Customize further if needed (bandwidth, background, hotkeys, themes, etc.). Save the connection — now you can connect with two clicks from the main menu. If you need to run Remmina on Windows, a guide is available on the official website. FreeRDP Website: freerdp.com FreeRDP is a fork of the now-unsupported rdesktop project and is actively maintained under the Apache license. FreeRDP is a terminal-based client. It is configured and launched entirely via the command line. Its command structure is similar to rdesktop, for example: xfreerdp -u USERNAME -p PASSWORD -g WIDTHxHEIGHT IP This command connects to the server at the given IP using the specified credentials and screen resolution. KRDC Website: krdc KRDC (KDE Remote Desktop Client) is the official remote desktop client for KDE that supports RDP and VNC protocols. It offers a clean and straightforward interface consistent with KDE's Plasma desktop environment. KRDC is ideal for users of KDE-based distributions like Kubuntu, openSUSE KDE, and Fedora KDE Spin. It integrates well with KDE's network tools and provides essential features such as full-screen mode, session bookmarking, and network browsing via Zeroconf/Bonjour. KRDC is actively maintained by the KDE community and is available through most Linux package managers. GNOME Connections Website: gnome-connections Vinagre was the former GNOME desktop's default remote desktop client. GNOME Connections, a modernized remote desktop tool for GNOME environments, has since replaced it. GNOME Connections supports RDP and VNC, providing a simple and user-friendly interface that matches the GNOME design language. It focuses on ease of use rather than configurability, making it ideal for non-technical users or quick access needs. Features: Bookmarking for quick reconnections Simple RDP session management Seamless integration into GNOME Shell Connections is maintained as part of the official GNOME project and is available in most distribution repositories. Apache Guacamole Website: guacamole.apache.org This is the simplest yet most complex remote desktop software for Linux. Simple because it works directly in a browser — no additional programs or services are needed. Complex because it requires one-time server installation and configuration. Apache Guacamole is a client gateway for remote connections that works over HTML5. It supports Telnet, SSH, VNC, and RDP — all accessible via a web interface. Although the documentation is extensive, many ready-made scripts exist online to simplify basic setup. To install: wget https://git.io/fxZq5 -O guac-install.sh chmod +x guac-install.sh ./guac-install.sh After installation, the script will provide a connection address and password. To connect to a Windows server via RDP: Open the Admin Panel, go to Settings → Connections, and create a new connection. Enter the username and IP address of the target machine — that's all you need. The connection will now appear on the main page, ready for use. Conclusion RDP is a convenient tool for connecting to a remote machine running Windows or a Linux system with a GUI. The server requires minimal setup — just a few settings and firewall adjustments — and the variety of client programs offers something for everyone.
09 June 2025 · 6 min to read
Infrastructure

Docker Container Storage and Registries: How to Store, Manage, and Secure Your Images

Docker containerization offers many benefits, one of which is image layering, enabling fast container generation. However, containers have limitations — for instance, persistent data needs careful planning, as all data within a container is lost when it's destroyed. In this article, we’ll look at how to solve this issue using Docker’s native solution called Docker Volumes, which allows the creation of persistent Docker container storage. What Happens to Data Written Inside a Container To begin, let’s open a shell inside a container using the following command: docker run -it --rm busybox Now let’s try writing some data to the container: echo "Hostman" > /tmp/data cat /tmp/data Hostman We can see that the data is written, but where exactly? If you're familiar with Docker, you might know that images are structured like onions — layers stacked on top of each other, with the final layer finalizing the image. Each layer can only be written once and becomes read-only afterward. When a container is created, Docker adds another layer for handling write operations. Since container lifespans are limited, all data disappears once the container is gone. This can be a serious problem if the container holds valuable information. To solve this, Docker provides a solution called Docker Volumes. Let’s look at what it is and how it works. Docker Volumes Docker Volumes provide developers with persistent storage for containers. This tool decouples data from the container’s lifecycle, allowing access to container data at any time. As a result, data written inside containers remains available even after the container is destroyed, and it can be reused by other containers. This is a useful solution for sharing data between Docker containers and also enables new containers to connect to the existing storage. How Docker Volumes Work A directory is created on the server and then mounted into one or more containers. This directory is independent because it is not included in the Docker image layer structure, which allows it to bypass the read-only restriction of the image layers for containers that include such a directory. To create a volume, use the following command: docker volume create Now, let’s check its location using: docker volume inspect volume_name The volume name usually consists of a long alphanumeric string. In response, Docker will display information such as the time the volume was created and other metadata, including the Mountpoint. This line shows the path to the volume. To view the data stored in the volume, simply open the specified directory. There are also other ways to create a Docker Volume. For example, the -v option can be added directly during container startup, allowing you to create a volume on the fly: docker run -it --rm -v newdata:/data busybox Let’s break down what’s happening here: The -v argument follows a specific syntax, indicated by the colon right after the volume name (in this case, we chose a very creative name, newdata). After the colon, the mount path inside the container is specified. Now, you can write data to this path, for example: echo "Cloud" > /data/cloud Data written this way can easily be found at the mount path. As seen in the example above, the volume name is not arbitrary — it matches the name we provided using -v. However, Docker Volumes also allow for randomly generated names, which are always unique to each host. If you’re assigning names manually, make sure they are also unique. Now, run the command: docker volume ls If the volume appears in the list, it means any number of other containers can use it. To test this, you can run: docker run -it --rm -v newdata:/data busybox Then write something to the volume. Next, start another container using the exact same command and you’ll see that the data is still there and accessible — meaning it can be reused. Docker Volumes in Practice Now let’s take a look at how Docker Volumes can be used in practice. Suppose we're developing an application to collect specific types of data — let’s say football statistics. We gather this data and plan to use it later for analysis — for example, to assess players’ transfer market values or for betting predictions. Let’s call our application FootballStats. Preserving Data After Container Removal Obviously, if we don’t use Docker Volumes, all the collected statistics will simply be lost as soon as the container that stored them is destroyed. Therefore, we need to store the data in volumes so it can be reused later. To do this, we use the familiar -v option:  -v footballstats:/dir/footballstats This will allow us to store match statistics in the /dir/footballstats directory, on top of all container layers. Sharing Data Suppose the FootballStats container has already gathered a certain amount of data, and now it's time to analyze it. For instance, we might want to find out how a particular team performed in the latest national championship or how a specific player did — goals, assists, cards, etc. To do this, we can mount our volume into a new container, which we’ll call FootballStats-Analytics. The key advantage of this setup is that the new container can read the data without interfering with the original FootballStats container’s ongoing data collection. At the same time, analysis of the incoming data can be performed using defined parameters and algorithms. This information can be stored anywhere, either in the existing volume or a new one, if needed. Other Types of Mounts In addition to standard volumes, Docker Volumes also supports other types of mounts designed to solve specialized tasks: Bind Mount Bind mounts are used to attach an existing path on the host to a container. This is useful for including configuration files, datasets, or static assets from websites. To specify directories for mounting into the container, use the --mount option with the syntax <host path>:<container path>. Tmpfs Mount Tmpfs mounts serve the opposite purpose of regular Docker Volumes — they do not persist data after the container is destroyed. This can be useful for developers who perform extensive logging. In such cases, continuously writing temporary data to disk can significantly degrade system performance. The --tmpfs option creates temporary in-memory directories, avoiding constant access to the file system. Drivers Docker Volume Drivers are a powerful tool that enable flexible volume management. They allow you to specify various storage options, the most important being the storage location — which can be local or remote, even outside the physical or virtual infrastructure of the provider. This ensures that data can survive not only the destruction of the container but even the shutdown of the host itself. Conclusion So, we’ve learned how to create and manage storage using Docker Volumes. For more information on how to modify container storage in Docker, refer to the platform’s official documentation. 
09 June 2025 · 6 min to read

Do you have questions,
comments, or concerns?

Our professionals are available to assist you at any moment,
whether you need help or are just unsure of where to start.
Email us
Hostman's Support