Sign In
Sign In

Network Protocols: What They Are and How They Work

Network Protocols: What They Are and How They Work
Hostman Team
Technical writer
Infrastructure

A network protocol is a set of rules and agreements used to facilitate communication between devices at a specific network layer. Protocols define and regulate how information is exchanged between participants in computer networks. Many protocols are involved in network operation. For example, loading a webpage in a browser is the result of a process governed by several protocols:

  • HTTP: The browser forms a request to the server.
  • DNS: The browser resolves the domain name to an IP address.
  • TCP: A connection is established, and data integrity is ensured.
  • IP: Network addressing is performed.
  • Ethernet: Physical data transmission occurs between devices on the network.

These numerous protocols can be categorized according to the network layers they operate on. The most common network models are the OSI and TCP/IP models. In this article, we will explain these models and describe the most widely used protocols.

Key Terminology

This section introduces essential network-related terms needed for understanding the rest of the article.

Network. A network is a collection of digital devices and systems that are connected to each other (physically or logically) and exchange data. Network elements may include servers, computers, phones, routers, even a smart Wi-Fi-enabled lightbulb—and the list goes on. The size of a network can vary significantly—even two devices connected by a cable form a network. Data transmitted over a network is packaged into packets, which are special blocks of data. Protocols define the rules for creating and handling these packets.

Some communication systems, such as point-to-point telecommunications, do not support packet-based transmission and instead transmit data as a continuous bit stream. Packet-based transmission enables more efficient traffic distribution among network participants.

Network Node. A node is any device that is part of a computer network. Nodes are typically divided into two types:

  • End Nodes. These are devices that send and/or receive data. Simply put, these are sources or destinations of information.
  • Intermediate Nodes. These nodes connect end nodes together.

For example, a smartphone sends a request to a server via Wi-Fi. The smartphone and server are end nodes, while the Wi-Fi router is an intermediate node. Depending on node placement and quantity, a network may be classified as:

  • Global Network. A network that spans the entire globe. The most well-known example is the Internet.
  • Local Network (LAN). A network covering a limited area. For example, your home Wi-Fi connects your phone, computer, and laptop into a local network. The router (an intermediate node) acts as a bridge to the global network. An exception to geographic classification is networks of space-based systems, such as satellites or orbital stations.
  • Distributed Network. A network with geographically distributed nodes.

Network Medium. This refers to the environment in which data transmission occurs. The medium can be cables, wires, air, or optical fiber. If copper wire is used, data is transmitted via electricity; with fiber optics, data is transmitted via light pulses. If no cables are used and data is transmitted wirelessly, radio waves are used.

OSI Model

In the early days of computer networks, no universal model existed to standardize network operation and design. Each company implemented its own approach, often incompatible with others.

This fragmented landscape became problematic—networks, which were supposed to connect computers, instead created barriers due to incompatible architectures. In 1977, the International Organization for Standardization (ISO) took on the task of solving this issue. After seven years of research, the OSI model was introduced in 1984.

OSI stands for Open Systems Interconnection, meaning systems that use publicly available specifications to allow interoperability, regardless of their architecture. (This "openness" should not be confused with Open Source.)

The model consists of seven network layers, each responsible for specific tasks. Let’s look at each:

1. Physical Layer

This layer deals with the physical aspects of data transmission, including transmission methods, medium characteristics, and signal modulation.

2. Data Link Layer

The data link layer operates within a local network. It frames the raw bit stream from the physical layer into recognizable data units (frames), determines start and end points, handles addressing within a local network, detects errors, and ensures data integrity. Standard protocols are Ethernet and PPP.

3. Network Layer

This layer handles communication between different networks. It builds larger networks from smaller subnets and provides global addressing and routing, selecting the optimal path. For example, the IP protocol, which gives each device a unique address, operates at this layer. Key protocols are IP and ICMP.

4. Transport Layer

The transport layer ensures end-to-end communication between processes on different computers. It directs data to the appropriate application using ports. Protocols such as:

  • UDP — Unreliable transmission of datagrams.
  • TCP — Reliable byte-stream transmission.

5. Session Layer

This layer manages communication sessions: establishing, maintaining, and terminating connections, as well as synchronizing data.

6. Presentation Layer

Responsible for translating data formats into forms understandable to both sender and receiver. Examples: text encoding (ASCII, UTF-8), file formats (JPEG, PNG, GIF), encryption and decryption.

7. Application Layer

The user-facing layer where applications operate. Examples include web browsers using HTTP, email clients, and video/audio communication apps.

Some OSI protocols span more than one layer. For instance, Ethernet covers both the physical and data link layers.

When data is sent from one node to another, it passes through each OSI layer from top to bottom. Each layer processes and encapsulates the data before passing it to the next lower layer. This process is called encapsulation.

On the receiving end, the process is reversed: each layer decapsulates and processes the data, from bottom to top, until it reaches the application. This is called decapsulation.

While the OSI model is not used in practical network implementations today, it remains highly valuable for educational purposes, as many network architectures share similar principles.

TCP/IP

While the OSI model was being developed and debated over, others were implementing practical solutions. The most widely adopted was the TCP/IP stack, also known as the DoD model.

According to RFC 1122, the TCP/IP model has four layers:

  1. Application Layer
  2. Transport Layer
  3. Internet Layer (sometimes just called "Network")
  4. Link Layer (also called Network Access or Interface Layer)

Though different in structure, TCP/IP follows the same fundamental principles as OSI. For example:

  • The OSI session, presentation, and application layers are merged into a single application layer in TCP/IP.
  • The OSI physical and data link layers are merged into the link layer in TCP/IP.

Since terminology may vary across sources, we will clarify which model we are referring to throughout this article.

Let’s take a closer look at each layer and the protocols involved, starting from the bottom.

Data Link Layer in TCP/IP

As mentioned earlier, the Data Link Layer in the TCP/IP model combines two layers from the OSI model: the Data Link and Physical layers. The most widely used data link protocol in TCP/IP is Ethernet, so we’ll focus on that.

Ethernet

Let’s forget about IP addresses and network models for a moment. Imagine a local network consisting of 4 computers and a switch. We'll ignore the switch itself; in our example, it's simply a device that connects the computers into a single local network.

40509b52 7906 4baa 8c97 58e17a7b9851

Each computer has its own MAC address. In our simplified example, a MAC address consists of 3 numbers, which is not accurate in reality.

MAC Address

In reality, a MAC address is 48 bits long. It’s a unique identifier assigned to a network device. If two devices have the same MAC address, it can cause network issues.

The first 24 bits of a MAC address are assigned by the IEEE — an organization responsible for developing electronics and telecommunications standards. The device manufacturer assigns the remaining 24 bits.

Now, back to our local network. If one computer wants to send data to another, it needs the recipient's MAC address.

Data in Ethernet networks is transmitted in the form of Ethernet frames. Ethernet is a relatively old protocol, developed in 1973, and has gone through several upgrades and format changes over time.

Here are the components of an Ethernet frame:

  • Preamble indicates the beginning of a frame.
  • Destination MAC address is the recipient’s address.
  • Source MAC address is the sender’s address.
  • Type/Length indicates the network protocol being used, such as IPv4 or IPv6.
  • SNAP/LLC and Data are the payload. Ethernet frames have a minimum size requirement to prevent collisions.
  • FCS (Frame Check Sequence) is a checksum used to detect transmission errors.

ARP

So far, we’ve talked about a simple local network where all nodes share the same data link environment. That’s why this is called the data link layer. However, MAC addressing alone is not enough for modern TCP/IP networks. It works closely with IP addressing, which belongs to the network layer.

We’ll go into more detail on IP in the network layer section. For now, let’s look at how IP addresses interact with MAC addresses. Let’s assign an IP address to each computer:

A2d608a0 062b 43aa A4a3 Fece0cf37348

In everyday life, we rarely interact with MAC addresses directly — computers do that. Instead, we use IP addresses or domain names. The ARP (Address Resolution Protocol) helps map an IP address to its corresponding MAC address.

When a computer wants to send data but doesn’t know the recipient’s MAC address, it broadcasts a message like: "Computer with IP 1.1.1.2, please send your MAC address to the computer with MAC:333."

If a computer with that IP exists on the network, it replies: "1.1.1.2 — that’s me, my MAC is 111."

So far, we've worked within a single network. Now, let’s expand to multiple subnets.

Network Layer Protocols in TCP/IP

Now we add a router to our local network and connect it to another subnet.

4b9809c6 Bba0 495b A601 A76596f596f9

Two networks are connected via the router. This device acts as an intermediate node, allowing communication between different data link environments. In simple terms, it allows a computer from one subnet to send data to a computer in another subnet.

How does a device know it’s sending data outside its own subnet?

Every network has a parameter called a subnet mask. By applying this mask to a node’s IP address, the device can determine the subnet address. This is done using a bitwise AND operation.

You can check the subnet mask in Windows using the ipconfig command: 

Image1

In this example, the mask is 255.255.255.0.

This is a common subnet mask. It means that if the first three octets of two IP addresses match, they are in the same subnet.

For example:

  • IP 1.1.1.2 and 1.1.1.3 are in the same subnet.
  • IP 1.1.2.2 is in a different subnet.

When a device detects that the recipient is in another subnet, it sends data to the default gateway, which is the router’s IP address.

Let’s simulate a situation:

A device with MAC 111 wants to send data to the IP 1.1.2.3. The sender realizes this is a different subnet and sends the data to the default gateway. First, it uses ARP to get the MAC address of the gateway, then sends the packet.

The router receives the packet, sees that the destination IP is different, and forwards the data. In the second subnet, it again uses ARP to find the MAC address of the target device and finally delivers the data.

IP Protocol

The IP (Internet Protocol) was introduced in the 1980s to connect computer networks. Today, there are two versions:

  • IPv4 – uses 32-bit addressing. The number of available IP addresses is limited.
  • IPv6 – uses 128-bit addressing and was introduced to solve IPv4 address exhaustion. In IPv6, ARP is not used.

Both protocols serve the same function. IPv6 was meant to replace IPv4, but because of technologies like NAT, IPv4 is still widely used. In this guide, we’ll focus on IPv4.

An IP packet consists of the following fields:

  • Version – IPv4 or IPv6.
  • IHL (Internet Header Length) – indicates the size of the header.
  • Type of Service – used for QoS (Quality of Service).
  • Total Length – includes header and data.
  • Identification – groups fragmented packets together.
  • Flags – indicate if a packet is fragmented.
  • Fragment Offset – position of the fragment.
  • Time to Live (TTL) – limits the number of hops.
  • Protocol – defines the transport protocol (e.g., TCP, UDP).
  • Header Checksum – verifies the header’s integrity.
  • Source IP Address
  • Destination IP Address
  • Options – additional parameters for special use.
  • Data – the actual payload.

Transport Layer Protocols

The most common transport layer protocols in TCP/IP are UDP and TCP. They deliver data to specific applications identified by port numbers. Let’s start with UDP — it’s simpler than TCP.

UDP

A UDP datagram contains:

  • Source port
  • Destination port
  • Length
  • Checksum
  • Payload (from the higher layer)

UDP’s role is to handle ports and verify frames. However, it does not guarantee delivery. If some data is lost or corrupted, UDP will not request a retransmission — unlike TCP.

TCP

TCP packets are called segments. A TCP segment includes:

  • Source and destination ports
  • Sequence number
  • Acknowledgment number (used for confirming receipt)
  • Header length
  • Reserved flags
  • Control flags (for establishing or ending connections)
  • Window size (how many segments should be acknowledged)
  • Checksum
  • Urgent pointer
  • Options
  • Data (from the higher layer)

TCP guarantees reliable data transmission. A connection is established between endpoints before sending data. If delivery cannot be guaranteed, the connection is terminated. TCP handles packet loss, ensures order, and reassembles fragmented data.

Application Layer Protocols

In both the TCP/IP model and the OSI model, the top layer is the application layer.

Here are some widely used application protocols:

  • DNS (Domain Name System) – resolves domain names to IP addresses.
  • HTTP – transfers hypertext over the web, allowing communication between browsers and web servers.
  • HTTPS – does the same as HTTP, but with encryption for secure communication.

DNS servers use UDP, which is faster but less reliable. In contrast, protocols like FTP and HTTP rely on TCP, which provides reliable delivery.

Other popular application protocols include:

  • FTP (File Transfer Protocol) – for managing file transfers.
  • POP3 (Post Office Protocol version 3) – used by email clients to retrieve messages.
  • IMAP (Internet Message Access Protocol) – allows access to emails over the internet.

Conclusion

This guide covered the most commonly used protocols in computer networks. These protocols form the backbone of most real-world network communications. In total, there are around 7,000 protocols, many of which are used for more specialized tasks.

Infrastructure

Similar

Infrastructure

Virtualization vs Containerization: What They Are and When to Use Each

This article explores two popular technologies for abstracting physical hardware: virtualization and containerization. We will provide a general overview of each and also discuss the differences between virtualization and containerization. What Is Virtualization The core component of this technology is the virtual machine (VM). A VM is an isolated software environment that emulates the hardware of a specific platform. In other words, a VM is an abstraction that allows a single physical server to be transformed into multiple virtual ones. Creating a VM makes sense when you need to manage all operating system kernel settings. This avoids kernel conflicts with hardware, supports more features than a specific OS build might provide, and allows you to optimize and install systems with a modified kernel. What Is Containerization Containers work differently: to install and run a container platform, a pre-installed operating system kernel is required (this can also be on a virtual OS). The OS allocates system resources for the containers that provide a fully configured environment for deploying applications. Like virtual machines, containers can be easily moved between servers and provide a certain level of isolation. However, to deploy them successfully, it’s sufficient for the base kernel (e.g., Linux, Windows, or macOS) to match — the specific OS version doesn’t matter. Thus, containers serve as a bridge between the system kernel layer and the application layer. What Is the Difference Between Containerization and Virtualization Some, especially IT beginners, often frame it as "virtualization vs containerization." But these technologies shouldn't be pitted against each other — they actually complement one another. Let’s examine how they differ and where they overlap by looking at how both technologies perform specific functions. Isolation and Security Virtualization makes it possible to fully isolate a VM from the rest of the server, including other VMs. Therefore, VMs are useful when you need to separate your applications from others located on the same servers or within the same cluster. VMs also increase the level of network security. Containerization provides a certain level of isolation, too, but containers are not as robust when it comes to boundary security compared to VMs. However, solutions exist that allow individual containers to be isolated within VMs — one such solution is Hyper-V. Working with the Operating System A VM is essentially a full-fledged OS with its own kernel, which is convenient but imposes high demands on hardware resources (RAM, storage, CPU). Containerization uses only a small fraction of system resources, especially with adapted containers. When forming images in a hypervisor, the minimal necessary software environment is created to ensure the container runs on an OS with a particular kernel. Thus, containerization is much more resource-efficient. OS Updates With virtualization, you have to download and install OS updates on each VM. To install a new OS version, you need to update the VM — in some cases, even create a new one. This consumes a significant amount of time, especially when many virtual machines are deployed. With containers, the situation is similar. First, you modify a file (called a Dockerfile) that contains information about the image. You change the lines that specify the OS version. Then the image is rebuilt and pushed to a registry. But that’s not all: the image must then be redeployed. To do this, you use orchestrators — platforms for managing and scaling containers. Orchestration tools (the most well-known are Kubernetes and Docker Swarm) allow automation of these procedures, but developers must install and learn them first. Deployment Mechanisms To deploy a single VM, Windows (or Linux) tools will suffice, as will the previously mentioned Hyper-V. But if you have two or more VMs, it’s more convenient to use solutions like PowerShell. Single containers are deployed from images via a hypervisor (such as Docker), but for mass deployment, orchestration platforms are essential. So in terms of deployment mechanisms, virtualization and containerization are similar: different tools are used depending on how many entities are being deployed. Data Storage Features With virtualization, VHDs are used when organizing local storage for a single VM. If there are multiple VMs or servers, the SMB protocol is used for shared file access. Hypervisors for containers have their own storage tools. For example, Docker has a local Registry repository that lets you create private storage and track image versions. There is also the public Docker Hub repository, which is used for integration with GitHub. Orchestration platforms offer similar tools: for instance, Kubernetes can set up file storage using Azure’s infrastructure. Load Balancing To balance the load between VMs, they are moved between servers or even clusters, selecting the one with the best fault tolerance. Containers are balanced differently. They can’t be moved per se, but orchestrators provide automatic starting or stopping of individual containers or whole groups. This enables flexible load distribution between cluster nodes. Fault Tolerance Faults are also handled in similar ways. If an individual VM fails, it’s not difficult to transfer that VM to another server and restart the OS there. If there’s an issue with the server hosting the containerization platform, containers can be quickly recreated on another server using the orchestrator. Pros and Cons of Virtualization Advantages: Reliable isolation. Logical VM isolation means failures in one VM don’t affect the others on the same server. VMs also offer a good level of network security: if one VM is compromised, its isolation prevents infection of others. Resource optimization. Several VMs can be deployed on one server, saving on purchasing additional hardware. This also facilitates the creation of clusters in data centers. Flexibility and load balancing. VMs are easily transferred, making it simpler to boost cluster performance and maintain systems. VMs can also be copied and restored from backups. Furthermore, different VMs can run different OSs, and the kernel can be any type — Linux, Windows, or macOS — all on the same server. Disadvantages: Resource consumption. VMs can be several gigabytes in size and consume significant CPU power. There are also limits on how many VMs can run on a single server. Sluggishness. Deployment time depends on how "heavy" the VM is. More importantly, VMs are not well-suited to scaling. Using VMs for short-term computing tasks is usually not worthwhile. Licensing issues. Although licensing is less relevant for Russian developers, you still need to consider OS and software licensing costs when deploying VMs — and these can add up significantly in a large infrastructure. Pros and Cons of Containerization Advantages: Minimal resource use. Since all containers share the same OS kernel, much less hardware is needed than with virtual machines. This means you can create far more containers on the same system. Performance. Small image sizes mean containers are deployed and destroyed much faster than virtual machines. This makes containers ideal for developers handling short-term tasks and dynamic scaling. Immutable images. Unlike virtual machines, container images are immutable. This allows the launch of any number of identical containers, simplifying testing. Updating containers is also easy — a new image with updated contents is created on the container platform. Disadvantages: Compatibility issues. Containers created in one hypervisor (like Docker) may not work elsewhere. Problems also arise with orchestrators: for example, Docker Swarm may not work properly with OpenShift, unlike Kubernetes. Developers need to carefully choose their tools. Limited lifecycle. While persistent container storage is possible, special tools (like Docker Data Volumes) are required. Otherwise, once a container is deleted, all its data disappears. You must plan ahead for data backup. Application size. Containers are designed for microservices and app components. Heavy containers, such as full-featured enterprise software, can cause deployment and performance issues. Conclusion Having explored the features of virtualization and containerization, we can draw a logical conclusion: each technology is suited to different tasks. Containers are fast and efficient, use minimal hardware resources, and are ideal for developers working with microservices architecture and application components. Virtual machines are full-fledged OS environments, suitable for secure corporate software deployment. Therefore, these technologies do not compete — they complement each other.
10 June 2025 · 7 min to read
Infrastructure

Top RDP Clients for Linux in 2025: Remote Access Tools for Every Use Case

RDP (Remote Desktop Protocol) is a proprietary protocol for accessing a remote desktop. All modern Windows operating systems have it by default. However, a Linux system with a graphical interface and the xrdp package installed can also act as a server. This article focuses on Linux RDP clients and the basic principles of how the protocol works. Remote Desktop Protocol RDP operates at the application layer of the OSI model and is based on the Transport Layer Protocol (TCP). Its operation follows this process: A connection is established using TCP at the transport layer. An RDP session is initialized. The RDP client authenticates, and data transmission parameters are negotiated. A remote session is launched: the RDP client takes control of the server. The server is the computer being remotely accessed. The RDP client is the application on the computer used to initiate the connection. During the session, all computational tasks are handled by the server. The RDP client receives the graphical interface of the server's OS, which is controlled using input devices. The graphical interface may be transmitted as a full graphical copy or as graphical primitives (rectangles, circles, text, etc.) to save bandwidth. By default, RDP uses port 3389, but this can be changed if necessary. A typical use case is managing a Windows remote desktop from a Linux system. From anywhere in the world, you can connect to it via the internet and work without worrying about the performance of the RDP client. Originally, RDP was introduced in Windows NT 4.0. It comes preinstalled in all modern versions of Windows. However, implementing a Linux remote desktop solution requires special software. RDP Security Two methods are used to ensure the security of an RDP session: internal and external. Standard RDP Security: This is an internal security subsystem. The server generates RSA keys and a public key certificate. When connecting, the RDP client receives these. If confirmed, authentication takes place. Enhanced RDP Security: This uses external tools to secure the session, such as TLS encryption. Advantages of RDP RDP is network-friendly: it can work over NAT, TCP, or UDP, supports port forwarding, and is resilient to connection drops. Requires only 300–500 Kbps bandwidth. A powerful server can run demanding apps even on weak RDP clients. Supports Linux RDP connections to Windows. Disadvantages of RDP Applications sensitive to latency, like games or video streaming, may not perform well. Requires a stable server. File and document transfer between the client and server may be complicated due to internet speed limitations. Configuring an RDP Server on Windows The most common RDP use case is connecting to a Windows server from another system, such as a Linux client. To enable remote access, the target system must be configured correctly. The setup is fairly simple and works "out of the box" on most modern Windows editions.  Enable remote desktop access via the Remote Access tab in System Properties. Select the users who can connect (by default, only administrators). Check firewall settings. Some profiles like “Public” or “Private” may block RDP by default. If the server is not in a domain, RDP might not work until you allow it manually via Windows Firewall → Allowed Apps. If behind a router, you might need to configure port forwarding via the router’s web interface (typically under Port Forwarding). Recall that RDP uses TCP port 3389 by default. Best RDP Clients for Linux Remmina Website: remmina.org Remmina is a remote desktop client with a graphical interface, written in GTK+ and licensed under GPL. In addition to RDP, it supports VNC, NX, XDMCP, SPICE, X2Go, and SSH. One of its key features is extensibility via plugins. By default, RDP is not available until you install the freerdp plugin. After installing the plugin, restart Remmina, and RDP will appear in the menu. To connect: Add a new connection. Fill in connection settings (you only need the remote machine's username and IP). Customize further if needed (bandwidth, background, hotkeys, themes, etc.). Save the connection — now you can connect with two clicks from the main menu. If you need to run Remmina on Windows, a guide is available on the official website. FreeRDP Website: freerdp.com FreeRDP is a fork of the now-unsupported rdesktop project and is actively maintained under the Apache license. FreeRDP is a terminal-based client. It is configured and launched entirely via the command line. Its command structure is similar to rdesktop, for example: xfreerdp -u USERNAME -p PASSWORD -g WIDTHxHEIGHT IP This command connects to the server at the given IP using the specified credentials and screen resolution. KRDC Website: krdc KRDC (KDE Remote Desktop Client) is the official remote desktop client for KDE that supports RDP and VNC protocols. It offers a clean and straightforward interface consistent with KDE's Plasma desktop environment. KRDC is ideal for users of KDE-based distributions like Kubuntu, openSUSE KDE, and Fedora KDE Spin. It integrates well with KDE's network tools and provides essential features such as full-screen mode, session bookmarking, and network browsing via Zeroconf/Bonjour. KRDC is actively maintained by the KDE community and is available through most Linux package managers. GNOME Connections Website: gnome-connections Vinagre was the former GNOME desktop's default remote desktop client. GNOME Connections, a modernized remote desktop tool for GNOME environments, has since replaced it. GNOME Connections supports RDP and VNC, providing a simple and user-friendly interface that matches the GNOME design language. It focuses on ease of use rather than configurability, making it ideal for non-technical users or quick access needs. Features: Bookmarking for quick reconnections Simple RDP session management Seamless integration into GNOME Shell Connections is maintained as part of the official GNOME project and is available in most distribution repositories. Apache Guacamole Website: guacamole.apache.org This is the simplest yet most complex remote desktop software for Linux. Simple because it works directly in a browser — no additional programs or services are needed. Complex because it requires one-time server installation and configuration. Apache Guacamole is a client gateway for remote connections that works over HTML5. It supports Telnet, SSH, VNC, and RDP — all accessible via a web interface. Although the documentation is extensive, many ready-made scripts exist online to simplify basic setup. To install: wget https://git.io/fxZq5 -O guac-install.sh chmod +x guac-install.sh ./guac-install.sh After installation, the script will provide a connection address and password. To connect to a Windows server via RDP: Open the Admin Panel, go to Settings → Connections, and create a new connection. Enter the username and IP address of the target machine — that's all you need. The connection will now appear on the main page, ready for use. Conclusion RDP is a convenient tool for connecting to a remote machine running Windows or a Linux system with a GUI. The server requires minimal setup — just a few settings and firewall adjustments — and the variety of client programs offers something for everyone.
09 June 2025 · 6 min to read
Infrastructure

Docker Container Storage and Registries: How to Store, Manage, and Secure Your Images

Docker containerization offers many benefits, one of which is image layering, enabling fast container generation. However, containers have limitations — for instance, persistent data needs careful planning, as all data within a container is lost when it's destroyed. In this article, we’ll look at how to solve this issue using Docker’s native solution called Docker Volumes, which allows the creation of persistent Docker container storage. What Happens to Data Written Inside a Container To begin, let’s open a shell inside a container using the following command: docker run -it --rm busybox Now let’s try writing some data to the container: echo "Hostman" > /tmp/data cat /tmp/data Hostman We can see that the data is written, but where exactly? If you're familiar with Docker, you might know that images are structured like onions — layers stacked on top of each other, with the final layer finalizing the image. Each layer can only be written once and becomes read-only afterward. When a container is created, Docker adds another layer for handling write operations. Since container lifespans are limited, all data disappears once the container is gone. This can be a serious problem if the container holds valuable information. To solve this, Docker provides a solution called Docker Volumes. Let’s look at what it is and how it works. Docker Volumes Docker Volumes provide developers with persistent storage for containers. This tool decouples data from the container’s lifecycle, allowing access to container data at any time. As a result, data written inside containers remains available even after the container is destroyed, and it can be reused by other containers. This is a useful solution for sharing data between Docker containers and also enables new containers to connect to the existing storage. How Docker Volumes Work A directory is created on the server and then mounted into one or more containers. This directory is independent because it is not included in the Docker image layer structure, which allows it to bypass the read-only restriction of the image layers for containers that include such a directory. To create a volume, use the following command: docker volume create Now, let’s check its location using: docker volume inspect volume_name The volume name usually consists of a long alphanumeric string. In response, Docker will display information such as the time the volume was created and other metadata, including the Mountpoint. This line shows the path to the volume. To view the data stored in the volume, simply open the specified directory. There are also other ways to create a Docker Volume. For example, the -v option can be added directly during container startup, allowing you to create a volume on the fly: docker run -it --rm -v newdata:/data busybox Let’s break down what’s happening here: The -v argument follows a specific syntax, indicated by the colon right after the volume name (in this case, we chose a very creative name, newdata). After the colon, the mount path inside the container is specified. Now, you can write data to this path, for example: echo "Cloud" > /data/cloud Data written this way can easily be found at the mount path. As seen in the example above, the volume name is not arbitrary — it matches the name we provided using -v. However, Docker Volumes also allow for randomly generated names, which are always unique to each host. If you’re assigning names manually, make sure they are also unique. Now, run the command: docker volume ls If the volume appears in the list, it means any number of other containers can use it. To test this, you can run: docker run -it --rm -v newdata:/data busybox Then write something to the volume. Next, start another container using the exact same command and you’ll see that the data is still there and accessible — meaning it can be reused. Docker Volumes in Practice Now let’s take a look at how Docker Volumes can be used in practice. Suppose we're developing an application to collect specific types of data — let’s say football statistics. We gather this data and plan to use it later for analysis — for example, to assess players’ transfer market values or for betting predictions. Let’s call our application FootballStats. Preserving Data After Container Removal Obviously, if we don’t use Docker Volumes, all the collected statistics will simply be lost as soon as the container that stored them is destroyed. Therefore, we need to store the data in volumes so it can be reused later. To do this, we use the familiar -v option:  -v footballstats:/dir/footballstats This will allow us to store match statistics in the /dir/footballstats directory, on top of all container layers. Sharing Data Suppose the FootballStats container has already gathered a certain amount of data, and now it's time to analyze it. For instance, we might want to find out how a particular team performed in the latest national championship or how a specific player did — goals, assists, cards, etc. To do this, we can mount our volume into a new container, which we’ll call FootballStats-Analytics. The key advantage of this setup is that the new container can read the data without interfering with the original FootballStats container’s ongoing data collection. At the same time, analysis of the incoming data can be performed using defined parameters and algorithms. This information can be stored anywhere, either in the existing volume or a new one, if needed. Other Types of Mounts In addition to standard volumes, Docker Volumes also supports other types of mounts designed to solve specialized tasks: Bind Mount Bind mounts are used to attach an existing path on the host to a container. This is useful for including configuration files, datasets, or static assets from websites. To specify directories for mounting into the container, use the --mount option with the syntax <host path>:<container path>. Tmpfs Mount Tmpfs mounts serve the opposite purpose of regular Docker Volumes — they do not persist data after the container is destroyed. This can be useful for developers who perform extensive logging. In such cases, continuously writing temporary data to disk can significantly degrade system performance. The --tmpfs option creates temporary in-memory directories, avoiding constant access to the file system. Drivers Docker Volume Drivers are a powerful tool that enable flexible volume management. They allow you to specify various storage options, the most important being the storage location — which can be local or remote, even outside the physical or virtual infrastructure of the provider. This ensures that data can survive not only the destruction of the container but even the shutdown of the host itself. Conclusion So, we’ve learned how to create and manage storage using Docker Volumes. For more information on how to modify container storage in Docker, refer to the platform’s official documentation. 
09 June 2025 · 6 min to read

Do you have questions,
comments, or concerns?

Our professionals are available to assist you at any moment,
whether you need help or are just unsure of where to start.
Email us
Hostman's Support