Sign In
Sign In

NoSQL Databases Explained: Types, Use Cases & Core Characteristics

NoSQL Databases Explained: Types, Use Cases & Core Characteristics
Hostman Team
Technical writer
Infrastructure

NoSQL (which stands for "Not Only SQL") represents a new class of data management systems that deviate from the traditional relational approach to information storage. Unlike conventional DBMSs, such as MySQL or PostgreSQL, which store data in tables with fixed structures and strict relationships, NoSQL offers more flexible methods for organizing and storing information. This technology doesn't reject SQL; rather, it expands the ways to handle data.

The origin of the term NoSQL has an interesting backstory that began not with technology but with the name of a tech conference. In 2009, organizers of a database event in San Francisco adopted the term, and it unexpectedly caught on in the industry. Interestingly, a decade earlier, in 1998, developer Carlo Strozzi had already used the term "NoSQL" for his own project, which had no connection to modern non-relational systems.

Modern NoSQL databases fall into several key categories of data storage systems. These include:

  • Document-oriented databases (led by MongoDB)
  • Key-value stores (e.g., Redis)
  • Graph databases (Neo4j is a prominent example)
  • Column-family stores (such as ClickHouse)

The unifying feature among these systems is their rejection of the classic SQL language in favor of proprietary data processing methods.

Unlike relational DBMSs, where SQL serves as a standardized language for querying and joining data through operations like JOIN and UNION, NoSQL databases have developed their own query languages. Each NoSQL database offers a unique syntax for manipulating data. Here are some examples:

// MongoDB (uses a JavaScript-like syntax):
db.users.find({ age: { $gt: 21 } })

// Redis (uses command-based syntax):
HGET user:1000 email
SET session:token "abc123"

NoSQL databases are particularly efficient in handling large volumes of unstructured data. A prime example is the architecture of modern social media platforms, where MongoDB enables storage of a user's profile, posts, responses, and activity in a single document, thereby optimizing data retrieval performance.

NoSQL vs SQL: Relational and Non-Relational Databases

The evolution of NoSQL databases has paralleled the growing complexity of technological and business needs. The modern digital world, which generates terabytes of data every second, necessitated new data processing approaches. As a result, two fundamentally different data management philosophies have emerged:

  1. Relational approach, focused on data integrity and reliability
  2. NoSQL approach, prioritizing adaptability and scalability

Each concept is grounded in its own core principles, which define its practical applications.

Relational systems adhere to ACID principles:

  • Atomicity ensures that transactions are all-or-nothing.
  • Consistency guarantees that data remains valid throughout.
  • Isolation keeps concurrent transactions from interfering.
  • Durability ensures that once a transaction is committed, it remains so.

NoSQL systems follow the BASE principles:

  • Basically Available – the system prioritizes continuous availability.
  • Soft state – the system state may change over time.
  • Eventually consistent – consistency is achieved eventually, not instantly.

Key Differences:

Aspect

Relational Databases

NoSQL Databases

Data Organization

Structured in predefined tables and schemas

Flexible format, supports semi-structured/unstructured data

Scalability

Vertical (via stronger servers)

Horizontal (adding more nodes to the cluster)

Data Integrity

Maintained at the DBMS core level

Managed at the application level

Performance

Efficient for complex transactions

High performance in basic I/O operations

Data Storage

Distributed across multiple interrelated tables

Groups related data into unified blocks/documents

These fundamental differences define their optimal use cases:

  • Relational systems are irreplaceable where data precision is critical (e.g., financial systems).
  • NoSQL solutions excel in processing high-volume data flows (e.g., social media, analytics platforms).

Key Features and Advantages of NoSQL

Most NoSQL systems are open source, allowing developers to explore and modify the core system without relying on expensive proprietary software.

Schema Flexibility

One of the main advantages of NoSQL is its schema-free approach. Unlike relational databases, where altering the schema often requires modifying existing records, NoSQL allows the dynamic addition of attributes without reorganizing the entire database.

// MongoDB: Flexible schema supports different structures in the same collection
db.users.insertMany([
  { name: "Emily", email: "emily@email.com" },
  { name: "Maria", email: "maria@email.com", phone: "+35798765432" },
  { name: "Peter", social: { twitter: "@peter", facebook: "peter.fb" } }
])

Horizontal Scalability

NoSQL databases employ a fundamentally different strategy for boosting performance. While traditional relational databases rely on upgrading a single server, NoSQL architectures use distributed clusters. Performance is improved by adding nodes, with workload automatically balanced across the system.

Sharding and Replication

NoSQL databases support sharding—a method of distributing data across multiple servers. Conceptually similar to RAID 0 (striping), sharding enables:

  • Enhanced system performance
  • Improved fault tolerance
  • Efficient load distribution

High Performance

NoSQL systems offer exceptional performance due to optimized storage mechanisms and avoidance of resource-heavy operations like joins. They perform best in scenarios such as:

  • Basic read/write operations
  • Large-scale data management
  • Concurrent user request handling
  • Unstructured data processing

Handling Unstructured Data

NoSQL excels in working with:

  • Large volumes of unstructured data
  • Heterogeneous data types
  • Rapidly evolving data structures

Support for Modern Technologies

NoSQL databases integrate well with:

  • Cloud platforms
  • Microservice architectures
  • Big Data processing systems
  • Modern development frameworks

Cost Efficiency

NoSQL solutions can be cost-effective due to:

  • Open-source licensing
  • Efficient use of commodity hardware
  • Scalability using standard servers
  • Reduced administrative overhead

Main Types of NoSQL Databases

In modern distributed system development, several core types of NoSQL solutions are distinguished, each with a mature ecosystem and strong community support.

Document-Oriented Databases

Document-based systems are the most mature and widely adopted type of NoSQL databases. MongoDB, the leading technology in this segment, is the benchmark example of document-oriented data storage architecture.

Data Storage Principle

In document-oriented databases, information is stored as documents grouped into collections. Unlike relational databases, where data is distributed across multiple tables, here, all related information about an object is contained within a single document.

Example of a user document with orders:

{
  "_id": ObjectId("507f1f77bcf86cd799439011"),
  "user": {
    "username": "stephanie",
    "email": "steph@example.com",
    "registered": "2024-02-01"
  },
  "orders": [
    {
      "orderId": "ORD-001",
      "date": "2024-02-02",
      "items": [
        {
          "name": "Phone",
          "price": 799.99,
          "quantity": 1
        }
      ],
      "status": "delivered"
    }
  ],
  "preferences": {
    "notifications": true,
    "language": "en"
  }
}

Basic Operations with MongoDB

// Insert a document
db.users.insertOne({
  username: "stephanie",
  email: "steph@example.com"
})

// Find documents
db.users.find({ "preferences.language": "en" })

// Update data
db.users.updateOne(
  { username: "stephanie" },
  { $set: { "preferences.notifications": false }}
)

// Delete a document
db.users.deleteOne({ username: "stephanie" })

Advantages of the Document-Oriented Approach

Flexible Data Schema

  • Each document can have its own structure
  • Easy to add new fields
  • No need to modify the overall database schema

Natural Data Representation

  • Documents resemble programming objects
  • Intuitive structure
  • Developer-friendly

Performance

  • Fast retrieval of complete object data
  • Efficient handling of nested structures
  • Horizontal scalability

Working with Hierarchical Data

  • Naturally stores tree-like structures
  • Convenient nested object representation
  • Effective processing of complex structures

Use Cases

The architecture is particularly effective in:

  • Developing systems with dynamically evolving data structures
  • Processing large volumes of unstandardized data
  • Building high-load distributed platforms

Typical Use Scenarios

  • Digital content management platforms
  • Distributed social media platforms
  • Enterprise content organization systems
  • Event aggregation and analytics services
  • Complex analytical platforms

Key-Value Stores

Among key-value stores, Redis (short for Remote Dictionary Server) holds a leading position in the NoSQL market. A core architectural feature of this technology is that the entire data set is stored in memory, ensuring exceptional performance.

Working Principle

The architecture of key-value stores is based on three fundamental components for each data record:

  • Unique key (record identifier)
  • Associated data (value)
  • Optional TTL (Time To Live) parameter

Data Types in Redis

# Strings
SET user:name "Stephanie"
GET user:name

# Lists
LPUSH notifications "New message"
RPUSH notifications "Payment received"

# Sets
SADD user:roles "admin" "editor"
SMEMBERS user:roles

# Hashes
HSET user:1000 name "Steph" email "steph@example.com"
HGET user:1000 email

# Sorted Sets
ZADD leaderboard 100 "player1" 85 "player2"
ZRANGE leaderboard 0 -1

Key Advantages

High Performance

  • In-memory operations
  • Simple data structure
  • Minimal overhead

Storage Flexibility

  • Support for multiple data types
  • Ability to set data expiration
  • Atomic operations

Reliability

  • Data persistence options
  • Master-slave replication
  • Clustering support

Typical Use Scenarios

Caching

# Cache query results
SET "query:users:active" "{json_result}"
EXPIRE "query:users:active" 3600  # Expires in one hour

Counters and Rankings

# Increase view counter
INCR "views:article:1234"

# Update ranking
ZADD "top_articles" 156 "article:1234"

Message Queues

# Add task to queue
LPUSH "task_queue" "process_order:1234"

# Get task from queue
RPOP "task_queue"

Redis achieves peak efficiency when deployed in systems with intensive operational throughput, where rapid data access and instant processing are critical. A common architectural solution is to integrate Redis as a high-performance caching layer alongside the primary data store, significantly boosting the overall application performance.

Graph Databases

Graph DBMS (Graph Databases) stand out among NoSQL solutions due to their specialization in managing relationships between data entities. In this segment, Neo4j has established a leading position thanks to its efficiency in handling complex network data structures where relationships between objects are of fundamental importance.

Core Components

Nodes

  • Represent entities
  • Contain properties
  • Have labels

Relationships

  • Connect nodes
  • Are directional
  • Can contain properties
  • Define the type of connection

Example of a Graph Model in Neo4j

// Create nodes
CREATE (anna:Person { name: 'Anna', age: 30 })
CREATE (mary:Person { name: 'Mary', age: 28 })
CREATE (post:Post { title: 'Graph Databases', date: '2024-02-04' })

// Create relationships
CREATE (anna)-[:FRIENDS_WITH]->(mary)
CREATE (anna)-[:AUTHORED]->(post)
CREATE (mary)-[:LIKED]->(post)

Typical Queries

// Find friends of friends
MATCH (person:Person {name: 'Anna'})-[:FRIENDS_WITH]->(friend)-[:FRIENDS_WITH]->(friendOfFriend)
RETURN friendOfFriend.name

// Find most popular posts
MATCH (post:Post)<-[:LIKED]-(person:Person)
RETURN post.title, count(person) as likes
ORDER BY likes DESC
LIMIT 5

Key Advantages

Natural Representation of Relationships

  • Intuitive data model
  • Efficient relationship storage
  • Easy to understand and work with

Graph Traversal Performance

  • Fast retrieval of connected data
  • Efficient handling of complex queries
  • Optimized for recursive queries

Practical Applications

Social Networks

// Friend recommendations
MATCH (user:Person)-[:FRIENDS_WITH]->(friend)-[:FRIENDS_WITH]->(potentialFriend)
WHERE user.name = 'Anna' AND NOT (user)-[:FRIENDS_WITH]->(potentialFriend)
RETURN potentialFriend.name

Recommendation Systems

// Recommendations based on interests
MATCH (user:Person)-[:LIKES]->(product:Product)<-[:LIKES]-(otherUser)-[:LIKES]->(recommendation:Product)
WHERE user.name = 'Anna' AND NOT (user)-[:LIKES]->(recommendation)
RETURN recommendation.name, count(otherUser) as frequency

Routing

// Find shortest path
MATCH path = shortestPath(
  (start:Location {name: 'A'})-[:CONNECTS_TO*]->(end:Location {name: 'B'})
)
RETURN path

Usage Highlights

  • Essential when working with complex, interrelated data structures
  • Maximum performance in processing cyclic and nested queries
  • Enables flexible design and management of multi-level relationships

Neo4j and similar platforms for graph database management show exceptional efficiency in systems where relationship processing and deep link analysis are critical. These tools offer advanced capabilities for managing complex network architectures and detecting patterns in structured sets of connected data.

Columnar Databases

The architecture of these systems is based on column-oriented storage of data, as opposed to the traditional row-based approach. This enables significant performance gains for specialized queries. Leading solutions in this area include ClickHouse and HBase, both recognized as reliable enterprise-grade technologies.

How It Works

Traditional (row-based) storage:

Row1: [id1, name1, email1, age1]  
Row2: [id2, name2, email2, age2]

Column-based storage:

Column1: [id1, id2]  
Column2: [name1, name2]  
Column3: [email1, email2]  
Column4: [age1, age2]

Key Characteristics

Storage Structure

  • Data is grouped by columns
  • Efficient compression of homogeneous data
  • Fast reading of specific fields

Scalability

  • Horizontal scalability
  • Distributed storage
  • High availability

Example Usage with ClickHouse

-- Create table
CREATE TABLE users (
    user_id UUID,
    name String,
    email String,
    registration_date DateTime
) ENGINE = MergeTree()
ORDER BY (registration_date, user_id);

-- Insert data
INSERT INTO users (user_id, name, email, registration_date)
VALUES (generateUUIDv4(), 'Anna Smith', 'anna@example.com', now());

-- Analytical query
SELECT 
    toDate(registration_date) as date,
    count(*) as users_count
FROM users 
GROUP BY date
ORDER BY date;

Key Advantages

Analytical Efficiency

  • Fast reading of selected columns
  • Optimized aggregation queries
  • Effective with large datasets

Data Compression

  • Superior compression of uniform data
  • Reduced disk space usage
  • I/O optimization

Typical Use Cases

Big Data

-- Log analysis with efficient aggregation
SELECT 
    event_type,
    count() as events_count,
    uniqExact(user_id) as unique_users
FROM system_logs 
WHERE toDate(timestamp) >= '2024-01-01'
GROUP BY event_type
ORDER BY events_count DESC;

Time Series

-- Aggregating metrics by time intervals
SELECT 
    toStartOfInterval(timestamp, INTERVAL 5 MINUTE) as time_bucket,
    avg(cpu_usage) as avg_cpu,
    max(cpu_usage) as max_cpu,
    quantile(0.95)(cpu_usage) as cpu_95th
FROM server_metrics
WHERE server_id = 'srv-001'
    AND timestamp >= now() - INTERVAL 1 DAY
GROUP BY time_bucket
ORDER BY time_bucket;

Analytics Systems

-- Advanced user statistics
SELECT 
    country,
    count() as users_count,
    round(avg(age), 1) as avg_age,
    uniqExact(city) as unique_cities,
    sumIf(purchase_amount, purchase_amount > 0) as total_revenue,
    round(avg(purchase_amount), 2) as avg_purchase
FROM user_statistics
GROUP BY country
HAVING users_count >= 100
ORDER BY total_revenue DESC
LIMIT 10;

Usage Highlights

  • Maximum performance in systems with read-heavy workloads
  • Proven scalability for large-scale data processing
  • Excellent integration in distributed computing environments

Columnar database management systems show exceptional efficiency in projects requiring deep analytical processing of large datasets. This is particularly evident in areas such as enterprise analytics, real-time performance monitoring systems, and platforms for processing timestamped streaming data.

Full-Text Databases (OpenSearch)

The OpenSearch platform, built on the architectural principles of Elasticsearch, is a comprehensive ecosystem for high-performance full-text search and multidimensional data analysis. This solution, designed according to distributed systems principles, stands out for its capabilities in data processing, intelligent search, and the creation of interactive visualizations for large-scale datasets.

Key Features

Full-Text Search

// Search with multilingual support
GET /products/_search
{
  "query": {
    "multi_match": {
      "query": "wireless headphones",
      "fields": ["title", "description"],
      "type": "most_fields"
    }
  }
}

Data Analytics

// Aggregation by categories
GET /products/_search
{
  "size": 0,
  "aggs": {
    "popular_categories": {
      "terms": {
        "field": "category",
        "size": 10
      }
    }
  }
}

Key Advantages

Efficient Search

  • Fuzzy search support
  • Result ranking
  • Match highlighting
  • Autocomplete functionality

Analytical Capabilities

  • Complex aggregations
  • Statistical analysis
  • Data visualization
  • Real-time monitoring

Common Use Cases

E-commerce Search

  • Product search
  • Faceted navigation
  • Product recommendations
  • User behavior analysis

Monitoring and Logging

  • Metrics collection
  • Performance analysis
  • Anomaly detection
  • Error tracking

Analytical Dashboards

  • Data visualization
  • Business metrics
  • Reporting
  • Real-time analytics

OpenSearch is particularly effective in projects that require advanced search and data analytics. At Hostman, OpenSearch is available as a managed service, simplifying integration and maintenance.

When to Choose NoSQL?

The architecture of various database management systems has been developed with specific use cases in mind, so choosing the right tech stack should be based on a detailed analysis of your application's requirements.In modern software development, a hybrid approach is becoming increasingly common, where multiple types of data storage are integrated into a single project to achieve maximum efficiency and extended functionality.

NoSQL systems do not provide a one-size-fits-all solution. When designing your data storage architecture, consider the specific nature of the project and its long-term development strategy.

Choose NoSQL databases when the following matter:

Large-scale Data Streams

  • Efficient handling of petabyte-scale storage
  • High-throughput read and write operations
  • Need for horizontal scalability

Dynamic Data Structures

  • Evolving data requirements
  • Flexibility under uncertainty

Performance Prioritization

  • High-load systems
  • Real-time applications
  • Services requiring high availability

Unconventional Data Formats

  • Networked relationship structures
  • Time-stamped sequences
  • Spatial positioning

Stick with Relational Databases when you need:

Guaranteed Integrity

  • Banking transactions
  • Electronic health records
  • Mission-critical systems

Complex Relationships

  • Multi-level data joins
  • Complex transactional operations
  • Strict ACID compliance

Immutable Structure

  • Fixed requirement specifications
  • Standardized business processes
  • Formalized reporting systems

Practical Recommendations

Hybrid Approach

// Using Redis for caching
// alongside PostgreSQL for primary data
const cached = await redis.get(`user:${id}`);
if (!cached) {
    const user = await pg.query('SELECT * FROM users WHERE id = $1', [id]);
    await redis.set(`user:${id}`, JSON.stringify(user));
    return user;
}
return JSON.parse(cached);

Gradual Transition

  • Start with a pilot project
  • Test performance
  • Evaluate support costs

Decision-Making Factors

Technical Aspects

  • Data volume
  • Query types
  • Scalability requirements
  • Consistency model

Business Requirements

  • Project budget
  • Development timeline
  • Reliability expectations
  • Growth plans

Development Team

  • Technology expertise
  • Availability of specialists
  • Maintenance complexity
Infrastructure

Similar

Infrastructure

Virtualization vs Containerization: What They Are and When to Use Each

This article explores two popular technologies for abstracting physical hardware: virtualization and containerization. We will provide a general overview of each and also discuss the differences between virtualization and containerization. What Is Virtualization The core component of this technology is the virtual machine (VM). A VM is an isolated software environment that emulates the hardware of a specific platform. In other words, a VM is an abstraction that allows a single physical server to be transformed into multiple virtual ones. Creating a VM makes sense when you need to manage all operating system kernel settings. This avoids kernel conflicts with hardware, supports more features than a specific OS build might provide, and allows you to optimize and install systems with a modified kernel. What Is Containerization Containers work differently: to install and run a container platform, a pre-installed operating system kernel is required (this can also be on a virtual OS). The OS allocates system resources for the containers that provide a fully configured environment for deploying applications. Like virtual machines, containers can be easily moved between servers and provide a certain level of isolation. However, to deploy them successfully, it’s sufficient for the base kernel (e.g., Linux, Windows, or macOS) to match — the specific OS version doesn’t matter. Thus, containers serve as a bridge between the system kernel layer and the application layer. What Is the Difference Between Containerization and Virtualization Some, especially IT beginners, often frame it as "virtualization vs containerization." But these technologies shouldn't be pitted against each other — they actually complement one another. Let’s examine how they differ and where they overlap by looking at how both technologies perform specific functions. Isolation and Security Virtualization makes it possible to fully isolate a VM from the rest of the server, including other VMs. Therefore, VMs are useful when you need to separate your applications from others located on the same servers or within the same cluster. VMs also increase the level of network security. Containerization provides a certain level of isolation, too, but containers are not as robust when it comes to boundary security compared to VMs. However, solutions exist that allow individual containers to be isolated within VMs — one such solution is Hyper-V. Working with the Operating System A VM is essentially a full-fledged OS with its own kernel, which is convenient but imposes high demands on hardware resources (RAM, storage, CPU). Containerization uses only a small fraction of system resources, especially with adapted containers. When forming images in a hypervisor, the minimal necessary software environment is created to ensure the container runs on an OS with a particular kernel. Thus, containerization is much more resource-efficient. OS Updates With virtualization, you have to download and install OS updates on each VM. To install a new OS version, you need to update the VM — in some cases, even create a new one. This consumes a significant amount of time, especially when many virtual machines are deployed. With containers, the situation is similar. First, you modify a file (called a Dockerfile) that contains information about the image. You change the lines that specify the OS version. Then the image is rebuilt and pushed to a registry. But that’s not all: the image must then be redeployed. To do this, you use orchestrators — platforms for managing and scaling containers. Orchestration tools (the most well-known are Kubernetes and Docker Swarm) allow automation of these procedures, but developers must install and learn them first. Deployment Mechanisms To deploy a single VM, Windows (or Linux) tools will suffice, as will the previously mentioned Hyper-V. But if you have two or more VMs, it’s more convenient to use solutions like PowerShell. Single containers are deployed from images via a hypervisor (such as Docker), but for mass deployment, orchestration platforms are essential. So in terms of deployment mechanisms, virtualization and containerization are similar: different tools are used depending on how many entities are being deployed. Data Storage Features With virtualization, VHDs are used when organizing local storage for a single VM. If there are multiple VMs or servers, the SMB protocol is used for shared file access. Hypervisors for containers have their own storage tools. For example, Docker has a local Registry repository that lets you create private storage and track image versions. There is also the public Docker Hub repository, which is used for integration with GitHub. Orchestration platforms offer similar tools: for instance, Kubernetes can set up file storage using Azure’s infrastructure. Load Balancing To balance the load between VMs, they are moved between servers or even clusters, selecting the one with the best fault tolerance. Containers are balanced differently. They can’t be moved per se, but orchestrators provide automatic starting or stopping of individual containers or whole groups. This enables flexible load distribution between cluster nodes. Fault Tolerance Faults are also handled in similar ways. If an individual VM fails, it’s not difficult to transfer that VM to another server and restart the OS there. If there’s an issue with the server hosting the containerization platform, containers can be quickly recreated on another server using the orchestrator. Pros and Cons of Virtualization Advantages: Reliable isolation. Logical VM isolation means failures in one VM don’t affect the others on the same server. VMs also offer a good level of network security: if one VM is compromised, its isolation prevents infection of others. Resource optimization. Several VMs can be deployed on one server, saving on purchasing additional hardware. This also facilitates the creation of clusters in data centers. Flexibility and load balancing. VMs are easily transferred, making it simpler to boost cluster performance and maintain systems. VMs can also be copied and restored from backups. Furthermore, different VMs can run different OSs, and the kernel can be any type — Linux, Windows, or macOS — all on the same server. Disadvantages: Resource consumption. VMs can be several gigabytes in size and consume significant CPU power. There are also limits on how many VMs can run on a single server. Sluggishness. Deployment time depends on how "heavy" the VM is. More importantly, VMs are not well-suited to scaling. Using VMs for short-term computing tasks is usually not worthwhile. Licensing issues. Although licensing is less relevant for Russian developers, you still need to consider OS and software licensing costs when deploying VMs — and these can add up significantly in a large infrastructure. Pros and Cons of Containerization Advantages: Minimal resource use. Since all containers share the same OS kernel, much less hardware is needed than with virtual machines. This means you can create far more containers on the same system. Performance. Small image sizes mean containers are deployed and destroyed much faster than virtual machines. This makes containers ideal for developers handling short-term tasks and dynamic scaling. Immutable images. Unlike virtual machines, container images are immutable. This allows the launch of any number of identical containers, simplifying testing. Updating containers is also easy — a new image with updated contents is created on the container platform. Disadvantages: Compatibility issues. Containers created in one hypervisor (like Docker) may not work elsewhere. Problems also arise with orchestrators: for example, Docker Swarm may not work properly with OpenShift, unlike Kubernetes. Developers need to carefully choose their tools. Limited lifecycle. While persistent container storage is possible, special tools (like Docker Data Volumes) are required. Otherwise, once a container is deleted, all its data disappears. You must plan ahead for data backup. Application size. Containers are designed for microservices and app components. Heavy containers, such as full-featured enterprise software, can cause deployment and performance issues. Conclusion Having explored the features of virtualization and containerization, we can draw a logical conclusion: each technology is suited to different tasks. Containers are fast and efficient, use minimal hardware resources, and are ideal for developers working with microservices architecture and application components. Virtual machines are full-fledged OS environments, suitable for secure corporate software deployment. Therefore, these technologies do not compete — they complement each other.
10 June 2025 · 7 min to read
Infrastructure

Top RDP Clients for Linux in 2025: Remote Access Tools for Every Use Case

RDP (Remote Desktop Protocol) is a proprietary protocol for accessing a remote desktop. All modern Windows operating systems have it by default. However, a Linux system with a graphical interface and the xrdp package installed can also act as a server. This article focuses on Linux RDP clients and the basic principles of how the protocol works. Remote Desktop Protocol RDP operates at the application layer of the OSI model and is based on the Transport Layer Protocol (TCP). Its operation follows this process: A connection is established using TCP at the transport layer. An RDP session is initialized. The RDP client authenticates, and data transmission parameters are negotiated. A remote session is launched: the RDP client takes control of the server. The server is the computer being remotely accessed. The RDP client is the application on the computer used to initiate the connection. During the session, all computational tasks are handled by the server. The RDP client receives the graphical interface of the server's OS, which is controlled using input devices. The graphical interface may be transmitted as a full graphical copy or as graphical primitives (rectangles, circles, text, etc.) to save bandwidth. By default, RDP uses port 3389, but this can be changed if necessary. A typical use case is managing a Windows remote desktop from a Linux system. From anywhere in the world, you can connect to it via the internet and work without worrying about the performance of the RDP client. Originally, RDP was introduced in Windows NT 4.0. It comes preinstalled in all modern versions of Windows. However, implementing a Linux remote desktop solution requires special software. RDP Security Two methods are used to ensure the security of an RDP session: internal and external. Standard RDP Security: This is an internal security subsystem. The server generates RSA keys and a public key certificate. When connecting, the RDP client receives these. If confirmed, authentication takes place. Enhanced RDP Security: This uses external tools to secure the session, such as TLS encryption. Advantages of RDP RDP is network-friendly: it can work over NAT, TCP, or UDP, supports port forwarding, and is resilient to connection drops. Requires only 300–500 Kbps bandwidth. A powerful server can run demanding apps even on weak RDP clients. Supports Linux RDP connections to Windows. Disadvantages of RDP Applications sensitive to latency, like games or video streaming, may not perform well. Requires a stable server. File and document transfer between the client and server may be complicated due to internet speed limitations. Configuring an RDP Server on Windows The most common RDP use case is connecting to a Windows server from another system, such as a Linux client. To enable remote access, the target system must be configured correctly. The setup is fairly simple and works "out of the box" on most modern Windows editions.  Enable remote desktop access via the Remote Access tab in System Properties. Select the users who can connect (by default, only administrators). Check firewall settings. Some profiles like “Public” or “Private” may block RDP by default. If the server is not in a domain, RDP might not work until you allow it manually via Windows Firewall → Allowed Apps. If behind a router, you might need to configure port forwarding via the router’s web interface (typically under Port Forwarding). Recall that RDP uses TCP port 3389 by default. Best RDP Clients for Linux Remmina Website: remmina.org Remmina is a remote desktop client with a graphical interface, written in GTK+ and licensed under GPL. In addition to RDP, it supports VNC, NX, XDMCP, SPICE, X2Go, and SSH. One of its key features is extensibility via plugins. By default, RDP is not available until you install the freerdp plugin. After installing the plugin, restart Remmina, and RDP will appear in the menu. To connect: Add a new connection. Fill in connection settings (you only need the remote machine's username and IP). Customize further if needed (bandwidth, background, hotkeys, themes, etc.). Save the connection — now you can connect with two clicks from the main menu. If you need to run Remmina on Windows, a guide is available on the official website. FreeRDP Website: freerdp.com FreeRDP is a fork of the now-unsupported rdesktop project and is actively maintained under the Apache license. FreeRDP is a terminal-based client. It is configured and launched entirely via the command line. Its command structure is similar to rdesktop, for example: xfreerdp -u USERNAME -p PASSWORD -g WIDTHxHEIGHT IP This command connects to the server at the given IP using the specified credentials and screen resolution. KRDC Website: krdc KRDC (KDE Remote Desktop Client) is the official remote desktop client for KDE that supports RDP and VNC protocols. It offers a clean and straightforward interface consistent with KDE's Plasma desktop environment. KRDC is ideal for users of KDE-based distributions like Kubuntu, openSUSE KDE, and Fedora KDE Spin. It integrates well with KDE's network tools and provides essential features such as full-screen mode, session bookmarking, and network browsing via Zeroconf/Bonjour. KRDC is actively maintained by the KDE community and is available through most Linux package managers. GNOME Connections Website: gnome-connections Vinagre was the former GNOME desktop's default remote desktop client. GNOME Connections, a modernized remote desktop tool for GNOME environments, has since replaced it. GNOME Connections supports RDP and VNC, providing a simple and user-friendly interface that matches the GNOME design language. It focuses on ease of use rather than configurability, making it ideal for non-technical users or quick access needs. Features: Bookmarking for quick reconnections Simple RDP session management Seamless integration into GNOME Shell Connections is maintained as part of the official GNOME project and is available in most distribution repositories. Apache Guacamole Website: guacamole.apache.org This is the simplest yet most complex remote desktop software for Linux. Simple because it works directly in a browser — no additional programs or services are needed. Complex because it requires one-time server installation and configuration. Apache Guacamole is a client gateway for remote connections that works over HTML5. It supports Telnet, SSH, VNC, and RDP — all accessible via a web interface. Although the documentation is extensive, many ready-made scripts exist online to simplify basic setup. To install: wget https://git.io/fxZq5 -O guac-install.sh chmod +x guac-install.sh ./guac-install.sh After installation, the script will provide a connection address and password. To connect to a Windows server via RDP: Open the Admin Panel, go to Settings → Connections, and create a new connection. Enter the username and IP address of the target machine — that's all you need. The connection will now appear on the main page, ready for use. Conclusion RDP is a convenient tool for connecting to a remote machine running Windows or a Linux system with a GUI. The server requires minimal setup — just a few settings and firewall adjustments — and the variety of client programs offers something for everyone.
09 June 2025 · 6 min to read
Infrastructure

Docker Container Storage and Registries: How to Store, Manage, and Secure Your Images

Docker containerization offers many benefits, one of which is image layering, enabling fast container generation. However, containers have limitations — for instance, persistent data needs careful planning, as all data within a container is lost when it's destroyed. In this article, we’ll look at how to solve this issue using Docker’s native solution called Docker Volumes, which allows the creation of persistent Docker container storage. What Happens to Data Written Inside a Container To begin, let’s open a shell inside a container using the following command: docker run -it --rm busybox Now let’s try writing some data to the container: echo "Hostman" > /tmp/data cat /tmp/data Hostman We can see that the data is written, but where exactly? If you're familiar with Docker, you might know that images are structured like onions — layers stacked on top of each other, with the final layer finalizing the image. Each layer can only be written once and becomes read-only afterward. When a container is created, Docker adds another layer for handling write operations. Since container lifespans are limited, all data disappears once the container is gone. This can be a serious problem if the container holds valuable information. To solve this, Docker provides a solution called Docker Volumes. Let’s look at what it is and how it works. Docker Volumes Docker Volumes provide developers with persistent storage for containers. This tool decouples data from the container’s lifecycle, allowing access to container data at any time. As a result, data written inside containers remains available even after the container is destroyed, and it can be reused by other containers. This is a useful solution for sharing data between Docker containers and also enables new containers to connect to the existing storage. How Docker Volumes Work A directory is created on the server and then mounted into one or more containers. This directory is independent because it is not included in the Docker image layer structure, which allows it to bypass the read-only restriction of the image layers for containers that include such a directory. To create a volume, use the following command: docker volume create Now, let’s check its location using: docker volume inspect volume_name The volume name usually consists of a long alphanumeric string. In response, Docker will display information such as the time the volume was created and other metadata, including the Mountpoint. This line shows the path to the volume. To view the data stored in the volume, simply open the specified directory. There are also other ways to create a Docker Volume. For example, the -v option can be added directly during container startup, allowing you to create a volume on the fly: docker run -it --rm -v newdata:/data busybox Let’s break down what’s happening here: The -v argument follows a specific syntax, indicated by the colon right after the volume name (in this case, we chose a very creative name, newdata). After the colon, the mount path inside the container is specified. Now, you can write data to this path, for example: echo "Cloud" > /data/cloud Data written this way can easily be found at the mount path. As seen in the example above, the volume name is not arbitrary — it matches the name we provided using -v. However, Docker Volumes also allow for randomly generated names, which are always unique to each host. If you’re assigning names manually, make sure they are also unique. Now, run the command: docker volume ls If the volume appears in the list, it means any number of other containers can use it. To test this, you can run: docker run -it --rm -v newdata:/data busybox Then write something to the volume. Next, start another container using the exact same command and you’ll see that the data is still there and accessible — meaning it can be reused. Docker Volumes in Practice Now let’s take a look at how Docker Volumes can be used in practice. Suppose we're developing an application to collect specific types of data — let’s say football statistics. We gather this data and plan to use it later for analysis — for example, to assess players’ transfer market values or for betting predictions. Let’s call our application FootballStats. Preserving Data After Container Removal Obviously, if we don’t use Docker Volumes, all the collected statistics will simply be lost as soon as the container that stored them is destroyed. Therefore, we need to store the data in volumes so it can be reused later. To do this, we use the familiar -v option:  -v footballstats:/dir/footballstats This will allow us to store match statistics in the /dir/footballstats directory, on top of all container layers. Sharing Data Suppose the FootballStats container has already gathered a certain amount of data, and now it's time to analyze it. For instance, we might want to find out how a particular team performed in the latest national championship or how a specific player did — goals, assists, cards, etc. To do this, we can mount our volume into a new container, which we’ll call FootballStats-Analytics. The key advantage of this setup is that the new container can read the data without interfering with the original FootballStats container’s ongoing data collection. At the same time, analysis of the incoming data can be performed using defined parameters and algorithms. This information can be stored anywhere, either in the existing volume or a new one, if needed. Other Types of Mounts In addition to standard volumes, Docker Volumes also supports other types of mounts designed to solve specialized tasks: Bind Mount Bind mounts are used to attach an existing path on the host to a container. This is useful for including configuration files, datasets, or static assets from websites. To specify directories for mounting into the container, use the --mount option with the syntax <host path>:<container path>. Tmpfs Mount Tmpfs mounts serve the opposite purpose of regular Docker Volumes — they do not persist data after the container is destroyed. This can be useful for developers who perform extensive logging. In such cases, continuously writing temporary data to disk can significantly degrade system performance. The --tmpfs option creates temporary in-memory directories, avoiding constant access to the file system. Drivers Docker Volume Drivers are a powerful tool that enable flexible volume management. They allow you to specify various storage options, the most important being the storage location — which can be local or remote, even outside the physical or virtual infrastructure of the provider. This ensures that data can survive not only the destruction of the container but even the shutdown of the host itself. Conclusion So, we’ve learned how to create and manage storage using Docker Volumes. For more information on how to modify container storage in Docker, refer to the platform’s official documentation. 
09 June 2025 · 6 min to read

Do you have questions,
comments, or concerns?

Our professionals are available to assist you at any moment,
whether you need help or are just unsure of where to start.
Email us
Hostman's Support