A DPU is a special type of processor designed for data processing. The abbreviation stands for Data Processing Unit. Technologically, it is a kind of smart network interface card.
Its main purpose is to offload the central processing unit (CPU) by taking over part of its workload.
To understand why DPUs are important and what potential this technology holds, we need to go back several decades.
In the 1990s, the Intel x86 processor, combined with software, provided companies with unprecedented computing power. Client-server computing began to develop, followed by multi-tier architectures and then distributed computing. Organizations deployed application servers, databases, and specialized software, all running on numerous x86 servers.
In the early 2000s, hypervisors became widespread. Now, multiple virtual machines could be launched on a single powerful server. Hardware resources were no longer wasted and began to be used efficiently.
Thanks to hypervisors, hardware became programmable. Administrators could now write code to automatically detect and initiate virtual machines, forming the foundation of today’s cloud computing paradigm.
The next step was network and storage virtualization. As a result, a powerful CPU became the foundation for emulating virtually everything: virtual processors, network cards, and storage interfaces.
The downside of this evolution was that pressure on the CPU increased significantly. It became responsible for everything, from running the operating system and applications to managing network traffic, storage I/O operations, security, and more. All system components began competing for CPU resources.
The CPU’s functions went far beyond its original purpose. At this point, two major trends emerged:
AI workloads require parallelism, which cannot be achieved with a general-purpose CPU. Thus, graphics processing units (GPUs) became the driving force behind AI development.
Originally designed to accelerate graphics rendering, GPUs evolved into coprocessors for executing complex mathematical operations in parallel. NVIDIA quickly seized this opportunity and released GPUs specifically designed for AI training and inference workloads.
GPUs were the first step toward offloading the CPU. They took over mathematical computations. After that, the market saw the emergence of other programmable chips.
These microchips are known as application-specific integrated circuits (ASICs) and field-programmable gate arrays (FPGAs), which can be programmed for specific tasks, such as optimizing network traffic or accelerating storage I/O operations. Companies like Broadcom, Intel, and NVIDIA began producing processors that were installed on network cards and other devices.
Thanks to GPUs and programmable controllers, the excessive load on the CPU started to decrease. Network functions, storage, and data processing were delegated to specialized hardware. That’s the simplest explanation of what a coprocessor is: a device that shares the CPU’s workload, allowing hardware resources to be used to their full potential. The secret to success is simple: each component does what it does best.
Before discussing DPUs, we should first understand what an ASIC processor is and how it relates to network interface cards.
A network card is a device that allows a computer to communicate with other devices on a network. They are also referred to by the abbreviation NIC (Network Interface Controller).
At the core of every NIC is an ASIC designed to perform Ethernet controller functions. However, these microchips can also be assigned other roles. The key point is that a standard NIC’s functionality cannot be changed after manufacturing; it performs only the tasks it was designed for.
In contrast, SmartNICs have no such limitations. They allow users to upload additional software, making it possible to expand or modify the functionality of the ASIC, without even needing to know how the processor itself is structured.
To enable such flexibility, SmartNICs include enhanced computing power and extra memory. These resources can be added in different ways: by integrating multi-core ARM processors, specialized network processors, or FPGAs.
Data Processing Units are an extension of SmartNICs. Network cards are enhanced with support for NVMe or NVMe over Fabrics (NVMe-oF).
A device equipped with an ARM NVMe processor can easily handle input/output operations, offloading the central processor. It’s a simple yet elegant solution that frees up valuable CPU resources.
A DPU includes programmable interfaces for both networking and storage. Thanks to this, applications and workloads can access more of the CPU’s performance, which is no longer burdened with routine network and data management tasks.
One of the best-known solutions is NVIDIA® BlueField, a DPU line first introduced in 2019, with the third generation announced in 2021.
NVIDIA BlueField DPU is designed to create secure, high-speed infrastructure capable of supporting workloads in any environment. Its main advantages include:
Another company in this space is Pensando, which develops the Distributed Services Card, a data-processing card featuring a DPU. It includes additional ARM cores and hardware accelerators for specific tasks such as encryption and disk I/O processing.
Google and Amazon are also developing their own ASIC-based projects:
It is quite possible that the DPU will become the third essential component of future data center servers, alongside the CPU (central processing unit) and GPU (graphics processing unit). This is due to its ability to handle networking and storage tasks.
The architecture may look like this:
It appears that DPUs have a promising future, largely driven by the ever-growing volume of data. Coprocessors can breathe new life into existing servers by reducing CPU load and taking over routine operations. This eliminates the need to look for other optimization methods (such as tweaking NVIDIA RAID functions) to boost performance.
Estimates suggest that currently, around 30% of CPU workload is consumed by networking functions. Transferring these tasks to a DPU provides additional computing power to the CPU. This can also extend the lifespan of servers by several months or even years, depending on how much CPU capacity was previously dedicated to networking.
By adding a DPU to servers, clients can ensure that CPUs are fully utilized for application workloads, rather than being bogged down by routine network and storage access operations. And this looks like a logical continuation of the process that began over 30 years ago, when organizations started building high-performance systems based on a single central processor.