How to Create and Optimize Docker Images

Technical writer

Docker

22.01.2025

Reading time: 12 min

In today's environment, most companies actively use the Docker containerization system in their projects, especially when working with microservice applications. Docker allows you to quickly deploy any applications, whether monolithic or cloud-native. Despite the simplicity of working with Docker, it's important to understand some nuances of creating your own images. In this article, we will explore how to work with Docker images and optimize them using two different applications as examples.

Prerequisites

To work with the Docker containerization system, we will need:

A cloud server or a virtual machine with any pre-installed Linux distribution. We will be using Ubuntu 22.04.
Docker installed. See our installation guide.

You can also use a pre-configured image with Docker. To do this, go to the Cloud servers section in your Hostman control panel, click Create server, and select Docker in the Marketplace tab.

Working with Docker Images

Docker images are created by other users and stored in registries—special repositories for images. Registries can be public or private. Public repositories are available to all users without requiring authentication. Private registries, however, can only be accessed by users with appropriate login credentials. Companies widely use private repositories to store their own images during software development.

By default, Docker uses the public registry Docker Hub, which any user can use to publish their own images or download images created by others. When a user runs a command such as docker run, the Docker daemon will, by default, contact its standard registry. If necessary, you can change the registry to another one.

To create custom Docker images, a Dockerfile is used—a text file containing instructions for building an image. These instructions use 18 specially reserved keywords. The most common types of instructions include the following:

FROM specifies the base image. Every image starts with a base image. A base image refers to a Linux distribution, such as Ubuntu, Debian, Oracle Linux, Alpine, etc. There are also many images with various pre-installed software, such as Nginx, Grafana, Prometheus, MySQL, and others. However, even when using an image with pre-installed software, some Linux OS distribution will always be specified inside.
WORKDIR creates a directory inside the image. Its functionality is similar to the mkdir utility used to create directories in Linux distributions. It can be used multiple times in one image.
COPY copies files and directories from the host system into the image. It is used to copy configuration files and application source code files.
ADD is similar to the COPY instruction, but in addition to copying files, ADD allows downloading files from remote sources and extracting .tar archives.
RUN executes commands inside the image. With RUN, you can perform any actions that a user can perform in a Bash shell, including creating files, installing packages, starting services, etc.
CMD specifies the command that will be executed when the container is started.

Example: Creating an Image

As an example, we will create an image with a simple Python program.

Create a project directory and move into it:

mkdir python-calculator && cd python-calculator

Create a file console_calculator.py with the following content:

print("*" * 10, "Calculator", "*" * 10)
print("To exit from program type q")

try:
 while True:
    arithmetic_operators = input("Choose arithmetic operation (+ - * /):\n")
    if arithmetic_operators == "q":
        break
    if arithmetic_operators in ("+", "-", "*", "/"):
        first_number = float(input("First number is:\n"))
        second_number = float(input("Second number is:\n"))
        print("The result is:")
        if arithmetic_operators == "+":
            print("%.2f" % (first_number + second_number))
        elif arithmetic_operators == "-":
            print("%.2f" % (first_number - second_number))
        elif arithmetic_operators == "*":
            print("%.2f" % (first_number * second_number))
        elif arithmetic_operators == "/":
            if second_number != 0:
                print("%.2f" % (first_number / second_number))
            else:
                print("You can't divide by zero!")      
    else:
        print("Invalid symbol!")

except (KeyboardInterrupt, EOFError) as e:
    print(e)

Create a new Dockerfile with the following content:

FROM python:3.10-alpine

WORKDIR /app

COPY console_calculator.py .

CMD ["python3","console_calculator.py"]

For the base image, we will use python:3.10, which is based on a lightweight Linux distribution called Alpine. We will discuss the use of Alpine in more detail in the next chapter.

Inside the image, we will create a directory app, where the project file will be located.

The container will be launched using the command "python3", "console_calculator.py".

To build the image, the docker build command is used. Each image must also be assigned a tag. A tag is a unique identifier that can be assigned to an image. The tag is specified using the -t flag:

docker build -t python-console-calculator:01 .

The period at the end of the command indicates that the Dockerfile is located in the current directory.

You can display the list of created images using:

docker images

To launch the container, use:

docker run --rm -it python-console-calculator:01

Let's test the functionality of the program by performing a few simple arithmetic operations:

To exit the program, you need to press the q key.

Since we specified the --rm flag when starting the container, the container will be automatically removed.

You can also run the container in daemon mode, i.e., in the background. To do this, include the -d flag when starting the container:

docker run -dit python-console-calculator:01

After that, the container will appear in the list of running containers:

When starting the container in the background to access our script, you need to use docker exec, which executes a command inside the container. First, you need to start a shell (bash or sh), then manually run the script inside the container.

To do this, use the docker exec command, passing the sh command as an argument to open the shell inside the container (where 4f1b8b26c607 is the unique container ID displayed in the CONTAINER ID column of the docker ps output):

docker exec -it 4f1b8b26c607 sh

Then, run the script manually:

python console_calculator.py

To remove a running container, you need to use the docker rm command and pass the container's ID or name. You also need to use the -f flag, which will force the removal of a running container:

docker rm -f 186e8f43ca60

Optimizing Docker Images

When creating Docker images, there is one main rule: finished images should be compact and occupy as little space as possible. Additionally, the smaller the image, the faster it is built. This can play a key role when using CI/CD methods or when releasing software in the Time to Market model.

Proper Selection of the Base Image

As the first recommendation, it's important to choose the base image wisely. For example, instead of using various Linux distribution images like Ubuntu, Oracle Linux, Rocky Linux, and many others, you can directly choose an image that already comes with the required programming language, framework, or other necessary technology. Examples of such images include:

node for working with the Node.js platform
A pre-built image with Nginx
ibmjava for working with the Java programming language
postgres for working with the PostgreSQL databases
redis for working with the NoSQL Redis

Using a specific image instead of an operating system image has the following advantages:

There is no need to install the main tool (programming language, framework, etc.), so the image won't be "cluttered" with unnecessary packages, preventing an increase in size.
Images that come with pre-installed software (like Nginx, Redis, PostgreSQL, Grafana, etc.) are always created by the developers of the software themselves. This means that users do not need to configure the program to run it (except in cases where it needs to be integrated with their service).

Let's consider this recommendation with a practical example. We will use a simple Python program that prints "Hello from Python!". First, we will build an image using debian as the base image.

Create and navigate to the directory where the project files will be stored:

mkdir dockerfile-python && cd dockerfile-python

Create the test.py file with the following content:

print("Hello from Python!")

Next, create a Dockerfile with the following content:

FROM debian:latest

COPY test.py .

RUN apt update 
RUN apt -y install python3

CMD ["python3", "test.py"]

To run Python programs, you also need to install the Python interpreter.

Then, build the image:

docker build -t python-debian:01 .

Let’s check the Docker image size:

docker images

The image takes up 185MB, which is quite a lot for an application that just prints a single line to the terminal.

Now, let's choose the correct base image, which is based on the Alpine distribution.

Another feature of base images is that for many images, there are special versions in the form of slim and alpine images, which are even smaller. Let's look at the example of the official Python 3.10 image. The python:3.10 image takes up a whole 1 GB, whereas the slim version is much smaller—127 MB. And the alpine image is only 50 MB.

Slim images are images that contain the minimum set of packages necessary to run a finished application. These images lack most packages and libraries. Slim images are created from both regular Linux distributions (such as Ubuntu or Debian) and Alpine-based distributions.

Alpine images are images that use the Alpine distribution as the operating system— a lightweight Linux distribution that takes up about 5 MB of disk space (without the kernel). It differs from other Linux distributions in that it uses a package manager called apk, lacks the system initialization system, and has fewer pre-installed programs.

When using both slim and Alpine images, it is essential to thoroughly test your application, as the required packages or libraries might be missing in such distributions.

Now, let's test our application using the Python image with Alpine.

Return to the previously used Dockerfile and replace the base image from debian to the python:alpine3.19 image. You should also remove the two RUN instructions, as there will be no need to install the Python interpreter:

FROM python:alpine3.19

COPY test.py .

CMD ["python3", "test.py"]

Use a new tag to build the image:

List all the Docker images. Check the image size and compare with the previous one:

Since we chose the correct base image with Python already preinstalled, the image size was reduced from 185 MB to 43.8 MB.

Reducing the Number of Layers

Docker images are based on the concept of layers. A layer represents a change made to the image's file system. These changes include copying/creating directories and files or installing packages. It is recommended to use as few layers as possible in the image. Among all Dockerfile instructions, only the FROM, COPY, ADD, and RUN instructions create layers that increase the final image size. All other instructions create temporary intermediate images and do not directly increase the image size.

Let's take the previously used Dockerfile and modify it according to new requirements. Suppose we need to install additional packages using the apt package manager:

FROM debian:latest

COPY test.py .

RUN apt update 
RUN apt -y install python3 htop net-tools mc gcc

CMD ["python3", "test.py"]

Build the image:

docker build -t python-non-optimize:01 .

Check the size of the created Docker image:

docker images

The image size was 570 MB. However, we can reduce the size by using fewer layers. Previously, our Dockerfile contained two RUN instructions, which created two layers. We can reduce the image size by combining the apt update and apt install commands using the && symbol, which in Bash means that the next command will only run if the first one completes successfully.

Another important point is to remove cache files left in the image after package installation using the apt package manager (this also applies to other package managers such as yum/dnf and apk). The cache must be removed. For distributions using apt, the cache of installed programs is stored in the /var/lib/apt/lists directory. Therefore, we will add a command to delete all files in that directory within the RUN instruction without creating a new layer:

FROM debian:latest

COPY test.py .

RUN apt update && apt -y install python3 htop net-tools mc gcc && rm -rf /var/lib/apt/lists/*

CMD ["python3", "test.py"]

Build the image:

docker build -t python-optimize:03 .

And check the size:

The image size was reduced from the initial 570 MB to the current 551 MB.

Using Multi-Stage Builds

Another significant way to reduce the size of the created image is by using multi-stage builds. These builds, which involve two or more base images, allow us to separate the build environment from the runtime environment, effectively removing unnecessary files and dependencies from the final image. These unnecessary files might include libraries or development dependencies that are only needed during the build process.

Let’s explore this approach with a practical example using the Node.js platform. Node.js should be installed beforehand, following our guide.

We will first build the application image without multi-stage builds to evaluate the difference in size.

Create a directory for the project:

mkdir node-app && cd node-app

Initialize a new Node.js application:

npm init -y

Install the express library:

npm install express

Create an index.js file with the content:

const express = require('express');
const app = express();
const PORT = process.env.PORT || 3000;

app.get('/', (req, res) => {
  res.send('Hello, World!');
});

app.listen(PORT, () => {
  console.log(Server is running on port${PORT});
});

Create Dockerfile with this content:

FROM node:14-alpine
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY index.js .
EXPOSE 3000
CMD ["npm", "start"]

Build the image:

docker build -t node-app:01 .

Check the size:

docker images

The image size was 124 MB. Now let's rewrite the Dockerfile to use two images, transforming it into the following form:

FROM node:14 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY index.js .

FROM gcr.io/distroless/base-debian10 AS production
WORKDIR /app
COPY --from=builder /app .
EXPOSE 3000
CMD ["npm", "start"]

Build the image:

docker build -t node-app:02 .

List the Docker images and check the size:

docker images

As a result, the image size was drastically reduced—from 124 MB to 21.5 MB.

Conclusion

In this article, we created our own Docker image and explored various ways to run it. We also paid significant attention to optimizing Docker images. Through optimization, we can greatly reduce the image size, which allows for faster image builds.

Docker

22.01.2025

Reading time: 12 min

Similar

Docker

How to Create and Optimize Docker Images

Prerequisites

Working with Docker Images

Example: Creating an Image

Optimizing Docker Images

Proper Selection of the Base Image

Reducing the Number of Layers

Using Multi-Stage Builds

Conclusion

Similar

Installing Nextcloud with Docker

Docker Exec: Access, Commands, and Use Cases

How to Install Docker on Ubuntu 22.04

Do you have questions, comments, or concerns?

Do you have questions,
comments, or concerns?