In today's environment, most companies actively use the Docker containerization system in their projects, especially when working with microservice applications. Docker allows you to quickly deploy any applications, whether monolithic or cloud-native. Despite the simplicity of working with Docker, it's important to understand some nuances of creating your own images. In this article, we will explore how to work with Docker images and optimize them using two different applications as examples.
To work with the Docker containerization system, we will need:
A cloud server or a virtual machine with any pre-installed Linux distribution. We will be using Ubuntu 22.04.
Docker installed. See our installation guide.
You can also use a pre-configured image with Docker. To do this, go to the Cloud servers section in your Hostman control panel, click Create server, and select Docker in the Marketplace tab.
Docker images are created by other users and stored in registries—special repositories for images. Registries can be public or private. Public repositories are available to all users without requiring authentication. Private registries, however, can only be accessed by users with appropriate login credentials. Companies widely use private repositories to store their own images during software development.
By default, Docker uses the public registry Docker Hub, which any user can use to publish their own images or download images created by others. When a user runs a command such as docker run, the Docker daemon will, by default, contact its standard registry. If necessary, you can change the registry to another one.
To create custom Docker images, a Dockerfile is used—a text file containing instructions for building an image. These instructions use 18 specially reserved keywords. The most common types of instructions include the following:
FROM
specifies the base image. Every image starts with a base image. A base image refers to a Linux distribution, such as Ubuntu, Debian, Oracle Linux, Alpine, etc. There are also many images with various pre-installed software, such as Nginx, Grafana, Prometheus, MySQL, and others. However, even when using an image with pre-installed software, some Linux OS distribution will always be specified inside.
WORKDIR
creates a directory inside the image. Its functionality is similar to the mkdir
utility used to create directories in Linux distributions. It can be used multiple times in one image.
COPY
copies files and directories from the host system into the image. It is used to copy configuration files and application source code files.
ADD
is similar to the COPY
instruction, but in addition to copying files, ADD
allows downloading files from remote sources and extracting .tar
archives.
RUN
executes commands inside the image. With RUN
, you can perform any actions that a user can perform in a Bash shell, including creating files, installing packages, starting services, etc.
CMD
specifies the command that will be executed when the container is started.
As an example, we will create an image with a simple Python program.
Create a project directory and move into it:
mkdir python-calculator && cd python-calculator
Create a file console_calculator.py
with the following content:
print("*" * 10, "Calculator", "*" * 10)
print("To exit from program type q")
try:
while True:
arithmetic_operators = input("Choose arithmetic operation (+ - * /):\n")
if arithmetic_operators == "q":
break
if arithmetic_operators in ("+", "-", "*", "/"):
first_number = float(input("First number is:\n"))
second_number = float(input("Second number is:\n"))
print("The result is:")
if arithmetic_operators == "+":
print("%.2f" % (first_number + second_number))
elif arithmetic_operators == "-":
print("%.2f" % (first_number - second_number))
elif arithmetic_operators == "*":
print("%.2f" % (first_number * second_number))
elif arithmetic_operators == "/":
if second_number != 0:
print("%.2f" % (first_number / second_number))
else:
print("You can't divide by zero!")
else:
print("Invalid symbol!")
except (KeyboardInterrupt, EOFError) as e:
print(e)
FROM python:3.10-alpine
WORKDIR /app
COPY console_calculator.py .
CMD ["python3","console_calculator.py"]
For the base image, we will use python:3.10
, which is based on a lightweight Linux distribution called Alpine. We will discuss the use of Alpine in more detail in the next chapter.
Inside the image, we will create a directory app, where the project file will be located.
The container will be launched using the command "python3", "console_calculator.py"
.
To build the image, the docker build
command is used. Each image must also be assigned a tag. A tag is a unique identifier that can be assigned to an image. The tag is specified using the -t
flag:
docker build -t python-console-calculator:01 .
The period at the end of the command indicates that the Dockerfile is located in the current directory.
You can display the list of created images using:
docker images
To launch the container, use:
docker run --rm -it python-console-calculator:01
Let's test the functionality of the program by performing a few simple arithmetic operations:
To exit the program, you need to press the q
key.
Since we specified the --rm
flag when starting the container, the container will be automatically removed.
You can also run the container in daemon mode, i.e., in the background. To do this, include the -d
flag when starting the container:
docker run -dit python-console-calculator:01
After that, the container will appear in the list of running containers:
When starting the container in the background to access our script, you need to use docker exec
, which executes a command inside the container. First, you need to start a shell (bash or sh), then manually run the script inside the container.
To do this, use the docker exec
command, passing the sh
command as an argument to open the shell inside the container (where 4f1b8b26c607
is the unique container ID displayed in the CONTAINER ID
column of the docker ps
output):
docker exec -it 4f1b8b26c607 sh
Then, run the script manually:
python console_calculator.py
To remove a running container, you need to use the docker rm
command and pass the container's ID or name. You also need to use the -f
flag, which will force the removal of a running container:
docker rm -f 186e8f43ca60
When creating Docker images, there is one main rule: finished images should be compact and occupy as little space as possible. Additionally, the smaller the image, the faster it is built. This can play a key role when using CI/CD methods or when releasing software in the Time to Market model.
As the first recommendation, it's important to choose the base image wisely. For example, instead of using various Linux distribution images like Ubuntu, Oracle Linux, Rocky Linux, and many others, you can directly choose an image that already comes with the required programming language, framework, or other necessary technology. Examples of such images include:
node for working with the Node.js platform
A pre-built image with Nginx
ibmjava for working with the Java programming language
postgres for working with the PostgreSQL databases
redis for working with the NoSQL Redis
Using a specific image instead of an operating system image has the following advantages:
There is no need to install the main tool (programming language, framework, etc.), so the image won't be "cluttered" with unnecessary packages, preventing an increase in size.
Images that come with pre-installed software (like Nginx, Redis, PostgreSQL, Grafana, etc.) are always created by the developers of the software themselves. This means that users do not need to configure the program to run it (except in cases where it needs to be integrated with their service).
Let's consider this recommendation with a practical example. We will use a simple Python program that prints "Hello from Python!". First, we will build an image using debian
as the base image.
Create and navigate to the directory where the project files will be stored:
mkdir dockerfile-python && cd dockerfile-python
Create the test.py
file with the following content:
print("Hello from Python!")
Next, create a Dockerfile with the following content:
FROM debian:latest
COPY test.py .
RUN apt update
RUN apt -y install python3
CMD ["python3", "test.py"]
To run Python programs, you also need to install the Python interpreter.
Then, build the image:
docker build -t python-debian:01 .
Let’s check the Docker image size:
docker images
The image takes up 185MB, which is quite a lot for an application that just prints a single line to the terminal.
Now, let's choose the correct base image, which is based on the Alpine distribution.
Another feature of base images is that for many images, there are special versions in the form of slim and alpine images, which are even smaller. Let's look at the example of the official Python 3.10 image. The python:3.10
image takes up a whole 1 GB, whereas the slim version is much smaller—127 MB. And the alpine image is only 50 MB.
Slim images are images that contain the minimum set of packages necessary to run a finished application. These images lack most packages and libraries. Slim images are created from both regular Linux distributions (such as Ubuntu or Debian) and Alpine-based distributions.
Alpine images are images that use the Alpine distribution as the operating system— a lightweight Linux distribution that takes up about 5 MB of disk space (without the kernel). It differs from other Linux distributions in that it uses a package manager called apk, lacks the system initialization system, and has fewer pre-installed programs.
When using both slim and Alpine images, it is essential to thoroughly test your application, as the required packages or libraries might be missing in such distributions.
Now, let's test our application using the Python image with Alpine.
Return to the previously used Dockerfile and replace the base image from debian
to the python:alpine3.19
image. You should also remove the two RUN
instructions, as there will be no need to install the Python interpreter:
FROM python:alpine3.19
COPY test.py .
CMD ["python3", "test.py"]
Since we chose the correct base image with Python already preinstalled, the image size was reduced from 185 MB to 43.8 MB.
Docker images are based on the concept of layers. A layer represents a change made to the image's file system. These changes include copying/creating directories and files or installing packages. It is recommended to use as few layers as possible in the image. Among all Dockerfile instructions, only the FROM
, COPY
, ADD
, and RUN
instructions create layers that increase the final image size. All other instructions create temporary intermediate images and do not directly increase the image size.
Let's take the previously used Dockerfile and modify it according to new requirements. Suppose we need to install additional packages using the apt
package manager:
FROM debian:latest
COPY test.py .
RUN apt update
RUN apt -y install python3 htop net-tools mc gcc
CMD ["python3", "test.py"]
Build the image:
docker build -t python-non-optimize:01 .
Check the size of the created Docker image:
docker images
The image size was 570 MB. However, we can reduce the size by using fewer layers. Previously, our Dockerfile contained two RUN
instructions, which created two layers. We can reduce the image size by combining the apt update
and apt install
commands using the &&
symbol, which in Bash means that the next command will only run if the first one completes successfully.
Another important point is to remove cache files left in the image after package installation using the apt
package manager (this also applies to other package managers such as yum
/dnf
and apk
). The cache must be removed. For distributions using apt
, the cache of installed programs is stored in the /var/lib/apt/lists
directory. Therefore, we will add a command to delete all files in that directory within the RUN
instruction without creating a new layer:
FROM debian:latest
COPY test.py .
RUN apt update && apt -y install python3 htop net-tools mc gcc && rm -rf /var/lib/apt/lists/*
CMD ["python3", "test.py"]
Build the image:
docker build -t python-optimize:03 .
And check the size:
The image size was reduced from the initial 570 MB to the current 551 MB.
Another significant way to reduce the size of the created image is by using multi-stage builds. These builds, which involve two or more base images, allow us to separate the build environment from the runtime environment, effectively removing unnecessary files and dependencies from the final image. These unnecessary files might include libraries or development dependencies that are only needed during the build process.
Let’s explore this approach with a practical example using the Node.js platform. Node.js should be installed beforehand, following our guide.
We will first build the application image without multi-stage builds to evaluate the difference in size.
Create a directory for the project:
mkdir node-app && cd node-app
Initialize a new Node.js application:
npm init -y
express
library:npm install express
index.js
file with the content:const express = require('express');
const app = express();
const PORT = process.env.PORT || 3000;
app.get('/', (req, res) => {
res.send('Hello, World!');
});
app.listen(PORT, () => {
console.log(Server is running on port${PORT});
});
FROM node:14-alpine
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY index.js .
EXPOSE 3000
CMD ["npm", "start"]
docker build -t node-app:01 .
docker images
FROM node:14 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY index.js .
FROM gcr.io/distroless/base-debian10 AS production
WORKDIR /app
COPY --from=builder /app .
EXPOSE 3000
CMD ["npm", "start"]
docker build -t node-app:02 .
docker images
As a result, the image size was drastically reduced—from 124 MB to 21.5 MB.
In this article, we created our own Docker image and explored various ways to run it. We also paid significant attention to optimizing Docker images. Through optimization, we can greatly reduce the image size, which allows for faster image builds.