Logging is the process of collecting, recording, and storing data about various events, actions, and the state of a system or application in a specific format. This process is a key aspect of distributed systems, contributing to what is known as observability.
Image source: thealgorists.com
The data collected during the logging process can include:
General information about the system's operation.
Warnings about potential issues.
Records of errors in the system that require attention and resolution.
Debugging information.
Records of user or system actions for auditing, security, and identifying unauthorized activities.
In Kubernetes, logging is the process of collecting, managing, and analyzing logs generated by application containers and Kubernetes components to monitor, debug, and detect issues within the cluster environment.
In this guide, we'll explore the logging process in a Kubernetes environment, starting with its architecture and ending with tools for processing logs.
In this section of the guide, we'll review the architecture of log collection in Kubernetes. The following are the main levels of log collection:
Each pod records logs from its containers generated by the application. You can examine these logs immediately after creating and configuring the pod using the command line.
Image source: kubernetes.io
The image above depicts the logging process at the pod container level. Here, a certain container (app-container
) within the my-pod
pod sends logs to the standard output streams stdout
and stderr
of the containerized application, where stdout
is the output stream and stderr
is the error stream. The kubelet
agent, connected to the container runtime environment via CRI, is responsible for handling and controlling the logs collected by the container.
To view Kubernetes pod logs, use the following command:
kubectl logs <pod_name>
However, you might have multiple containers deployed within your pod but only need logs from one of them. To achieve this, you can add a specific flag and the container name to the previous command:
kubectl logs <pod_name> -c <container_name>
The results of these commands will display logs generated by the container or the entire pod of the application. These logs may contain information about the application's status, errors, or successful and failed operations.
If your worker nodes have systemd
installed, kubelet
records system logs in journald
. To read these logs, use the command:
journalctl -u kubelet
If systemd
is not present on the node, the logs will be recorded in log files in the /var/log
directory.
Kubernetes does not have a built-in capability for cluster-level logging. To implement it, users employ various approaches. Below are two of the most common methods for processing Kubernetes logs:
Using a Logging Agent at the Node Level
Image source: kubernetes.io
One way to collect and process logs in Kubernetes at the cluster level is by using logging-agent
at the node level. This agent (e.g., Fluentd
or Logstash
) is a component that runs on each node in the cluster. It collects, processes, and sends logs to an aggregator. The integration is achieved through a DaemonSet object, which adds a copy of the agent to each worker node in the cluster.
Using a Sidecar Container at the Pod or Node Level
A sidecar is an additional container that runs alongside the main container within the same pod or node. A properly configured sidecar container collects logs from a file, socket, or journald
and then forwards them to its own stdout
or stderr
streams.
Image source: kubernetes.io
To collect logs in different formats, it's recommended to configure multiple sidecar containers. In this case, each container will redirect logs from a shared volume to its own stdout
stream.
To view the collected logs, you can use the familiar command:
kubectl logs <pod_name> <sidecar_container_name>
In this section, we'll examine some popular open-source tools for collecting logs in Kubernetes.
Fluentd and Fluent Bit are two logging agents designed for collecting, filtering, aggregating, and forwarding logs in various environments, including Kubernetes. Fluentd is more suitable for processing collected logs due to the presence of various plugins. Fluent Bit, on the other hand, is ideal for collecting logs and sending them to their final destinations.
Fluent Operator is a tool developed for managing and automatically deploying Fluentd and Fluent Bit agents in a Kubernetes environment. It simplifies the process of installing, configuring, and scaling these agents in a cluster using Custom Resource Definitions (CRD). With Fluent Operator, users can deploy each agent individually or use Fluent Bit in combination with Fluentd.
The workflow with Fluent Operator is depicted in the following image:
Image source: github.com
Below are the custom resources of Fluent Operator:
FluentBit – Used to create Fluent Bit daemons and their configuration.
FluentBitConfig – Used to define a set of plugins (input, output, filtering) managed by FluentBit and to generate the final configuration as a secret.
Input – Configuration module for log collection.
Parser – Configuration module for log parsing.
Filter – Configuration module for log filtering.
Output – Configuration module for sending logs to the specified destination.
The interaction and operation of these resources are shown in the following image:
Image source: github.com
The ELK Stack combines three popular tools for collecting, aggregating, processing, and visualizing log data: Elasticsearch, Logstash, and Kibana. Since 2015, Beats has been added to enhance performance, and the entire stack has been renamed Elastic Stack.
Elasticsearch is a search and analytics engine used for storing and indexing structured and unstructured data, including logs.
Logstash is an agent for collecting, processing, and delivering logs. Sometimes, users replace it with Fluentd, which is considered a more standard solution for the Kubernetes environment. In this case, the stack becomes EFK instead of ELK.
Beats are lightweight agents installed on edge hosts to collect various data types and then forward them to the stack. Examples include Filebeat, Metricbeat, Packetbeat, Winlogbeat, and others.
Kibana is a tool for visualizing and analyzing data integrated with Elasticsearch.
You can see the workflow of the ELK Stack in the image below:
Image source: medium.com
Here, Beats and Logstash collect and process logs, Elasticsearch stores them, and Kibana creates visualizations.
In a Kubernetes environment, many sources of logs need to be collected and processed to improve the system. In this guide, we reviewed the architecture of Kubernetes logging and the process of logging at different levels. Additionally, we demonstrated tools for collecting and aggregating logs in Kubernetes, such as Fluent Operator (Fluentd and Fluent Bit) and Elastic Stack.