Sign In
Sign In

Logging in Kubernetes: Collection, Storage, Processing, and Parsing Logs

Logging in Kubernetes: Collection, Storage, Processing, and Parsing Logs
Hostman Team
Technical writer
Kubernetes
16.08.2024
Reading time: 6 min

Logging is the process of collecting, recording, and storing data about various events, actions, and the state of a system or application in a specific format. This process is a key aspect of distributed systems, contributing to what is known as observability.

Image6 (1)

Image source: thealgorists.com

The data collected during the logging process can include:

  • General information about the system's operation.

  • Warnings about potential issues.

  • Records of errors in the system that require attention and resolution.

  • Debugging information.

  • Records of user or system actions for auditing, security, and identifying unauthorized activities.

In Kubernetes, logging is the process of collecting, managing, and analyzing logs generated by application containers and Kubernetes components to monitor, debug, and detect issues within the cluster environment.

In this guide, we'll explore the logging process in a Kubernetes environment, starting with its architecture and ending with tools for processing logs.

Kubernetes Logging Architecture

In this section of the guide, we'll review the architecture of log collection in Kubernetes. The following are the main levels of log collection:

Log Collection at the Pod Level

Each pod records logs from its containers generated by the application. You can examine these logs immediately after creating and configuring the pod using the command line.

Image1

Image source: kubernetes.io

The image above depicts the logging process at the pod container level. Here, a certain container (app-container) within the my-pod pod sends logs to the standard output streams stdout and stderr of the containerized application, where stdout is the output stream and stderr is the error stream. The kubelet agent, connected to the container runtime environment via CRI, is responsible for handling and controlling the logs collected by the container.

To view Kubernetes pod logs, use the following command:

kubectl logs <pod_name>

However, you might have multiple containers deployed within your pod but only need logs from one of them. To achieve this, you can add a specific flag and the container name to the previous command:

kubectl logs <pod_name> -c <container_name>

The results of these commands will display logs generated by the container or the entire pod of the application. These logs may contain information about the application's status, errors, or successful and failed operations.

If your worker nodes have systemd installed, kubelet records system logs in journald. To read these logs, use the command:

journalctl -u kubelet

If systemd is not present on the node, the logs will be recorded in log files in the /var/log directory.

Cluster-Level Log Collection

Kubernetes does not have a built-in capability for cluster-level logging. To implement it, users employ various approaches. Below are two of the most common methods for processing Kubernetes logs:

  • Using a Logging Agent at the Node Level

Image3 (1)

Image source: kubernetes.io

One way to collect and process logs in Kubernetes at the cluster level is by using logging-agent at the node level. This agent (e.g., Fluentd or Logstash) is a component that runs on each node in the cluster. It collects, processes, and sends logs to an aggregator. The integration is achieved through a DaemonSet object, which adds a copy of the agent to each worker node in the cluster.

  • Using a Sidecar Container at the Pod or Node Level

A sidecar is an additional container that runs alongside the main container within the same pod or node. A properly configured sidecar container collects logs from a file, socket, or journald and then forwards them to its own stdout or stderr streams.

Image5

Image source: kubernetes.io

To collect logs in different formats, it's recommended to configure multiple sidecar containers. In this case, each container will redirect logs from a shared volume to its own stdout stream.

To view the collected logs, you can use the familiar command:

kubectl logs <pod_name> <sidecar_container_name>

Kubernetes Logging Tools

In this section, we'll examine some popular open-source tools for collecting logs in Kubernetes.

Fluentd, Fluent Bit, and Fluent Operator

Fluentd and Fluent Bit are two logging agents designed for collecting, filtering, aggregating, and forwarding logs in various environments, including Kubernetes. Fluentd is more suitable for processing collected logs due to the presence of various plugins. Fluent Bit, on the other hand, is ideal for collecting logs and sending them to their final destinations.

Fluent Operator is a tool developed for managing and automatically deploying Fluentd and Fluent Bit agents in a Kubernetes environment. It simplifies the process of installing, configuring, and scaling these agents in a cluster using Custom Resource Definitions (CRD). With Fluent Operator, users can deploy each agent individually or use Fluent Bit in combination with Fluentd.

The workflow with Fluent Operator is depicted in the following image:

Image4

Image source: github.com

Below are the custom resources of Fluent Operator:

  • FluentBit – Used to create Fluent Bit daemons and their configuration.

  • FluentBitConfig – Used to define a set of plugins (input, output, filtering) managed by FluentBit and to generate the final configuration as a secret.

  • Input – Configuration module for log collection.

  • Parser – Configuration module for log parsing.

  • Filter – Configuration module for log filtering.

  • Output – Configuration module for sending logs to the specified destination.

The interaction and operation of these resources are shown in the following image:

02315a69 6200 4f60 Adbc 426c406741be

Image source: github.com

Elasticsearch, Logstash, Beats, and Kibana (ELK Stack/Elastic Stack)

The ELK Stack combines three popular tools for collecting, aggregating, processing, and visualizing log data: Elasticsearch, Logstash, and Kibana. Since 2015, Beats has been added to enhance performance, and the entire stack has been renamed Elastic Stack.

  • Elasticsearch is a search and analytics engine used for storing and indexing structured and unstructured data, including logs.

  • Logstash is an agent for collecting, processing, and delivering logs. Sometimes, users replace it with Fluentd, which is considered a more standard solution for the Kubernetes environment. In this case, the stack becomes EFK instead of ELK.

  • Beats are lightweight agents installed on edge hosts to collect various data types and then forward them to the stack. Examples include Filebeat, Metricbeat, Packetbeat, Winlogbeat, and others.

  • Kibana is a tool for visualizing and analyzing data integrated with Elasticsearch.

You can see the workflow of the ELK Stack in the image below:

Image2 (1)

Image source: medium.com

Here, Beats and Logstash collect and process logs, Elasticsearch stores them, and Kibana creates visualizations.

Conclusion

In a Kubernetes environment, many sources of logs need to be collected and processed to improve the system. In this guide, we reviewed the architecture of Kubernetes logging and the process of logging at different levels. Additionally, we demonstrated tools for collecting and aggregating logs in Kubernetes, such as Fluent Operator (Fluentd and Fluent Bit) and Elastic Stack.

Kubernetes
16.08.2024
Reading time: 6 min

Do you have questions,
comments, or concerns?

Our professionals are available to assist you at any moment,
whether you need help or are just unsure of where to start
Email us