Logging in Kubernetes: Collection, Storage, Processing, and Parsing Logs

Logging in Kubernetes: Collection, Storage, Processing, and Parsing Logs
Hostman Team
Technical writer
Kubernetes
16.08.2024
Reading time: 6 min

Logging is the process of collecting, recording, and storing data about various events, actions, and the state of a system or application in a specific format. This process is a key aspect of distributed systems, contributing to what is known as observability.

Image6 (1)

Image source: thealgorists.com

The data collected during the logging process can include:

  • General information about the system's operation.

  • Warnings about potential issues.

  • Records of errors in the system that require attention and resolution.

  • Debugging information.

  • Records of user or system actions for auditing, security, and identifying unauthorized activities.

In Kubernetes, logging is the process of collecting, managing, and analyzing logs generated by application containers and Kubernetes components to monitor, debug, and detect issues within the cluster environment.

In this guide, we'll explore the logging process in a Kubernetes environment, starting with its architecture and ending with tools for processing logs.

Kubernetes Logging Architecture

In this section of the guide, we'll review the architecture of log collection in Kubernetes. The following are the main levels of log collection:

Log Collection at the Pod Level

Each pod records logs from its containers generated by the application. You can examine these logs immediately after creating and configuring the pod using the command line.

Image1

Image source: kubernetes.io

The image above depicts the logging process at the pod container level. Here, a certain container (app-container) within the my-pod pod sends logs to the standard output streams stdout and stderr of the containerized application, where stdout is the output stream and stderr is the error stream. The kubelet agent, connected to the container runtime environment via CRI, is responsible for handling and controlling the logs collected by the container.

To view Kubernetes pod logs, use the following command:

kubectl logs <pod_name>

However, you might have multiple containers deployed within your pod but only need logs from one of them. To achieve this, you can add a specific flag and the container name to the previous command:

kubectl logs <pod_name> -c <container_name>

The results of these commands will display logs generated by the container or the entire pod of the application. These logs may contain information about the application's status, errors, or successful and failed operations.

If your worker nodes have systemd installed, kubelet records system logs in journald. To read these logs, use the command:

journalctl -u kubelet

If systemd is not present on the node, the logs will be recorded in log files in the /var/log directory.

Cluster-Level Log Collection

Kubernetes does not have a built-in capability for cluster-level logging. To implement it, users employ various approaches. Below are two of the most common methods for processing Kubernetes logs:

  • Using a Logging Agent at the Node Level

Image3 (1)

Image source: kubernetes.io

One way to collect and process logs in Kubernetes at the cluster level is by using logging-agent at the node level. This agent (e.g., Fluentd or Logstash) is a component that runs on each node in the cluster. It collects, processes, and sends logs to an aggregator. The integration is achieved through a DaemonSet object, which adds a copy of the agent to each worker node in the cluster.

  • Using a Sidecar Container at the Pod or Node Level

A sidecar is an additional container that runs alongside the main container within the same pod or node. A properly configured sidecar container collects logs from a file, socket, or journald and then forwards them to its own stdout or stderr streams.

Image5

Image source: kubernetes.io

To collect logs in different formats, it's recommended to configure multiple sidecar containers. In this case, each container will redirect logs from a shared volume to its own stdout stream.

To view the collected logs, you can use the familiar command:

kubectl logs <pod_name> <sidecar_container_name>

Kubernetes Logging Tools

In this section, we'll examine some popular open-source tools for collecting logs in Kubernetes.

Fluentd, Fluent Bit, and Fluent Operator

Fluentd and Fluent Bit are two logging agents designed for collecting, filtering, aggregating, and forwarding logs in various environments, including Kubernetes. Fluentd is more suitable for processing collected logs due to the presence of various plugins. Fluent Bit, on the other hand, is ideal for collecting logs and sending them to their final destinations.

Fluent Operator is a tool developed for managing and automatically deploying Fluentd and Fluent Bit agents in a Kubernetes environment. It simplifies the process of installing, configuring, and scaling these agents in a cluster using Custom Resource Definitions (CRD). With Fluent Operator, users can deploy each agent individually or use Fluent Bit in combination with Fluentd.

The workflow with Fluent Operator is depicted in the following image:

Image4

Image source: github.com

Below are the custom resources of Fluent Operator:

  • FluentBit – Used to create Fluent Bit daemons and their configuration.

  • FluentBitConfig – Used to define a set of plugins (input, output, filtering) managed by FluentBit and to generate the final configuration as a secret.

  • Input – Configuration module for log collection.

  • Parser – Configuration module for log parsing.

  • Filter – Configuration module for log filtering.

  • Output – Configuration module for sending logs to the specified destination.

The interaction and operation of these resources are shown in the following image:

02315a69 6200 4f60 Adbc 426c406741be

Image source: github.com

Elasticsearch, Logstash, Beats, and Kibana (ELK Stack/Elastic Stack)

The ELK Stack combines three popular tools for collecting, aggregating, processing, and visualizing log data: Elasticsearch, Logstash, and Kibana. Since 2015, Beats has been added to enhance performance, and the entire stack has been renamed Elastic Stack.

  • Elasticsearch is a search and analytics engine used for storing and indexing structured and unstructured data, including logs.

  • Logstash is an agent for collecting, processing, and delivering logs. Sometimes, users replace it with Fluentd, which is considered a more standard solution for the Kubernetes environment. In this case, the stack becomes EFK instead of ELK.

  • Beats are lightweight agents installed on edge hosts to collect various data types and then forward them to the stack. Examples include Filebeat, Metricbeat, Packetbeat, Winlogbeat, and others.

  • Kibana is a tool for visualizing and analyzing data integrated with Elasticsearch.

You can see the workflow of the ELK Stack in the image below:

Image2 (1)

Image source: medium.com

Here, Beats and Logstash collect and process logs, Elasticsearch stores them, and Kibana creates visualizations.

Conclusion

In a Kubernetes environment, many sources of logs need to be collected and processed to improve the system. In this guide, we reviewed the architecture of Kubernetes logging and the process of logging at different levels. Additionally, we demonstrated tools for collecting and aggregating logs in Kubernetes, such as Fluent Operator (Fluentd and Fluent Bit) and Elastic Stack.

Kubernetes
16.08.2024
Reading time: 6 min

Similar

Kubernetes

Installing MongoDB in a Kubernetes Cluster

MongoDB is a widely used NoSQL database designed to store large volumes of unstructured data. Combined with Kubernetes, MongoDB becomes a powerful solution for scaling databases efficiently within a unified environment. Prerequisites To install MongoDB on Kubernetes, you'll need a configured cloud server (or a physical one) with superuser rights and a Kubernetes cluster. While any OS can be used, Linux is recommended for minimal installation issues. Step-by-Step MongoDB Installation Connect to the Server: Gain superuser access and install necessary software: sudo -s apt-get update && apt install curl apt-transport-https -y && curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - && echo "deb https://apt.kubernetes.io/ kubernetes-xenial main" | tee -a /etc/apt/sources.list.d/kubernetes.list && apt-get update && apt install kubectl -y Configure Kubernetes Environment: Create a directory, add the configuration file, and set the environment variable: mkdir /usr/local/etc/mongo && cd /usr/local/etc/mongo cat << EOF > testcluster.conf<insert your cluster config data here>EOF echo "export KUBECONFIG=testcluster.conf" >> ~/.bashrc Verify Connection: Use kubectl cluster-info to check the connection. A successful connection will display:  Kubernetes control plane is running at <IP>. Create MongoDB Configuration Files: Set up a container for data storage and create a Creds.yaml file for MongoDB credentials. Encrypt login and password using BASE64: echo <unencrypted data> | base64echo <encrypted data> | base64 -d Example: apiVersion: v1 data: username: <username encrypted with BASE64> password: <password encrypted with BASE64> kind: Secret metadata: creationTimestamp: null name: creds Deploy MongoDB: Create a PersistVolClaim.yaml file with MongoDB configuration and deploy it using: kubectl apply -f PersistVolClaim.yaml The file example: apiVersion: apps/v1 kind: Deployment metadata: labels: app: mongo name: mongo spec: replicas: 1 selector: matchLabels: app: mongo strategy: {} template: metadata: labels: app: mongo spec: containers: - image: mongo name: mongo args: ["--dbpath","/data/db"] livenessProbe: exec: command: - mongo - --disableImplicitSessions - --eval readinessProbe: exec: command: - mongo - --disableImplicitSessions - --eval env: - name: MONGO_INITDB_ROOT_USERNAME valueFrom: secretKeyRef: name: creds key: username - name: MONGO_INITDB_ROOT_PASSWORD valueFrom: secretKeyRef: name: creds key: password volumeMounts: - name: "datadir" mountPath: "/data/db" volumes: - name: "datadir" persistentVolumeClaim: claimName: "mongopvc" Test MongoDB Connection: After deploying containers, verify the connection: kubectl exec deployment/client -it -- /bin/bashmongo If everything is connected successfully, the system will display a typical database prompt. To create a new database, simply switch to it; however, note that it will not be saved until you add some data. This can be done as follows: use database_name db.createCollection("newdata") show dbs The last command is used to verify that the newly created database exists. Considerations for MongoDB in Kubernetes Remote Storage: For flexibility, use remote storage for MongoDB to facilitate movement if needed. Resource Management: Configure requests and limits in replica pods to avoid performance issues. Pod Disruption Budget: Set up to maintain the desired number of running replicas. Other Tools and Customization The method of installing MongoDB in Kubernetes described here is one of many options. You can also use software specifically designed to work with Kubernetes, such as Helm or KubeDB. KubeDB, in particular, was created to simplify the integration of other products into Kubernetes. As for Helm, it is another popular solution by VMware (although VMware didn't develop it but acquired and now maintains the product). Another solution is Percona Operator. This modern, open-source application (developed in 2018) is user-friendly and continuously improved by the community. Some people use combined solutions like Percona + Helm. However, installing MongoDB using each of these applications has its nuances, so it's advisable to study these products before proceeding; plenty of documentation is available. In conclusion, you can use a customized MongoDB image to manage a MongoDB cluster in Kubernetes according to your specific needs. For example, the default MongoDB image doesn't include authentication. Therefore, you can download an image with pre-configured authentication or create your own. Of course, using customized Docker images is slightly more complex than the implementation described above. Still, it gives you full control over the database configurations and settings according to your requirements. You can find useful information on customizing the official MongoDB image here. Conclusion With this guide, you can deploy MongoDB in a Kubernetes cluster. However, further tasks will require some knowledge of Kubernetes, so if you're not familiar with it, we recommend first studying the official documentation.
23 August 2024 · 5 min to read
Kubernetes

Kubernetes Cluster: Installation, Configuration, and Management

Kubernetes, or K8s, is an open-source container orchestration platform developed by Google. The core concept behind Kubernetes is that a user installs it on a server, or more likely a cluster, and deploys various workloads on it. Kubernetes addresses challenges related to container creation, scaling, namespaces, access rights, and more. The primary interaction with the cluster is through YAML configuration files. This tutorial will guide you through creating and deploying a Kubernetes cluster locally. Creating Virtual Machines We will set up the Kubernetes cluster on two virtual machines: one acting as the master node and the other as a worker node. While deploying a cluster with only two nodes is not practical for real-world use, it is sufficient for educational purposes. If you wish to create a Kubernetes cluster with more nodes, simply repeat the process for each additional node. We will use Oracle's VirtualBox to create virtual machines, which you can download from this link. After installation, proceed to create the virtual machines. For the operating system, we will use Ubuntu Server, which can be downloaded here. After downloading, open VirtualBox. Click "Create" in VirtualBox to create a new virtual machine. The default settings are sufficient, but allocate 3 GB of RAM and 2 CPUs for the master node (which manages the Kubernetes cluster) and 2 GB of RAM for the worker node. Kubernetes requires a minimum of 2 CPUs for the master node. Create two virtual machines this way. After creating the virtual machines, create a boot image with the Ubuntu Server distribution. Go to "Storage" and click "Choose/Create a Disk Image." Click "Add" and select the Ubuntu Server distribution. Then, start both machines and install the operating system by selecting "Try or Install Ubuntu." During installation, create users for each system and choose the default settings. After installation, shut down both virtual machines and go to their settings. In the "Network" section, change the connection type to "Bridged Adapter" for each system so that the virtual machines can communicate with each other over the network. System Preparation Network Configuration Set the node names for the cluster. On the master node, execute the following command: sudo hostnamectl set-hostname master.local On the worker node, execute: sudo hostnamectl set-hostname worker.local If there are multiple worker nodes, assign each a unique name: worker1.local, worker2.local, and so on. To ensure that nodes are accessible by name, modify the hosts file on each node. Add the following lines: 192.168.43.80     master.local master192.168.43.77     worker.local worker Here, 192.168.43.80 and 192.168.43.77 are the IP addresses of each node. To find the IP address, use the ip addr command: ip addr Locate the IP address next to inet. Open the hosts file and make the necessary edits: sudo nano /etc/hosts To verify that the VMs can communicate with each other, ping the nodes: ping 192.168.43.80 If successful, you will receive a response similar to this: PING 192.168.43.80 (192.168.43.80) 56(84) bytes of data.64 bytes from 192.168.43.80: icmp_seq=1 ttl=64 time=0.054 ms Updating Packages and Installing Additional Utilities Next, install the necessary utilities and packages on each node. These steps should be applied to each node unless specified otherwise. Start by updating the package list and systems: sudo apt-get update && apt-get upgrade -y Then install the following packages: sudo apt-get install curl apt-transport-https git iptables-persistent -y Swap File Kubernetes will not start with an active swap file, so it needs to be disabled: sudo swapoff -a To prevent it from reactivating after a reboot, modify the fstab file: sudo nano /etc/fstab Comment out the line with #: # /swap.img      none    swap    sw      0       0 Kernel Configuration Load additional kernel modules: sudo nano /etc/modules-load.d/k8s.conf Add the following two lines to k8s.conf: br_netfilteroverlay Now, load the modules into the kernel: sudo modprobe br_netfiltersudo modprobe overlay Verify the modules are loaded successfully: sudo lsmod | egrep "br_netfilter|overlay" You should see output similar to this: overlay               147456  0br_netfilter           28672  0bridge                299008  1 br_netfilter Create a configuration file to process traffic through the bridge in netfilter: sudo nano /etc/sysctl.d/k8s.conf Add the following two lines: net.bridge.bridge-nf-call-ip6tables = 1net.bridge.bridge-nf-call-iptables = 1 Apply the settings: sudo sysctl --system Docker Installation Run the following command to install Docker: sudo apt-get install docker docker.io -y For more details on installing Docker on Ubuntu, refer to the official guide. After installation, enable Docker to start on boot and restart the service: sudo systemctl enable dockersudo systemctl restart docker Kubernetes Installation Add the GPG key: sudo curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add - Next, create a repository configuration file: sudo nano /etc/apt/sources.list.d/kubernetes.list Add the following entry: deb https://apt.kubernetes.io/ kubernetes-xenial main Update the apt-get package list: sudo apt-get update Install the following packages: sudo apt-get install kubelet kubeadm kubectl Installation is now complete. Verify the Kubernetes client version: sudo kubectl version --client  The output should be similar to this: Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.2"} Cluster Configuration Master Node Run the following command for the initial setup and preparation of the master node: sudo kubeadm init --pod-network-cidr=10.244.0.0/16 The --pod-network-cidr flag specifies the internal subnet address, with 10.244.0.0/16 being the default value. The process will take a few minutes. Upon completion, you will see the following message: Then you can join any number of worker nodes by running the following on each as root:kubeadm join 192.168.43.80:6443 --token f7sihu.wmgzwxkvbr8500al \--discovery-token-ca-cert-hash sha256:6746f66b2197ef496192c9e240b31275747734cf74057e04409c33b1ad280321 Save this command to connect the worker nodes to the master node. Create the KUBECONFIG environment variable: export KUBECONFIG=/etc/kubernetes/admin.conf Install the Container Network Interface (CNI): kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml Worker Node On the worker node, run the kubeadm join command obtained during the master node setup. After this, on the master node, enter: sudo kubectl get nodes The output should be: NAME                 STATUS      ROLES                        AGE    VERSIONmaster.local          Ready      control-plane,master          10m    v1.24.2worker.local          Ready      <none>                        79s    v1.24.2 The cluster is now deployed and ready for operation. Conclusion Setting up a Kubernetes cluster involves several steps, from creating and configuring virtual machines to installing and configuring the necessary software components. This tutorial provided a step-by-step guide to deploying a basic Kubernetes cluster on a local environment. While this setup is suitable for educational purposes, real-world deployments typically involve more nodes and more complex configurations. Kubernetes provides powerful tools for managing containerized applications, making it a valuable skill for modern IT professionals. By following this guide, you've taken the first steps in mastering Kubernetes and its ecosystem.
22 August 2024 · 7 min to read
Kubernetes

Running Kubernetes Clusters in the Cloud with VMware

Containerization is an effective way to deliver applications to customers. If your cloud IT infrastructure is deployed on VMware, you can use CSE, or Container Service Extension, to work with Kubernetes (K8s). This solution significantly accelerates the time from receiving code to deploying it in a production cloud system by automating the management (orchestration) of containers with the software. What is CSE? CSE is an extension to the VMware vCloud Director (VCD) platform that adds functionality for interacting with Kubernetes clusters—from creation to lifecycle management. Its installation allows for a comprehensive approach, integrating the management of both legacy and containerized applications within a single VMware infrastructure, while maintaining uniformity and a systematic management approach. Key features The CSE client facilitates cluster deployment, adds worker nodes, and configures NFS storage. A vCloud Director-based cloud offers high-security, multi-tenant (user-isolated) computing resources. The CSE server is a tool for configuring the configuration file and virtual machine templates. Creating and managing Kubernetes clusters in VMware is relatively complex, especially compared to tools like Docker Swarm, another cluster management tool for remote hosts. Kubernetes is often compared to vSphere, but the discussed platform offers more extensive functionality for managing a containerized IT infrastructure. This compensates for the drawbacks of a complex architecture and the high cost of the product. CSE Features The first thing the developers highlight about CSE is the ability to save on the already implemented VMware vCloud Director platform. All previously installed applications will continue to function as before (virtually invisible to the end client), while adding the ability to work with VMware Container. System resilience remains high regardless of traffic uniformity or platform load dynamics. Benefits of implementing the extension: A tool for managing clusters, node pools, and other resources. Significantly reduced time-to-market for any new developments. Increased availability of web resources, including cloud applications. Automatic server load distribution. Improved reliability and performance of CI/CD processes. The number of containers is unlimited as long as the physical server's resources (memory, CPU, etc.) are sufficient. This allows for parallel development of different projects that are initially isolated from each other. There are also no restrictions on the installed operating systems or programming languages. This is convenient when operating in an international market, even with just one physical server. Installing the CSE Extension in vcd-cli The vcd-cli (Command Line Interface) tool manages the infrastructure from the command line. By default, it does not support working with CSE. To enable it, you need to install the container-service-extension add-on: python3 -m pip install container-service-extension Next, you need to add the extension to the vcd-cli configuration file, located at ~/.vcd-cli/profiles.yaml. Open this file with a text editor and find the line active with the following value: extensions:- container_service_extension.client.cse After saving the changes to the configuration file, log in: vcd login <host> <organization_name> <login> Now, verify that the extension is indeed installed and actively interacting with the host: vcd cse versionCSE, Container Service Extension for VMware vCloud Director, version 3.0.1 vcd cse system infoproperty     value-----------  ------------------------------------------------------description  Container Service Extension for VMware vCloud Directorproduct      CSEversion      2.6.1 Creating a Kubernetes Cluster Next, let's look at activating a Kubernetes cluster within VMware. Integration with the vCloud Director platform allows managing the process from a single point in a familiar interface. Data center resources are typically pooled, and deployment is done through VM templates with pre-installed and pre-configured Kubernetes. You can create a cluster manually with the command: vcd cse cluster create <cluster_name> \        --network <network_name> \         --ssh-key ~/.ssh/id_rsa.pub \        --nodes <number_of_nodes> \        --template <template_name> The cluster and network names are mandatory. The rest are optional and will default if omitted. You can check the full list of active templates with the command: vcd cse template list The selected network must be of type Routed and connected to the internet. If either of these conditions is not met, the cluster initialization process will stall during the master node generation. You can use a "grey" network with NAT or Direct Connect technology. The result of the cluster creation will be visible in the vCloud Director platform's web interface, in the vApps section. After monitoring the status, the final step is to create a configuration file for Kubernetes. Generate it with the command: vcd cse cluster config <cluster_name> > config Then move the file to an appropriate location with the commands: mkdir ~/.kube/configcp config ~/.kube/config The cluster is now fully ready for use—from setting user parameters to deploying virtual machines, applications, and more. However, keep in mind that emulating containerization does have some limitations. Implementation Features For instance, the CSE extension does not support the LoadBalancer service type. Therefore, Kubernetes manifests using it (plus Ingress) will not work correctly. There are solutions to this drawback, and we'll discuss two of the most popular—MetalLB and Project Contour. MetalLB Using MetalLB with Kubernetes involves a load balancer that replaces cloud routing protocols with standard LB protocols. Here's an example of how to use it. 1) Create a namespace and add MetalLB using manifests: kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.5/manifests/namespace.yamlkubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.5/manifests/metallb.yaml 2) Next, configure node connection security. Without this, the transmitted pods will go into a CreateContainerConfigError status, and error messages such as secret memberlist not found will appear in the logs: kubectl create secret generic -n metallb-system memberlist --from-literal=secretkey="$(openssl rand -base64 128)" 3) Check the current status of the utility. If configured correctly, the controller and speaker will be displayed as running: kubectl get pod --namespace=metallb-system  NAME                          READY   STATUS    RESTARTS   AGEcontroller-57f648cb96-2lzm4   1/1     Running   0          5h52mspeaker-ccstt                 1/1     Running   0          5h52mspeaker-kbkps                 1/1     Running   0          5h52mspeaker-sqfqz                 1/1     Running   0          5h52m 4) Finally, manually create a configuration file: apiVersion: v1 kind: ConfigMap metadata:   namespace: metallb-system   name: config data:   config: |     address-pools:     - name: default       protocol: layer2       addresses:      - X.X.X.101-X.X.X.102 You should fill in the addresses parameter with the addresses that remain free and will handle the load balancing. Apply the configuration file: kubectl apply -f metallb-config.yaml The procedure for setting up a LoadBalancer for Kubernetes using MetalLB is complete; next is Ingress support, which is easier to implement with another tool. Project Contour Create a manifest with Project Contour using the command: kubectl apply -f https://projectcontour.io/quickstart/contour.yaml This command automatically deploys the Envoy proxy server, which listens on the standard ports 80 and 443. Conclusion Integrating Kubernetes into VMware with the Container Service Extension (CSE) unifies the management of legacy and containerized applications within VMware vCloud Director. While the setup may be complex, CSE enhances application deployment, scaling, and management, offering a resilient and scalable infrastructure. Despite some limitations, such as native LoadBalancer support, tools like MetalLB and Project Contour provide effective solutions. Overall, CSE empowers organizations to modernize their IT infrastructure, accelerating development and optimizing resources within a secure, multi-tenant cloud environment.
22 August 2024 · 7 min to read

Do you have questions,
comments, or concerns?

Our professionals are available to assist you at any moment,
whether you need help or are just unsure of where to start.
Email us
Hostman's Support