Sign In
Sign In

How to Monitor Apache Kafka

How to Monitor Apache Kafka
Hostman Team
Technical writer
Kafka
08.07.2025
Reading time: 17 min

With the development of microservice architecture, new tools are emerging that make working with microservice applications easier and more streamlined. One of these tools is Apache Kafka — a popular platform and system for stream data processing and real-time messaging. It is used by various companies around the world to build scalable message transmission systems, data analytics, and integration with microservice applications.

As a core service in application architecture, Kafka requires monitoring. Without proper monitoring, the cluster may experience failures, which could lead to data loss or leaks of information. Today, we will examine in detail how to organize monitoring for Apache Kafka.

Apache Kafka Architecture

Before moving on to the process of organizing monitoring and securing Kafka, let’s break down the program’s architecture.

Kafka is a distributed system consisting of several key components:

  • Brokers — physical or virtual servers (hosts) that receive, store, and process messages. Each broker is responsible for specific topic partitions.

  • Topics — logical categories where messages arrive. Topics are divided into partitions for parallel processing.

  • Producers — data sources or, more simply, clients that send data to topics.

  • Consumers — clients that read data from topics, often combined in groups for load distribution.

  • ZooKeeper — used to coordinate brokers and also stores metadata and configuration. Starting from version 3.3+, it is possible to work without ZooKeeper thanks to KRaft (a protocol for storing and managing metadata inside Kafka). The key feature of KRaft is eliminating Apache Kafka’s dependence on an external ZooKeeper service.

Messages in Kafka are key-value pairs written to partitions as logs. Consumers read these messages by tracking their position in the log. This architecture ensures high throughput but makes the system vulnerable to failures if monitoring and security are not given sufficient attention.

Monitoring

Kafka often plays the role of a central component in the infrastructure of large applications, especially when used in microservice architecture. For example, it can transmit millions of events per second between multiple systems or databases. Any delay, failure, or data loss can lead to serious consequences, including financial losses or data loss. Therefore, it is necessary to build Kafka monitoring that will address the following tasks:

  • Performance control. Broker performance decreases if there are delivery delays or if the broker itself is overloaded. These actions slow down the entire data processing chain.

  • Data integrity control. With data integrity monitoring, it is possible to minimize problems associated with message loss, duplication, or data corruption.

  • Scaling planning. Monitoring helps understand when to add brokers (horizontal scaling) or increase server resources (vertical scaling).

Key Metrics for Kafka Monitoring

Effective monitoring requires tracking metrics at all system levels. Let’s look at the main categories and examples.

  1. Broker Metrics

    • Incoming and Outgoing Traffic. Shows how much data the broker receives and sends. If the values approach network or disk limits, this is a signal for scaling.

    • Request Processing Latency. The average time to process requests from clients. Growth in latency may indicate a lack of resources.

    • Number of Active Connections. An abnormally high number of connections may indicate an attack or incorrect client behavior.

    • Resource Utilization. CPU, RAM, and disk space usage.

  2. Topic and Partition Metrics

    • Log Size. The total volume of data in a topic. If it grows uncontrollably, the cleanup policy should be reviewed.

    • Number of Messages. Data arrival rate. Sharp spikes may indicate peak loads.

    • Offset. The position of the last recorded message and the position up to which consumers have read.

  3. Consumer and Producer Metrics

    • Consumer Lag. The lag of consumers behind producers. For example, if the lag exceeds 10,000 messages, it may mean that consumers cannot keep up with processing.

    • Producer Request Rate. The frequency of producer requests. A drop in this metric may signal failures on the sender side.

    • Fetch Latency. The time required by the consumer to fetch data. High values indicate network or broker problems.

Kafka Monitoring Setup

Let’s break down how to set up Kafka monitoring in practice.

Prerequisites

We will need one server or virtual machine with any pre-installed Linux distribution. In this article, we will use Ubuntu 24.04 as an example.

The server must meet the following requirements:

  • At least 4 GB of RAM. This amount is suitable only for setting up and test usage of Apache Kafka and is not intended for high-resource tasks. For more serious tasks, at least 8 GB of RAM is required.

  • At least a single-core processor for basic configuration. For real workloads (for example, working with large data volumes, mathematical or scientific calculations), a 4-core processor is recommended.

  • A public IP address, which can be rented when creating the server in the “Network” section.

The server can be created in the control panel under Cloud Servers. During setup, we recommend choosing a region with minimal ping for fast data transfer. Other parameters can be left unchanged.

The server will launch in a couple of minutes, and you will find its IP address, login, and password in the server’s dashboard.

Installing and Launching Apache Kafka

Let’s start by installing Kafka using these steps:

  1. Update the repository index and install the OpenJDK 11 package needed to run Kafka:

apt update && apt -y install openjdk-11-jdk
  1. Check that Java was successfully installed by displaying its version:

java -version

Image1

If a version is returned, Java was successfully installed.

  1. Next, use wget to download the program archive (used version — 3.9.1):

wget https://downloads.apache.org/kafka/3.9.1/kafka_2.13-3.9.1.tgz
  1. Unpack the downloaded archive with the command:

tar -xvzf kafka_2.13-3.9.1.tgz

A directory named kafka_2.13-3.9.1 will appear. Move it to /opt/kafka:

mv kafka_2.13-3.9.1 /opt/kafka
  1. Next, for convenient Kafka management, create systemd units. Let’s start with ZooKeeper. Using any text editor, create a file zookeeper.service:

nano /etc/systemd/system/zookeeper.service

Use the following content:

[Unit]
Description=Apache Zookeeper service
Requires=network.target remote-fs.target
After=network.target remote-fs.target

[Service]
Type=simple
ExecStart=/opt/kafka/bin/zookeeper-server-start.sh /opt/kafka/config/zookeeper.properties
ExecStop=/opt/kafka/bin/zookeeper-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

Save changes and exit the file.

Also create a systemd file for Kafka:

nano /etc/systemd/system/kafka.service

Use this content:

[Unit]
Description=Apache Kafka Service
Requires=zookeeper.service

[Service]
Type=simple
Environment="JAVA_HOME=/usr/lib/jvm/java-1.11.0-openjdk-amd64"
ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties
ExecStop=/opt/kafka/bin/kafka-server-stop.sh

[Install]
WantedBy=multi-user.target
  1. Reload the daemon configuration files with:

systemctl daemon-reload
  1. Start ZooKeeper:

systemctl start zookeeper

Check its status:

systemctl status zookeeper

It should show active (running) indicating ZooKeeper started successfully.

Next, start Kafka:

systemctl start kafka

And also check its status:

systemctl status kafka

It should show active (running) indicating Kafka started successfully.

  1. Additionally, create a separate user who will be assigned as the owner of all Kafka-related files and directories:

useradd -r -m -s /bin/false kafka
  1. Set the necessary permissions:

chown -R kafka:kafka /opt/kafka

Testing the Installation

After both services—ZooKeeper and Kafka—have been started, let’s test Kafka’s operation.

All commands below should be run from the /opt/kafka directory:

cd /opt/kafka
  1. Create a new topic called new-topic1:

bin/kafka-topics.sh --create --topic new-topic1 --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1

If successful, the terminal will display Created topic new-topic1.

  1. Also list all topics in the current Kafka instance:

bin/kafka-topics.sh --list --bootstrap-server localhost:9092

Image4

The topic new-topic1 should be listed.

  1. Next, test the producer. Launch it with:

bin/kafka-console-producer.sh --topic new-topic1 --bootstrap-server localhost:9092
  1. Send a test message:

Hello from kafka!
  1. Without closing the current SSH session, open a new one and go to /opt/kafka:

cd /opt/kafka

Start the consumer:

bin/kafka-console-consumer.sh --topic new-topic1 --from-beginning --bootstrap-server localhost:9092

If everything works correctly, you will see the previously sent message.

Image12

Installing Prometheus

  1. Create a user named prometheus:

useradd --no-create-home --shell /bin/false prometheus
  1. Create directories for Prometheus configuration files:

mkdir /etc/prometheus
mkdir /var/lib/prometheus
  1. Assign the directory owner:

chown prometheus:prometheus /var/lib/prometheus
  1. Move to the /tmp directory:

cd /tmp/

And download the program archive:

wget https://github.com/prometheus/prometheus/releases/download/v2.53.5/prometheus-2.53.5.linux-amd64.tar.gz
  1. Unpack the downloaded archive:

tar xvfz prometheus-2.53.5.linux-amd64.tar.gz
  1. Go into the extracted directory:

cd prometheus-2.53.5.linux-amd64
  1. Move the console directory, prometheus.yml config file, and the Prometheus binary, and set ownership:

mv console* /etc/prometheus
mv prometheus.yml /etc/prometheus
mv prometheus /usr/local/bin/
chown -R prometheus:prometheus /etc/prometheus
chown prometheus:prometheus /usr/local/bin/prometheus
  1. Additionally, create a systemd unit for Prometheus:

nano /etc/systemd/system/prometheus.service

Use the following content:

[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
  --config.file /etc/prometheus/prometheus.yml \
  --storage.tsdb.path /var/lib/prometheus/ \
  --web.console.templates=/etc/prometheus/consoles \
  --web.console.libraries=/etc/prometheus/console_libraries

[Install]
WantedBy=multi-user.target
  1. By default, Prometheus is only accessible from localhost. Let’s allow access from all addresses by editing the main config:

nano /etc/prometheus/prometheus.yml

At the end of the file, find the targets parameter under static_configs and replace localhost with the external IP address of your server (you will have your own external IP). 

static_configs:
  - targets: ["166.1.227.100:9090"]

Save and exit.

  1. Start Prometheus, add it to autostart, and check its status:

systemctl start prometheus && systemctl enable prometheus && systemctl status prometheus

If the status shows active (running), Prometheus has started successfully.

Restart the systemd daemon and Prometheus and check its status again:

systemctl daemon-reload && systemctl restart prometheus && systemctl status prometheus

If active (running) is displayed, Prometheus is successfully running.

Now go to your browser using the server’s IP address and port 9090 (default Prometheus port). You should see the program’s web interface.

Installing Grafana

  1. Install the necessary packages:
apt-get install -y apt-transport-https software-properties-common wget
  1. Create a directory to store the key:

mkdir -p /etc/apt/keyrings/
  1. Import the GPG key:
wget -q -O - https://apt.grafana.com/gpg.key | gpg --dearmor | sudo tee /etc/apt/keyrings/grafana.gpg > /dev/null
  1. Add the repository:

echo "deb [signed-by=/etc/apt/keyrings/grafana.gpg] https://apt.grafana.com stable main" | tee -a /etc/apt/sources.list.d/grafana.list
  1. Update the package index and install Grafana:

apt update && apt -y install grafana
  1. Start the service with the following commands:

systemctl daemon-reload && systemctl enable grafana-server && systemctl start grafana-server

Check Grafana’s status:

systemctl status grafana-server

If it shows active (running), Grafana has started successfully.

Using the server’s IP address and port 3000 (Grafana’s default port), go to the web interface. The initial login and password for the web interface are admin / admin. On first login, the system will prompt you to set a new password for the admin user.

Image5

After authentication, the web interface will open.

Image10

Installing JMX Exporter

JMX Exporter is a utility that collects and transmits metrics from applications running on Java to monitoring systems such as Prometheus. To install JMX Exporter, you need to perform the following steps:

  1. Download the utility from the official repository using wget:

wget https://repo.maven.apache.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.20.0/jmx_prometheus_javaagent-0.20.0.jar
  1. Move the downloaded JAR file to the /opt/kafka/libs directory:

mv jmx_prometheus_javaagent-0.20.0.jar /opt/kafka/libs/
  1. Open the kafka-server-start.sh file for editing:

nano /opt/kafka/bin/kafka-server-start.sh

And add the following lines at the very end of the file:

KAFKA_OPTS="-javaagent:/opt/kafka/libs/jmx_prometheus_javaagent-0.20.0.jar=9091:/etc/prometheus/prometheus.yml"
KAFKA_OPTS="-javaagent:/opt/kafka/libs/jmx_prometheus_javaagent-0.20.0.jar=9091:/opt/kafka/config/sample_jmx_exporter.yml"

Save the changes and exit the file.

  1. Restart Kafka using the commands:

systemctl daemon-reload && systemctl restart kafka

Configuring JMX Exporter

Let's proceed to configure JMX Exporter.

  1. Go to the /opt/kafka/config directory:

cd /opt/kafka/config
  1. Create the sample_jmx_exporter.yml file:

nano sample_jmx_exporter.yml

And use the following content:

lowercaseOutputName: true

rules:
# Special cases and very specific rules
- pattern : kafka.server<type=(.+), name=(.+), clientId=(.+), topic=(.+), partition=(.*)><>Value
  name: kafka_server_$1_$2
  type: GAUGE
  labels:
    clientId: "$3"
    topic: "$4"
    partition: "$5"
- pattern : kafka.server<type=(.+), name=(.+), clientId=(.+), brokerHost=(.+), brokerPort=(.+)><>Value
  name: kafka_server_$1_$2
  type: GAUGE
  labels:
    clientId: "$3"
    broker: "$4:$5"
- pattern : kafka.coordinator.(\w+)<type=(.+), name=(.+)><>Value
  name: kafka_coordinator_$1_$2_$3
  type: GAUGE

# Generic per-second counters with 0-2 key/value pairs
- pattern: kafka.(\w+)<type=(.+), name=(.+)PerSec\w*, (.+)=(.+), (.+)=(.+)><>Count
  name: kafka_$1_$2_$3_total
  type: COUNTER
  labels:
    "$4": "$5"
    "$6": "$7"
- pattern: kafka.(\w+)<type=(.+), name=(.+)PerSec\w*, (.+)=(.+)><>Count
  name: kafka_$1_$2_$3_total
  type: COUNTER
  labels:
    "$4": "$5"
- pattern: kafka.(\w+)<type=(.+), name=(.+)PerSec\w*><>Count
  name: kafka_$1_$2_$3_total
  type: COUNTER

- pattern: kafka.server<type=(.+), client-id=(.+)><>([a-z-]+)
  name: kafka_server_quota_$3
  type: GAUGE
  labels:
    resource: "$1"
    clientId: "$2"

- pattern: kafka.server<type=(.+), user=(.+), client-id=(.+)><>([a-z-]+)
  name: kafka_server_quota_$4
  type: GAUGE
  labels:
    resource: "$1"
    user: "$2"
    clientId: "$3"

# Generic gauges with 0-2 key/value pairs
- pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.+), (.+)=(.+)><>Value
  name: kafka_$1_$2_$3
  type: GAUGE
  labels:
    "$4": "$5"
    "$6": "$7"
- pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.+)><>Value
  name: kafka_$1_$2_$3
  type: GAUGE
  labels:
    "$4": "$5"
- pattern: kafka.(\w+)<type=(.+), name=(.+)><>Value
  name: kafka_$1_$2_$3
  type: GAUGE

# Emulate Prometheus 'Summary' metrics for the exported 'Histogram's.
#
# Note that these are missing the '_sum' metric!
- pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.+), (.+)=(.+)><>Count
  name: kafka_$1_$2_$3_count
  type: COUNTER
  labels:
    "$4": "$5"
    "$6": "$7"
- pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.*), (.+)=(.+)><>(\d+)thPercentile
  name: kafka_$1_$2_$3
  type: GAUGE
  labels:
    "$4": "$5"
    "$6": "$7"
    quantile: "0.$8"
- pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.+)><>Count
  name: kafka_$1_$2_$3_count
  type: COUNTER
  labels:
    "$4": "$5"
- pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.*)><>(\d+)thPercentile
  name: kafka_$1_$2_$3
  type: GAUGE
  labels:
    "$4": "$5"
    quantile: "0.$6"
- pattern: kafka.(\w+)<type=(.+), name=(.+)><>Count
  name: kafka_$1_$2_$3_count
  type: COUNTER
- pattern: kafka.(\w+)<type=(.+), name=(.+)><>(\d+)thPercentile
  name: kafka_$1_$2_$3
  type: GAUGE
  labels:
    quantile: "0.$4"

Save the changes and exit the file.

  1. Next, open the main Prometheus configuration file prometheus.yml for editing:

nano /etc/prometheus/prometheus.yml

We need to add the Kafka endpoint so that Prometheus can collect data. To do this, at the very bottom add the following block, where 166.1.227.100 is the external IP address of the server (do not forget to change it to your actual external IP address):

- job_name: 'kafka'
  static_configs:
    - targets: ["166.1.227.100:9091"]

Save the changes and exit the file.

  1. Restart Prometheus and check its status:

systemctl daemon-reload && systemctl restart prometheus && systemctl status prometheus

Configuring Kafka

Next, it is necessary to make changes when starting Kafka by adding the paths to the Prometheus and JMX Exporter files.

  1. Open the Kafka systemd file for editing:

nano /etc/systemd/system/kafka.service

And add the following lines to the [Service] block:

Environment="KAFKA_OPTS=-javaagent:/opt/kafka/libs/jmx_prometheus_javaagent-0.20.0.jar=9091:/etc/prometheus/prometheus.yml"
Environment="KAFKA_OPTS=-javaagent:/opt/kafka/libs/jmx_prometheus_javaagent-0.20.0.jar=9091:/opt/kafka/config/sample_jmx_exporter.yml"

Image8

Save the changes and exit the file.

  1. Restart Kafka and check its status:

systemctl daemon-reload && systemctl restart kafka && systemctl status kafka
  1. Go to the Prometheus web interface, then to the Status section, and in the dropdown menu select Targets:

Image7

A new data source for Kafka will appear.

Connecting Prometheus Metrics to Grafana

The final step is to add the metrics from Prometheus to Grafana to build visualizations using graphs.

  1. Go to the Grafana web interface. On the left panel, select the Connections menu, then in the dropdown go to the Data sources section.
  2. In the opened section, click the Add data source button.
  3. Then select Prometheus as the data source.
  4. As the name of the source, specify Kafka (you can choose any other unused name), and as the address, specify the IP address and port where Prometheus is located.
  5. Click the Save & test button.

If connected to Prometheus successfully, a corresponding message will be displayed.

Image9

Creating a Visualization in Grafana

After we have configured monitoring, it is time to add a dashboard for visualization in Grafana.

  1. On the left panel, go to the Dashboards section.

  2. In the opened window, click the New button on the right and in the dropdown menu select New dashboard.

  3. Next, go to the Import dashboard section:

Image6

  1. Use dashboard number 11962 to add it to Grafana and click the Load button:

Image3

In the opened section, you can set a name for the dashboard. At the bottom, as the data source, select the previously added Prometheus instance:

Image2

Click the Import button.

Creating a Test Load

The added dashboard currently does not show any load. Let’s simulate it ourselves.

  1. On the server, go to the /opt/kafka directory:

cd /opt/kafka
  1. Create a new topic named test-load:

bin/kafka-topics.sh --create --topic test-load --bootstrap-server localhost:9092 --partitions 4 --replication-factor 1
  1. Kafka has a built-in tool kafka-producer-perf-test.sh, which allows you to simulate message sending by a producer. Let’s launch it to create a test load:

bin/kafka-producer-perf-test.sh --topic test-load --num-records 1000000 --record-size 100 --throughput -1 --producer-props bootstrap.servers=localhost:9092

The command above will generate and send 1,000,000 messages.

  1. Also, create a load by consuming another 1,000,000 messages with a consumer:

bin/kafka-consumer-perf-test.sh --topic test-load --messages 1000000 --broker-list localhost:9092 --group test-group
  1. Go to the Grafana dashboard and you can observe the graphs:

Image11

Conclusion

Monitoring Apache Kafka is a complex and comprehensive process that requires maximum attention to detail. The process starts with metrics collection, which can be organized using modern tools like Prometheus and Grafana. Once the metrics are set up, it is necessary to regularly check the cluster’s state for possible problems. Proper monitoring ensures stability of operation. Apache Kafka is a powerful tool that will fully reveal its potential only with correct setup and operation.

Kafka
08.07.2025
Reading time: 17 min

Similar

Kafka

Installing and Configuring Kafka on Windows, Ubuntu, and Other Operating Systems

A message broker is a software that acts as an intermediary for sending messages between different applications. It functions like a server that receives messages from one application (called the producer) and routes them to one or more other applications (called consumers). The main purpose of a message broker is to ensure robust and reliable communication between different systems without requiring these systems to always be available or directly connected. This allows applications to work asynchronously, providing fault tolerance and the ability to operate in real time. A message broker can accept messages from multiple sources and efficiently route them to the appropriate receiver. Depending on the required business logic, messages can be grouped into topics or queues. There are many different message brokers, each with its own features and advantages. In this article, we'll focus on Kafka. Apache Kafka is a fast and scalable message broker capable of handling millions of messages per second. It is particularly valued for its fault tolerance and ability to store data for extended periods. Originally developed by LinkedIn, Kafka is now the most popular open-source solution in the message broker space and is licensed by the Apache Software Foundation. It is widely used to build real-time data pipelines and streaming applications. Moving and processing data streams between systems or applications is a critical task, and Kafka excels in helping users handle data streams in real-time with minimal latency. As a distributed system, Kafka is divided across multiple servers, which can store and process data streams in parallel. This distribution allows Kafka to provide real-time data processing for many different sources, ensuring reliability and resilience against system failures. In this article, we will explore how to install and configure Kafka on Windows, Ubuntu, and macOS,  so that you can take full advantage of it for your projects.  System Requirements Apache Kafka is designed to maximize the efficiency of the hardware it runs on. However, there are some general recommendations to keep in mind when setting up a system to work with this broker: Processor (CPU): Kafka usually doesn't require a lot of processing power since most operations are performed using direct disk access (zero-copy). However, the number of CPU cores can impact throughput. Memory (RAM): Having at least 8GB of RAM is recommended, but the final amount will depend heavily on the data load and the number of parallel operations. Disk Space: Kafka efficiently uses the file system and direct disk writing. It is preferable to use an SSD with high read/write speeds. It's also recommended that a separate disk be used to isolate its operations from other processes. Network: Kafka actively uses the network for data transmission. A stable connection with high bandwidth is recommended. Operating System: Apache Kafka generally runs on Unix-like systems such as Linux, but it does not restrict users from choosing other operating systems. Java: Since Kafka is written in Java, you will need the Java Development Kit (JDK) version 8 or higher. While Linux gives Kafka a key advantage in performance and scalability, the broker works well on both Windows and macOS. We'll discuss the pros and cons of each solution a bit later, but for now, let's proceed with the installation. Installing and Configuring Kafka on Windows The process of Kafka installation on Windows is straightforward but requires some care. Here's a step-by-step guide: Download and Install Java Development Kit (JDK): Apache Kafka runs on Java, so the first step is to install the development tools if they are not already installed. You can download the JDK from Oracle's official website. After installation, verify its functionality by entering the following command in the command prompt (cmd): java -version Download Apache Kafka: You can download Apache Kafka for Windows from the project's official website (look for Binary downloads). It is recommended that you choose the latest stable version (at the time of writing, this is 3.7.0). However, the installation process does not vary significantly between versions, so you can apply this guide to other versions as well). Unpacking: After downloading the archive, unpack it and move it to a convenient location. After unpacking the distribution, you will see various folders such as: bin: This folder contains executable files used to start and manage the distributed messaging system. The /windows subfolder contains special versions of files intended for use on Windows OS. config: This folder contains Kafka configuration files, including zookeeper.properties and server.properties, which can be edited for more precise setup. libs: This folder contains all the libraries needed to run Kafka. logs: This folder contains work logs, which can be useful for troubleshooting issues and finding dependencies between components. site-docs: This folder contains documentation for the Kafka version you installed, which can be helpful for beginners. LICENSE and NOTICE: These files contain the license agreement and legal notices. Basic Configuration of Data and Logging Directories: By default, log files and the data directory are saved in the /tmp folder, which can lead to performance, security, and data management issues. It is recommended to change the default paths to custom ones: Navigate to /config/server.properties and open the file in any text editor (e.g., VSCode). Find the log.dirs field (you can use the search function by pressing Ctrl+F). Change the default path /tmp/kafka-logs to a permanent path, e.g., c:/kafka/kafka-logs. Save the file and close it. Perform similar actions for the data directory: Navigate to /config/zookeeper.properties and open the file in any text editor. In the dataDir parameter, change the default path to a custom one. An example of a permanent path is shown in the screenshot below. The basic setup is now complete. This is enough to start the Zookeeper and Kafka servers and verify that the system is working. Starting Zookeeper and Kafka Servers: To start, navigate to the folder with the unpacked archive and open the command prompt. To start Zookeeper, use the following command: .\bin\windows\zookeeper-server-start.bat .\config\zookeeper.properties Note that our Zookeeper is running on port 2181, which is the default port for this service. If you encounter the error "The input line is too long. The syntax of the command is incorrect", move the Kafka folder closer to the disk's root. During the startup of zookeeper-server-start.bat, the CLASSPATH is called multiple times, leading to the overflow of the variable. The cmd.exe environment supports no more than 8191 characters. Open a new terminal window to start the Kafka server and use the following command: .\bin\windows\kafka-server-start.bat .\config\server.properties Verifying Functionality: To verify that everything is working, try creating a topic using the following command: .\bin\windows\kafka-topics.bat --create --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1 --topic TestTopic Note that port 2181 matches the open port for Zookeeper. To visualize, create another topic called NewTopic. Now check what topics exist with the following command: .\bin\windows\kafka-topics.bat --list --bootstrap-server localhost:9092 You can interact with the topic in a new command prompt by creating and reading several messages. To do this, enter the following command in a new window: .\bin\windows\kafka-console-producer.bat --broker-list localhost:9092 --topic TestTopic After the command starts, you can send any messages. To start receiving messages, enter the following command in a new console window: .\bin\windows\kafka-console-consumer.bat --bootstrap-server localhost:9092 --topic TestTopic --from-beginning As shown in the screenshot, we received messages on the TestTopic. This is a simple functionality that helps you understand and familiarize yourself with Kafka tools. However, things may go wrong during installation and testing. Here are some common issues that may arise: JDK Issues: Ensure you have installed the JDK, not just the JRE. Also, make sure the version is 8 or higher. Environment Variable Check: After installing Java, ensure that the JAVA_HOME variable is set correctly and that the path to the bin directory is included in the system path. Firewall and Antivirus Issues: Sometimes, third-party antivirus programs or firewalls can block Kafka. If you encounter connection issues, try disabling them temporarily. Ports: By default, Zookeeper listens on port 2181, and Kafka on 9092. Make sure these ports are free, or reassign the default ports for these services. Starting Zookeeper Before Kafka: Before starting Kafka, make sure Zookeeper is already running. If not, start Zookeeper. Improper Kafka Shutdown: If Kafka shuts down improperly, some data may still be left in the temporary folder. If you start encountering difficulties during startup, try clearing the temporary files. Installing and Configuring Kafka on Ubuntu The steps for installing Kafka on Ubuntu are quite similar to those for other Linux distributions. The main differences lie in each operating system's package managers and minor specifics. These steps also resemble the installation process for Windows, so you can refer to that section even if you're using Linux. Downloading and Installing the Java Development Kit (JDK) As mentioned earlier, Apache Kafka runs on Java, so the first step is to install the JDK. However, before doing so, it's recommended to update your package list and upgrade the package versions with the following commands: sudo apt updatesudo apt upgrade In Linux systems, the installation can be easily done via the terminal by entering the following commands: sudo apt install default-jresudo apt install default-jdk Downloading and Extracting Kafka You can download Apache Kafka from the official website. Select the latest stable version of the product. Use the wget utility from the console to download it: wget https://downloads.apache.org/kafka/3.7.0/kafka_2.13-3.7.0.tgz To extract the downloaded file, use this command: tar xzf kafka_2.13-3.7.0.tgz Note that the version of Kafka might be different when you read this, so the command, particularly the numbers in the link, might look different. After the above steps, you should have a folder with the product next to the archive. Navigate to this folder using: cd kafka_2.13-3.7.0 Verifying Functionality The remaining steps are similar to what was done for Windows, so it's recommended to read the instructions starting from the third point in that section. To start Zookeeper, enter the following command: bin/zookeeper-server-start.sh config/zookeeper.properties Then, in a new terminal window, start Kafka: bin/kafka-server-start.sh config/server.properties This covers the basic installation and configuration. You can configure various parameters for production environments such as multiple backups, network configuration, data partitioning, and more, but this process is more labor-intensive and complex. Common Issues with Kafka Installation on Ubuntu and other Linux distributions Permission Issues: Sometimes, permission problems arise in Linux when accessing certain files or directories. To bypass this, you can use sudo before commands that cause issues. However, be cautious as sudo gives full admin access, which might lead to security issues. Java Memory Errors: If you encounter Java memory problems while working with Kafka, try increasing the maximum memory allocated for the JVM using the -Xmx flag. You'll need to add this flag to the configuration file in /bin/kafka-server-start.sh. However, ensure that you leave enough memory for other processes on the system. Increasing the maximum JVM memory can slow down the system if JVM starts using all available resources. Version Management: Version issues can arise when working with Linux. Always check the version of Kafka and all related tools, such as Zookeeper, to ensure compatibility. Proper Shutdown of Kafka and Zookeeper: To shut down Kafka and Zookeeper on Linux, you can use the following commands: kafka-server-stop.shzookeeper-server-stop.sh It's recommended that these services are always stopped properly to avoid data loss. Logging Issues: Kafka generates a large number of logs. Ensure you have enough disk space and that log rotation is enabled. Port and File Limits: Make sure you have permission to open the necessary number of files or sockets. Linux has system limits that can be adjusted if needed. Installing and Configuring Kafka on MacOS Homebrew is a package manager that simplifies software installation on MacOS. Homebrew doesn't require admin rights to install software, making it convenient and reducing security risks. If you don't have Homebrew installed, you can install it by entering the following command in the terminal: /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" In this case, you will need Homebrew to install Kafka and its dependencies. Updating Homebrew If you already have Homebrew installed, it's a good idea to update it to the latest version with: brew update Installing the Java Development Kit (JDK) To install JDK, you can use the Homebrew we just installed. Enter the following command in the terminal: brew install openjdk Installing Kafka Install Kafka with the following command: brew install kafka Starting Kafka and Zookeeper First, start Zookeeper, then Kafka. Replace /usr/local/bin with the path to the executable files for Kafka and Zookeeper if they are located elsewhere: zookeeper-server-start /usr/local/etc/kafka/zookeeper.propertieskafka-server-start /usr/local/etc/kafka/server.properties For simplicity, we launched Zookeeper and Kafka in standalone mode on a local machine. To create a full-fledged distributed network on multiple machines, adjust the configuration files accordingly. Key parameters to modify include: Partitions: These allow parallel data processing. The number of partitions determines how many streams can process data simultaneously within a topic. Replicas: Copies of existing partitions ensure fault tolerance. The number of replicas determines how many copies of each partition will be stored in the cluster. Broker Information: A complete list of all servers that will participate in the cluster. Common Issues When Installing Kafka on MacOS JDK Version: Ensure that JDK version 8 or higher is installed. If not, you might encounter an error when trying to launch Kafka. Environment Variables: Kafka may not work if environment variables are incorrectly set or not set at all. For instance, you must set the KAFKA_HOME environment variable to the directory path. Other environment variables like JAVA_HOME might also be necessary for proper operation. File Paths and Permissions: Kafka might not find the necessary files or fail to start if it doesn't have read and write permissions for certain directories. You might need to change permissions or move some files. Homebrew Issues: Ensure Homebrew is correctly installed and updated to the latest version. Sometimes, installation via Homebrew can lead to version conflicts or dependency issues. Dependency Issues: The system requires Zookeeper to function. Always start Zookeeper before Kafka. Ports: Kafka and Zookeeper use specific ports (9092 and 2181, respectively) by default. If other applications use these ports, Kafka won't be able to start. Configuration: Errors in Kafka configuration files or incorrectly set parameters can cause issues when attempting to start Kafka. Installing and Configuring Kafka in Docker Docker is a platform for developing, delivering, and running applications in containers. Containers allow you to package an application with all its environment and dependencies into a single package that can be easily distributed and installed on any system. Installing Kafka in Docker is a great way to quickly and easily start working with the system. Here are some simple steps for installation: Install Docker Download Docker from the official website in a way that suits your OS. Run an Instance Use this command to start a Kafka instance: docker run -p 9092:9092 apache/kafka:3.7.0 Note that your Kafka version may differ from the one in the example. You can verify the functionality of Kafka in a manner similar to the Linux installation section. Choosing an OS for Deploying Kafka As we've established, Kafka can be installed on all major operating systems, as well as in Docker. Depending on the specific situation and needs, each option has its own advantages and disadvantages. If you're deciding which OS to use for deploying Kafka, here are the pros and cons of each system: Windows Pros: Ease of Use: Windows remains one of the most popular operating systems with extensive documentation and community support. Integration: It integrates very well with other Microsoft products and services. Cons: Windows is not always the best choice for deploying server applications; you will likely encounter compatibility and performance issues. While PowerShell and WSL (Windows Subsystem for Linux) can simplify operations, these systems may not always be optimal for working with Linux applications. Kafka and Zookeeper are usually tested and used on Unix-like systems, which might lead to more bugs and issues. macOS Pros: Simple Installation: Installation is straightforward with minimal difficulties. User-Friendly Tools: Convenient tools for installing and managing software. Unix-Based System: Makes it easier to work with most tools. Cons: Resource-Intensive: If your Mac lacks sufficient resources, it may slow down operations. Compatibility Issues: Possible compatibility issues between macOS versions and Kafka could lead to critical errors. Linux Pros: Open Source Support: Since Linux is open-source and supported by a large community, there are almost always ways to solve any problem. Efficient Resource Use: Linux consumes fewer system resources, making it more efficient for running Kafka. Preferred for Server Applications: Linux-based operating systems are often the preferred choice for server applications. Cons: Technical Skills Required: More technical skills are needed for setup and management compared to Windows and macOS. GUI Installation Challenges: There may be difficulties when installing and configuring a GUI. Docker Pros: Portability: Docker containers can run on any operating system, simplifying broker deployment in various environments. Isolation: Docker provides isolation between applications, meaning Kafka's operation won't affect other applications. Reproducibility: Docker allows you to create configurations that are easy to replicate, simplifying updates and deployments. Integration with Other Tools: Docker interacts well with popular solutions, simplifying Kafka container management and scaling. Cons: Complexity: Docker adds an extra layer of complexity to the broker installation. Data Management: The broker stores all messages on disk, and managing this in a containerized environment can be challenging. Performance: As with any containerized system, the broker's performance may be limited by the container's resources, requiring fine-tuning of Docker. Management: Managing and monitoring a broker in a container can be complex, especially in large systems. You may need automation tools like Kubernetes and Prometheus. Overall, Linux is the most common choice for working with Apache Kafka, especially for servers and workstations. However, the choice of operating system will depend directly on your preferences and requirements. Running Kafka in the Cloud We've covered the process of installing Kafka on different operating systems, but this process can be time-consuming due to potential errors. If you want to avoid the hassle of installation and configuration, consider our solution. Hostman offers a flexible and scalable cloud solution for launching a Kafka instance in just a few minutes. You don't need to install or configure any software; just select a region and configuration. Hostman ensures stability and performance for your Kafka project, thanks to professional support and high-performance infrastructure. This allows you to fully focus on developing and scaling your project without worrying about the technical side of things. Try Hostman today and discover the benefits of working with reliable and high-performance cloud hosting server. Conclusion In this guide, we have covered how to install Kafka on Ubuntu, Windows, and other operating systems. Apache Kafka is a robust, reliable, and scalable message broker that offers high throughput, fault tolerance, and low latency. Here are some reasons why Kafka is a great choice for a messaging environment: High Throughput: Apache Kafka can handle millions of messages per second, making it an excellent choice for applications that process large volumes of real-time data. Fault Tolerance: Kafka provides recovery from failures and ensures high data availability through its replication mechanisms. Scalability: Kafka can easily scale by adding more nodes to the cluster without disrupting the service. Long-Term Data Storage: Unlike most other message brokers, Kafka supports long-term data storage. You can configure the retention period in Kafka, and the data will be stored until it expires. Distributed System: Kafka is inherently a distributed system, meaning messages can be consumed in any order and across multiple channels. Integration with Multiple Systems: Kafka can be easily integrated with various systems, such as Hadoop, Spark, Storm, Flink, and many others. Fast Processing: Apache Kafka provides low latency, making it an excellent choice for applications requiring real-time data processing. Publish-Subscribe Topology: Kafka allows data sources to send messages to topics, and recipient applications to subscribe to topics of interest. All these advantages make Kafka one of the most popular and reliable message brokers on the market.
29 August 2024 · 18 min to read
Kafka

How to Install Apache Kafka on Ubuntu 22.04: A Step-by-Step Tutorial

Apache Kafka is a distributed streaming platform designed for building real-time data pipelines and applications. It provides a scalable, fault-tolerant infrastructure to handle streams of data across various applications. It excels in handling high-throughput, fault-tolerant, and publish-subscribe messaging, making it a popular choice for developers looking to implement real-time analytics and event-driven systems. This is a step-by-step guide to learn how to install Apache Kafka on Ubuntu 22.04. Prerequisites A cloud server with Ubuntu 22.04 installed A non-root user with sudo privileges At least 4GB of RAM. Step 1: Creating a user for Kafka The first step is to create a dedicated user to ensure that Kafka's operations do not interfere with the system's other functionalities. Add a new user called kafka: sudo adduser kafka Next, you need to add the kafka user to the sudo group to have the necessary privileges for Kafka installation. sudo adduser kafka sudo Then, log in to the kafka account: su -l kafka The kafka user now is ready to be used. Step 2: Installing Java Development Kit (JDK) Apache Kafka is written in Java and Scala, which means Java Runtime Environment (JRE) is required to run it. However, for a complete development setup that may involve custom Kafka clients or plugins, the full Java Development Kit (JDK) is recommended.  Installing Java Development Kit Open the terminal and update the package index: sudo apt update Install the OpenJDK 11 package: sudo apt install openjdk-11-jdk Now that you’ve installed the JDK, you can start downloading Kafka. Step 3: Downloading Kafka You can download the 3.4 Kafka version from here and extract it in a folder. Start by creating a folder named downloads to store the archive: mkdir ~/downloadscd ~/downloadswget https://archive.apache.org/dist/kafka/3.4.0/kafka_2.12-3.4.0.tgz Then, move to ~ and extract the archive you downloaded: cd ~tar -xvzf ~/downloads/kafka_2.12-3.4.0.tgz Let’s rename the directory kafka_2.12-3.4.0 to kafka. mv kafka_2.12-3.4.0/ kafka/ Now that you’ve downloaded Kafka, you can start configuring your Kafka server. Step 4: Configuring the Kafka server First, start by setting the log.dirs property to change the directory where the Kafka logs are. To do so, you need to edit the server.properties file: nano ~/kafka/config/server.properties Look for log.dirs and set the value to /home/kafka/kafka-logs. You can also change the value of num.partition to 3 so that when you create the topic you don’t specify the number of partitions, it will be 3 by default. Now that you’ve finished configuring your Kafka server, you can run the server. Step 5: Starting the Kafka server To start the Kafka server, you need to first start Zookeeper and then start Kafka. What is Zookeeper? Apache ZooKeeper manages coordination and configuration for distributed systems, such as Kafka. Kafka uses ZooKeeper to maintain the state between nodes in the Kafka cluster and to keep track of topics, partitions, and configurations.  In this release of Kafka, zookeeper comes with Kafka, so no need to install it.  To start Zookeeper & Kafka, there are 2 commands: ~/bin/zookeeper-server-start.sh  ~/kafka/config/zookeeper.properties~/kafka/bin/kafka-server-start.sh  ~/kafka/config/server.properties But, to be more efficient, you need to create systemd unit files and use systemctl instead. Unit File for Zookeeper: sudo nano /etc/systemd/system/zookeeper.service [Unit] Description=Apache Zookeeper Service Requires=network.target After=network.target [Service] Type=simple User=kafka ExecStart=/home/kafka/kafka/bin/zookeeper-server-start.sh /home/kafka/kafka/config/zookeeper.properties ExecStop=/home/kafka/kafka/bin/zookeeper-server-stop.sh Restart=on-abnormal [Install] WantedBy=multi-user.target  Unit File for Kafka: sudo nano /etc/systemd/system/kafka.service [Unit] Description=Apache Kafka Service that requires zookeeper service Requires=zookeeper.service After=zookeeper.service [Service] Type=simple User=kafka ExecStart= /home/kafka/kafka/bin/kafka-server-start.sh /home/kafka/kafka/config/server.properties ExecStop=/home/kafka/kafka/bin/kafka-server-stop.sh Restart=on-abnormal [Install] WantedBy=multi-user.target Then, you can start the Kafka server: sudo systemctl start kafka Check the status: sudo systemctl status kafka Step 6: Testing the Kafka server You can check if the Kafka server is up with netcat. By default, Kafka server runs on 9092: nc -vz localhost 9092 You can also check logs: cat ~/kafka/logs/server.log It looks like it’s all good.  If your server is running successfully, try to create a topic: ~/kafka/bin/kafka-topics.sh --bootstrap-server localhost:9092 --create --topic firstTopic  Let’s check the topics’ list: ~/kafka/bin/kafka-topics.sh --list --bootstrap-server localhost:9092 You can produce messages to the topic: ~/kafka/bin/kafka-console-producer.sh --bootstrap-server localhost:9092 --topic firstTopic You can then read the messages: ~/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic firstTopic --from-beginning Step 7: Setting up Kafka in production (optional) When transitioning from a development setup to a production environment, it's crucial to consider deploying Apache Kafka as a part of a cluster rather than as a single instance. A Kafka cluster ensures better reliability, scalability, and fault tolerance. Running a cluster involves multiple Kafka servers (brokers) and, typically, several ZooKeeper instances to manage the cluster's state. Here’s an overview of the process for establishing a robust multi-node Kafka environment. Overview of Setting Up a Multi-Node Kafka Cluster Infrastructure Preparation Nodes: Prepare multiple servers (physical or virtual) with Ubuntu 22.04 installed, and at least three brokers for production environments to ensure fault tolerance. Each server act as a Kafka broker. Networking: Ensure all nodes can communicate with each other. Consistent Software Installation Install Java on all brokers. Install Kafka on each node following the same steps used above, ensuring consistency across all installations. ZooKeeper Setup Cluster Configuration: Although a single ZooKeeper instance can manage a small Kafka cluster, a ZooKeeper ensemble (cluster) is recommended for production. Typically, this consists of an odd number of servers (at least three) to avoid split-brain scenarios and to ensure high availability and failover capabilities. Configure each ZooKeeper node with a unique identifier and set up the ensemble so that each Kafka node knows how to connect to the ZooKeeper cluster. Kafka Configuration Unique Broker ID: Each Kafka broker must be assigned a unique ID (change “broker.id” in server.properties). Network Configuration: Configure server properties to include listeners and advertised listeners for broker communication.  Replication Factor: Set the appropriate replication factor in Kafka settings to ensure that copies of each partition are stored on multiple brokers. This replication is key to Kafka’s fault tolerance. Starting the Services Start the ZooKeeper ensemble first, ensuring all nodes in the ensemble are up and communicating. Launch the Kafka brokers across all nodes. Check the logs to ensure that each broker has joined the cluster and is functioning correctly. Step 8: Installing CMAK (optional) CMAK (Cluster Manager for Apache Kafka, previously known as Kafka Manager) is a web-based management tool for Apache Kafka clusters. It provides a user-friendly interface for monitoring cluster health and performance, managing topics, and configuring multiple Kafka clusters.  CMAK will simplify complex administrative tasks, making it easier for users to maintain and optimize their Kafka environments. To install CMAK, you need to install sbt which is a build tool for Scala projects like CMAK. echo "deb https://repo.scala-sbt.org/scalasbt/debian all main" | sudo tee /etc/apt/sources.list.d/sbt.list echo "deb https://repo.scala-sbt.org/scalasbt/debian /" | sudo tee -a /etc/apt/sources.list.d/sbt.list curl -sL "https://keyserver.ubuntu.com/pks/lookup?op=get&search=0x2EE0EA64E40A89B84B2DF73499E82A75642AC823" | sudo apt-key add sudo apt update sudo apt install sbt Then clone the latest version of CMAK: git clone https://github.com/yahoo/CMAK.gitcd CMAK Use sbt to build CMAK. sbt clean dist This command compiles the application and packages it into a zip file under the target/universal/ directory. Install unzip to be able to extract the file: sudo apt install unzip Once the build process is complete, extract the generated ZIP file: cd target/universal/ unzip cmak-VERSION.zip mv cmak-VERSION cmak Change VERSION to the one that you have. Now, we need to set the host and port of zookeeper correctly. Open ~CMAK/target/universal/cmak/conf/application.conf and change zkhosts properties. And to be able to run cmak, we need to set JAVA_OPTS variable: export JAVA_OPTS="-Dconfig.file=/home/kafka/CMAK/target/universal/cmak/conf/application.conf -Dhttp.port=9000" Then, move to ~/CMAK/target/universal/cmak directory and start CMAK: ./bin/cmak Go to your browser, enter the address: yourhost:9000, and make sure you have the right firewall rules to access to it. Then, add your cluster by adding your zookeeper host. Click Add Cluster: Then add your host: Now, your CMAK is ready, you can manage your brokers, topics, partitions, and much more. To learn more please refer to the documentation.
22 May 2024 · 8 min to read

Do you have questions,
comments, or concerns?

Our professionals are available to assist you at any moment,
whether you need help or are just unsure of where to start.
Email us
Hostman's Support