Apache Kafka is a distributed streaming platform designed for building real-time data pipelines and applications. It provides a scalable, fault-tolerant infrastructure to handle streams of data across various applications. It excels in handling high-throughput, fault-tolerant, and publish-subscribe messaging, making it a popular choice for developers looking to implement real-time analytics and event-driven systems.
This is a step-by-step guide to learn how to install Apache Kafka on Ubuntu 22.04.
A cloud server with Ubuntu 22.04 installed
A non-root user with sudo
privileges
At least 4GB of RAM.
The first step is to create a dedicated user to ensure that Kafka's operations do not interfere with the system's other functionalities.
Add a new user called kafka
:
sudo adduser kafka
Next, you need to add the kafka
user to the sudo
group to have the necessary privileges for Kafka installation.
sudo adduser kafka sudo
Then, log in to the kafka
account:
su -l kafka
The kafka
user now is ready to be used.
Apache Kafka is written in Java and Scala, which means Java Runtime Environment (JRE) is required to run it. However, for a complete development setup that may involve custom Kafka clients or plugins, the full Java Development Kit (JDK) is recommended.
Open the terminal and update the package index:
sudo apt update
Install the OpenJDK 11 package:
sudo apt install openjdk-11-jdk
Now that you’ve installed the JDK, you can start downloading Kafka.
You can download the 3.4 Kafka version from here and extract it in a folder.
Start by creating a folder named downloads
to store the archive:
mkdir ~/downloads
cd ~/downloads
wget https://archive.apache.org/dist/kafka/3.4.0/kafka_2.12-3.4.0.tgz
Then, move to ~
and extract the archive you downloaded:
cd ~
tar -xvzf ~/downloads/kafka_2.12-3.4.0.tgz
Let’s rename the directory kafka_2.12-3.4.0
to kafka
.
mv kafka_2.12-3.4.0/ kafka/
Now that you’ve downloaded Kafka, you can start configuring your Kafka server.
First, start by setting the log.dirs
property to change the directory where the Kafka logs are.
To do so, you need to edit the server.properties
file:
nano ~/kafka/config/server.properties
Look for log.dirs
and set the value to /home/kafka/kafka-logs
.
You can also change the value of num.partition
to 3 so that when you create the topic you don’t specify the number of partitions, it will be 3 by default.
Now that you’ve finished configuring your Kafka server, you can run the server.
To start the Kafka server, you need to first start Zookeeper and then start Kafka.
Apache ZooKeeper manages coordination and configuration for distributed systems, such as Kafka. Kafka uses ZooKeeper to maintain the state between nodes in the Kafka cluster and to keep track of topics, partitions, and configurations.
In this release of Kafka, zookeeper
comes with Kafka, so no need to install it.
To start Zookeeper & Kafka, there are 2 commands:
~/bin/zookeeper-server-start.sh ~/kafka/config/zookeeper.properties
~/kafka/bin/kafka-server-start.sh ~/kafka/config/server.properties
But, to be more efficient, you need to create systemd
unit files and use systemctl instead.
Unit File for Zookeeper:
sudo nano /etc/systemd/system/zookeeper.service
[Unit]
Description=Apache Zookeeper Service
Requires=network.target
After=network.target
[Service]
Type=simple
User=kafka
ExecStart=/home/kafka/kafka/bin/zookeeper-server-start.sh /home/kafka/kafka/config/zookeeper.properties
ExecStop=/home/kafka/kafka/bin/zookeeper-server-stop.sh
Restart=on-abnormal
[Install]
WantedBy=multi-user.target
Unit File for Kafka:
sudo nano /etc/systemd/system/kafka.service
[Unit]
Description=Apache Kafka Service that requires zookeeper service
Requires=zookeeper.service
After=zookeeper.service
[Service]
Type=simple
User=kafka
ExecStart= /home/kafka/kafka/bin/kafka-server-start.sh /home/kafka/kafka/config/server.properties
ExecStop=/home/kafka/kafka/bin/kafka-server-stop.sh
Restart=on-abnormal
[Install]
WantedBy=multi-user.target
Then, you can start the Kafka server:
sudo systemctl start kafka
Check the status:
sudo systemctl status kafka
You can check if the Kafka server is up with netcat
. By default, Kafka server runs on 9092:
nc -vz localhost 9092
You can also check logs:
cat ~/kafka/logs/server.log
It looks like it’s all good.
If your server is running successfully, try to create a topic:
~/kafka/bin/kafka-topics.sh --bootstrap-server localhost:9092 --create --topic firstTopic
Let’s check the topics’ list:
~/kafka/bin/kafka-topics.sh --list --bootstrap-server localhost:9092
You can produce messages to the topic:
~/kafka/bin/kafka-console-producer.sh --bootstrap-server localhost:9092 --topic firstTopic
You can then read the messages:
~/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic firstTopic --from-beginning
When transitioning from a development setup to a production environment, it's crucial to consider deploying Apache Kafka as a part of a cluster rather than as a single instance. A Kafka cluster ensures better reliability, scalability, and fault tolerance. Running a cluster involves multiple Kafka servers (brokers) and, typically, several ZooKeeper instances to manage the cluster's state.
Here’s an overview of the process for establishing a robust multi-node Kafka environment.
Infrastructure Preparation
Nodes: Prepare multiple servers (physical or virtual) with Ubuntu 22.04 installed, and at least three brokers for production environments to ensure fault tolerance. Each server act as a Kafka broker.
Networking: Ensure all nodes can communicate with each other.
Consistent Software Installation
Install Java on all brokers.
Install Kafka on each node following the same steps used above, ensuring consistency across all installations.
ZooKeeper Setup
Cluster Configuration: Although a single ZooKeeper instance can manage a small Kafka cluster, a ZooKeeper ensemble (cluster) is recommended for production. Typically, this consists of an odd number of servers (at least three) to avoid split-brain scenarios and to ensure high availability and failover capabilities.
Configure each ZooKeeper node with a unique identifier and set up the ensemble so that each Kafka node knows how to connect to the ZooKeeper cluster.
Kafka Configuration
Unique Broker ID: Each Kafka broker must be assigned a unique ID (change “broker.id” in server.properties).
Network Configuration: Configure server properties to include listeners and advertised listeners for broker communication.
Replication Factor: Set the appropriate replication factor in Kafka settings to ensure that copies of each partition are stored on multiple brokers. This replication is key to Kafka’s fault tolerance.
Starting the Services
Start the ZooKeeper ensemble first, ensuring all nodes in the ensemble are up and communicating.
Launch the Kafka brokers across all nodes. Check the logs to ensure that each broker has joined the cluster and is functioning correctly.
CMAK (Cluster Manager for Apache Kafka, previously known as Kafka Manager) is a web-based management tool for Apache Kafka clusters. It provides a user-friendly interface for monitoring cluster health and performance, managing topics, and configuring multiple Kafka clusters.
CMAK will simplify complex administrative tasks, making it easier for users to maintain and optimize their Kafka environments.
To install CMAK, you need to install sbt
which is a build tool for Scala projects like CMAK.
echo "deb https://repo.scala-sbt.org/scalasbt/debian all main" | sudo tee /etc/apt/sources.list.d/sbt.list
echo "deb https://repo.scala-sbt.org/scalasbt/debian /" | sudo tee -a /etc/apt/sources.list.d/sbt.list
curl -sL "https://keyserver.ubuntu.com/pks/lookup?op=get&search=0x2EE0EA64E40A89B84B2DF73499E82A75642AC823" | sudo apt-key add
sudo apt update
sudo apt install sbt
Then clone the latest version of CMAK:
git clone https://github.com/yahoo/CMAK.git
cd CMAK
Use sbt
to build CMAK.
sbt clean dist
This command compiles the application and packages it into a zip file under the target/universal/
directory.
Install unzip
to be able to extract the file:
sudo apt install unzip
Once the build process is complete, extract the generated ZIP file:
cd target/universal/
unzip cmak-VERSION.zip
mv cmak-VERSION cmak
Change VERSION
to the one that you have.
Now, we need to set the host and port of zookeeper correctly.
Open ~CMAK/target/universal/cmak/conf/application.conf
and change zkhosts
properties.
And to be able to run cmak
, we need to set JAVA_OPTS
variable:
export JAVA_OPTS="-Dconfig.file=/home/kafka/CMAK/target/universal/cmak/conf/application.conf -Dhttp.port=9000"
Then, move to ~/CMAK/target/universal/cmak
directory and start CMAK
:
./bin/cmak
Go to your browser, enter the address: yourhost:9000
, and make sure you have the right firewall rules to access to it.
Then, add your cluster by adding your zookeeper host. Click Add Cluster:
Then add your host:
Now, your CMAK is ready, you can manage your brokers, topics, partitions, and much more. To learn more please refer to the documentation.