Log In

How to Install Apache Kafka on Ubuntu 22.04: A Step-by-Step Tutorial

How to Install Apache Kafka on Ubuntu 22.04: A Step-by-Step Tutorial
22.05.2024
Reading time: 8 min
Hostman Team
Technical writer

Apache Kafka is a distributed streaming platform designed for building real-time data pipelines and applications. It provides a scalable, fault-tolerant infrastructure to handle streams of data across various applications. It excels in handling high-throughput, fault-tolerant, and publish-subscribe messaging, making it a popular choice for developers looking to implement real-time analytics and event-driven systems.

This is a step-by-step guide to learn how to install Apache Kafka on Ubuntu 22.04.

Prerequisites

  • A cloud server with Ubuntu 22.04 installed

  • A non-root user with sudo privileges

  • At least 4GB of RAM.

Step 1: Creating a user for Kafka 

The first step is to create a dedicated user to ensure that Kafka's operations do not interfere with the system's other functionalities.

Add a new user called kafka:

sudo adduser kafka

Image5

Next, you need to add the kafka user to the sudo group to have the necessary privileges for Kafka installation.

sudo adduser kafka sudo

Then, log in to the kafka account:

su -l kafka

The kafka user now is ready to be used.

Step2: Installing Java Development Kit (JDK) 

Apache Kafka is written in Java and Scala, which means Java Runtime Environment (JRE) is required to run it. However, for a complete development setup that may involve custom Kafka clients or plugins, the full Java Development Kit (JDK) is recommended. 

Installing Java Development Kit

Open the terminal and update the package index:

sudo apt update

Install the OpenJDK 11 package:

sudo apt install openjdk-11-jdk

Now that you’ve installed the JDK, you can start downloading Kafka.

Step3: Downloading Kafka 

You can download the 3.4 Kafka version from here and extract it in a folder.

Start by creating a folder named downloads to store the archive:

mkdir ~/downloads
cd ~/downloads
wget https://archive.apache.org/dist/kafka/3.4.0/kafka_2.12-3.4.0.tgz

Then, move to ~ and extract the archive you downloaded:

cd ~
tar -xvzf ~/downloads/kafka_2.12-3.4.0.tgz

Let’s rename the directory kafka_2.12-3.4.0 to kafka.

mv kafka_2.12-3.4.0/ kafka/

Now that you’ve downloaded Kafka, you can start configuring your Kafka server.

Step4: Configuring the Kafka server

First, start by setting the log.dirs property to change the directory where the Kafka logs are.

To do so, you need to edit the server.properties file:

nano ~/kafka/config/server.properties

Look for log.dirs and set the value to /home/kafka/kafka-logs.

You can also change the value of num.partition to 3 so that when you create the topic you don’t specify the number of partitions, it will be 3 by default.

Image6

Now that you’ve finished configuring your Kafka server, you can run the server.

Step5: Starting the Kafka server

To start the Kafka server, you need to first start Zookeeper and then start Kafka.

What is Zookeeper?

Apache ZooKeeper manages coordination and configuration for distributed systems, such as Kafka. Kafka uses ZooKeeper to maintain the state between nodes in the Kafka cluster and to keep track of topics, partitions, and configurations. 

In this release of Kafka, zookeeper comes with Kafka, so no need to install it. 

  • To start Zookeeper & Kafka, there are 2 commands:

~/bin/zookeeper-server-start.sh  ~/kafka/config/zookeeper.properties
~/kafka/bin/kafka-server-start.sh  ~/kafka/config/server.properties

But, to be more efficient, you need to create systemd unit files and use systemctl instead.

  • Unit File for Zookeeper:

sudo nano /etc/systemd/system/zookeeper.service
[Unit]
Description=Apache Zookeeper Service
Requires=network.target                 
After=network.target                 

[Service]
Type=simple
User=kafka
ExecStart=/home/kafka/kafka/bin/zookeeper-server-start.sh /home/kafka/kafka/config/zookeeper.properties        
ExecStop=/home/kafka/kafka/bin/zookeeper-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target
  •  Unit File for Kafka:

sudo nano /etc/systemd/system/kafka.service
[Unit]
Description=Apache Kafka Service that requires zookeeper service
Requires=zookeeper.service
After=zookeeper.service

[Service]
Type=simple
User=kafka
ExecStart= /home/kafka/kafka/bin/kafka-server-start.sh /home/kafka/kafka/config/server.properties                            
ExecStop=/home/kafka/kafka/bin/kafka-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

Then, you can start the Kafka server:

sudo systemctl start kafka

Check the status:

sudo systemctl status kafka

Image4

Step 6: Testing the Kafka server

You can check if the Kafka server is up with netcat. By default, Kafka server runs on 9092:

nc -vz localhost 9092

Image13

You can also check logs:

cat ~/kafka/logs/server.log

Image7

It looks like it’s all good. 

If your server is running successfully, try to create a topic:

~/kafka/bin/kafka-topics.sh --bootstrap-server localhost:9092 --create --topic firstTopic 

Let’s check the topics’ list:

~/kafka/bin/kafka-topics.sh --list --bootstrap-server localhost:9092

123

You can produce messages to the topic:

~/kafka/bin/kafka-console-producer.sh --bootstrap-server localhost:9092 --topic firstTopic

You can then read the messages:

~/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic firstTopic --from-beginning

Image12

Step7: Setting up Kafka in production (optional)

When transitioning from a development setup to a production environment, it's crucial to consider deploying Apache Kafka as a part of a cluster rather than as a single instance. A Kafka cluster ensures better reliability, scalability, and fault tolerance. Running a cluster involves multiple Kafka servers (brokers) and, typically, several ZooKeeper instances to manage the cluster's state.

Here’s an overview of the process for establishing a robust multi-node Kafka environment.

Overview of Setting Up a Multi-Node Kafka Cluster

  1. Infrastructure Preparation

    • Nodes: Prepare multiple servers (physical or virtual) with Ubuntu 22.04 installed, and at least three brokers for production environments to ensure fault tolerance. Each server act as a Kafka broker.

    • Networking: Ensure all nodes can communicate with each other.

  2. Consistent Software Installation

    • Install Java on all brokers.

    • Install Kafka on each node following the same steps used above, ensuring consistency across all installations.

  3. ZooKeeper Setup

    • Cluster Configuration: Although a single ZooKeeper instance can manage a small Kafka cluster, a ZooKeeper ensemble (cluster) is recommended for production. Typically, this consists of an odd number of servers (at least three) to avoid split-brain scenarios and to ensure high availability and failover capabilities.

    • Configure each ZooKeeper node with a unique identifier and set up the ensemble so that each Kafka node knows how to connect to the ZooKeeper cluster.

  4. Kafka Configuration

    • Unique Broker ID: Each Kafka broker must be assigned a unique ID (change “broker.id” in server.properties).

    • Network Configuration: Configure server properties to include listeners and advertised listeners for broker communication. 

    • Replication Factor: Set the appropriate replication factor in Kafka settings to ensure that copies of each partition are stored on multiple brokers. This replication is key to Kafka’s fault tolerance.

  5. Starting the Services

    • Start the ZooKeeper ensemble first, ensuring all nodes in the ensemble are up and communicating.

    • Launch the Kafka brokers across all nodes. Check the logs to ensure that each broker has joined the cluster and is functioning correctly.

Step8: Installing CMAK (optional)

CMAK (Cluster Manager for Apache Kafka, previously known as Kafka Manager) is a web-based management tool for Apache Kafka clusters. It provides a user-friendly interface for monitoring cluster health and performance, managing topics, and configuring multiple Kafka clusters. 

CMAK will simplify complex administrative tasks, making it easier for users to maintain and optimize their Kafka environments.

To install CMAK, you need to install sbt which is a build tool for Scala projects like CMAK.

echo "deb https://repo.scala-sbt.org/scalasbt/debian all main" | sudo tee /etc/apt/sources.list.d/sbt.list
echo "deb https://repo.scala-sbt.org/scalasbt/debian /" | sudo tee -a /etc/apt/sources.list.d/sbt.list
curl -sL "https://keyserver.ubuntu.com/pks/lookup?op=get&search=0x2EE0EA64E40A89B84B2DF73499E82A75642AC823" | sudo apt-key add
sudo apt update
sudo apt install sbt

Then clone the latest version of CMAK:

git clone https://github.com/yahoo/CMAK.git
cd CMAK

Use sbt to build CMAK.

sbt clean dist

This command compiles the application and packages it into a zip file under the target/universal/ directory.

Install unzip to be able to extract the file:

sudo apt install unzip

Once the build process is complete, extract the generated ZIP file:

cd target/universal/
unzip cmak-VERSION.zip
mv cmak-VERSION cmak

Change VERSION to the one that you have.

Now, we need to set the host and port of zookeeper correctly.

Open ~CMAK/target/universal/cmak/conf/application.conf and change zkhosts properties.

Image3

And to be able to run cmak, we need to set JAVA_OPTS variable:

export JAVA_OPTS="-Dconfig.file=/home/kafka/CMAK/target/universal/cmak/conf/application.conf -Dhttp.port=9000"

Then, move to ~/CMAK/target/universal/cmak directory and start CMAK:

./bin/cmak

3bee75ed 6459 4ab2 A764 26939707ef31

Go to your browser, enter the address: yourhost:9000, and make sure you have the right firewall rules to access to it.

Image8

Then, add your cluster by adding your zookeeper host. Click Add Cluster:

Image11

Then add your host:

Image9

Now, your CMAK is ready, you can manage your brokers, topics, partitions, and much more. To learn more please refer to the documentation.

Image10


Share