A message broker is a software that acts as an intermediary for sending messages between different applications. It functions like a server that receives messages from one application (called the producer) and routes them to one or more other applications (called consumers). The main purpose of a message broker is to ensure robust and reliable communication between different systems without requiring these systems to always be available or directly connected. This allows applications to work asynchronously, providing fault tolerance and the ability to operate in real time. A message broker can accept messages from multiple sources and efficiently route them to the appropriate receiver. Depending on the required business logic, messages can be grouped into topics or queues.
There are many different message brokers, each with its own features and advantages. In this article, we'll focus on Kafka.
Apache Kafka is a fast and scalable message broker capable of handling millions of messages per second. It is particularly valued for its fault tolerance and ability to store data for extended periods. Originally developed by LinkedIn, Kafka is now the most popular open-source solution in the message broker space and is licensed by the Apache Software Foundation. It is widely used to build real-time data pipelines and streaming applications. Moving and processing data streams between systems or applications is a critical task, and Kafka excels in helping users handle data streams in real-time with minimal latency. As a distributed system, Kafka is divided across multiple servers, which can store and process data streams in parallel. This distribution allows Kafka to provide real-time data processing for many different sources, ensuring reliability and resilience against system failures.
In this article, we will explore how to install and configure Kafka on Windows, Ubuntu, and macOS, so that you can take full advantage of it for your projects.
Apache Kafka is designed to maximize the efficiency of the hardware it runs on. However, there are some general recommendations to keep in mind when setting up a system to work with this broker:
Processor (CPU): Kafka usually doesn't require a lot of processing power since most operations are performed using direct disk access (zero-copy). However, the number of CPU cores can impact throughput.
Memory (RAM): Having at least 8GB of RAM is recommended, but the final amount will depend heavily on the data load and the number of parallel operations.
Disk Space: Kafka efficiently uses the file system and direct disk writing. It is preferable to use an SSD with high read/write speeds. It's also recommended that a separate disk be used to isolate its operations from other processes.
Network: Kafka actively uses the network for data transmission. A stable connection with high bandwidth is recommended.
Operating System: Apache Kafka generally runs on Unix-like systems such as Linux, but it does not restrict users from choosing other operating systems.
Java: Since Kafka is written in Java, you will need the Java Development Kit (JDK) version 8 or higher.
While Linux gives Kafka a key advantage in performance and scalability, the broker works well on both Windows and macOS. We'll discuss the pros and cons of each solution a bit later, but for now, let's proceed with the installation.
The process of Kafka installation on Windows is straightforward but requires some care. Here's a step-by-step guide:
Download and Install Java Development Kit (JDK): Apache Kafka runs on Java, so the first step is to install the development tools if they are not already installed. You can download the JDK from Oracle's official website. After installation, verify its functionality by entering the following command in the command prompt (cmd):
java -version
Download Apache Kafka: You can download Apache Kafka for Windows from the project's official website (look for Binary downloads). It is recommended that you choose the latest stable version (at the time of writing, this is 3.7.0). However, the installation process does not vary significantly between versions, so you can apply this guide to other versions as well).
Unpacking: After downloading the archive, unpack it and move it to a convenient location. After unpacking the distribution, you will see various folders such as:
: This folder contains executable files used to start and manage the distributed messaging system. The /windows
subfolder contains special versions of files intended for use on Windows OS.
: This folder contains Kafka configuration files, including zookeeper.properties
and server.properties
, which can be edited for more precise setup.
: This folder contains all the libraries needed to run Kafka.
: This folder contains work logs, which can be useful for troubleshooting issues and finding dependencies between components.
: This folder contains documentation for the Kafka version you installed, which can be helpful for beginners.
: These files contain the license agreement and legal notices.
Basic Configuration of Data and Logging Directories: By default, log files and the data directory are saved in the /tmp
folder, which can lead to performance, security, and data management issues. It is recommended to change the default paths to custom ones:
Navigate to /config/server.properties
and open the file in any text editor (e.g., VSCode).
Find the log.dirs
field (you can use the search function by pressing Ctrl+F).
Change the default path /tmp/kafka-logs
to a permanent path, e.g., c:/kafka/kafka-logs
. Save the file and close it.
Perform similar actions for the data
Navigate to /config/zookeeper.properties
and open the file in any text editor.
In the dataDir
parameter, change the default path to a custom one. An example of a permanent path is shown in the screenshot below.
The basic setup is now complete. This is enough to start the Zookeeper and Kafka servers and verify that the system is working.
Starting Zookeeper and Kafka Servers: To start, navigate to the folder with the unpacked archive and open the command prompt. To start Zookeeper, use the following command:
.\bin\windows\zookeeper-server-start.bat .\config\zookeeper.properties
Note that our Zookeeper is running on port 2181, which is the default port for this service.
If you encounter the error "The input line is too long. The syntax of the command is incorrect", move the Kafka folder closer to the disk's root. During the startup of zookeeper-server-start.bat
is called multiple times, leading to the overflow of the variable. The cmd.exe
environment supports no more than 8191 characters.
Open a new terminal window to start the Kafka server and use the following command:
.\bin\windows\kafka-server-start.bat .\config\server.properties
Verifying Functionality: To verify that everything is working, try creating a topic using the following command:
.\bin\windows\kafka-topics.bat --create --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1 --topic TestTopic
Note that port 2181 matches the open port for Zookeeper.
To visualize, create another topic called NewTopic
. Now check what topics exist with the following command:
.\bin\windows\kafka-topics.bat --list --bootstrap-server localhost:9092
You can interact with the topic in a new command prompt by creating and reading several messages. To do this, enter the following command in a new window:
.\bin\windows\kafka-console-producer.bat --broker-list localhost:9092 --topic TestTopic
After the command starts, you can send any messages.
To start receiving messages, enter the following command in a new console window:
.\bin\windows\kafka-console-consumer.bat --bootstrap-server localhost:9092 --topic TestTopic --from-beginning
As shown in the screenshot, we received messages on the TestTopic
This is a simple functionality that helps you understand and familiarize yourself with Kafka tools. However, things may go wrong during installation and testing. Here are some common issues that may arise:
JDK Issues: Ensure you have installed the JDK, not just the JRE. Also, make sure the version is 8 or higher.
Environment Variable Check: After installing Java, ensure that the JAVA_HOME
variable is set correctly and that the path to the bin directory is included in the system path.
Firewall and Antivirus Issues: Sometimes, third-party antivirus programs or firewalls can block Kafka. If you encounter connection issues, try disabling them temporarily.
Ports: By default, Zookeeper listens on port 2181, and Kafka on 9092. Make sure these ports are free, or reassign the default ports for these services.
Starting Zookeeper Before Kafka: Before starting Kafka, make sure Zookeeper is already running. If not, start Zookeeper.
Improper Kafka Shutdown: If Kafka shuts down improperly, some data may still be left in the temporary folder. If you start encountering difficulties during startup, try clearing the temporary files.
The steps for installing Kafka on Ubuntu are quite similar to those for other Linux distributions. The main differences lie in each operating system's package managers and minor specifics. These steps also resemble the installation process for Windows, so you can refer to that section even if you're using Linux.
As mentioned earlier, Apache Kafka runs on Java, so the first step is to install the JDK. However, before doing so, it's recommended to update your package list and upgrade the package versions with the following commands:
sudo apt update
sudo apt upgrade
In Linux systems, the installation can be easily done via the terminal by entering the following commands:
sudo apt install default-jre
sudo apt install default-jdk
You can download Apache Kafka from the official website. Select the latest stable version of the product. Use the wget utility from the console to download it:
wget https://downloads.apache.org/kafka/3.7.0/kafka_2.13-3.7.0.tgz
To extract the downloaded file, use this command:
tar xzf kafka_2.13-3.7.0.tgz
Note that the version of Kafka might be different when you read this, so the command, particularly the numbers in the link, might look different. After the above steps, you should have a folder with the product next to the archive. Navigate to this folder using:
cd kafka_2.13-3.7.0
The remaining steps are similar to what was done for Windows, so it's recommended to read the instructions starting from the third point in that section. To start Zookeeper, enter the following command:
bin/zookeeper-server-start.sh config/zookeeper.properties
Then, in a new terminal window, start Kafka:
bin/kafka-server-start.sh config/server.properties
This covers the basic installation and configuration. You can configure various parameters for production environments such as multiple backups, network configuration, data partitioning, and more, but this process is more labor-intensive and complex.
Permission Issues: Sometimes, permission problems arise in Linux when accessing certain files or directories. To bypass this, you can use sudo
before commands that cause issues. However, be cautious as sudo
gives full admin access, which might lead to security issues.
Java Memory Errors: If you encounter Java memory problems while working with Kafka, try increasing the maximum memory allocated for the JVM using the -Xmx
flag. You'll need to add this flag to the configuration file in /bin/kafka-server-start.sh
. However, ensure that you leave enough memory for other processes on the system. Increasing the maximum JVM memory can slow down the system if JVM starts using all available resources.
Version Management: Version issues can arise when working with Linux. Always check the version of Kafka and all related tools, such as Zookeeper, to ensure compatibility.
Proper Shutdown of Kafka and Zookeeper: To shut down Kafka and Zookeeper on Linux, you can use the following commands:
It's recommended that these services are always stopped properly to avoid data loss.
Logging Issues: Kafka generates a large number of logs. Ensure you have enough disk space and that log rotation is enabled.
Port and File Limits: Make sure you have permission to open the necessary number of files or sockets. Linux has system limits that can be adjusted if needed.
Homebrew is a package manager that simplifies software installation on MacOS. Homebrew doesn't require admin rights to install software, making it convenient and reducing security risks. If you don't have Homebrew installed, you can install it by entering the following command in the terminal:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
In this case, you will need Homebrew to install Kafka and its dependencies.
If you already have Homebrew installed, it's a good idea to update it to the latest version with:
brew update
To install JDK, you can use the Homebrew we just installed. Enter the following command in the terminal:
brew install openjdk
Install Kafka with the following command:
brew install kafka
First, start Zookeeper, then Kafka. Replace /usr/local/bin
with the path to the executable files for Kafka and Zookeeper if they are located elsewhere:
zookeeper-server-start /usr/local/etc/kafka/zookeeper.properties
kafka-server-start /usr/local/etc/kafka/server.properties
For simplicity, we launched Zookeeper and Kafka in standalone mode on a local machine. To create a full-fledged distributed network on multiple machines, adjust the configuration files accordingly. Key parameters to modify include:
Partitions: These allow parallel data processing. The number of partitions determines how many streams can process data simultaneously within a topic.
Replicas: Copies of existing partitions ensure fault tolerance. The number of replicas determines how many copies of each partition will be stored in the cluster.
Broker Information: A complete list of all servers that will participate in the cluster.
JDK Version: Ensure that JDK version 8 or higher is installed. If not, you might encounter an error when trying to launch Kafka.
Environment Variables: Kafka may not work if environment variables are incorrectly set or not set at all. For instance, you must set the KAFKA_HOME
environment variable to the directory path. Other environment variables like JAVA_HOME
might also be necessary for proper operation.
File Paths and Permissions: Kafka might not find the necessary files or fail to start if it doesn't have read and write permissions for certain directories. You might need to change permissions or move some files.
Homebrew Issues: Ensure Homebrew is correctly installed and updated to the latest version. Sometimes, installation via Homebrew can lead to version conflicts or dependency issues.
Dependency Issues: The system requires Zookeeper to function. Always start Zookeeper before Kafka.
Ports: Kafka and Zookeeper use specific ports (9092 and 2181, respectively) by default. If other applications use these ports, Kafka won't be able to start.
Configuration: Errors in Kafka configuration files or incorrectly set parameters can cause issues when attempting to start Kafka.
Docker is a platform for developing, delivering, and running applications in containers. Containers allow you to package an application with all its environment and dependencies into a single package that can be easily distributed and installed on any system. Installing Kafka in Docker is a great way to quickly and easily start working with the system. Here are some simple steps for installation:
Download Docker from the official website in a way that suits your OS.
Use this command to start a Kafka instance:
docker run -p 9092:9092 apache/kafka:3.7.0
Note that your Kafka version may differ from the one in the example.
You can verify the functionality of Kafka in a manner similar to the Linux installation section.
As we've established, Kafka can be installed on all major operating systems, as well as in Docker. Depending on the specific situation and needs, each option has its own advantages and disadvantages. If you're deciding which OS to use for deploying Kafka, here are the pros and cons of each system:
Ease of Use: Windows remains one of the most popular operating systems with extensive documentation and community support.
Integration: It integrates very well with other Microsoft products and services.
Windows is not always the best choice for deploying server applications; you will likely encounter compatibility and performance issues.
While PowerShell and WSL (Windows Subsystem for Linux) can simplify operations, these systems may not always be optimal for working with Linux applications.
Kafka and Zookeeper are usually tested and used on Unix-like systems, which might lead to more bugs and issues.
Simple Installation: Installation is straightforward with minimal difficulties.
User-Friendly Tools: Convenient tools for installing and managing software.
Unix-Based System: Makes it easier to work with most tools.
Resource-Intensive: If your Mac lacks sufficient resources, it may slow down operations.
Compatibility Issues: Possible compatibility issues between macOS versions and Kafka could lead to critical errors.
Open Source Support: Since Linux is open-source and supported by a large community, there are almost always ways to solve any problem.
Efficient Resource Use: Linux consumes fewer system resources, making it more efficient for running Kafka.
Preferred for Server Applications: Linux-based operating systems are often the preferred choice for server applications.
Technical Skills Required: More technical skills are needed for setup and management compared to Windows and macOS.
GUI Installation Challenges: There may be difficulties when installing and configuring a GUI.
Portability: Docker containers can run on any operating system, simplifying broker deployment in various environments.
Isolation: Docker provides isolation between applications, meaning Kafka's operation won't affect other applications.
Reproducibility: Docker allows you to create configurations that are easy to replicate, simplifying updates and deployments.
Integration with Other Tools: Docker interacts well with popular solutions, simplifying Kafka container management and scaling.
Complexity: Docker adds an extra layer of complexity to the broker installation.
Data Management: The broker stores all messages on disk, and managing this in a containerized environment can be challenging.
Performance: As with any containerized system, the broker's performance may be limited by the container's resources, requiring fine-tuning of Docker.
Management: Managing and monitoring a broker in a container can be complex, especially in large systems. You may need automation tools like Kubernetes and Prometheus.
Overall, Linux is the most common choice for working with Apache Kafka, especially for servers and workstations. However, the choice of operating system will depend directly on your preferences and requirements.
We've covered the process of installing Kafka on different operating systems, but this process can be time-consuming due to potential errors. If you want to avoid the hassle of installation and configuration, consider our solution.
Hostman offers a flexible and scalable cloud solution for launching a Kafka instance in just a few minutes. You don't need to install or configure any software; just select a region and configuration.
Hostman ensures stability and performance for your Kafka project, thanks to professional support and high-performance infrastructure. This allows you to fully focus on developing and scaling your project without worrying about the technical side of things.
Try Hostman today and discover the benefits of working with reliable and high-performance cloud hosting.
In this guide, we have covered how to install Kafka on Ubuntu, Windows, and other operating systems.
Apache Kafka is a robust, reliable, and scalable message broker that offers high throughput, fault tolerance, and low latency. Here are some reasons why Kafka is a great choice for a messaging environment:
High Throughput: Apache Kafka can handle millions of messages per second, making it an excellent choice for applications that process large volumes of real-time data.
Fault Tolerance: Kafka provides recovery from failures and ensures high data availability through its replication mechanisms.
Scalability: Kafka can easily scale by adding more nodes to the cluster without disrupting the service.
Long-Term Data Storage: Unlike most other message brokers, Kafka supports long-term data storage. You can configure the retention period in Kafka, and the data will be stored until it expires.
Distributed System: Kafka is inherently a distributed system, meaning messages can be consumed in any order and across multiple channels.
Integration with Multiple Systems: Kafka can be easily integrated with various systems, such as Hadoop, Spark, Storm, Flink, and many others.
Fast Processing: Apache Kafka provides low latency, making it an excellent choice for applications requiring real-time data processing.
Publish-Subscribe Topology: Kafka allows data sources to send messages to topics, and recipient applications to subscribe to topics of interest.
All these advantages make Kafka one of the most popular and reliable message brokers on the market.