Detailed explanation of the installation steps of kafka and storm cluster environments

Author：Eve Cole Update Time：2025-07-04 04:00:03

Preface

Before starting, it should be noted that there is no necessary connection between storm and kafka cluster installation. I wrote these two together because they are both managed by zookeeper and rely on the JDK environment. In order not to repeat the configuration, I wrote these two together. If you only need one, just choose the reading you choose. I won’t say much below, let’s take a look at the detailed introduction together.

The dependencies of these two are as follows:

Storm cluster: JDK1.8, Zookeeper3.4, Storm1.1.1;
Kafa cluster: JDK1.8, Zookeeper3.4, Kafka2.12;

Note: Storm1.0 and Kafka2.0 require JDK at least 1.7 and Zookeeper3.0 or above.

Download address:

Zookeeper: https://zookeeper.apache.org/releases.html (local download)
Storm: http://storm.apache.org/downloads.html (local download)
Kafka: http://kafka.apache.org/downloads (local download)

JDK installation

Every machine needs to be installed with JDK! ! !

Note: Generally, CentOS comes with openjdk, but we are using oracle's JDK. So you need to write uninstall openjdk, and then install the JDK downloaded in oracle. If you have uninstalled, you can skip this step.

First enter java -version

Check whether JDK is installed. If it is installed but the version is not suitable, uninstall it.

enter

 rpm -qa | grep java

View information

Then enter:

 rpm -e --nodeps "You want to uninstall JDK information"

For example: rpm -e --nodeps java-1.7.0-openjdk-1.7.0.99-2.6.5.1.el6.x86_64

After confirming that it is gone, unzip the downloaded JDK

 tar -xvf jdk-8u144-linux-x64.tar.gz

Move to the opt/java folder, create new ones without them, and rename the folder to jdk1.8.

 mv jdk1.8.0_144 /opt/javamv jdk1.8.0_144 jdk1.8

Then edit the profile file and add the following configuration

enter:

 vim /etc/profile

Add to:

 export JAVA_HOME=/opt/java/jdk1.8export JRE_HOME=/opt/java/jdk1.8/jreexport CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/libexport PATH=.:${JAVA_HOME}/bin:$PATH

After successful addition, enter

 source /etc/profilejava -version

Check whether the configuration is successful

Zookeeper environment installation

1. Document preparation

Unzip the downloaded Zookeeper configuration file

Enter on linux:

 tar -xvf zookeeper-3.4.10.tar.gz

Then move to /opt/zookeeper, create a new one without it, and then rename the folder to zookeeper3.4

enter

 mv zookeeper-3.4.10 /opt/zookeepermv zookeeper-3.4.10 zookeeper3.4

2. Environment configuration

Edit /etc/profile file

enter:

 export ZK_HOME=/opt/zookeeper/zookeeper3.4 export PATH=.:${JAVA_HOME}/bin:${ZK_HOME}/bin:$PATH

enter:

 source /etc/profile

Make the configuration effective

3. Modify the configuration file

3.3.1 Create files and directories

Create these directories on the servers in the cluster

 mkdir /opt/zookeeper/data mkdir /opt/zookeeper/dataLog

And create myid file in /opt/zookeeper/data directory

enter:

 touch myid

After successful creation, change the myid file.

For convenience, I changed the content of myid files of master, slave1, and slave2 to 1, 2, 3

3.3.2 Create new zoo.cfg

Switch to /opt/zookeeper/zookeeper3.4/conf directory

If there is no zoo.cfg file, copy the zoo_sample.cfg file and rename it to zoo.cfg.

Modify this newly created zoo.cfg file

 dataDir=/opt/zookeeper/datadataLogDir=/opt/zookeeper/dataLogserver.1=master:2888:3888server.2=slave1:2888:3888server.3=slave2:2888:3888

Description: client port, as the name implies, is the port where the client connects to the zookeeper service. This is a TCP port. The dataLogDir is the order log (WAL) placed. DataDir puts snapshot of memory data structures, which facilitates quick recovery. In order to maximize performance, it is generally recommended to divide dataDir and dataLogDir onto different disks, so that the disk order writing characteristics can be fully utilized. DataDir and dataLogDir need to be created by themselves, and the directories can be formulated by themselves, and they can be corresponding. This 1 in server.1 needs to correspond to the value in the myid file in the dataDir directory on the master machine. This 2 in server.2 needs to correspond to the value in the myid file in the dataDir directory on slave1. This 3 in server.3 needs to correspond to the values in the myid file in the dataDir directory on slave2. Of course, you can use the values as you like, just as they correspond. The port numbers of 2888 and 3888 can also be used casually, because it doesn't matter if you use the same on different machines.

1.tickTime: CS communication heartbeat number

The time interval between Zookeeper servers or between clients and servers maintains heartbeats, that is, each tickTime time will send a heartbeat. tickTime is in milliseconds.

tickTime=2000

2.initLimit: LF initial communication time limit

The maximum number of heartbeats (number of tickTimes) that can tolerate during initial connection between the follower server (F) and the leader server (L) in the cluster.

initLimit=10

3.syncLimit: LF synchronous communication time limit

The maximum number of heartbeats (number of tickTimes) that can be tolerated between requests and responses between follower servers and leader servers in the cluster.

syncLimit=5

Still transfer zookeeper to other machines. Remember to change the myid under /opt/zookeeper/data, this cannot be consistent.

enter:

 scp -r /opt/zookeeper root@slave1:/optscp -r /opt/zookeeper root@slave2:/opt

4. Start zookeeper

Because zookeeper is an electoral system, its master-slave relationship is not specified like hadoop. For details, you can refer to the official documentation.

After successfully configuring zookeeper, start zookeeper on each machine.

Switch to the zookeeper directory

 cd /opt/zookeeper/zookeeper3.4/bin

enter:

 zkServer.sh start

After successful startup

View status input:

 zkServer.sh status

You can view the leader and follower of zookeeper on each machine

Storm environment installation

1. Document preparation

Decompress the downloaded storm configuration file

Enter on linux:

 tar -xvf apache-storm-1.1.1.tar.gz

Then move it to /opt/storm, create it if it doesn't exist, and then rename the folder to storm1.1

enter

 mv apache-storm-1.1.1 /opt/storm mv apache-storm-1.1.1 storm1.1

Edit /etc/profile file

Add to:

 export STORM_HOME=/opt/storm/storm1.1export PATH=.:${JAVA_HOME}/bin:${ZK_HOME}/bin:${STORM_HOME}/bin:$PATH

Enter storm version to view version information

3. Modify the configuration file

Edit storm.yarm of storm/conf.

Make the following edits:

enter:

vim storm.yarm

 slave.zookeeper.servers: - "master" - "slave1" - "slave2"storm.local.dir: "/root/storm"nimbus.seeds: ["master"]supervisor.slots.ports: - 6700 - 6701 - 6702 - 6703

illustrate:

1. storm.zookeeper.servers is the service address of the specified zookeeper.

Because the storage information of storm is on zookeeper, the service address of zookeeper must be configured. If zookeeper is a stand-alone machine, you only need to specify one!

2. storm.local.dir represents the storage directory.

The Nimbus and Supervisor daemons need to store a directory on the local disk to store a small amount of state (such as jars, confs, etc.). Can be created on each machine and given permissions.

3. nimbus.seeds represents the candidate host.

Worker needs to know which machine is the host candidate (zookeeper cluster is electoral), so that topology jars and confs can be downloaded.

4. Supervisor.slots.ports represents the worker port.

For each supervisor machine, we can use this to configure how many workers to run on this machine. Each worker uses a separate port to receive messages, which also defines which ports are open to use. If you define 5 ports here, it means that up to 5 workers can be run on this supervisor node. If 3 ports are defined, it means that up to 3 workers can be run. By default (i.e., configured in defaults.yaml), there will be four workers running on ports 6700, 6701, 6702, and 6703.

Supervisor does not start these four workers immediately upon startup. Instead, it will only start when the assigned task is accepted. The specific number of workers to start must be determined based on how many workers we need in this supervisor in Topology. If the specified Topology will be executed by only one worker, then the supervisor starts one worker and will not start all.

Note: There are no spaces in front of these configurations! ! ! , otherwise an error will be reported. The host name (mapping) is used here, and the IP can also be used. The actual situation is based on your own.

You can use the scp command or ftp software to copy storm to other machines.

After successful configuration, you can start Storm, but make sure that JDK and Zookeeper have been installed correctly and that Zookeeper has been successfully started.

4. Start Storm

Switch to the storm/bin directory

Start input on the master node:

 storm nimbus >/dev/null 2>&1 &

Access the web interface (master) input:

 storm ui

Enter from node (slave1, slave2):

 storm supervisor >/dev/null 2>&1 &

Enter in the browser interface: Port 8080

The interface is opened successfully, indicating that the environment configuration is successful:

kafka's environment installation

Kafka is a high-throughput streaming distributed message system used to process active streaming data, such as web page visits pm, logs, etc., which can not only process big data information in real time but also offline.

1. Document preparation

Decompress the downloaded Kafka configuration file

Enter on linux:

 tar -xvf kafka_2.12-1.0.0.tgz

Then move it to /opt/kafka, create it if it doesn't exist, and then rename the folder to kafka2.12

enter

 mv kafka_2.12-1.0.0 /opt/kafka mv kafka_2.12-1.0.0 kafka2.12

2. Environment configuration

Edit /etc/profile file

enter:

 export KAFKA_HOME=/opt/kafka/kafka2.12 export PATH=.:${JAVA_HOME}/bin:${KAFKA_HOME}/bin:${ZK_HOME}/bin:$PATH

enter:

 source /etc/profile

Make the configuration effective

3. Modify the configuration file

Note: Actually, if it is a stand-alone machine, the kafka configuration file can be started directly in the bin directory without modifying it. But we are a cluster here, so just change it slightly.

Switch to the kafka/config directory

Edit the server.properties file

What needs to be changed is Zookeeper's address:

Find the Zookeeper configuration, specify the address of the Zookeeper cluster, and modify it as follows

 zookeeper.connect=master:2181,slave1:2181,slave2:2181zookeeper.connection.timeout.ms=6000

Other options for change are

1 , num.partitions represents the specified partition, default is 1

2. Log.dirs kafka log path, just change this according to personal needs.

...

Note: There are other configurations, you can view the official documentation. If there are no special requirements, just use the default one.

After configuration, remember to use the scp command to transmit to other clusters!

4. Start kafka

Cluster Every cluster needs to be operated!

Switch to the kafka/bin directory

enter:

 kafka-server-start.sh

Then enter the jps name to see if it starts successfully:

After successful startup, you can perform a simple test

Create a topic first

enter:

 kafka-topics.sh --zookeeper master:2181 --create --topic t_test --partitions 5 --replication-factor 2

Description: Here is a topic named t_test, and 5 partitions are specified, each partition has 2 replicas. If the partition is not specified, the default partition is configured in the configuration file.

Then production data

enter:

 kafka-console-producer.sh --broker-list master:9092 --topic t_test

You can use Ctrl+D to exit

Then we open another xshell window

Consume

enter:

 kafka-console-consumer.sh --zookeeper master:2181 --topic t_test --from-beginning

You can use Ctrl+C to exit

You can see that the data has been consumed normally.

5. Some commonly used commands of kafka

1. Start and close kafka

 bin/kafka-server-start.sh config/server.properties >>/dev/null 2>&1 &bin/kafka-server-stop.sh

2. Check the message queue and specific queue in the kafka cluster

View all topics in the cluster

 kafka-topics.sh --zookeeper master:2181,slave1:2181,slave2:2181 --list

View a topic information

 kafka-topics.sh --zookeeper master:2181 --describe --topic t_test

3. Create a Topic

 kafka-topics.sh --zookeeper master:2181 --create --topic t_test --partitions 5 --replication-factor 2

4. Production data and consumption data

 kafka-console-producer.sh --broker-list master:9092 --topic t_test

Ctrl+D Exit

 kafka-console-consumer.sh --zookeeper master:2181 --topic t_test --from-beginning

Ctrl+C Exit

5.kafka's delete command

 kafka-topics.sh --delete --zookeeper master:2181 --topic t_test

6. Add partition

 kafka-topics.sh --alter --topict_test --zookeeper master:2181 --partitions 10

other

Reference to the official document for Storm environment construction: http://storm.apache.org/releases/1.1.1/Setting-up-a-Storm-cluster.html

Kafka environment construction reference official document: http://kafka.apache.org/quickstart

Summarize

The above is the entire content of this article. I hope that the content of this article has certain reference value for everyone's study or work. If you have any questions, you can leave a message to communicate. Thank you for your support to Wulin.com.