Preface
Before starting, it should be noted that there is no necessary connection between storm and kafka cluster installation. I wrote these two together because they are both managed by zookeeper and rely on the JDK environment. In order not to repeat the configuration, I wrote these two together. If you only need one, just choose the reading you choose. I won’t say much below, let’s take a look at the detailed introduction together.
The dependencies of these two are as follows:
Note: Storm1.0 and Kafka2.0 require JDK at least 1.7 and Zookeeper3.0 or above.
Download address:
JDK installation
Every machine needs to be installed with JDK! ! !
Note: Generally, CentOS comes with openjdk, but we are using oracle's JDK. So you need to write uninstall openjdk, and then install the JDK downloaded in oracle. If you have uninstalled, you can skip this step.
First enter java -version
Check whether JDK is installed. If it is installed but the version is not suitable, uninstall it.
enter
rpm -qa | grep java
View information
Then enter:
rpm -e --nodeps "You want to uninstall JDK information"
For example: rpm -e --nodeps java-1.7.0-openjdk-1.7.0.99-2.6.5.1.el6.x86_64
After confirming that it is gone, unzip the downloaded JDK
tar -xvf jdk-8u144-linux-x64.tar.gz
Move to the opt/java folder, create new ones without them, and rename the folder to jdk1.8.
mv jdk1.8.0_144 /opt/javamv jdk1.8.0_144 jdk1.8
Then edit the profile file and add the following configuration
enter:
vim /etc/profile
Add to:
export JAVA_HOME=/opt/java/jdk1.8export JRE_HOME=/opt/java/jdk1.8/jreexport CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/libexport PATH=.:${JAVA_HOME}/bin:$PATHAfter successful addition, enter
source /etc/profilejava -version
Check whether the configuration is successful
Zookeeper environment installation
1. Document preparation
Unzip the downloaded Zookeeper configuration file
Enter on linux:
tar -xvf zookeeper-3.4.10.tar.gz
Then move to /opt/zookeeper, create a new one without it, and then rename the folder to zookeeper3.4
enter
mv zookeeper-3.4.10 /opt/zookeepermv zookeeper-3.4.10 zookeeper3.4
2. Environment configuration
Edit /etc/profile file
enter:
export ZK_HOME=/opt/zookeeper/zookeeper3.4 export PATH=.:${JAVA_HOME}/bin:${ZK_HOME}/bin:$PATHenter:
source /etc/profile
Make the configuration effective
3. Modify the configuration file
3.3.1 Create files and directories
Create these directories on the servers in the cluster
mkdir /opt/zookeeper/data mkdir /opt/zookeeper/dataLog
And create myid file in /opt/zookeeper/data directory
enter:
touch myid
After successful creation, change the myid file.
For convenience, I changed the content of myid files of master, slave1, and slave2 to 1, 2, 3
3.3.2 Create new zoo.cfg
Switch to /opt/zookeeper/zookeeper3.4/conf directory
If there is no zoo.cfg file, copy the zoo_sample.cfg file and rename it to zoo.cfg.
Modify this newly created zoo.cfg file
dataDir=/opt/zookeeper/datadataLogDir=/opt/zookeeper/dataLogserver.1=master:2888:3888server.2=slave1:2888:3888server.3=slave2:2888:3888
Description: client port, as the name implies, is the port where the client connects to the zookeeper service. This is a TCP port. The dataLogDir is the order log (WAL) placed. DataDir puts snapshot of memory data structures, which facilitates quick recovery. In order to maximize performance, it is generally recommended to divide dataDir and dataLogDir onto different disks, so that the disk order writing characteristics can be fully utilized. DataDir and dataLogDir need to be created by themselves, and the directories can be formulated by themselves, and they can be corresponding. This 1 in server.1 needs to correspond to the value in the myid file in the dataDir directory on the master machine. This 2 in server.2 needs to correspond to the value in the myid file in the dataDir directory on slave1. This 3 in server.3 needs to correspond to the values in the myid file in the dataDir directory on slave2. Of course, you can use the values as you like, just as they correspond. The port numbers of 2888 and 3888 can also be used casually, because it doesn't matter if you use the same on different machines.
1.tickTime: CS communication heartbeat number
The time interval between Zookeeper servers or between clients and servers maintains heartbeats, that is, each tickTime time will send a heartbeat. tickTime is in milliseconds.
tickTime=2000
2.initLimit: LF initial communication time limit
The maximum number of heartbeats (number of tickTimes) that can tolerate during initial connection between the follower server (F) and the leader server (L) in the cluster.
initLimit=10
3.syncLimit: LF synchronous communication time limit
The maximum number of heartbeats (number of tickTimes) that can be tolerated between requests and responses between follower servers and leader servers in the cluster.
syncLimit=5
Still transfer zookeeper to other machines. Remember to change the myid under /opt/zookeeper/data, this cannot be consistent.
enter:
scp -r /opt/zookeeper root@slave1:/optscp -r /opt/zookeeper root@slave2:/opt
4. Start zookeeper
Because zookeeper is an electoral system, its master-slave relationship is not specified like hadoop. For details, you can refer to the official documentation.
After successfully configuring zookeeper, start zookeeper on each machine.
Switch to the zookeeper directory
cd /opt/zookeeper/zookeeper3.4/bin
enter:
zkServer.sh start
After successful startup
View status input:
zkServer.sh status
You can view the leader and follower of zookeeper on each machine
Storm environment installation
1. Document preparation
Decompress the downloaded storm configuration file
Enter on linux:
tar -xvf apache-storm-1.1.1.tar.gz
Then move it to /opt/storm, create it if it doesn't exist, and then rename the folder to storm1.1
enter
mv apache-storm-1.1.1 /opt/storm mv apache-storm-1.1.1 storm1.1
Edit /etc/profile file
Add to:
export STORM_HOME=/opt/storm/storm1.1export PATH=.:${JAVA_HOME}/bin:${ZK_HOME}/bin:${STORM_HOME}/bin:$PATH Enter storm version to view version information
3. Modify the configuration file
Edit storm.yarm of storm/conf.
Make the following edits:
enter:
vim storm.yarm
slave.zookeeper.servers: - "master" - "slave1" - "slave2"storm.local.dir: "/root/storm"nimbus.seeds: ["master"]supervisor.slots.ports: - 6700 - 6701 - 6702 - 6703
illustrate:
1. storm.zookeeper.servers is the service address of the specified zookeeper.
Because the storage information of storm is on zookeeper, the service address of zookeeper must be configured. If zookeeper is a stand-alone machine, you only need to specify one!
2. storm.local.dir represents the storage directory.
The Nimbus and Supervisor daemons need to store a directory on the local disk to store a small amount of state (such as jars, confs, etc.). Can be created on each machine and given permissions.
3. nimbus.seeds represents the candidate host.
Worker needs to know which machine is the host candidate (zookeeper cluster is electoral), so that topology jars and confs can be downloaded.
4. Supervisor.slots.ports represents the worker port.
For each supervisor machine, we can use this to configure how many workers to run on this machine. Each worker uses a separate port to receive messages, which also defines which ports are open to use. If you define 5 ports here, it means that up to 5 workers can be run on this supervisor node. If 3 ports are defined, it means that up to 3 workers can be run. By default (i.e., configured in defaults.yaml), there will be four workers running on ports 6700, 6701, 6702, and 6703.
Supervisor does not start these four workers immediately upon startup. Instead, it will only start when the assigned task is accepted. The specific number of workers to start must be determined based on how many workers we need in this supervisor in Topology. If the specified Topology will be executed by only one worker, then the supervisor starts one worker and will not start all.
Note: There are no spaces in front of these configurations! ! ! , otherwise an error will be reported. The host name (mapping) is used here, and the IP can also be used. The actual situation is based on your own.
You can use the scp command or ftp software to copy storm to other machines.
After successful configuration, you can start Storm, but make sure that JDK and Zookeeper have been installed correctly and that Zookeeper has been successfully started.
4. Start Storm
Switch to the storm/bin directory
Start input on the master node:
storm nimbus >/dev/null 2>&1 &
Access the web interface (master) input:
storm ui
Enter from node (slave1, slave2):
storm supervisor >/dev/null 2>&1 &
Enter in the browser interface: Port 8080
The interface is opened successfully, indicating that the environment configuration is successful:
kafka's environment installation
Kafka is a high-throughput streaming distributed message system used to process active streaming data, such as web page visits pm, logs, etc., which can not only process big data information in real time but also offline.
1. Document preparation
Decompress the downloaded Kafka configuration file
Enter on linux:
tar -xvf kafka_2.12-1.0.0.tgz
Then move it to /opt/kafka, create it if it doesn't exist, and then rename the folder to kafka2.12
enter
mv kafka_2.12-1.0.0 /opt/kafka mv kafka_2.12-1.0.0 kafka2.12
2. Environment configuration
Edit /etc/profile file
enter:
export KAFKA_HOME=/opt/kafka/kafka2.12 export PATH=.:${JAVA_HOME}/bin:${KAFKA_HOME}/bin:${ZK_HOME}/bin:$PATHenter:
source /etc/profile
Make the configuration effective
3. Modify the configuration file
Note: Actually, if it is a stand-alone machine, the kafka configuration file can be started directly in the bin directory without modifying it. But we are a cluster here, so just change it slightly.
Switch to the kafka/config directory
Edit the server.properties file
What needs to be changed is Zookeeper's address:
Find the Zookeeper configuration, specify the address of the Zookeeper cluster, and modify it as follows
zookeeper.connect=master:2181,slave1:2181,slave2:2181zookeeper.connection.timeout.ms=6000
Other options for change are
1 , num.partitions represents the specified partition, default is 1
2. Log.dirs kafka log path, just change this according to personal needs.
...
Note: There are other configurations, you can view the official documentation. If there are no special requirements, just use the default one.
After configuration, remember to use the scp command to transmit to other clusters!
4. Start kafka
Cluster Every cluster needs to be operated!
Switch to the kafka/bin directory
enter:
kafka-server-start.sh
Then enter the jps name to see if it starts successfully:
After successful startup, you can perform a simple test
Create a topic first
enter:
kafka-topics.sh --zookeeper master:2181 --create --topic t_test --partitions 5 --replication-factor 2
Description: Here is a topic named t_test, and 5 partitions are specified, each partition has 2 replicas. If the partition is not specified, the default partition is configured in the configuration file.
Then production data
enter:
kafka-console-producer.sh --broker-list master:9092 --topic t_test
You can use Ctrl+D to exit
Then we open another xshell window
Consume
enter:
kafka-console-consumer.sh --zookeeper master:2181 --topic t_test --from-beginning
You can use Ctrl+C to exit
You can see that the data has been consumed normally.
5. Some commonly used commands of kafka
1. Start and close kafka
bin/kafka-server-start.sh config/server.properties >>/dev/null 2>&1 &bin/kafka-server-stop.sh
2. Check the message queue and specific queue in the kafka cluster
View all topics in the cluster
kafka-topics.sh --zookeeper master:2181,slave1:2181,slave2:2181 --list
View a topic information
kafka-topics.sh --zookeeper master:2181 --describe --topic t_test
3. Create a Topic
kafka-topics.sh --zookeeper master:2181 --create --topic t_test --partitions 5 --replication-factor 2
4. Production data and consumption data
kafka-console-producer.sh --broker-list master:9092 --topic t_test
Ctrl+D Exit
kafka-console-consumer.sh --zookeeper master:2181 --topic t_test --from-beginning
Ctrl+C Exit
5.kafka's delete command
kafka-topics.sh --delete --zookeeper master:2181 --topic t_test
6. Add partition
kafka-topics.sh --alter --topict_test --zookeeper master:2181 --partitions 10
other
Reference to the official document for Storm environment construction: http://storm.apache.org/releases/1.1.1/Setting-up-a-Storm-cluster.html
Kafka environment construction reference official document: http://kafka.apache.org/quickstart
Summarize
The above is the entire content of this article. I hope that the content of this article has certain reference value for everyone's study or work. If you have any questions, you can leave a message to communicate. Thank you for your support to Wulin.com.