This article demonstrates an example of using Spark as the analysis engine, Cassandra as the data storage, and using Spring Boot to develop drivers.
1. Prerequisites
Create a keyspace
CREATE KEYSPACE hfcb WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 };Create a table
CREATE TABLE person ( id text PRIMARY KEY, first_name text, last_name text);
Insert test data
insert into person (id,first_name,last_name) values('1','wang','yunfei');insert into person (id,first_name,last_name) values('2','peng','chao');insert into person (id,first_name,last_name) values('3','li','jian');insert into person (id,first_name,last_name) values('4','zhang','jie');insert into person (id,first_name,last_name) values('5','liang','wei');insert into person (id,first_name,last_name) values('5','liang','wei');2.Spark-cassandra-connector installation
To enable Spark-1.5.1 to use Cassandra as data storage, you need to add the following jar package dependencies (example to place the package in the /opt/spark/managed-lib/ directory, which can be arbitrary):
cassandra-clientutil-3.0.2.jarcassandra-driver-core-3.1.4.jarguava-16.0.1.jarcassandra-thrift-3.0.2.jar joda-convert-1.2.jarjoda-time-2.9.9.jarlibthrift-0.9.1.jarspark-cassandra-connector_2.10-1.5.1.jar
In the /opt/spark/conf directory, create a new spark-env.sh file and enter the following content
SPARK_CLASSPATH=/opt/spark/managed-lib/*
3. Spring Boot application development
Add spark-cassandra-connector and spark dependencies
<dependency> <groupId>com.datastax.spark</groupId> <artifactId>spark-cassandra-connector_2.10</artifactId> <version>1.5.1</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.10</artifactId> <version>1.5.1</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2.10</artifactId> <version>1.5.1</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2.10</artifactId> <version>1.5.1</version> </dependency>
Configure spark and cassandra paths in application.yml
spark.master: spark://master:7077cassandra.host: 192.168.1.140cassandra.keyspace: hfcb
Here we specifically state that spark://master:7077 is a domain name rather than an IP address. You can modify the local hosts file to map master and IP address.
Configuring SparkContext and CassandraSQLContext
@Configurationpublic class SparkCassandraConfig { @Value("${spark.master}") String sparkMasterUrl; @Value("${cassandra.host}") String cassandraHost; @Value("${cassandra.keyspace}") String cassandraKeyspace; @Bean public JavaSparkContext javaSparkContext(){ SparkConf conf = new SparkConf(true) .set("spark.cassandra.connection.host", cassandraHost)// .set("spark.cassandra.auth.username", "cassandra")// .set("spark.cassandra.auth.password", "cassandra") .set("spark.submit.deployMode", "client"); JavaSparkContext context = new JavaSparkContext(sparkMasterUrl, "SparkDemo", conf); return context; } @Bean public CassandraSQLContext sqlContext(){ CassandraSQLContext cassandraSQLContext = new CassandraSQLContext(javaSparkContext().sc()); cassandraSQLContext.setKeyspace(cassandraKeyspace); return cassandraSQLContext; } }Simple call
@Repositorypublic class PersonRepository { @Autowired CassandraSQLContext cassandraSQLContext; public Long countPerson(){ DataFrame people = cassandraSQLContext.sql("select * from person order by id"); return people.count(); }}Start it and execute it like a regular Spring Boot program.
Source code address: https://github.com/wiselyman/spring-spark-cassandra.git
Summarize
The above is an example of the integration development of Spring Boot with Spark and Cassandra system introduced to you by the editor. I hope it will be helpful to you. If you have any questions, please leave me a message and the editor will reply to you in time. Thank you very much for your support to Wulin.com website!