Preface
With more and more internal business systems in the enterprise and JVM-based services, there may be multiple sets of JDKs running different services in the online environment. Everyone knows that services written based on the higher version of Java specification will appear on the lower version of JVM: java.lang.UnsupportedClassVersionError exception.
Spark 2.2 has begun to remove support for Java 7. In most cases, our Spark Application is a JDK shared with Hadoop system. If the JDK version 7 that Hadoop depends on is 7, then there will be problems with the application written on it based on JDK 8.
This article mainly introduces how to specify different JDK versions for Spark Application in different scenarios.
The cluster has deployed the specified JDK version
Assume that the deployment path of JDK for each node in the cluster is: /usr/java/jdk1.8
Spark provides spark.executorEnv.[EnvironmentVariableName] configuration, which can be used to add environment variables to the Executor process. If the cluster manager used by Spark Application is Standalone, you only need to formulate the jdk path of the Executor side through spark.executorEnv.JAVA_HOME , as follows:
$SPARK_HOME/bin/spark-submit / --conf "spark.executorEnv.JAVA_HOME=/usr/java/jdk1.8" / ...
In YARN mode, you also need to specify different JAVA_HOME environment variables for the Application Master, as follows:
$SPARK_HOME/bin/spark-submit / --conf "spark.executorEnv.JAVA_HOME=/usr/java/jdk1.8" / --conf "spark.yarn.appMasterEnv.JAVA_HOME=/usr/java/jdk1.8" / ...
When deploying on YARN in cluster mode, spark.yarn.appMasterEnv.JAVA_HOME is equivalent to setting a specific JDK version for the Driver of Spark Application;
When deploying in client mode, spark.yarn.appMasterEnv.JAVA_HOME simply sets a specific JDK version for Executor Launcher.
The JDK version of the Driver side is the same as the SPARK_HOME environment variable in the machine where spark-submit is located, and it can be directly specified in spark-env.sh.
The cluster lacks a specific JDK version and has no administrative permissions on the cluster.
In some special scenarios, we do not have management permissions for the cluster, and can only submit Application through YARN, and the JDK version we need is not deployed in the cluster. In this case, we need to submit the JDK installation package together.
Here we require that our JDK installation package must be in gz format and be placed in the same directory as the jar package after your code is packaged. Suppose that the JDK installation package we downloaded is: jdk-8u141-linux-x64.tar.gz.
关键配置如下:
$SPARK_HOME/bin/spark-submit / --conf "spark.yarn.dist.archives=jdk-8u141-linux-x64.tar.gz" / --conf "spark.executorEnv.JAVA_HOME=./jdk-8u141-linux-x64.tar.gz/jdk1.8.0_141" / --conf "spark.yarn.appMasterEnv.JAVA_HOME=./jdk-8u141-linux-x64.tar.gz/jdk1.8.0_141" / ...
We can distribute the JDK installation package to the working directory of all Executors (including the Application Master's Executor) by specifying the spark.yarn.dist.archives configuration. In addition, the tar.gz compressed package will also be automatically decompressed. Assuming that the decompressed directory of jdk-8u141-linux-x64.tar.gz is jdk1.8.0_141, then the directory of our specific JDK is: ./jdk-8u141-linux-x64.tar.gz/jdk1.8.0_141, and different JDK versions and so on.
Note: Since Spark Standalone does not provide the function of distributing JDK installation packages and automatically decompressing, this method can only be used under YARN.
verify
Through ps -ef grep, you can see that the Java startup path is configured successfully for Java in our specific JDK directory.
The following is the process startup information for the Executor I specify the JDK version in YARN mode:
stan 590751 590745 0 20:45 ? 00:00:14 ./jdk-8u141-linux-x64.tar.gz/jdk1.8.0_141/bin/java -server -Xmx512m -XX:+UseG1GC -XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark -XX:InitiatingHeapOccupancyPercent=35 -XX:PermSize=256M -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:./gc.log -verbose:gc -Djava.io.tmpdir=/home/stan/tmp/hadoop-stan/nm-local-dir/usercache/stan/appcache/application_1508397483453_0095/container_1508397483453_0095_01_000004/tmp -Dspark.driver.port=52986 -Dspark.yarn.app.container.log.dir=/home/stan//hadoop-2.6.4/logs/userlogs/application_1508397483453_0095/container_1508397483453_0095_01_000004 -XX:OnOutOfMemoryError=kill %p org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://[email protected]:52986 --executor-id 3 --hostname stan --cores 1 --app-id application_1508397483453_0095 --user-class-path file:/home/stan/tmp/hadoop-stan/nm-local-dir/usercache/stan/appcache/application_1508397483453_0095/container_1508397483453_0095_01_000004/__app__.jar
Attachment: Solution to the incompatible error of spark application runtime version
17/06/27 14:34:41 INFO deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 17/06/27 14:34:41 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 788.8 KB, free 1246.5 MB) 17/06/27 14:34:41 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 54.0 KB, free 1246.4 MB) 17/06/27 14:34:41 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.50.70.121:37335 (size: 54.0 KB, free: 1247.2 MB) 17/06/27 14:34:41 INFO SparkContext: Created broadcast 0 from rdd at TradeInfoOutlier.scala:30 Exception in thread "main" java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaUniverse$JavaMirror; at com.fangdd.data.profile.outlier.TradeInfoOutlier$.main(TradeInfoOutlier.scala:30) at com.fangdd.data.profile.outlier.TradeInfoOutlier.main(TradeInfoOutlier.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:745) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 17/06/27 14:34:42 INFO SparkContext: Invoking stop() from shutdown hook
This error is because the production environment uses the running environment of scala 2.10 + spark1.6.3, and the local application jar uses the compilation environment of scala 2.11 + spark.1.6.3. Therefore, the above error was reported in the production environment cluster. After changing the scala version and re-engaging the jar package, the above error was reported.
Summarize
The above is the entire content of this article. I hope that the content of this article has certain reference value for everyone's study or work. If you have any questions, you can leave a message to communicate. Thank you for your support to Wulin.com.