This section describes how to use Spark2x to submit Spark applications, including Spark Core and Spark SQL. Spark Core is the kernel module of Spark. It executes tasks and is used to compile Spark applications. Spark SQL is a module that executes SQL statements.
Develop a Spark application to perform the following operations on logs about netizens' dwell time for online shopping on a weekend.
log1.txt: logs collected on Saturday
LiuYang,female,20 YuanJing,male,10 GuoYijun,male,5 CaiXuyu,female,50 Liyuan,male,20 FangBo,female,50 LiuYang,female,20 YuanJing,male,10 GuoYijun,male,50 CaiXuyu,female,50 FangBo,female,60
log2.txt: logs collected on Sunday
LiuYang,female,20 YuanJing,male,10 CaiXuyu,female,50 FangBo,female,50 GuoYijun,male,5 CaiXuyu,female,50 Liyuan,male,20 CaiXuyu,female,50 FangBo,female,50 LiuYang,female,20 YuanJing,male,10 FangBo,female,50 GuoYijun,male,50 CaiXuyu,female,50 FangBo,female,60
source bigdata_env
source Spark2x/component_env
kinit <service user for authentication>
spark-submit --class com.xxxx.bigdata.spark.examples.FemaleInfoCollection --master yarn-client /opt/female/FemaleInfoCollection.jar <inputPath>
For example, create a table, insert a piece of data, and then query the table.
spark-sql> CREATE TABLE TEST(NAME STRING, AGE INT); Time taken: 0.348 seconds spark-sql>INSERT INTO TEST VALUES('Jack', 20); Time taken: 1.13 seconds spark-sql> SELECT * FROM TEST; Jack 20 Time taken: 0.18 seconds, Fetched 1 row(s)
The storage path and format of the result data are specified by the Spark application.
The History Server UI is used to display the status of Spark applications that are complete or incomplete.
Spark UI: used to display the status of running applications.
View Spark2x Logs to learn application running status, and adjust applications based on log information.