Spark provides the Spark SQL language that is similar to SQL to perform operations on structured data. This section describes how to use Spark SQL from scratch. Create a table named src_data, write a data record in each row of the table, and store the data in the mrs_20160907 cluster. Then use SQL statements to query data in the table, and delete the table at last.
The sample text file is as follows:
abcd3ghji efgh658ko 1234jjyu9 7h8kodfg1 kk99icxz3
sparksql is only an example. The file system name must be globally unique. Otherwise, the parallel file system fails to be created.
OBS Path: obs://sparksql/input/sparksql-test.txt
HDFS Path: /user/userinput
A job can be submitted only when the mrs_20160907 cluster is in the Running state.
When entering Spark SQL statements, ensure that the statement characters are not more than 10,000.
Syntax:
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name [(col_name data_type [COMMENT col_comment], ...)] [COMMENT table_comment] [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS] [ROW FORMAT row_format] [STORED AS file_format] [LOCATION hdfs_path];
You can use the following two methods to create a table example:
For details about how to obtain the AK/SK, see Prerequisites.
create table src_data1 (line string) row format delimited fields terminated by ',' ;
load data inpath '/user/userinput/sparksql-test.txt' into table src_data1;
When method 2 is used, the data from OBS cannot be loaded to the created tables directly.
SELECT col_name FROM table_name;
Example of querying all data in the src_data table:
select * from src_data;
DROP TABLE [IF EXISTS] table_name;
Example of deleting the src_data table:
drop table src_data;
After the Spark SQL statements are submitted, the statement execution results are displayed in the result column.