forked from docs/doc-exports
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
5.1 KiB
5.1 KiB
Configuring the Compression Format of a Parquet Table
Scenarios
The compression format of a Parquet table can be configured as follows:
- If the Parquet table is a partitioned one, set the parquet.compression parameter of the Parquet table to specify the compression format. For example, set tblproperties in the table creation statement: "parquet.compression"="snappy".
- If the Parquet table is a non-partitioned one, set the spark.sql.parquet.compression.codec parameter to specify the compression format. The configuration of the parquet.compression parameter is invalid, because the value of the spark.sql.parquet.compression.codec parameter is read by the parquet.compression parameter. If the spark.sql.parquet.compression.codec parameter is not configured, the default value is snappy and will be read by the parquet.compression parameter.
Therefore, the spark.sql.parquet.compression.codec parameter can only be used to set the compression format of a non-partitioned Parquet table.
Configuration parameters
Navigation path for setting parameters:
On Manager, choose Cluster > Name of the desired cluster > Service > Spark2x > Configuration. Click All Configurations and enter a parameter name in the search box.
Parameter |
Description |
Default Value |
---|---|---|
spark.sql.parquet.compression.codec |
Used to set the compression format of a non-partitioned Parquet table. |
snappy |
Parent topic: Scenario-Specific Configuration