forked from docs/doc-exports
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
66 lines
7.2 KiB
HTML
66 lines
7.2 KiB
HTML
<a name="mrs_01_1997"></a><a name="mrs_01_1997"></a>
|
|
|
|
<h1 class="topictitle1">Optimizing Datasource Tables</h1>
|
|
<div id="body1595920218808"><div class="section" id="mrs_01_1997__sb041ae7f6de74030aea62708c73c0c0a"><h4 class="sectiontitle">Scenario</h4><p id="mrs_01_1997__a5e8a593b7e70436c864cad9a7e46c82a">Save the partition information about the datasource table to the Metastore and process partition information in the Metastore.</p>
|
|
<ul id="mrs_01_1997__u5ddd127a89944f92a238daf43990ddeb"><li id="mrs_01_1997__la9fe3f75573c42aebf865e67c6264b2d">Optimize the datasource tables, support syntax such as adding, deletion, and modification in the table based on partitions, improving compatibility with Hive.</li><li id="mrs_01_1997__la077ac602115487e971078f431980bde">Support statements of partition tailoring and push down to the Metastore to filter unmatched partitions.<div class="p" id="mrs_01_1997__a9cc1e9f111ba47fcb38974c74821fa69"><a name="mrs_01_1997__la077ac602115487e971078f431980bde"></a><a name="la077ac602115487e971078f431980bde"></a>Example:<pre class="screen" id="mrs_01_1997__s898de8f94ae640a289b79c6c1f84442a">select count(*) from table where partCol=1; //partCol (partition column)</pre>
|
|
</div>
|
|
<p id="mrs_01_1997__a8a7e0061e9c243278aa6f3348e29ac2f">You need only to process data corresponding to partCol=1 when performing the TableScan operation in the physical plan.</p>
|
|
</li></ul>
|
|
</div>
|
|
<div class="section" id="mrs_01_1997__scdc4ea91e25044338a8a1fb07a4d601e"><h4 class="sectiontitle">Procedure</h4><div class="p" id="mrs_01_1997__a4d2dc0bb9f3448a68c7fdc8b45954c56">If you want to enable Datasource table optimization, configure the <strong id="mrs_01_1997__b118314185818">spark-defaults.conf</strong> file on the Spark client.
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="mrs_01_1997__t7f273d3997f34efaa45e88b285717334" frame="border" border="1" rules="all"><caption><b>Table 1 </b>Parameter description</caption><thead align="left"><tr id="mrs_01_1997__ra0c516a4bef74577bb55c761d96cc2d8"><th align="left" class="cellrowborder" valign="top" width="36.4%" id="mcps1.3.2.2.2.2.4.1.1"><p id="mrs_01_1997__a3956f84c0f814d1f8db221a829215a4f">Parameter</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="50.88%" id="mcps1.3.2.2.2.2.4.1.2"><p id="mrs_01_1997__a979a91c0d92a41b1a86ddf33f10102a2">Description</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="12.72%" id="mcps1.3.2.2.2.2.4.1.3"><p id="mrs_01_1997__af364d58698a9421394091cb4d6ba2485">Default Value</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="mrs_01_1997__r28c99445e8a7451a870e125565b3b81b"><td class="cellrowborder" valign="top" width="36.4%" headers="mcps1.3.2.2.2.2.4.1.1 "><p id="mrs_01_1997__a7d9b6b8f80764987866be7226105cf1e">spark.sql.hive.manageFilesourcePartitions</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50.88%" headers="mcps1.3.2.2.2.2.4.1.2 "><p id="mrs_01_1997__a4f160f6c3f5e4972882e432547718964">Specifies whether to enable Metastore partition management (including datasource tables and converted Hive).</p>
|
|
<ul id="mrs_01_1997__u9ff68da24ff44b7998438cd30a85befd"><li id="mrs_01_1997__l3da75ece7d794aa0a13a4bcf1eea8e3d"><strong id="mrs_01_1997__b1197532820588">true</strong> indicates enabling Metastore partition management. In this case, datasource tables are stored in Hive and Metastore is used to tailor partitions in query statements.</li></ul>
|
|
<ul id="mrs_01_1997__uef68f174a233437a928af4d4ff2bbe88"><li id="mrs_01_1997__l0db472b8d16146afa74489b1c664d463"><strong id="mrs_01_1997__b54501344115819">false</strong> indicates disabling Metastore partition management.</li></ul>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="12.72%" headers="mcps1.3.2.2.2.2.4.1.3 "><p id="mrs_01_1997__a5a26a23c61224ec1bb64b1e7aeeac725">true</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="mrs_01_1997__rbaccc82ae0b54876a9722429799ae6c4"><td class="cellrowborder" valign="top" width="36.4%" headers="mcps1.3.2.2.2.2.4.1.1 "><p id="mrs_01_1997__ac8995a63963d4b06b62d79fed6d5e934">spark.sql.hive.metastorePartitionPruning</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50.88%" headers="mcps1.3.2.2.2.2.4.1.2 "><p id="mrs_01_1997__a3234c7eca65b42fb98d6205b5c2d0c72">Specifies whether to support pushing down predicate to Hive Metastore.</p>
|
|
<ul id="mrs_01_1997__u4a9fd2471bb142129e5f55476f900858"><li id="mrs_01_1997__ld626b09c6d324c2399f29cd6637aca33"><strong id="mrs_01_1997__b13101195475814">true</strong> indicates supporting pushing down predicate to Hive Metastore. Only the predicate of Hive tables is supported.</li></ul>
|
|
<ul id="mrs_01_1997__u02f9f7f2d73e4284b8ed0b14efbe5b68"><li id="mrs_01_1997__la2b6376356314f40a3f8049c16891243"><strong id="mrs_01_1997__b2794165765813">false</strong> indicates not supporting pushing down predicate to Hive Metastore.</li></ul>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="12.72%" headers="mcps1.3.2.2.2.2.4.1.3 "><p id="mrs_01_1997__aac7b94eba912417b9fb24f3ddc74b61b">true</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="mrs_01_1997__r10e94347b9b74b888812e9b48a164dfa"><td class="cellrowborder" valign="top" width="36.4%" headers="mcps1.3.2.2.2.2.4.1.1 "><p id="mrs_01_1997__add88ca072df6454aa46cb2811509b69a">spark.sql.hive.filesourcePartitionFileCacheSize</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50.88%" headers="mcps1.3.2.2.2.2.4.1.2 "><p id="mrs_01_1997__a13cc4f07eabd45b2b6845da8df3918fe">The cache size of the partition file metadata in the memory.</p>
|
|
<p id="mrs_01_1997__a3b742840be3f4507bd0ddb56b4cf9839">All tables share a cache that can use up to specified num bytes for file metadata.</p>
|
|
<p id="mrs_01_1997__af3ae91b49f384ab39361f8836899655a">This parameter is valid only when <span class="parmname" id="mrs_01_1997__parmname9973814707"><b>spark.sql.hive.manageFilesourcePartitions</b></span> is set to <strong id="mrs_01_1997__b1978414301">true</strong>.</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="12.72%" headers="mcps1.3.2.2.2.2.4.1.3 "><p id="mrs_01_1997__aed744bbef39a41e7a55cc5b26227924c">250 * 1024 * 1024</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="mrs_01_1997__r3c50338706c14886b332cc045bd44019"><td class="cellrowborder" valign="top" width="36.4%" headers="mcps1.3.2.2.2.2.4.1.1 "><p id="mrs_01_1997__a601027a036af418db52f6918b0711b31">spark.sql.hive.convertMetastoreOrc</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50.88%" headers="mcps1.3.2.2.2.2.4.1.2 "><p id="mrs_01_1997__a920d639815a247a88056c8818a18c2c7"><span id="mrs_01_1997__ph19661422606">The processing approach of ORC tables.</span></p>
|
|
<ul id="mrs_01_1997__ucd35c9987aa8487ab2f4a6b41f7e90a4"><li id="mrs_01_1997__l35de4330223246b1acd1622eee986543"><strong id="mrs_01_1997__b15722102812017">false</strong>: Spark SQL uses Hive SerDe to process ORC tables.</li><li id="mrs_01_1997__l73d621fc7f26426e8fedfaa3b011f2bb"><strong id="mrs_01_1997__b1214316328013">true</strong>: Spark SQL uses the Spark built-in mechanism to process ORC tables.</li></ul>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="12.72%" headers="mcps1.3.2.2.2.2.4.1.3 "><p id="mrs_01_1997__ae15446505f664966b4a4f52235ae4fd1">true</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1985.html">Spark SQL and DataFrame Tuning</a></div>
|
|
</div>
|
|
</div>
|
|
|