forked from docs/doc-exports
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
17 lines
1.9 KiB
HTML
17 lines
1.9 KiB
HTML
<a name="mrs_01_2035"></a><a name="mrs_01_2035"></a>
|
|
|
|
<h1 class="topictitle1">If I Access a parquet Table on Which I Do not Have Permission, Why a Job Is Run Before "Missing Privileges" Is Displayed?</h1>
|
|
<div id="body1595920221842"><div class="section" id="mrs_01_2035__s9996033cf96548dc91a4723962d31e98"><h4 class="sectiontitle">Question</h4><p id="mrs_01_2035__ae8702a27a6874496861bf7cb028a67a6">If I access a parquet table on which I do not have permission, why a job is run before "Missing Privileges" is displayed?</p>
|
|
</div>
|
|
<div class="section" id="mrs_01_2035__s3d826fb8a5984f94934088e4cff7591a"><h4 class="sectiontitle">Answer</h4><p id="mrs_01_2035__a71a047d3f1b04f4db3eca481815f96cc">The execution sequence of Spark SQL statement parse the table in the statement first, then obtain the metadata in the table, and finally check the permission.</p>
|
|
<p id="mrs_01_2035__ace46ec745fdc464e8d2137741db97e26">The metadata of a parquet table contains the Split information (which is read by HDFS API) about files. If the table contains many files, the HDFS API reads data in serial mode, in which degrades the performance. If the number of files in the table exceeds the threshold <i><span class="varname" id="mrs_01_2035__v4fed18da42d841408ded5042af03bf5d">spark.sql.sources.parallelSplitDiscovery.threshold</span></i>, a job will be generated to use Executor to read the data in parallel mode.</p>
|
|
<p id="mrs_01_2035__a2a4f0464896e4c0584294a2e66ba6950">The permission authentication is executed after the metadata is obtained. Therefore, when the number of files in the table exceeds the threshold, a job is run before the permission authentication error message <span class="uicontrol" id="mrs_01_2035__u627d81601d8f4767a5b3dd6a506eef6b"><b>Missing Privileges</b></span>.</p>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_2022.html">Spark SQL and DataFrame</a></div>
|
|
</div>
|
|
</div>
|
|
|