forked from docs/doc-exports
Reviewed-by: Kacur, Michal <michal.kacur@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
32 lines
5.8 KiB
HTML
32 lines
5.8 KiB
HTML
<a name="mrs_01_24507"></a><a name="mrs_01_24507"></a>
|
|
|
|
<h1 class="topictitle1">Using the ZSTD_JNI Compression Algorithm to Compress Hive ORC Tables</h1>
|
|
<div id="body0000001533533490"><div class="section" id="mrs_01_24507__section10795559152719"><h4 class="sectiontitle">Scenario</h4><p id="mrs_01_24507__p33657411813">ZSTD_JNI is a native implementation of the ZSTD compression algorithm. Compared with ZSTD, ZSTD_JNI has higher compression read/write efficiency and compression ratio, and allows you to specify the compression level as well as the compression mode for data columns in a specific format.</p>
|
|
<p id="mrs_01_24507__p3920114919205">Currently, only ORC tables can be compressed using ASTD_JNI. By contrast, ZSTD enables you to compress tables in the full storage format. Therefore, you are advised to use this feature only when you have high requirements on data compression.</p>
|
|
<div class="note" id="mrs_01_24507__note111992013152013"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="mrs_01_24507__p6199181314201">This section applies only to MRS 3.<span id="mrs_01_24507__ph1697119202432">2.0</span> or later.</p>
|
|
</div></div>
|
|
</div>
|
|
<div class="section" id="mrs_01_24507__section711185917428"><h4 class="sectiontitle">Example</h4><ol id="mrs_01_24507__ol1093394115818"><li id="mrs_01_24507__li1793444135814"><span>Log in to the node where the client is installed as the Hive client installation user.</span></li><li id="mrs_01_24507__li44323875912"><span>Run the following command to switch to the client installation directory, for example, <strong id="mrs_01_24507__b298941015176"><span id="mrs_01_24507__ph381512063917">/opt/client</span></strong>:</span><p><p id="mrs_01_24507__p545358125919"><strong id="mrs_01_24507__b24534815591">cd <span id="mrs_01_24507__ph135515843918">/opt/client</span></strong></p>
|
|
</p></li><li id="mrs_01_24507__li545317816598"><span>Run the following command to configure environment variables:</span><p><p id="mrs_01_24507__p194531985598"><strong id="mrs_01_24507__b1945312815912">source bigdata_env</strong></p>
|
|
</p></li><li id="mrs_01_24507__li610182915598"><span>Check whether the cluster authentication mode is in security mode.</span><p><ul id="mrs_01_24507__ul1637872275912"><li id="mrs_01_24507__li63783223598">If yes, run the following command to perform user authentication and then go to <a href="#mrs_01_24507__li333142945916">5</a>.<p id="mrs_01_24507__p10378152213593"><strong id="mrs_01_24507__b16378132211593">kinit</strong> <i><span class="varname" id="mrs_01_24507__varname237852295913">Hive service user</span></i></p>
|
|
</li><li id="mrs_01_24507__li13781322135917">If no, go to <a href="#mrs_01_24507__li333142945916">5</a>.</li></ul>
|
|
</p></li><li id="mrs_01_24507__li333142945916"><a name="mrs_01_24507__li333142945916"></a><a name="li333142945916"></a><span>Run the following command to log in to the Hive client:</span><p><p id="mrs_01_24507__p173315293591"><strong id="mrs_01_24507__b1133202915591">beeline</strong></p>
|
|
</p></li><li id="mrs_01_24507__li642203917591"><span>Create a table in ZSTD_JNI compression format as follows:</span><p><ul id="mrs_01_24507__ul18738103319596"><li id="mrs_01_24507__li673893311594">Run the following example command to set the <strong id="mrs_01_24507__b73056918469">orc.compress</strong> parameter to <strong id="mrs_01_24507__b3947171616462">ZSTD_JNI</strong> when using this compression algorithm to create an ORC table:<p id="mrs_01_24507__p673833395919"><strong id="mrs_01_24507__b373817331591">create table tab_1(...) stored as orc TBLPROPERTIES("orc.compress"="ZSTD_JNI");</strong></p>
|
|
</li></ul>
|
|
<ul id="mrs_01_24507__ul18739633115914"><li id="mrs_01_24507__li0739193375918">The compression level of ZSTD_JNI ranges from 1 to 19. A larger value indicates a higher compression ratio but a slower read/write speed. A smaller value indicates a lower compression ratio but a faster compression speed compared with read/write speed and the other way around. The default value is <strong id="mrs_01_24507__b37861044155010">6</strong>. You can set the compression level through the <strong id="mrs_01_24507__b1360194525114">orc.global.compress.level</strong> parameter, as shown in the follows.<p id="mrs_01_24507__p207391233165912"><strong id="mrs_01_24507__b1073963310597">create table tab_1(...) stored as orc TBLPROPERTIES("orc.compress"="ZSTD_JNI", 'orc.global.compress.level'='3');</strong></p>
|
|
</li></ul>
|
|
<ul id="mrs_01_24507__ul2739633185919"><li id="mrs_01_24507__li1073933305912">This compression algorithm allows you to compress service data and columns in a specific data format. Currently, data in the following formats is supported: JSON data columns, Base64 data columns, timestamp data columns, and UUID data columns. You can achieve this function by setting the <strong id="mrs_01_24507__b12718415612">orc.column.compress</strong> parameter during table creation.<p id="mrs_01_24507__p1073923315911">The following example code shows how to use ZSTD_JNI to compress data in the JSON, Base64, timestamp, and UUID formats.</p>
|
|
<p id="mrs_01_24507__p1573919331590"><strong id="mrs_01_24507__b673913317599">create table test_orc_zstd_jni(f1 int, f2 string, f3 string, f4 string, f5 string) stored as orc</strong></p>
|
|
<p id="mrs_01_24507__p16739193365912"><strong id="mrs_01_24507__b1573983316591">TBLPROPERTIES('orc.compress'='ZSTD_JNI', 'orc.column.compress'='[{"type":"cjson","columns":"f2"},{"type":"base64","columns":"f3"},{"type ":"gorilla","columns":{"format": "yyyy-MM-dd HH:mm:ss.SSS", "columns": "f4"}},{"type":"uuid","columns":"f5"}]');</strong></p>
|
|
<p id="mrs_01_24507__p1873963365911">You can insert data in the corresponding format based on the site requirements to further compress the data.</p>
|
|
</li></ul>
|
|
</p></li></ol>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_0581.html">Using Hive</a></div>
|
|
</div>
|
|
</div>
|
|
|