forked from docs/doc-exports
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
168 lines
14 KiB
HTML
168 lines
14 KiB
HTML
<a name="mrs_01_1419"></a><a name="mrs_01_1419"></a>
|
|
|
|
<h1 class="topictitle1">Suggestions for Creating CarbonData Tables</h1>
|
|
<div id="body1595920210172"><div class="section" id="mrs_01_1419__s9555ec02d87e475d98d18505caab4aab"><h4 class="sectiontitle">Scenario</h4><p id="mrs_01_1419__ac8e7fd34058a49c290bc842c5063be97">This section provides suggestions based on more than 50 test cases to help you create CarbonData tables with higher query performance.</p>
|
|
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="mrs_01_1419__ta61755b42ee449f5890c25220972fa25" frame="border" border="1" rules="all"><caption><b>Table 1 </b>Columns in the CarbonData table</caption><thead align="left"><tr id="mrs_01_1419__rbcb08ed62694484b93f8dbcb09e14b75"><th align="left" class="cellrowborder" valign="top" width="25.169999999999998%" id="mcps1.3.1.3.2.5.1.1"><p id="mrs_01_1419__a1f5e66617533477292b541667c8bcb5e"><strong id="mrs_01_1419__a2a0f56b2a8c441b3895d3b1fbd063987">Column name</strong></p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="24.83%" id="mcps1.3.1.3.2.5.1.2"><p id="mrs_01_1419__a7f5a8a650eeb4e7e810ebc5690f09884"><strong id="mrs_01_1419__a9f115180de394119ba25b03053a54589">Data type</strong></p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="25%" id="mcps1.3.1.3.2.5.1.3"><p id="mrs_01_1419__a958b5f7b32874454b293d9a3f6f685ff"><strong id="mrs_01_1419__a175b712c5d084358963df2a9a73e09c4">Cardinality</strong></p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="25%" id="mcps1.3.1.3.2.5.1.4"><p id="mrs_01_1419__a45f32c1be91d4cea83f7959a99c263cb"><strong id="mrs_01_1419__ab6e5ec9229de4e15b49044b957d12126">Attribution</strong></p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="mrs_01_1419__r50f8f535738045f8b5d39c3288a9ad1a"><td class="cellrowborder" valign="top" width="25.169999999999998%" headers="mcps1.3.1.3.2.5.1.1 "><p id="mrs_01_1419__af09eded9713e4d2aae76a376fb70041f">msisdn</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="24.83%" headers="mcps1.3.1.3.2.5.1.2 "><p id="mrs_01_1419__ac219a6b99ea34187a9e4179b3d4eee9b">String</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.1.3.2.5.1.3 "><p id="mrs_01_1419__ab46bdc84115346c495e269683d61e844">30 million</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.1.3.2.5.1.4 "><p id="mrs_01_1419__a9280631d30c44fbdb462f35f7864f030">dimension</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="mrs_01_1419__ra7ff6e5e43d049d3be65332fec7d2484"><td class="cellrowborder" valign="top" width="25.169999999999998%" headers="mcps1.3.1.3.2.5.1.1 "><p id="mrs_01_1419__aaad6e719df5c4d1a9ccee15d8878b5c4">BEGIN_TIME</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="24.83%" headers="mcps1.3.1.3.2.5.1.2 "><p id="mrs_01_1419__a23438751eb14428daece149251bd0593">bigint</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.1.3.2.5.1.3 "><p id="mrs_01_1419__a272cb36059384931873f3933cf31028e">10,000</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.1.3.2.5.1.4 "><p id="mrs_01_1419__a9601fa6d74444079bc6eaff8e36a7854">dimension</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="mrs_01_1419__rc9e8502adfdc4a639044a5b1ec998392"><td class="cellrowborder" valign="top" width="25.169999999999998%" headers="mcps1.3.1.3.2.5.1.1 "><p id="mrs_01_1419__a24510870768d4db28bc475bb62ab75c5">host</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="24.83%" headers="mcps1.3.1.3.2.5.1.2 "><p id="mrs_01_1419__aecca792ab25f41e19af773e50d2b9937">String</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.1.3.2.5.1.3 "><p id="mrs_01_1419__a302bce8956374c2f8f2415f2de68c1d9">1 million</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.1.3.2.5.1.4 "><p id="mrs_01_1419__abd89877856144818b76dec4cad4b0d23">dimension</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="mrs_01_1419__r09b1bd1318c8417199d98aff5430d40b"><td class="cellrowborder" valign="top" width="25.169999999999998%" headers="mcps1.3.1.3.2.5.1.1 "><p id="mrs_01_1419__a73f6bd72d9cc4fb0b383d2edde06f438">dime_1</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="24.83%" headers="mcps1.3.1.3.2.5.1.2 "><p id="mrs_01_1419__a1bc88e5711b84774ad5b69cd7faae130">String</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.1.3.2.5.1.3 "><p id="mrs_01_1419__a7bd8ddd081e149a39812833bae423d74">1,000</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.1.3.2.5.1.4 "><p id="mrs_01_1419__ad7ed792665994ba0a1725b51dcaab549">dimension</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="mrs_01_1419__r72ca47585980423ab0c49248c24d7bff"><td class="cellrowborder" valign="top" width="25.169999999999998%" headers="mcps1.3.1.3.2.5.1.1 "><p id="mrs_01_1419__a288e6db185d64ef481ae8adcf221fa29">dime_2</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="24.83%" headers="mcps1.3.1.3.2.5.1.2 "><p id="mrs_01_1419__a2a9ebf815a344a04aaf3d1cb4671bf9a">String</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.1.3.2.5.1.3 "><p id="mrs_01_1419__ad9c15a00c2b9446bbe3be2a50c9a9383">500</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.1.3.2.5.1.4 "><p id="mrs_01_1419__a7f2b7a15b70547c380d2c6de34c3ea69">dimension</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="mrs_01_1419__rceaae94842e343159852202a8d1a7308"><td class="cellrowborder" valign="top" width="25.169999999999998%" headers="mcps1.3.1.3.2.5.1.1 "><p id="mrs_01_1419__a9eac6e3167814b9cb0ae3fa65b87a2e2">dime_3</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="24.83%" headers="mcps1.3.1.3.2.5.1.2 "><p id="mrs_01_1419__ad672eae553d0495a9498e94ab9d2de56">String</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.1.3.2.5.1.3 "><p id="mrs_01_1419__a1647d379c9114accabdde0b0607cee0d">800</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.1.3.2.5.1.4 "><p id="mrs_01_1419__a20db0f448d404404b260a955923d11d3">dimension</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="mrs_01_1419__re78c0e6afbee4c02a6bfb4be4ed8a2d3"><td class="cellrowborder" valign="top" width="25.169999999999998%" headers="mcps1.3.1.3.2.5.1.1 "><p id="mrs_01_1419__a348d0a4e6bd04e7a8780d2e153d6934d">counter_1</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="24.83%" headers="mcps1.3.1.3.2.5.1.2 "><p id="mrs_01_1419__a1e6cb30a4b554c209fb8e8115043d71a">numeric(20,0)</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.1.3.2.5.1.3 "><p id="mrs_01_1419__ac0ef7010312349ef8f7baba4376bf50b">NA</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.1.3.2.5.1.4 "><p id="mrs_01_1419__ab2c0953a1ea94d44a60bb327dbc8caad">measure</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="mrs_01_1419__rd22f4475e37e4159875be7e5ac6164be"><td class="cellrowborder" valign="top" width="25.169999999999998%" headers="mcps1.3.1.3.2.5.1.1 "><p id="mrs_01_1419__a833835c0a4374c779148a77202086de9">...</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="24.83%" headers="mcps1.3.1.3.2.5.1.2 "><p id="mrs_01_1419__a6af56abacdc24edc8cd9f10522285358">...</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.1.3.2.5.1.3 "><p id="mrs_01_1419__a1c218e46d6bd42b7b5537192a6ebd58f">NA</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.1.3.2.5.1.4 "><p id="mrs_01_1419__aeae0038109dc49e8908a7d1ac5f1e525">measure</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="mrs_01_1419__rfc6623a91d574eb1b287c9bfbc79aa75"><td class="cellrowborder" valign="top" width="25.169999999999998%" headers="mcps1.3.1.3.2.5.1.1 "><p id="mrs_01_1419__a8097095bf3404f3e9dd6cd4c1287bebe">counter_100</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="24.83%" headers="mcps1.3.1.3.2.5.1.2 "><p id="mrs_01_1419__ab9ad748162dd45d98c9ce070e1a93dfc">numeric(20,0)</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.1.3.2.5.1.3 "><p id="mrs_01_1419__af01e4b9d6ac24736a921c5bfc4359fb4">NA</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.1.3.2.5.1.4 "><p id="mrs_01_1419__aa8a19f7d06f94bee990f272f71469a57">measure</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="mrs_01_1419__sf72ad57b86234eb6ac4f5bef5f48edd5"><h4 class="sectiontitle">Procedure</h4><ul id="mrs_01_1419__u841234154f00460ba7bdea40baf9d101"><li id="mrs_01_1419__l30d1a1b0c43f4b39bf1164bbc4deb932">If the to-be-created table contains a column that is frequently used for filtering, for example, this column is used in more than 80% of filtering scenarios, <p id="mrs_01_1419__a6089d73174e64da8a51d13da1c47fa76"><a name="mrs_01_1419__l30d1a1b0c43f4b39bf1164bbc4deb932"></a><a name="l30d1a1b0c43f4b39bf1164bbc4deb932"></a>implement optimization as follows: </p>
|
|
<p id="mrs_01_1419__a3d569e79f273442a823f634a38a58eb7">Place this column in the first column of <strong id="mrs_01_1419__b173122315118">sort_columns</strong>. </p>
|
|
<p id="mrs_01_1419__ac9d3093bc2ce4eae9a0b69089bac4c51">For example, if <strong id="mrs_01_1419__b68131527151112">msisdn</strong> is the most frequently used filter criterion in a query, it is placed in the first column. Run the following command to create a table. The query performance is good if <strong id="mrs_01_1419__b27722333782236">msisdn</strong> is used as the filter condition.</p>
|
|
<pre class="screen" id="mrs_01_1419__s006ec0e83ebb432f9bf81c9674021128">create table carbondata_table(
|
|
msisdn String,
|
|
...
|
|
)STORED AS carbondata TBLPROPERTIES ('SORT_COLUMS'='msisdn');</pre>
|
|
</li><li id="mrs_01_1419__l8034a928bd924e0cba8fdbb15d6c701d">If the to-be-created table has multiple columns which are frequently used to filter the results,<p id="mrs_01_1419__a9a111bf0270847f591539d3a8d4aa973"><a name="mrs_01_1419__l8034a928bd924e0cba8fdbb15d6c701d"></a><a name="l8034a928bd924e0cba8fdbb15d6c701d"></a>implement optimization as follows: </p>
|
|
<p id="mrs_01_1419__a4b1eababc9e94ae688757b5aeba02198">Create an index for the columns. </p>
|
|
<p id="mrs_01_1419__aaf5516ceba9b4daab6a67eb556c09804">For example, if <strong id="mrs_01_1419__b18427542171115">msisdn</strong>, <strong id="mrs_01_1419__b219911453114">host</strong>, and <strong id="mrs_01_1419__b1842804710117">dime_1</strong> are frequently used columns, the <strong id="mrs_01_1419__b10800105014113">sort_columns</strong> column sequence is "dime_1-> host-> msisdn..." based on cardinality. Run the following command to create a table. The following command can improve the filtering performance of <strong id="mrs_01_1419__b207917258127">dime_1</strong>, <strong id="mrs_01_1419__b4807152751210">host</strong>, and <strong id="mrs_01_1419__b17641629181220">msisdn</strong>.</p>
|
|
<pre class="screen" id="mrs_01_1419__s92b1f178ca7d4ee08b8d96b6958de2f2">create table carbondata_table(
|
|
dime_1 String,
|
|
host String,
|
|
msisdn String,
|
|
dime_2 String,
|
|
dime_3 String,
|
|
...
|
|
)STORED AS carbondata
|
|
TBLPROPERTIES ('SORT_COLUMS'='dime_1,host,msisdn');</pre>
|
|
</li><li id="mrs_01_1419__l5adb5aed0720414ab300cee7a88c70f0">If the frequency of each column used for filtering is similar, <p id="mrs_01_1419__a877a9da27175421da0f783d778c3d823"><a name="mrs_01_1419__l5adb5aed0720414ab300cee7a88c70f0"></a><a name="l5adb5aed0720414ab300cee7a88c70f0"></a>implement optimization as follows: </p>
|
|
<p id="mrs_01_1419__a0187ca8024ef4b82924507cba5cc9d64"><strong id="mrs_01_1419__b1055710420124">sort_columns</strong> is sorted in ascending order of cardinality.</p>
|
|
<p id="mrs_01_1419__a26c717e81f3e4fc08e76a118688c8a6d">Run the following command to create a table:</p>
|
|
<pre class="screen" id="mrs_01_1419__s70ea3e0c4cf343faa72aa0a4a15c2ba9">create table carbondata_table(
|
|
Dime_1 String,
|
|
BEGIN_TIME bigint,
|
|
HOST String,
|
|
MSISDN String,
|
|
...
|
|
)STORED AS carbondata
|
|
TBLPROPERTIES ('SORT_COLUMS'='dime_2,dime_3,dime_1, BEGIN_TIME,host,msisdn');</pre>
|
|
</li><li id="mrs_01_1419__lac3652faf1c24c90a89c6ba20cafd898">Create tables in ascending order of cardinalities. Then create secondary indexes for columns with more cardinalities. The statement for creating an index is as follows:<pre class="screen" id="mrs_01_1419__s59de055404f74be79f0db7b34a335e07">create index carbondata_table_index_msidn on tablecarbondata_table (
|
|
MSISDN String) as 'carbondata' PROPERTIES <em id="mrs_01_1419__a46d90959a11c4b06b28b1e6b9ccdd977">('table_blocksize'='128')</em>;
|
|
create index carbondata_table_index_host on tablecarbondata_table (
|
|
host String) as 'carbondata' PROPERTIES <em id="mrs_01_1419__aec90d5b2aa014ccf987d75a5653f7114">('table_blocksize'='128')</em>;</pre>
|
|
</li><li id="mrs_01_1419__laf10d572f8574372841868e9878c711e">For columns of measure type, not requiring high accuracy, the numeric (20,0) data type is not required. You are advised to use the double data type to replace the numeric (20,0) data type to enhance query performance.<p id="mrs_01_1419__ad68d6d5bba274f98879c003872ec84ef"><a name="mrs_01_1419__laf10d572f8574372841868e9878c711e"></a><a name="laf10d572f8574372841868e9878c711e"></a>The result of performance analysis of test-case shows reduction in query execution time from 15 to 3 seconds, thereby improving performance by nearly 5 times. The command for creating a table is as follows:</p>
|
|
<pre class="screen" id="mrs_01_1419__s259999755d4a4939ad08e752b220ae97">create table carbondata_table(
|
|
Dime_1 String,
|
|
BEGIN_TIME bigint,
|
|
HOST String,
|
|
MSISDN String,
|
|
counter_1 double,
|
|
counter_2 double,
|
|
...
|
|
counter_100 double,
|
|
)STORED AS carbondata
|
|
;</pre>
|
|
</li><li id="mrs_01_1419__ld83eac8118334e46b574f5268691540e">If values (<strong id="mrs_01_1419__b178446434382236">start_time</strong> for example) of a column are incremental:<p id="mrs_01_1419__a40bf0498198541188a4c979eee290513">For example, if data is loaded to CarbonData every day, <strong id="mrs_01_1419__b196021389882236">start_time</strong> is incremental for each load. In this case, it is recommended that the <strong id="mrs_01_1419__b102851575082236">start_time</strong> column be put at the end of <strong id="mrs_01_1419__b750983582236">sort_columns</strong>, because incremental values are efficient in using min/max index. The command for creating a table is as follows:</p>
|
|
<pre class="screen" id="mrs_01_1419__sb468d2dba7ca433e89ac0ebb63d671d3">create table carbondata_table(
|
|
Dime_1 String,
|
|
HOST String,
|
|
MSISDN String,
|
|
counter_1 double,
|
|
counter_2 double,
|
|
BEGIN_TIME bigint,
|
|
...
|
|
counter_100 double,
|
|
)STORED AS carbondata
|
|
TBLPROPERTIES ( 'SORT_COLUMS'='dime_2,dime_3,dime_1..BEGIN_TIME');</pre>
|
|
</li></ul>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1417.html">CarbonData Performance Tuning</a></div>
|
|
</div>
|
|
</div>
|
|
|