Yang, Tong 48706b7552 MRS COMP-LTS 320-lts.1 version
Reviewed-by: Kacur, Michal <michal.kacur@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2024-04-12 12:51:10 +00:00

106 lines
12 KiB
HTML

<a name="mrs_01_1403"></a><a name="mrs_01_1403"></a>
<h1 class="topictitle1">Main Specifications of CarbonData</h1>
<div id="body8662426"><div class="section" id="mrs_01_1403__en-us_topic_0000001173630772_sb12ac8bd4a094e878a936b41967fe38b"><h4 class="sectiontitle">Main Specifications of CarbonData</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="mrs_01_1403__en-us_topic_0000001173630772_tb53bc1bc349b49e3a726250c59694f09" frame="border" border="1" rules="all"><caption><b>Table 1 </b>Main Specifications of CarbonData</caption><thead align="left"><tr id="mrs_01_1403__en-us_topic_0000001173630772_reb06b2ef11674632a2831d62319db641"><th align="left" class="cellrowborder" valign="top" width="33.33%" id="mcps1.3.1.2.2.4.1.1"><p id="mrs_01_1403__en-us_topic_0000001173630772_a110eaef12c924f138d1dd02b901d46b7">Entity</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="14.91%" id="mcps1.3.1.2.2.4.1.2"><p id="mrs_01_1403__en-us_topic_0000001173630772_ab2332187a5194515829b519f57d53f39">Tested Value</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="51.76%" id="mcps1.3.1.2.2.4.1.3"><p id="mrs_01_1403__en-us_topic_0000001173630772_a3ac24084c2e94e3bb73c06224309fe26">Test Environment</p>
</th>
</tr>
</thead>
<tbody><tr id="mrs_01_1403__en-us_topic_0000001173630772_rfee27f5af15647598d3b3dff26b380b1"><td class="cellrowborder" valign="top" width="33.33%" headers="mcps1.3.1.2.2.4.1.1 "><p id="mrs_01_1403__en-us_topic_0000001173630772_a351bedf14f0d4bb29a50c2229c040da0">Number of tables</p>
</td>
<td class="cellrowborder" valign="top" width="14.91%" headers="mcps1.3.1.2.2.4.1.2 "><p id="mrs_01_1403__en-us_topic_0000001173630772_acf4a6be8278c4f5db162ab6bd0996ad5">10000</p>
</td>
<td class="cellrowborder" valign="top" width="51.76%" headers="mcps1.3.1.2.2.4.1.3 "><p id="mrs_01_1403__en-us_topic_0000001173630772_p1746322016217">3 nodes. 4 vCPUs and 20 GB memory for each executor. Driver memory: 5 GB, 3 executors.</p>
<p id="mrs_01_1403__en-us_topic_0000001173630772_p174641320624">Total columns: 107</p>
<p id="mrs_01_1403__en-us_topic_0000001173630772_p1646482011214">String: 75</p>
<p id="mrs_01_1403__en-us_topic_0000001173630772_p1246472018213">Int: 13</p>
<p id="mrs_01_1403__en-us_topic_0000001173630772_p1546418209214">BigInt: 7</p>
<p id="mrs_01_1403__en-us_topic_0000001173630772_p34642201326">Timestamp: 6</p>
<p id="mrs_01_1403__en-us_topic_0000001173630772_p184643201425">Double: 6</p>
</td>
</tr>
<tr id="mrs_01_1403__en-us_topic_0000001173630772_rf38ac11c039b402e9e8a7d175f9745ce"><td class="cellrowborder" valign="top" width="33.33%" headers="mcps1.3.1.2.2.4.1.1 "><p id="mrs_01_1403__en-us_topic_0000001173630772_a45742fb327ec4fe6981ec9a2e680e43e">Number of table columns</p>
</td>
<td class="cellrowborder" valign="top" width="14.91%" headers="mcps1.3.1.2.2.4.1.2 "><p id="mrs_01_1403__en-us_topic_0000001173630772_a6d5734075a3647f5b287d944a418a302">2000</p>
</td>
<td class="cellrowborder" valign="top" width="51.76%" headers="mcps1.3.1.2.2.4.1.3 "><p id="mrs_01_1403__en-us_topic_0000001173630772_a357be32b47b94829b3ee76f418a7f415">3 nodes. 4 vCPUs and 20 GB memory for each executor. Driver memory: 5 GB, 3 executors.</p>
</td>
</tr>
<tr id="mrs_01_1403__en-us_topic_0000001173630772_r3bfa980bf885421f92fba66affaa98f2"><td class="cellrowborder" valign="top" width="33.33%" headers="mcps1.3.1.2.2.4.1.1 "><p id="mrs_01_1403__en-us_topic_0000001173630772_ab5d548f272124282980a0b2ee0121537">Maximum size of a raw CSV file</p>
</td>
<td class="cellrowborder" valign="top" width="14.91%" headers="mcps1.3.1.2.2.4.1.2 "><p id="mrs_01_1403__en-us_topic_0000001173630772_a38611ce906a24ace878d838c8286878e">200GB</p>
</td>
<td class="cellrowborder" valign="top" width="51.76%" headers="mcps1.3.1.2.2.4.1.3 "><p id="mrs_01_1403__en-us_topic_0000001173630772_a177bf50adcb0481b9e3937b5dc4ca423">17 cluster nodes. 150 GB memory and 25 vCPUs for each executor. Driver memory: 10 GB, 17 executors.</p>
</td>
</tr>
<tr id="mrs_01_1403__en-us_topic_0000001173630772_r2981e820bc164d248eb1ee85c8006fc5"><td class="cellrowborder" valign="top" width="33.33%" headers="mcps1.3.1.2.2.4.1.1 "><p id="mrs_01_1403__en-us_topic_0000001173630772_ae3c676aa3b75465ebd27a8d13975cddc">Number of CSV files in each folder</p>
</td>
<td class="cellrowborder" valign="top" width="14.91%" headers="mcps1.3.1.2.2.4.1.2 "><p id="mrs_01_1403__en-us_topic_0000001173630772_p16304259734">100 folders. Each folder has 10 files. The size of each file is 50 MB.</p>
</td>
<td class="cellrowborder" valign="top" width="51.76%" headers="mcps1.3.1.2.2.4.1.3 "><p id="mrs_01_1403__en-us_topic_0000001173630772_a9471bd61e6624496afb954899cce269e">3 nodes. 4 vCPUs and 20 GB memory for each executor. Driver memory: 5 GB, 3 executors.</p>
</td>
</tr>
<tr id="mrs_01_1403__en-us_topic_0000001173630772_rb75709a11c04438490567be8c0120a64"><td class="cellrowborder" valign="top" width="33.33%" headers="mcps1.3.1.2.2.4.1.1 "><p id="mrs_01_1403__en-us_topic_0000001173630772_a1de0ae6807f64b1bbc19b7d6447941d1">Number of load folders</p>
</td>
<td class="cellrowborder" valign="top" width="14.91%" headers="mcps1.3.1.2.2.4.1.2 "><p id="mrs_01_1403__en-us_topic_0000001173630772_abc246697a0214e05a2c3e54dd8c0a497">10000</p>
</td>
<td class="cellrowborder" valign="top" width="51.76%" headers="mcps1.3.1.2.2.4.1.3 "><p id="mrs_01_1403__en-us_topic_0000001173630772_ae22a4946563f4e798e06adae7cd630bf">3 nodes. 4 vCPUs and 20 GB memory for each executor. Driver memory: 5 GB, 3 executors.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<p id="mrs_01_1403__en-us_topic_0000001173630772_a079415aa624d47ac808442f0b99711e2">The memory required for data loading depends on the following factors:</p>
<ul id="mrs_01_1403__en-us_topic_0000001173630772_u701965767b65456d80487030699fcb38"><li id="mrs_01_1403__en-us_topic_0000001173630772_l2af0f49f74d84923918bda3049fffca5">Number of columns</li><li id="mrs_01_1403__en-us_topic_0000001173630772_l0b2ad68beb264c75bcae10118a2fc404">Column values</li><li id="mrs_01_1403__en-us_topic_0000001173630772_laf054cede7604ac0b4cbe9c0efa3727b">Concurrency (configured using <strong id="mrs_01_1403__en-us_topic_0000001173630772_b10707073428643">carbon.number.of.cores.while.loading</strong>)</li><li id="mrs_01_1403__en-us_topic_0000001173630772_lc2a18b3175694f13b281c59891adaf9c">Sort size in memory (configured using <strong id="mrs_01_1403__en-us_topic_0000001173630772_b15206796708643">carbon.sort.size</strong>)</li><li id="mrs_01_1403__en-us_topic_0000001173630772_lca40006db1bc48ceae6c6cb97b74644c">Intermediate cache (configured using <strong id="mrs_01_1403__en-us_topic_0000001173630772_b12249244176">carbon.graph.rowset.size</strong>)</li></ul>
<p id="mrs_01_1403__en-us_topic_0000001173630772_p9582339121819">Data loading of an 8 GB CSV file that contains 10 million records and 300 columns with each row size being about 0.8 KB requires about 10 GB executor memory. That is, set <strong id="mrs_01_1403__en-us_topic_0000001173630772_b11997260528643">carbon.sort.size</strong> to <strong id="mrs_01_1403__en-us_topic_0000001173630772_b5516141648643">100000</strong> and retain the default values for other parameters.</p>
<div class="section" id="mrs_01_1403__en-us_topic_0000001173630772_sb2e16a0c638f471e9f80554fd529674b"><h4 class="sectiontitle">Table Specifications</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="mrs_01_1403__en-us_topic_0000001173630772_t0777d29e5b424bbc89d8fde9b9d849cd" frame="border" border="1" rules="all"><caption><b>Table 2 </b>Table specifications</caption><thead align="left"><tr id="mrs_01_1403__en-us_topic_0000001173630772_r6bf74437b2ac4e8e8aa7906162a867e6"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.5.2.2.3.1.1"><p id="mrs_01_1403__en-us_topic_0000001173630772_af9f487ac78f941838c7c8685c58740f5">Entity</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.5.2.2.3.1.2"><p id="mrs_01_1403__en-us_topic_0000001173630772_a20202c3f57e840078348032b24224350">Tested Value</p>
</th>
</tr>
</thead>
<tbody><tr id="mrs_01_1403__en-us_topic_0000001173630772_r4717992167244de48aef28c5e77dc5d7"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.2.2.3.1.1 "><p id="mrs_01_1403__en-us_topic_0000001173630772_aacede83e3201423e9f4fd8094a371a92">Number of secondary index tables</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.2.2.3.1.2 "><p id="mrs_01_1403__en-us_topic_0000001173630772_a8129eca2c3854b95acbeb5a3cd6948da">10</p>
</td>
</tr>
<tr id="mrs_01_1403__en-us_topic_0000001173630772_r588d6001fe454537994155c6c90a327d"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.2.2.3.1.1 "><p id="mrs_01_1403__en-us_topic_0000001173630772_a7b49f2dc5c27429fb20fa7f7ddd49622">Number of composite columns in a secondary index table</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.2.2.3.1.2 "><p id="mrs_01_1403__en-us_topic_0000001173630772_a3ff1d1c6736c4bf8929ef554df714188">5</p>
</td>
</tr>
<tr id="mrs_01_1403__en-us_topic_0000001173630772_r40ad0e7bc03c46baae4de9d886d5c384"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.2.2.3.1.1 "><p id="mrs_01_1403__en-us_topic_0000001173630772_ad4b3c8e991e0487896157e303a4b7754">Length of column name in a secondary index table (unit: character)</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.2.2.3.1.2 "><p id="mrs_01_1403__en-us_topic_0000001173630772_a1dc5bc7092bf4668860aaf614366eabf">120</p>
</td>
</tr>
<tr id="mrs_01_1403__en-us_topic_0000001173630772_r546bf6ba11154544991323ad50552710"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.2.2.3.1.1 "><p id="mrs_01_1403__en-us_topic_0000001173630772_a86bccfdf496b43979a4e8acd9fdb9dd0">Length of a secondary index table name (unit: character)</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.2.2.3.1.2 "><p id="mrs_01_1403__en-us_topic_0000001173630772_a0d4463ef317649f2ada13dc69f734a9e">120</p>
</td>
</tr>
<tr id="mrs_01_1403__en-us_topic_0000001173630772_r218e1fa491f24d0db5682ed010f92f5e"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.2.2.3.1.1 "><p id="mrs_01_1403__en-us_topic_0000001173630772_a6e9d73f13d0f44c9ad77489981b0a5ae">Cumulative length of all secondary index table names + column names in an index table* (unit: character)</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.2.2.3.1.2 "><p id="mrs_01_1403__en-us_topic_0000001173630772_a02abcad2dc6346c2849554a68a967f10">3800**</p>
</td>
</tr>
</tbody>
</table>
</div>
<div class="note" id="mrs_01_1403__en-us_topic_0000001173630772_n499441d66f9441e49a4602fe8fb237cb"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><ul id="mrs_01_1403__en-us_topic_0000001173630772_ua77b2cb32d9b46c6b9bb14fa7ba69b80"><li id="mrs_01_1403__en-us_topic_0000001173630772_lc8bfc589d4c84e1fb407404bedc9684f">* Characters of column names in an index table refers to the upper limit allowed by Hive or the upper limit of available resources.</li><li id="mrs_01_1403__en-us_topic_0000001173630772_l429624ca8db6447cab0d94c5d206a24c">** Secondary index tables are registered using Hive and stored in HiveSERDEPROPERTIES in JSON format. The value of <strong id="mrs_01_1403__en-us_topic_0000001173630772_b21428846748643">SERDEPROPERTIES</strong> supported by Hive can contain a maximum of 4,000 characters and cannot be changed.</li></ul>
</div></div>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1401.html">Spark CarbonData Overview</a></div>
</div>
</div>