doc-exports/docs/mrs/component-operation-guide/mrs_01_1465.html

<a name="mrs_01_1465"></a><a name="mrs_01_1465"></a>

<h1 class="topictitle1">Why Does CarbonData Require Additional Executors Even Though the Parallelism Is Greater Than the Number of Blocks to Be Processed?</h1>
<div id="body1595920215969"><div class="section" id="mrs_01_1465__s1ecbf0ee302649289b27c11bbd623865"><h4 class="sectiontitle">Question</h4><p id="mrs_01_1465__a8863d1ff72b643d6bf72a02ebbf2f5c5">Why does CarbonData require additional executors even though the parallelism is greater than the number of blocks to be processed?</p>
</div>
<div class="section" id="mrs_01_1465__sbeb71b380603446d80adc26913320477"><h4 class="sectiontitle">Answer</h4><p id="mrs_01_1465__aa87402294c1c4419994e7d06e871e7ff">CarbonData block distribution optimizes data processing as follows:</p>
<ol id="mrs_01_1465__offe7b031e82a4577a436ab9aa5559c9c"><li id="mrs_01_1465__lb52aa5863ede45b3bff86099d8e0d86b">Optimize data processing parallelism.</li><li id="mrs_01_1465__l848e1b99cf744f148181cbfba67ad568">Optimize parallel reading of block data.</li></ol>
</div>
<p id="mrs_01_1465__ad2573cd92e6f4ab0b33ef25eae2ea7a9">To optimize parallel processing and parallel read, CarbonData requests executors based on the locality of blocks so that it can obtain executors on all nodes.</p>
<p id="mrs_01_1465__a18f29898bc5f445287f370583bfe86b5">If you are using dynamic allocation, you need to configure the following properties:</p>
<ol id="mrs_01_1465__o4dc6c4f55b1e4aea87f176d7608647c4"><li id="mrs_01_1465__ld94ca10f09d746b0b94e4e6717f86517">Set <span class="parmname" id="mrs_01_1465__p6aaf453793884c468bcd9484380c9553"><b>spark.dynamicAllocation.executorIdleTimeout</b></span> to 15 minutes (or the average query time).</li><li id="mrs_01_1465__lf281e981503941ceb89ae8f759a5bfff">Set <strong id="mrs_01_1465__b4802957171814">spark.dynamicAllocation.maxExecutors</strong> correctly. The default value <strong id="mrs_01_1465__b198032579185">2048</strong> is not recommended. Otherwise, CarbonData will request the maximum number of executors.</li><li id="mrs_01_1465__l2fa9aa586d7f4eff854d1a0263dac92c">For a bigger cluster, set <strong id="mrs_01_1465__b107891810111912">carbon.dynamicAllocation.schedulerTimeout</strong> to a value ranging from 10 to 15 seconds. The default value is 5 seconds.</li><li id="mrs_01_1465__l9749b19fd0c34d3a8ce13737c0db4159">Set <strong id="mrs_01_1465__b4746182731910">carbon.scheduler.minRegisteredResourcesRatio</strong> to a value ranging from 0.1 to 1.0. The default value is <strong id="mrs_01_1465__b674762761912">0.8</strong>. Block distribution can be started as long as the value of <strong id="mrs_01_1465__b13983114141911">carbon.scheduler.minRegisteredResourcesRatio</strong> is within the range.</li></ol>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1457.html">CarbonData FAQ</a></div>
</div>
</div>