forked from docs/doc-exports
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
109 lines
11 KiB
HTML
109 lines
11 KiB
HTML
<a name="mrs_01_0865"></a><a name="mrs_01_0865"></a>
|
|
|
|
<h1 class="topictitle1">Configuring Yarn Restart</h1>
|
|
<div id="body1590130747745"><div class="section" id="mrs_01_0865__s1113b633d370497992b2e7d82c26dc4c"><h4 class="sectiontitle">Scenario</h4><p id="mrs_01_0865__af13c587bb11f4a64853ba85eb5444b8c">The Yarn Restart feature includes ResourceManager Restart and NodeManager Restart.</p>
|
|
<ul id="mrs_01_0865__u1f7365b2ca404e5ea96c2422d42f87c0"><li id="mrs_01_0865__l674887dae81347dfaf9012b9f0366892">When ResourceManager Restart is enabled, the new active ResourceManager node loads the information of the previous active ResourceManager node, and takes over container status information on all NodeManager nodes to continue service running. In this way, status information can be saved by periodically executing checkpoint operations, avoiding data loss.</li><li id="mrs_01_0865__l6bf26cf5d532487ea6639b3154690ffc">When NodeManager Restart is enabled, NodeManager locally saves information about containers running on the node. After NodeManager is restarted, the container running progress on the node will not be lost by restoring the saved status information.</li></ul>
|
|
</div>
|
|
<div class="section" id="mrs_01_0865__s0dc8a28672d64129bc7824f3962b9525"><h4 class="sectiontitle">Configuration Description</h4><p id="mrs_01_0865__a42d6f74f183b4420b4611455517a5477">Go to the <strong id="mrs_01_0865__b1210435913364">All Configurations</strong> page of Yarn and enter a parameter name in the search box by referring to <a href="mrs_01_2125.html">Modifying Cluster Service Configuration Parameters</a>.</p>
|
|
<p id="mrs_01_0865__a0aa112a1951a40469081dbe635cc7974">Configure ResourceManager Restart as follows:</p>
|
|
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="mrs_01_0865__t63dc48a47e6b4ea1bf25a4d62d445592" frame="border" border="1" rules="all"><caption><b>Table 1 </b>Parameter description of ResourceManager Restart</caption><thead align="left"><tr id="mrs_01_0865__r1e06b0d93be248219a0561f0640ac02d"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.4.2.4.1.1"><p id="mrs_01_0865__a1a2fa123b77240cba9899d94ff5f0da1">Parameter</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="42.634263426342635%" id="mcps1.3.2.4.2.4.1.2"><p id="mrs_01_0865__ad3169d58d8724ad7aadf556c858f7eeb">Description</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="24.032403240324033%" id="mcps1.3.2.4.2.4.1.3"><p id="mrs_01_0865__a1de149d012d64676baf5d9f95e6c160e">Default Value</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="mrs_01_0865__r6e3e39824fee45fe825724b411b67b3b"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.4.2.4.1.1 "><p id="mrs_01_0865__a1fb7a7c1375a431ba9a96d56985b53ee">yarn.resourcemanager.recovery.enabled</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="42.634263426342635%" headers="mcps1.3.2.4.2.4.1.2 "><p id="mrs_01_0865__a066a7bdf09b548e3a3d959af5e5d8714">Whether to enable ResourceManager to restore the status after startup. If this parameter is set to <strong id="mrs_01_0865__b121845325244">true</strong>, <strong id="mrs_01_0865__b43861735112417">yarn.resourcemanager.store.class</strong> must also be set.</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="24.032403240324033%" headers="mcps1.3.2.4.2.4.1.3 "><p id="mrs_01_0865__a1fea492615c741a9b6b9f6be5b47f0d1">true</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="mrs_01_0865__r6e6d66d356194a0284a6c53c4ee873e2"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.4.2.4.1.1 "><p id="mrs_01_0865__a3c542e5a40224d53bd5d41c61ca98458">yarn.resourcemanager.store.class</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="42.634263426342635%" headers="mcps1.3.2.4.2.4.1.2 "><p id="mrs_01_0865__afcd43b293cd54800bd688a18ba267695">State-store class used to store the application and task statuses and certificate content.</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="24.032403240324033%" headers="mcps1.3.2.4.2.4.1.3 "><p id="mrs_01_0865__p94787450916">For clusters of versions earlier than MRS 3.x: <strong id="mrs_01_0865__b1498523774710">org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</strong></p>
|
|
<p id="mrs_01_0865__p1834617579210">For clusters of MRS 3.<em id="mrs_01_0865__i143191154124119">x</em> or later:</p>
|
|
<p id="mrs_01_0865__p101612171431">org.apache.hadoop.yarn.server.resourcemanager.recovery.AsyncZKRMStateStore</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="mrs_01_0865__r52d6d08b63584fa1b56014755d02ff3f"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.4.2.4.1.1 "><p id="mrs_01_0865__ad165de1d97a9406888e78787576c1641"><span id="mrs_01_0865__p479e6c8b5dd544538caa8373be59ec64">yarn.resourcemanager.zk-state-store.parent-path</span></p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="42.634263426342635%" headers="mcps1.3.2.4.2.4.1.2 "><p id="mrs_01_0865__a18cb8b365e914b3ba1548b02bac5311a"><span id="mrs_01_0865__pa807c57354074146ab12dcd4611e0f1f">Directory for storing ZKRMStateStore in ZooKeeper</span></p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="24.032403240324033%" headers="mcps1.3.2.4.2.4.1.3 "><p id="mrs_01_0865__ab0b1e9b568aa49109a26193a7f98f291"><span id="mrs_01_0865__p9730688dd6584dbcb9cae002043d275d">/rmstore</span></p>
|
|
</td>
|
|
</tr>
|
|
<tr id="mrs_01_0865__r97c2e00d20894cd88d7b093654f4b145"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.4.2.4.1.1 "><p id="mrs_01_0865__af5caa2b2013346d2bddcab61a7ae8069">yarn.resourcemanager.work-preserving-recovery.enabled</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="42.634263426342635%" headers="mcps1.3.2.4.2.4.1.2 "><p id="mrs_01_0865__a31dd8d6a22a74591a73e7e07bd622861">Whether to enable ResourceManager work serving. This configuration is used only for Yarn feature verification.</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="24.032403240324033%" headers="mcps1.3.2.4.2.4.1.3 "><p id="mrs_01_0865__a886600f0efef42b091d48c2fd56ff205">true</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="mrs_01_0865__r9c26d269d75244a88b44b1f16b2ad6f3"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.4.2.4.1.1 "><p id="mrs_01_0865__a6e8934a8e7e84c1c9d46e8f43405e44f">yarn.resourcemanager.state-store.async.load</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="42.634263426342635%" headers="mcps1.3.2.4.2.4.1.2 "><p id="mrs_01_0865__a5b764bb1c3ee4007b5960a03ac75e2a3">Whether to apply asynchronous restoration to completed applications.</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="24.032403240324033%" headers="mcps1.3.2.4.2.4.1.3 "><p id="mrs_01_0865__a501a2b2f46a74b42821767dca5bb20b2">For clusters of versions earlier than MRS 3.x: <strong id="mrs_01_0865__b199874713494">false</strong></p>
|
|
<p id="mrs_01_0865__p260413321615">For MRS 3.x or later: <strong id="mrs_01_0865__b951518144427">true</strong></p>
|
|
</td>
|
|
</tr>
|
|
<tr id="mrs_01_0865__rbd93b697e9c4402581c48785738b628c"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.4.2.4.1.1 "><p id="mrs_01_0865__en-us_topic_0039590205_p8637661219">yarn.resourcemanager.zk-state-store.num-fetch-threads</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="42.634263426342635%" headers="mcps1.3.2.4.2.4.1.2 "><p id="mrs_01_0865__af764dbf8c82f46838fc30fe679f4514d">If asynchronous restoration is enabled, increasing the number of working threads can speed up the restoration of task information stored in ZooKeeper. The value must be greater than 0.</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="24.032403240324033%" headers="mcps1.3.2.4.2.4.1.3 "><p id="mrs_01_0865__en-us_topic_0039590205_p300250851219">For clusters of versions earlier than MRS 3.x: <strong id="mrs_01_0865__b1545710334919">1</strong></p>
|
|
<p id="mrs_01_0865__p6717151979">For MRS 3.x or later: <strong id="mrs_01_0865__b11378161494316">20</strong></p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
<p id="mrs_01_0865__a27ea0bd2f0574e9dbcd3b33725cc330b">Configure NodeManager Restart as follows:</p>
|
|
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="mrs_01_0865__te8c202111b434ea5bf63358453308ff1" frame="border" border="1" rules="all"><caption><b>Table 2 </b>Parameter description of NodeManager Restart</caption><thead align="left"><tr id="mrs_01_0865__rfdc2ff20248d46f49a6b595f597d42f5"><th align="left" class="cellrowborder" valign="top" width="33.330000000000005%" id="mcps1.3.2.6.2.4.1.1"><p id="mrs_01_0865__adf6abf8b951847e98701eef7fa8c8b98">Parameter</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="42.36000000000001%" id="mcps1.3.2.6.2.4.1.2"><p id="mrs_01_0865__a79318ea543014d5c81679e9f1a895a4f">Description</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="24.310000000000002%" id="mcps1.3.2.6.2.4.1.3"><p id="mrs_01_0865__ad0d375e89f354d9a8178eea5a974f47c">Default Value</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="mrs_01_0865__r14204378967642a09967f85a2633bd8e"><td class="cellrowborder" valign="top" width="33.330000000000005%" headers="mcps1.3.2.6.2.4.1.1 "><p id="mrs_01_0865__a9c3a328038ce4443a06dbccb677a42c1">yarn.nodemanager.recovery.enabled</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="42.36000000000001%" headers="mcps1.3.2.6.2.4.1.2 "><p id="mrs_01_0865__a975a4bf0fa54477f8dac6241692ec1c3">Whether to enable the function of collecting logs upon a log collection failure when NodeManager is restarted and whether to restore the unfinished application</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="24.310000000000002%" headers="mcps1.3.2.6.2.4.1.3 "><p id="mrs_01_0865__a36bddeb0647f4333b470ab22cbd3d646">true</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="mrs_01_0865__r3912f9a71be241cc8dc12e3493ea7b2c"><td class="cellrowborder" valign="top" width="33.330000000000005%" headers="mcps1.3.2.6.2.4.1.1 "><p id="mrs_01_0865__a3e5f206484b14b0fac157b61dac0dbfc">yarn.nodemanager.recovery.dir</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="42.36000000000001%" headers="mcps1.3.2.6.2.4.1.2 "><p id="mrs_01_0865__a09f7cd58656f4443942da84fdffb3eae">Local directory used by NodeManager to store container status It applies to clusters of MRS 3.<em id="mrs_01_0865__i2945163284411">x</em> or later.</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="24.310000000000002%" headers="mcps1.3.2.6.2.4.1.3 "><p id="mrs_01_0865__p11581619153113">${SRV_HOME}/tmp/yarn-nm-recovery</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="mrs_01_0865__r06d98bd1a2484ed490a2098939a8be83"><td class="cellrowborder" valign="top" width="33.330000000000005%" headers="mcps1.3.2.6.2.4.1.1 "><p id="mrs_01_0865__a7066f2c31e984e048c877c74397c54b2">yarn.nodemanager.recovery.supervised</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="42.36000000000001%" headers="mcps1.3.2.6.2.4.1.2 "><p id="mrs_01_0865__a855106b770034b6bb1d0c29dac9c4cda">Whether NodeManager is monitored. After this parameter is enabled, NodeManager does not clear containers after exit. NodeManager assumes that it will restart and restore containers immediately.</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="24.310000000000002%" headers="mcps1.3.2.6.2.4.1.3 "><p id="mrs_01_0865__aa1b4aba77cc54720a4b2882f7b5a28c3">true</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_0851.html">Using Yarn</a></div>
|
|
</div>
|
|
</div>
|
|
|