forked from docs/doc-exports
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
46 lines
5.2 KiB
HTML
46 lines
5.2 KiB
HTML
<a name="mrs_01_2009"></a><a name="mrs_01_2009"></a>
|
|
|
|
<h1 class="topictitle1">What Can I Do If the getApplicationReport Exception Is Recorded in Logs During Spark Application Execution and the Application Does Not Exit for a Long Time?</h1>
|
|
<div id="body1595920219448"><div class="section" id="mrs_01_2009__s40c3a8c4bb3747e7a93ab07570f89f96"><h4 class="sectiontitle">Question</h4><p id="mrs_01_2009__ae38437c778cd41fe94b7bf5631eb8009">During Spark application execution, if the driver fails to connect to ResourceManager, the following error is reported and it does not exit for a long time. What can I do?</p>
|
|
<pre class="screen" id="mrs_01_2009__sf43ccab3f0ce48b2aa0bca8adcaa3915">16/04/23 15:31:44 INFO RetryInvocationHandler: Exception while invoking getApplicationReport of class ApplicationClientProtocolPBClientImpl over 37 after 1 fail over attempts. Trying to fail over after sleeping for 44160ms.
|
|
java.net.ConnectException: Call From vm1/192.168.39.30 to vm1:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused</pre>
|
|
</div>
|
|
<div class="section" id="mrs_01_2009__s7051ec0a8ba24803b60f25b8a064f102"><h4 class="sectiontitle">Answer</h4><p id="mrs_01_2009__a8105aa8d838445d5be76b774588057a7">In Spark, there is a scheduled thread that listens to the status of ApplicationMaster by connecting to ResourceManager. The connection to the ResourceManager times out. As a result, the preceding error is reported and the system keeps trying to connect to the ResourceManager. In the ResourceManager, the number of retry times is limited. By default, the number of retry times is 30 and the retry interval is about 30 seconds. The preceding error is reported during each retry. The driver exits only after the number of times is exceeded.</p>
|
|
<p id="mrs_01_2009__a40d91f470adb48c7adb01e2b76375e6c"><a href="#mrs_01_2009__t76cbd5573d354cb7846083dd9e85be25">Table 1</a> describes the retry-related configuration items in the ResourceManager.</p>
|
|
|
|
<div class="tablenoborder"><a name="mrs_01_2009__t76cbd5573d354cb7846083dd9e85be25"></a><a name="t76cbd5573d354cb7846083dd9e85be25"></a><table cellpadding="4" cellspacing="0" summary="" id="mrs_01_2009__t76cbd5573d354cb7846083dd9e85be25" frame="border" border="1" rules="all"><caption><b>Table 1 </b>Parameter description</caption><thead align="left"><tr id="mrs_01_2009__rbcf797f7d39948f888bd1e22d134b749"><th align="left" class="cellrowborder" valign="top" width="45.93000000000001%" id="mcps1.3.2.4.2.4.1.1"><p id="mrs_01_2009__aafaf1269e71242af8c4c9358abb2af77">Parameter</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="35.400000000000006%" id="mcps1.3.2.4.2.4.1.2"><p id="mrs_01_2009__a64224a0409744863b43ab81b806b9166">Description</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="18.67%" id="mcps1.3.2.4.2.4.1.3"><p id="mrs_01_2009__a4c65e4b07cbc4388983bf69505c78248">Default Value</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="mrs_01_2009__r6abdd435dd04493ab8a0a7232c34cfb3"><td class="cellrowborder" valign="top" width="45.93000000000001%" headers="mcps1.3.2.4.2.4.1.1 "><p id="mrs_01_2009__a799510384f4d43289328ff85f8631f6d">yarn.resourcemanager.connect.max-wait.ms</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="35.400000000000006%" headers="mcps1.3.2.4.2.4.1.2 "><p id="mrs_01_2009__a0485113be6f44bc2967adbe723088c4d">Maximum waiting time for connecting to the ResourceManager.</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="18.67%" headers="mcps1.3.2.4.2.4.1.3 "><p id="mrs_01_2009__ad7ec53d630fb44ea8790c688b52fe1dd">900000</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="mrs_01_2009__rcf2bfd18f6c54f749f69f64e8ff218fb"><td class="cellrowborder" valign="top" width="45.93000000000001%" headers="mcps1.3.2.4.2.4.1.1 "><p id="mrs_01_2009__a65aeb2a8947e46b7b16be8a811cc6d10">yarn.resourcemanager.connect.retry-interval.ms</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="35.400000000000006%" headers="mcps1.3.2.4.2.4.1.2 "><p id="mrs_01_2009__ad8170cc21abf40c4bde38c4d3dbe8302">Interval for reconnecting to the ResourceManager.</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="18.67%" headers="mcps1.3.2.4.2.4.1.3 "><p id="mrs_01_2009__a97652d6270e64feca770cca75e15109f">30000</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
<p id="mrs_01_2009__a8c3b79d66a6b46b79aba7cdbaa3592d8">Number of retries (<strong id="mrs_01_2009__b15169828769360">yarn.resourcemanager.connect.max-wait.ms/yarn.resourcemanager.connect.retry-interval.ms</strong>) = Maximum waiting time for connecting to the ResourceManager/Interval for reconnecting to the ResourceManager</p>
|
|
<p id="mrs_01_2009__ad64996f732a949ebafd0f6bf395f30d2">On the Spark client, modify the <span class="filepath" id="mrs_01_2009__filepath16416203409360"><b>conf/yarn-site.xml</b></span> file to add and configure <span class="parmname" id="mrs_01_2009__parmname15492182089360"><b>yarn.resourcemanager.connect.max-wait.ms</b></span> and <span class="parmname" id="mrs_01_2009__parmname12496462759360"><b>yarn.resourcemanager.connect.retry-interval.ms</b></span>. In this way, the number of retry times can be changed, and the Spark application can exit in advance.</p>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_2003.html">Spark Core</a></div>
|
|
</div>
|
|
</div>
|
|
|