Yang, Tong 48706b7552 MRS COMP-LTS 320-lts.1 version
Reviewed-by: Kacur, Michal <michal.kacur@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2024-04-12 12:51:10 +00:00

39 lines
4.8 KiB
HTML

<a name="mrs_01_24779"></a><a name="mrs_01_24779"></a>
<h1 class="topictitle1">Flink Restart Policy</h1>
<div id="body0000001583371905"><div class="section" id="mrs_01_24779__section18347182024813"><h4 class="sectiontitle">Overview</h4><p id="mrs_01_24779__p1731751632613">Flink supports different restart policies to control whether and how to restart a job when a fault occurs. If no restart policy is specified, the cluster uses the default restart policy. You can also specify a restart policy when submitting a job. For details about how to configure such a policy on the job development page of MRS 3.1.0 or later, see <a href="mrs_01_24024.html">Managing Jobs on the Flink Web UI</a>.</p>
<p id="mrs_01_24779__p1781191692811">The restart policy can be specified by configuring the <strong id="mrs_01_24779__b81096575193">restart-strategy</strong> parameter in the Flink configuration file <em id="mrs_01_24779__i167281730429">Client installation directory</em><strong id="mrs_01_24779__b454018361028">/Flink/flink/conf/flink-conf.yaml</strong> or can be dynamically specified in the application code. The configuration takes effect globally. Restart policies include <strong id="mrs_01_24779__b1760283710282">failure-rate</strong> and the following two default policies:</p>
<ul id="mrs_01_24779__ul104721020202614"><li id="mrs_01_24779__li157084411239"><strong id="mrs_01_24779__b8198848145314">No restart</strong>: If CheckPoint is not enabled, this policy is used by default.</li><li id="mrs_01_24779__li4708174132313"><strong id="mrs_01_24779__b3306620547">Fixed-delay</strong>: If CheckPoint is enabled but no restart policy is configured, this policy is used by default.</li></ul>
</div>
<div class="section" id="mrs_01_24779__section42423815314"><h4 class="sectiontitle">No restart Policy</h4><p id="mrs_01_24779__p59141828978">When a fault occurs, the job fails and does not attempt to restart.</p>
<p id="mrs_01_24779__p14661756315">Configure the parameter as follows:</p>
<pre class="screen" id="mrs_01_24779__screen1919121183112">restart-strategy: none</pre>
</div>
<div class="section" id="mrs_01_24779__section75816543540"><h4 class="sectiontitle">fixed-delay Policy</h4><p id="mrs_01_24779__p1117016574158">When a fault occurs, the job attempts to restart for a fixed number of times. If the number of attempts exceeds the times you specified, the job fails. The restart policy waits for a fixed period of time between two consecutive restart attempts.</p>
<p id="mrs_01_24779__p1866314312318">In the following example, a job fails if the job attempts to restart for three times at an interval of 10 seconds. Configure the parameters as follows:</p>
<pre class="screen" id="mrs_01_24779__screen1666374363118">restart-strategy: fixed-delay
restart-strategy.fixed-delay.attempts: 3
restart-strategy.fixed-delay.delay: 10 s</pre>
</div>
<div class="section" id="mrs_01_24779__section142111230153810"><h4 class="sectiontitle">failure-rate Policy</h4><p id="mrs_01_24779__p118191193306">When a job fails, the job restarts directly. If the failure rate exceeds the value you configured, the job is considered as failed. The restart policy waits for a fixed period of time between two consecutive restart attempts.</p>
<p id="mrs_01_24779__p5906164720335">In the following example, a job is considered as failed if the job attempts to restart for three times at an interval of 10 minutes. Configure the parameters as follows:</p>
<pre class="screen" id="mrs_01_24779__screen886214259334">restart-strategy: failure-rate
restart-strategy.failure-rate.max-failures-per-interval: 3
restart-strategy.failure-rate.failure-rate-interval: 10 min
restart-strategy.failure-rate.delay: 10 s</pre>
</div>
<div class="section" id="mrs_01_24779__section146831139193319"><h4 class="sectiontitle">Choosing a Restart Policy</h4><ul id="mrs_01_24779__ul176311182427"><li id="mrs_01_24779__li87638185423">If you do not want to retry a failed job, select the <strong id="mrs_01_24779__b9177848201215">No restart</strong> policy.</li><li id="mrs_01_24779__li1623182114217">To retry a failed job, select the <strong id="mrs_01_24779__b149951671311">failure-rate</strong> policy. If the fixed-delay policy is used, the number of job failures may reach the maximum number of retries due to hardware faults such as network and memory faults. As a result, the job fails.<p id="mrs_01_24779__p6768120203416">To prevent repeated restarts when the failure-rate policy is used, configure parameters as follows:</p>
<pre class="screen" id="mrs_01_24779__screen7354175410337">restart-strategy: failure-rate
restart-strategy.failure-rate.max-failures-per-interval: 3
restart-strategy.failure-rate.failure-rate-interval: 10 min
restart-strategy.failure-rate.delay: 10 s</pre>
</li></ul>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_0591.html">Using Flink</a></div>
</div>
</div>