Yang, Tong 6182f91ba8 MRS component operation guide_normal 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2022-12-09 14:55:21 +00:00

41 lines
4.2 KiB
HTML

<a name="mrs_01_1589"></a><a name="mrs_01_1589"></a>
<h1 class="topictitle1">Configuring DOP</h1>
<div id="body1596442815780"><div class="section" id="mrs_01_1589__s5e40e617f98542df9a4ab0d1d3b9f990"><h4 class="sectiontitle">Scenario</h4><p id="mrs_01_1589__a648869e4dcf9472199d9473583962161">The degree of parallelism (DOP) indicates the number of tasks to be executed concurrently. It determines the number of data blocks after the operation. Configuring the DOP will optimize the number of tasks, data volume of each task, and the host processing capability.</p>
<p id="mrs_01_1589__a7743c516e6234ef4b76f597ada25a426">Query the CPU and memory usage. If data and tasks are not evenly distributed among nodes, increase the DOP for even distribution. </p>
</div>
<div class="section" id="mrs_01_1589__s29757af776f541e99d6c97df085ae40e"><h4 class="sectiontitle">Procedure</h4><p id="mrs_01_1589__ac69554dd29244fb7950552e94bcdfe7e">Configure the DOP at one of the following layers (the priorities of which are in the descending order) based on the actual memory, CPU, data, and application logic conditions:</p>
<ul id="mrs_01_1589__uf9cabb09a24c40be9df9ee02dc57da64"><li id="mrs_01_1589__l6a9599cdcda8477caaa601bad4655d05">Operator<div class="p" id="mrs_01_1589__ac493f9c75bdb49a7ae0af7aed36df9b2"><a name="mrs_01_1589__l6a9599cdcda8477caaa601bad4655d05"></a><a name="l6a9599cdcda8477caaa601bad4655d05"></a>Call the <strong id="mrs_01_1589__b111164662210346">setParallelism()</strong> method to specify the DOP of an operator, data source, and sink. For example:<pre class="screen" id="mrs_01_1589__s120bc9b314cb4573913fe11f1c044281">final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream&lt;String&gt; text = [...]
DataStream&lt;Tuple2&lt;String, Integer&gt;&gt; wordCounts = text
.flatMap(new LineSplitter())
.keyBy(0)
.timeWindow(Time.seconds(5))
.sum(1).setParallelism(5);
wordCounts.print();
env.execute("Word Count Example");</pre>
</div>
</li><li id="mrs_01_1589__l1674d5b846004c9f963bc287e7d5661f">Execution environment<p id="mrs_01_1589__a56c8fa16838c48b88c68fc44c64b5323"><a name="mrs_01_1589__l1674d5b846004c9f963bc287e7d5661f"></a><a name="l1674d5b846004c9f963bc287e7d5661f"></a>Flink runs in the execution environment which defines a default DOP for operators, data source and data sink.</p>
<p id="mrs_01_1589__a8a64bef094734c168d47ed55d9489e07">Call the <strong id="mrs_01_1589__b186995584910346">setParallelism()</strong> method to specify the default DOP of the execution environment. Example:</p>
<pre class="screen" id="mrs_01_1589__scfdd722aba884d4c8701a6b1c86d979a">final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(3);
DataStream&lt;String&gt; text = [...]
DataStream&lt;Tuple2&lt;String, Integer&gt;&gt; wordCounts = [...]
wordCounts.print();
env.execute("Word Count Example");</pre>
</li><li id="mrs_01_1589__lb6deda212d1f4a6eb43b4d52e1bd2411">Client<div class="p" id="mrs_01_1589__a81e348fa6c3f44fe81ccd5eac0841ab4"><a name="mrs_01_1589__lb6deda212d1f4a6eb43b4d52e1bd2411"></a><a name="lb6deda212d1f4a6eb43b4d52e1bd2411"></a>Specify the DOP when submitting jobs to Flink on the client. If you use the CLI client, specify the DOP using the <strong id="mrs_01_1589__b18173877410346">-p</strong> parameter. Example:<pre class="screen" id="mrs_01_1589__s6c05c5528e884678928761ea99506ff2">./bin/flink run -p 10 ../examples/*WordCount-java*.jar</pre>
</div>
</li><li id="mrs_01_1589__l49568d1a77ab470ca44e72fb8ee48165">System<p id="mrs_01_1589__a420da1b80baa436abf044ea6acf35496"><a name="mrs_01_1589__l49568d1a77ab470ca44e72fb8ee48165"></a><a name="l49568d1a77ab470ca44e72fb8ee48165"></a>On the Flink client, modify the <span class="parmname" id="mrs_01_1589__p3df71d3f8ec8464d899efc4a0adaec02"><b>parallelism.default</b></span> parameter in the <span class="filepath" id="mrs_01_1589__fa6cae941d14e416eb70369b9f8176e4f"><b>flink-conf.yaml</b></span> file under the conf to specify the DOP for all execution environments.</p>
</li></ul>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1587.html">Optimization DataStream</a></div>
</div>
</div>