Yang, Tong 6182f91ba8 MRS component operation guide_normal 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2022-12-09 14:55:21 +00:00

28 lines
1.9 KiB
HTML

<a name="mrs_01_1979"></a><a name="mrs_01_1979"></a>
<h1 class="topictitle1">Using Broadcast Variables</h1>
<div id="body1595920217126"><div class="section" id="mrs_01_1979__sb97088d914234333a8d96e510644177a"><h4 class="sectiontitle">Scenario</h4><p id="mrs_01_1979__a37bb37bea01e44eea277186caf0708e1">Broadcast distributes data sets to each node. It allows data to be obtained locally when a data set is needed during a Spark task. If broadcast is not used, data serialization will be scheduled to tasks each time when a task requires data sets. It is time-consuming and makes the task get bigger.</p>
<ol id="mrs_01_1979__o87d5553d92f84038b05361be19c61108"><li id="mrs_01_1979__le527b0a832bc46c6b4c107f867f5132e">If a data set will be used by each slice of a task, broadcast the data set to each node.</li><li id="mrs_01_1979__l7d4cf326ebde4a088bfed49fffb356d0">When small and big tables need to be joined, broadcast small tables to each node. This eliminates the shuffle operation, changing the join operation into a common operation.</li></ol>
</div>
<div class="section" id="mrs_01_1979__sfa41963d11614dc6b78d67460e542456"><h4 class="sectiontitle">Procedure</h4><p id="mrs_01_1979__ab6449ebad00148559e3d8cb0e146584a">Add the following code to broadcast the testArr data to each node:</p>
<pre class="screen" id="mrs_01_1979__sc648a0c0d74c4a619e8ef4fa362ddbed">def main(args: Array[String) {
...
val testArr: Array[Long] = new Array[Long](200)
val testBroadcast: Broadcast[Array[Long]] = sc.broadcast(testArr)
val resultRdd: RDD[Long] = inpputRdd.map(input =&gt; handleData(testBroadcast, input))
...
}
def handleData(broadcast: Broadcast[Array[Long]], input: String) {
val value = broadcast.value
...
}</pre>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1975.html">Spark Core Tuning</a></div>
</div>
</div>