Yang, Tong 6182f91ba8 MRS component operation guide_normal 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2022-12-09 14:55:21 +00:00

26 lines
1.7 KiB
HTML

<a name="mrs_01_2059"></a><a name="mrs_01_2059"></a>
<h1 class="topictitle1">Why Does Spark2x Fail to Export a Table with the Same Field Name?</h1>
<div id="body1595920225551"><div class="section" id="mrs_01_2059__s681957a75fbc4d1e9b081b339ce06c43"><h4 class="sectiontitle">Question</h4><p id="mrs_01_2059__af5e3908fc849431da39af60dd875535f">The following code fails to be executed on spark-shell of Spark2x:</p>
<pre class="screen" id="mrs_01_2059__s43259df691794d40a593796ed5028b91">val acctId = List(("49562", "Amal", "Derry"), ("00000", "Fred", "Xanadu"))
val rddLeft = sc.makeRDD(acctId)
val dfLeft = rddLeft.toDF("Id", "Name", "City")
//dfLeft.show
val acctCustId = List(("Amal", "49562", "CO"), ("Dave", "99999", "ZZ"))
val rddRight = sc.makeRDD(acctCustId)
val dfRight = rddRight.toDF("Name", "CustId", "State")
//dfRight.show
val dfJoin = dfLeft.join(dfRight, dfLeft("Id") === dfRight("CustId"), "outer")
dfJoin.show
dfJoin.repartition(1).write.format("com.databricks.spark.csv").option("delimiter", "\t").option("header", "true").option("treatEmptyValuesAsNulls", "true").option("nullValue", "").save("/tmp/outputDir") </pre>
</div>
<div class="section" id="mrs_01_2059__s2ddd7f66701f4aad8f2206b93656380d"><h4 class="sectiontitle">Answer</h4><p id="mrs_01_2059__a31aff00c93464ae0b936d9f12bfb8530">In Spark2x, the duplicate field name of the <strong id="mrs_01_2059__b16606902298476">join</strong> statement is checked. You need to modify the code to ensure that no duplicate field exists in the saved data.</p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_2002.html">Common Issues About Spark2x</a></div>
</div>
</div>