Files
doc-exports/docs/dli/dev/dli_09_0087.html
Hasko, Vladimir cfc48b3aed dli_dev_0104_version
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com>
Co-authored-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-committed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
2024-05-06 09:14:57 +00:00

299 lines
31 KiB
HTML

<a name="dli_09_0087"></a><a name="dli_09_0087"></a>
<h1 class="topictitle1">PySpark Example Code</h1>
<div id="body8662426"><div class="section" id="dli_09_0087__section1523149131718"><h4 class="sectiontitle">Scenario</h4><p id="dli_09_0087__p98722576293">This section provides PySpark example code that demonstrates how to use a Spark job to access data from the GaussDB(DWS) data source.</p>
<p id="dli_09_0087__en-us_topic_0200509991_p1944354710257">A datasource connection has been created and bound to a queue on the DLI management console. </p>
<div class="note" id="dli_09_0087__note17925192652815"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="dli_09_0087__p692572617287">Hard-coded or plaintext passwords pose significant security risks. To ensure security, encrypt your passwords, store them in configuration files or environment variables, and decrypt them when needed.</p>
</div></div>
</div>
<div class="section" id="dli_09_0087__section25352219459"><h4 class="sectiontitle">Preparations</h4><ol id="dli_09_0087__en-us_topic_0197738139_ol12123050181818"><li id="dli_09_0087__en-us_topic_0197738139_li1612316509182">Import dependency packages.<div class="codecoloring" codetype="Python" id="dli_09_0087__en-us_topic_0197738139_screen195374592114"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span></pre></div></td><td class="code"><div><pre><span></span><span class="kn">from</span> <span class="nn">__future__</span> <span class="kn">import</span> <span class="n">print_function</span>
<span class="kn">from</span> <span class="nn">pyspark.sql.types</span> <span class="kn">import</span> <span class="n">StructType</span><span class="p">,</span> <span class="n">StructField</span><span class="p">,</span> <span class="n">IntegerType</span><span class="p">,</span> <span class="n">StringType</span>
<span class="kn">from</span> <span class="nn">pyspark.sql</span> <span class="kn">import</span> <span class="n">SparkSession</span>
</pre></div></td></tr></table></div>
</div>
</li><li id="dli_09_0087__en-us_topic_0197738139_li11272141817195">Create a session.<div class="codecoloring" codetype="Python" id="dli_09_0087__en-us_topic_0197738139_screen2658132002217"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">sparkSession</span> <span class="o">=</span> <span class="n">SparkSession</span><span class="o">.</span><span class="n">builder</span><span class="o">.</span><span class="n">appName</span><span class="p">(</span><span class="s2">&quot;datasource-dws&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">getOrCreate</span><span class="p">()</span>
</pre></div></td></tr></table></div>
</div>
</li></ol>
</div>
<div class="section" id="dli_09_0087__section127242716301"><h4 class="sectiontitle">Accessing a Data Source Using a DataFrame API</h4><ol id="dli_09_0087__en-us_topic_0197738139_ol121146133515"><li id="dli_09_0087__en-us_topic_0197738139_li811481318510">Set connection parameters.<div class="codecoloring" codetype="Python" id="dli_09_0087__en-us_topic_0197738139_screen97751620362"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span>
<span class="normal">5</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">url</span> <span class="o">=</span> <span class="s2">&quot;jdbc:postgresql://to-dws-1174404951-W8W4cW8I.datasource.com:8000/postgres&quot;</span>
<span class="n">dbtable</span> <span class="o">=</span> <span class="s2">&quot;customer&quot;</span>
<span class="n">user</span> <span class="o">=</span> <span class="s2">&quot;dbadmin&quot;</span>
<span class="n">password</span> <span class="o">=</span> <span class="s2">&quot;######&quot;</span>
<span class="n">driver</span> <span class="o">=</span> <span class="s2">&quot;org.postgresql.Driver&quot;</span>
</pre></div></td></tr></table></div>
</div>
</li><li id="dli_09_0087__en-us_topic_0197738139_li917316301757">Set data.<div class="codecoloring" codetype="Python" id="dli_09_0087__en-us_topic_0197738139_screen124802261619"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">dataList</span> <span class="o">=</span> <span class="n">sparkSession</span><span class="o">.</span><span class="n">sparkContext</span><span class="o">.</span><span class="n">parallelize</span><span class="p">([(</span><span class="mi">1</span><span class="p">,</span> <span class="s2">&quot;Katie&quot;</span><span class="p">,</span> <span class="mi">19</span><span class="p">)])</span>
</pre></div></td></tr></table></div>
</div>
</li><li id="dli_09_0087__en-us_topic_0197738139_li11507153517512">Configure the schema.<div class="codecoloring" codetype="Python" id="dli_09_0087__en-us_topic_0197738139_screen055217351661"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">schema</span> <span class="o">=</span> <span class="n">StructType</span><span class="p">([</span><span class="n">StructField</span><span class="p">(</span><span class="s2">&quot;id&quot;</span><span class="p">,</span> <span class="n">IntegerType</span><span class="p">(),</span> <span class="kc">False</span><span class="p">),</span>\
<span class="n">StructField</span><span class="p">(</span><span class="s2">&quot;name&quot;</span><span class="p">,</span> <span class="n">StringType</span><span class="p">(),</span> <span class="kc">False</span><span class="p">),</span>\
<span class="n">StructField</span><span class="p">(</span><span class="s2">&quot;age&quot;</span><span class="p">,</span> <span class="n">IntegerType</span><span class="p">(),</span> <span class="kc">False</span><span class="p">)])</span>
</pre></div></td></tr></table></div>
</div>
</li><li id="dli_09_0087__en-us_topic_0197738139_li84599412516">Create a DataFrame.<div class="codecoloring" codetype="Python" id="dli_09_0087__en-us_topic_0197738139_screen1423112411066"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">dataFrame</span> <span class="o">=</span> <span class="n">sparkSession</span><span class="o">.</span><span class="n">createDataFrame</span><span class="p">(</span><span class="n">dataList</span><span class="p">,</span> <span class="n">schema</span><span class="p">)</span>
</pre></div></td></tr></table></div>
</div>
</li><li id="dli_09_0087__en-us_topic_0197738139_li6977134720517">Save the data to GaussDB(DWS).<div class="codecoloring" codetype="Python" id="dli_09_0087__en-us_topic_0197738139_screen1114648863"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span>
<span class="normal">5</span>
<span class="normal">6</span>
<span class="normal">7</span>
<span class="normal">8</span>
<span class="normal">9</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">dataFrame</span><span class="o">.</span><span class="n">write</span> \
<span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s2">&quot;jdbc&quot;</span><span class="p">)</span> \
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;url&quot;</span><span class="p">,</span> <span class="n">url</span><span class="p">)</span> \
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;dbtable&quot;</span><span class="p">,</span> <span class="n">dbtable</span><span class="p">)</span> \
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;user&quot;</span><span class="p">,</span> <span class="n">user</span><span class="p">)</span> \
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;password&quot;</span><span class="p">,</span> <span class="n">password</span><span class="p">)</span> \
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;driver&quot;</span><span class="p">,</span> <span class="n">driver</span><span class="p">)</span> \
<span class="o">.</span><span class="n">mode</span><span class="p">(</span><span class="s2">&quot;Overwrite&quot;</span><span class="p">)</span> \
<span class="o">.</span><span class="n">save</span><span class="p">()</span>
</pre></div></td></tr></table></div>
</div>
<div class="note" id="dli_09_0087__en-us_topic_0197738139_note648119532618"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="dli_09_0087__en-us_topic_0197738139_p039712487568">The options of <strong id="dli_09_0087__en-us_topic_0197738139_b4865181715300">mode</strong> can be one of the following:</p>
<ul id="dli_09_0087__en-us_topic_0197738139_ul16164620151513"><li id="dli_09_0087__en-us_topic_0197738139_li1416452015156"><strong id="dli_09_0087__b49031049202717">ErrorIfExis</strong>: If the data already exists, the system throws an exception.</li><li id="dli_09_0087__en-us_topic_0197738139_li191651720151518"><strong id="dli_09_0087__b1623225216270">Overwrite</strong>: If the data already exists, the original data will be overwritten.</li><li id="dli_09_0087__en-us_topic_0197738139_li10165620111513"><strong id="dli_09_0087__b7721125312712">Append</strong>: If the data already exists, the system saves the new data.</li><li id="dli_09_0087__en-us_topic_0197738139_li181651720161514"><strong id="dli_09_0087__b134435792719">Ignore</strong>: If the data already exists, no operation is required. This is similar to the SQL statement <strong id="dli_09_0087__en-us_topic_0197738139_b1263102633014">CREATE TABLE IF NOT EXISTS</strong>.</li></ul>
</div></div>
</li><li id="dli_09_0087__en-us_topic_0197738139_li1663512531558">Read data from GaussDB(DWS).<div class="codecoloring" codetype="Python" id="dli_09_0087__en-us_topic_0197738139_screen44152614716"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span>
<span class="normal">5</span>
<span class="normal">6</span>
<span class="normal">7</span>
<span class="normal">8</span>
<span class="normal">9</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">jdbcDF</span> <span class="o">=</span> <span class="n">sparkSession</span><span class="o">.</span><span class="n">read</span> \
<span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s2">&quot;jdbc&quot;</span><span class="p">)</span> \
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;url&quot;</span><span class="p">,</span> <span class="n">url</span><span class="p">)</span> \
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;dbtable&quot;</span><span class="p">,</span> <span class="n">dbtable</span><span class="p">)</span> \
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;user&quot;</span><span class="p">,</span> <span class="n">user</span><span class="p">)</span> \
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;password&quot;</span><span class="p">,</span> <span class="n">password</span><span class="p">)</span> \
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;driver&quot;</span><span class="p">,</span> <span class="n">driver</span><span class="p">)</span> \
<span class="o">.</span><span class="n">load</span><span class="p">()</span>
<span class="n">jdbcDF</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
</pre></div></td></tr></table></div>
</div>
</li><li id="dli_09_0087__en-us_topic_0197738139_li735618013814">View the operation result.<p id="dli_09_0087__en-us_topic_0197738139_p175852160389"><a name="dli_09_0087__en-us_topic_0197738139_li735618013814"></a><a name="en-us_topic_0197738139_li735618013814"></a><span><img id="dli_09_0087__en-us_topic_0197738139_image1858520168385" src="en-us_image_0000001757793769.png"></span></p>
</li></ol>
</div>
<div class="section" id="dli_09_0087__section141481614173016"><h4 class="sectiontitle">Accessing a Data Source Using a SQL API</h4><ol id="dli_09_0087__en-us_topic_0197738139_ol76641478578"><li id="dli_09_0087__en-us_topic_0197738139_li766415714575">Create a table to connect to a GaussDB(DWS) data source.<div class="codecoloring" codetype="Python" id="dli_09_0087__en-us_topic_0197738139_screen168567111813"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span>
<span class="normal">5</span>
<span class="normal">6</span>
<span class="normal">7</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">sparkSession</span><span class="o">.</span><span class="n">sql</span><span class="p">(</span>
<span class="s2">&quot;CREATE TABLE IF NOT EXISTS dli_to_dws USING JDBC OPTIONS (</span>
<span class="s1">'url'</span><span class="o">=</span><span class="s1">'jdbc:postgresql://to-dws-1174404951-W8W4cW8I.datasource.com:8000/postgres'</span><span class="p">,</span>\
<span class="s1">'dbtable'</span><span class="o">=</span><span class="s1">'customer'</span><span class="p">,</span>\
<span class="s1">'user'</span><span class="o">=</span><span class="s1">'dbadmin'</span><span class="p">,</span>\
<span class="s1">'password'</span><span class="o">=</span><span class="s1">'######'</span><span class="p">,</span>\
<span class="s1">'driver'</span><span class="o">=</span><span class="s1">'org.postgresql.Driver'</span><span class="p">)</span><span class="s2">&quot;)</span>
</pre></div></td></tr></table></div>
</div>
<div class="note" id="dli_09_0087__en-us_topic_0197738139_note769511361632"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="dli_09_0087__en-us_topic_0197738139_p185611553167">For details about table creation parameters, see <a href="dli_09_0069.html#dli_09_0069__table193741955203417">Table 1</a>.</p>
</div></div>
</li><li id="dli_09_0087__en-us_topic_0197738139_li18200195616218">Insert data.<div class="codecoloring" codetype="Python" id="dli_09_0087__en-us_topic_0197738139_screen8334131819413"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">sparkSession</span><span class="o">.</span><span class="n">sql</span><span class="p">(</span><span class="s2">&quot;insert into dli_to_dws values(2,'John',24)&quot;</span><span class="p">)</span>
</pre></div></td></tr></table></div>
</div>
</li><li id="dli_09_0087__en-us_topic_0197738139_li18415185942">Query data.<div class="codecoloring" codetype="Python" id="dli_09_0087__en-us_topic_0197738139_screen759010271648"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">jdbcDF</span> <span class="o">=</span> <span class="n">sparkSession</span><span class="o">.</span><span class="n">sql</span><span class="p">(</span><span class="s2">&quot;select * from dli_to_dws&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
</pre></div></td></tr></table></div>
</div>
</li><li id="dli_09_0087__en-us_topic_0197738139_li96148241371">View the operation result.<p id="dli_09_0087__en-us_topic_0197738139_p199049444374"><a name="dli_09_0087__en-us_topic_0197738139_li96148241371"></a><a name="en-us_topic_0197738139_li96148241371"></a><span><img id="dli_09_0087__en-us_topic_0197738139_image7904744173713" src="en-us_image_0000001709994304.png"></span></p>
</li></ol>
</div>
<div class="section" id="dli_09_0087__section11998171917307"><h4 class="sectiontitle">Submitting a Spark Job</h4><ol id="dli_09_0087__en-us_topic_0197738139_ol612481914610"><li id="dli_09_0087__en-us_topic_0197738139_li17148191617535">Upload the Python code file to DLI. </li><li id="dli_09_0087__en-us_topic_0197738139_li67827509599">In the Spark job editor, select the corresponding dependency module and execute the Spark job. <div class="note" id="dli_09_0087__en-us_topic_0197738139_note1435543551919"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><ul id="dli_09_0087__en-us_topic_0197738139_ul17825285811"><li id="dli_09_0087__en-us_topic_0197738142_li58215295819">If the Spark version is 2.3.2 (will be offline soon) or 2.4.5, set <strong id="dli_09_0087__b2223235122120">Module</strong> to <strong id="dli_09_0087__b3223835202111">sys.datasource.hbase</strong> when you submit a job.</li><li id="dli_09_0087__li6624653171317">If the Spark version is 3.1.1, you do not need to select a module. Set <strong id="dli_09_0087__b1395439172113">Spark parameters (--conf)</strong>.<p id="dli_09_0087__p1765215102311">spark.driver.extraClassPath=/usr/share/extension/dli/spark-jar/datasource/dws/*</p>
<p id="dli_09_0087__p1865215532311">spark.executor.extraClassPath=/usr/share/extension/dli/spark-jar/datasource/dws/*</p>
</li></ul>
</div></div>
</li></ol>
</div>
<div class="section" id="dli_09_0087__section206008557104"><h4 class="sectiontitle">Complete Example Code</h4><ul id="dli_09_0087__ul59352091113"><li id="dli_09_0087__li593518910112">Connecting to data sources through DataFrame APIs<div class="note" id="dli_09_0087__note349152010285"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="dli_09_0087__p1449118206281">Hard-coded or plaintext passwords pose significant security risks. To ensure security, encrypt your passwords, store them in configuration files or environment variables, and decrypt them when needed.</p>
</div></div>
<div class="codecoloring" codetype="Python" id="dli_09_0087__en-us_topic_0197738140_screen19657164041313"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
<span class="normal"> 2</span>
<span class="normal"> 3</span>
<span class="normal"> 4</span>
<span class="normal"> 5</span>
<span class="normal"> 6</span>
<span class="normal"> 7</span>
<span class="normal"> 8</span>
<span class="normal"> 9</span>
<span class="normal">10</span>
<span class="normal">11</span>
<span class="normal">12</span>
<span class="normal">13</span>
<span class="normal">14</span>
<span class="normal">15</span>
<span class="normal">16</span>
<span class="normal">17</span>
<span class="normal">18</span>
<span class="normal">19</span>
<span class="normal">20</span>
<span class="normal">21</span>
<span class="normal">22</span>
<span class="normal">23</span>
<span class="normal">24</span>
<span class="normal">25</span>
<span class="normal">26</span>
<span class="normal">27</span>
<span class="normal">28</span>
<span class="normal">29</span>
<span class="normal">30</span>
<span class="normal">31</span>
<span class="normal">32</span>
<span class="normal">33</span>
<span class="normal">34</span>
<span class="normal">35</span>
<span class="normal">36</span>
<span class="normal">37</span>
<span class="normal">38</span>
<span class="normal">39</span>
<span class="normal">40</span>
<span class="normal">41</span>
<span class="normal">42</span>
<span class="normal">43</span>
<span class="normal">44</span>
<span class="normal">45</span>
<span class="normal">46</span>
<span class="normal">47</span>
<span class="normal">48</span>
<span class="normal">49</span>
<span class="normal">50</span>
<span class="normal">51</span></pre></div></td><td class="code"><div><pre><span></span><span class="c1"># _*_ coding: utf-8 _*_</span>
<span class="kn">from</span> <span class="nn">__future__</span> <span class="kn">import</span> <span class="n">print_function</span>
<span class="kn">from</span> <span class="nn">pyspark.sql.types</span> <span class="kn">import</span> <span class="n">StructType</span><span class="p">,</span> <span class="n">StructField</span><span class="p">,</span> <span class="n">IntegerType</span><span class="p">,</span> <span class="n">StringType</span>
<span class="kn">from</span> <span class="nn">pyspark.sql</span> <span class="kn">import</span> <span class="n">SparkSession</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s2">&quot;__main__&quot;</span><span class="p">:</span>
<span class="c1"># Create a SparkSession session. </span>
<span class="n">sparkSession</span> <span class="o">=</span> <span class="n">SparkSession</span><span class="o">.</span><span class="n">builder</span><span class="o">.</span><span class="n">appName</span><span class="p">(</span><span class="s2">&quot;datasource-dws&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">getOrCreate</span><span class="p">()</span>
<span class="c1"># Set cross-source connection parameters </span>
<span class="n">url</span> <span class="o">=</span> <span class="s2">&quot;jdbc:postgresql://to-dws-1174404951-W8W4cW8I.datasource.com:8000/postgres&quot;</span>
<span class="n">dbtable</span> <span class="o">=</span> <span class="s2">&quot;customer&quot;</span>
<span class="n">user</span> <span class="o">=</span> <span class="s2">&quot;dbadmin&quot;</span>
<span class="n">password</span> <span class="o">=</span> <span class="s2">&quot;######&quot;</span>
<span class="n">driver</span> <span class="o">=</span> <span class="s2">&quot;org.postgresql.Driver&quot;</span>
<span class="c1"># Create a DataFrame and initialize the DataFrame data. </span>
<span class="n">dataList</span> <span class="o">=</span> <span class="n">sparkSession</span><span class="o">.</span><span class="n">sparkContext</span><span class="o">.</span><span class="n">parallelize</span><span class="p">([(</span><span class="mi">1</span><span class="p">,</span> <span class="s2">&quot;Katie&quot;</span><span class="p">,</span> <span class="mi">19</span><span class="p">)])</span>
<span class="c1"># Setting schema </span>
<span class="n">schema</span> <span class="o">=</span> <span class="n">StructType</span><span class="p">([</span><span class="n">StructField</span><span class="p">(</span><span class="s2">&quot;id&quot;</span><span class="p">,</span> <span class="n">IntegerType</span><span class="p">(),</span> <span class="kc">False</span><span class="p">),</span>\
<span class="n">StructField</span><span class="p">(</span><span class="s2">&quot;name&quot;</span><span class="p">,</span> <span class="n">StringType</span><span class="p">(),</span> <span class="kc">False</span><span class="p">),</span>\
<span class="n">StructField</span><span class="p">(</span><span class="s2">&quot;age&quot;</span><span class="p">,</span> <span class="n">IntegerType</span><span class="p">(),</span> <span class="kc">False</span><span class="p">)])</span>
<span class="c1"># Create a DataFrame from RDD and schema </span>
<span class="n">dataFrame</span> <span class="o">=</span> <span class="n">sparkSession</span><span class="o">.</span><span class="n">createDataFrame</span><span class="p">(</span><span class="n">dataList</span><span class="p">,</span> <span class="n">schema</span><span class="p">)</span>
<span class="c1"># Write data to the DWS table </span>
<span class="n">dataFrame</span><span class="o">.</span><span class="n">write</span> \
<span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s2">&quot;jdbc&quot;</span><span class="p">)</span> \
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;url&quot;</span><span class="p">,</span> <span class="n">url</span><span class="p">)</span> \
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;dbtable&quot;</span><span class="p">,</span> <span class="n">dbtable</span><span class="p">)</span> \
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;user&quot;</span><span class="p">,</span> <span class="n">user</span><span class="p">)</span> \
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;password&quot;</span><span class="p">,</span> <span class="n">password</span><span class="p">)</span> \
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;driver&quot;</span><span class="p">,</span> <span class="n">driver</span><span class="p">)</span> \
<span class="o">.</span><span class="n">mode</span><span class="p">(</span><span class="s2">&quot;Overwrite&quot;</span><span class="p">)</span> \
<span class="o">.</span><span class="n">save</span><span class="p">()</span>
<span class="c1"># Read data </span>
<span class="n">jdbcDF</span> <span class="o">=</span> <span class="n">sparkSession</span><span class="o">.</span><span class="n">read</span> \
<span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s2">&quot;jdbc&quot;</span><span class="p">)</span> \
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;url&quot;</span><span class="p">,</span> <span class="n">url</span><span class="p">)</span> \
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;dbtable&quot;</span><span class="p">,</span> <span class="n">dbtable</span><span class="p">)</span> \
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;user&quot;</span><span class="p">,</span> <span class="n">user</span><span class="p">)</span> \
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;password&quot;</span><span class="p">,</span> <span class="n">password</span><span class="p">)</span> \
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;driver&quot;</span><span class="p">,</span> <span class="n">driver</span><span class="p">)</span> \
<span class="o">.</span><span class="n">load</span><span class="p">()</span>
<span class="n">jdbcDF</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
<span class="c1"># close session </span>
<span class="n">sparkSession</span><span class="o">.</span><span class="n">stop</span><span class="p">()</span>
</pre></div></td></tr></table></div>
</div>
</li><li id="dli_09_0087__li264133219111">Connecting to data sources through SQL APIs<div class="codecoloring" codetype="Python" id="dli_09_0087__en-us_topic_0197738140_screen107145273817"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
<span class="normal"> 2</span>
<span class="normal"> 3</span>
<span class="normal"> 4</span>
<span class="normal"> 5</span>
<span class="normal"> 6</span>
<span class="normal"> 7</span>
<span class="normal"> 8</span>
<span class="normal"> 9</span>
<span class="normal">10</span>
<span class="normal">11</span>
<span class="normal">12</span>
<span class="normal">13</span>
<span class="normal">14</span>
<span class="normal">15</span>
<span class="normal">16</span>
<span class="normal">17</span>
<span class="normal">18</span>
<span class="normal">19</span>
<span class="normal">20</span>
<span class="normal">21</span>
<span class="normal">22</span>
<span class="normal">23</span>
<span class="normal">24</span>
<span class="normal">25</span></pre></div></td><td class="code"><div><pre><span></span><span class="c1"># _*_ coding: utf-8 _*_</span>
<span class="kn">from</span> <span class="nn">__future__</span> <span class="kn">import</span> <span class="n">print_function</span>
<span class="kn">from</span> <span class="nn">pyspark.sql</span> <span class="kn">import</span> <span class="n">SparkSession</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s2">&quot;__main__&quot;</span><span class="p">:</span>
<span class="c1"># Create a SparkSession session. </span>
<span class="n">sparkSession</span> <span class="o">=</span> <span class="n">SparkSession</span><span class="o">.</span><span class="n">builder</span><span class="o">.</span><span class="n">appName</span><span class="p">(</span><span class="s2">&quot;datasource-dws&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">getOrCreate</span><span class="p">()</span>
<span class="c1"># Create a data table for DLI - associated GaussDB(DWS)</span>
<span class="n">sparkSession</span><span class="o">.</span><span class="n">sql</span><span class="p">(</span>
<span class="s2">&quot;CREATE TABLE IF NOT EXISTS dli_to_dws USING JDBC OPTIONS (</span><span class="se">\</span>
<span class="s2"> 'url'='jdbc:postgresql://to-dws-1174404951-W8W4cW8I.datasource.com:8000/postgres',</span><span class="se">\</span>
<span class="s2"> 'dbtable'='customer',</span><span class="se">\</span>
<span class="s2"> 'user'='dbadmin',</span><span class="se">\</span>
<span class="s2"> 'password'='######',</span><span class="se">\</span>
<span class="s2"> 'driver'='org.postgresql.Driver')&quot;</span><span class="p">)</span>
<span class="c1"># Insert data into the DLI data table </span>
<span class="n">sparkSession</span><span class="o">.</span><span class="n">sql</span><span class="p">(</span><span class="s2">&quot;insert into dli_to_dws values(2,'John',24)&quot;</span><span class="p">)</span>
<span class="c1"># Read data from DLI data table </span>
<span class="n">jdbcDF</span> <span class="o">=</span> <span class="n">sparkSession</span><span class="o">.</span><span class="n">sql</span><span class="p">(</span><span class="s2">&quot;select * from dli_to_dws&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
<span class="c1"># close session </span>
<span class="n">sparkSession</span><span class="o">.</span><span class="n">stop</span><span class="p">()</span>
</pre></div></td></tr></table></div>
</div>
</li></ul>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="dli_09_0086.html">Connecting to GaussDB(DWS)</a></div>
</div>
</div>