doc-exports/docs/dli/dev/dli_09_0090.html
Hasko, Vladimir cfc48b3aed dli_dev_0104_version
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com>
Co-authored-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-committed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
2024-05-06 09:14:57 +00:00

547 lines
65 KiB
HTML

<a name="dli_09_0090"></a><a name="dli_09_0090"></a>
<h1 class="topictitle1">PySpark Example Code</h1>
<div id="body8662426"><div class="section" id="dli_09_0090__section16813101121217"><h4 class="sectiontitle">Prerequisites</h4><p id="dli_09_0090__p882246201220">A datasource connection has been created on the DLI management console. </p>
</div>
<div class="section" id="dli_09_0090__section516483461213"><h4 class="sectiontitle">CSS Non-Security Cluster</h4><ul id="dli_09_0090__ul1469475820121"><li id="dli_09_0090__li19694558151211">Development description<ul id="dli_09_0090__ul185891014161317"><li id="dli_09_0090__li10365118121416">Code implementation<ol id="dli_09_0090__en-us_topic_0197738142_ol12123050181818"><li id="dli_09_0090__en-us_topic_0197738142_li1612316509182">Import dependency packages.<div class="codecoloring" codetype="Python" id="dli_09_0090__en-us_topic_0197738142_screen195374592114"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span></pre></div></td><td class="code"><div><pre><span></span><span class="kn">from</span> <span class="nn">__future__</span> <span class="kn">import</span> <span class="n">print_function</span>
<span class="kn">from</span> <span class="nn">pyspark.sql.types</span> <span class="kn">import</span> <span class="n">StructType</span><span class="p">,</span> <span class="n">StructField</span><span class="p">,</span> <span class="n">IntegerType</span><span class="p">,</span> <span class="n">StringType</span><span class="p">,</span> <span class="n">Row</span>
<span class="kn">from</span> <span class="nn">pyspark.sql</span> <span class="kn">import</span> <span class="n">SparkSession</span>
</pre></div></td></tr></table></div>
</div>
</li><li id="dli_09_0090__en-us_topic_0197738142_li11272141817195">Create a session.<div class="codecoloring" codetype="Python" id="dli_09_0090__en-us_topic_0197738142_screen2658132002217"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">sparkSession</span> <span class="o">=</span> <span class="n">SparkSession</span><span class="o">.</span><span class="n">builder</span><span class="o">.</span><span class="n">appName</span><span class="p">(</span><span class="s2">&quot;datasource-css&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">getOrCreate</span><span class="p">()</span>
</pre></div></td></tr></table></div>
</div>
</li></ol>
</li><li id="dli_09_0090__li18391148121418">Connecting to data sources through DataFrame APIs<ol id="dli_09_0090__en-us_topic_0197738142_ol127271626541"><li id="dli_09_0090__en-us_topic_0197738142_li147277210549">Set connection parameters.<div class="codecoloring" codetype="Python" id="dli_09_0090__en-us_topic_0197738142_screen8249222145511"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">resource</span> <span class="o">=</span> <span class="s2">&quot;/mytest&quot;</span>
<span class="n">nodes</span> <span class="o">=</span> <span class="s2">&quot;to-css-1174404953-hDTx3UPK.datasource.com:9200&quot;</span>
</pre></div></td></tr></table></div>
</div>
<div class="note" id="dli_09_0090__en-us_topic_0190067468_note2975123311388"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="dli_09_0090__p127645653210"><strong id="dli_09_0090__b7484173112151">resource</strong> indicates the name of the resource associated with the CSS. You can specify the resource location in <em id="dli_09_0090__i144670710165">/index/type</em> format. (The <strong id="dli_09_0090__b9303935111814">index</strong> can be the database and <strong id="dli_09_0090__b919474210180">type</strong> the table.)</p>
<ul id="dli_09_0090__en-us_topic_0190067468_ul4143105815201"><li id="dli_09_0090__en-us_topic_0190067468_li1614316583201">In Elasticsearch 6.X, a single index supports only one type, and the type name can be customized.</li><li id="dli_09_0090__en-us_topic_0190067468_li3144558182013">In Elasticsearch 7.X, a single index uses <strong id="dli_09_0090__b9283845132014">_doc</strong> as the type name and cannot be customized. To access Elasticsearch 7.X, set this parameter to <strong id="dli_09_0090__b887504714201">index</strong>.</li></ul>
</div></div>
</li><li id="dli_09_0090__en-us_topic_0197738142_li1983317185547">Create a schema and add data to it.<div class="codecoloring" codetype="Python" id="dli_09_0090__en-us_topic_0197738142_screen11743105965516"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">schema</span> <span class="o">=</span> <span class="n">StructType</span><span class="p">([</span><span class="n">StructField</span><span class="p">(</span><span class="s2">&quot;id&quot;</span><span class="p">,</span> <span class="n">IntegerType</span><span class="p">(),</span> <span class="kc">False</span><span class="p">),</span>
<span class="n">StructField</span><span class="p">(</span><span class="s2">&quot;name&quot;</span><span class="p">,</span> <span class="n">StringType</span><span class="p">(),</span> <span class="kc">False</span><span class="p">)])</span>
<span class="n">rdd</span> <span class="o">=</span> <span class="n">sparkSession</span><span class="o">.</span><span class="n">sparkContext</span><span class="o">.</span><span class="n">parallelize</span><span class="p">([</span><span class="n">Row</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="s2">&quot;John&quot;</span><span class="p">),</span> <span class="n">Row</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="s2">&quot;Bob&quot;</span><span class="p">)])</span>
</pre></div></td></tr></table></div>
</div>
</li><li id="dli_09_0090__en-us_topic_0197738142_li18999848205611">Construct a DataFrame.<div class="codecoloring" codetype="Python" id="dli_09_0090__en-us_topic_0197738142_screen842219555710"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">dataFrame</span> <span class="o">=</span> <span class="n">sparkSession</span><span class="o">.</span><span class="n">createDataFrame</span><span class="p">(</span><span class="n">rdd</span><span class="p">,</span> <span class="n">schema</span><span class="p">)</span>
</pre></div></td></tr></table></div>
</div>
</li><li id="dli_09_0090__en-us_topic_0197738142_li189396266549">Save data to CSS.<div class="codecoloring" codetype="Python" id="dli_09_0090__en-us_topic_0197738142_screen360873225719"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">dataFrame</span><span class="o">.</span><span class="n">write</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s2">&quot;css&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;resource&quot;</span><span class="p">,</span> <span class="n">resource</span><span class="p">)</span><span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;es.nodes&quot;</span><span class="p">,</span> <span class="n">nodes</span><span class="p">)</span><span class="o">.</span><span class="n">mode</span><span class="p">(</span><span class="s2">&quot;Overwrite&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">save</span><span class="p">()</span>
</pre></div></td></tr></table></div>
</div>
<div class="note" id="dli_09_0090__en-us_topic_0197738142_note2098134418572"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="dli_09_0090__en-us_topic_0197738142_p039712487568">The options of <strong id="dli_09_0090__b217811320499">mode</strong> can be one of the following:</p>
<ul id="dli_09_0090__en-us_topic_0197738142_ul1929273321915"><li id="dli_09_0090__en-us_topic_0197738142_li8292633161916"><strong id="dli_09_0090__b1377612742711">ErrorIfExis</strong>: If the data already exists, the system throws an exception.</li><li id="dli_09_0090__en-us_topic_0197738142_li1229213391913"><strong id="dli_09_0090__b068912942720">Overwrite</strong>: If the data already exists, the original data will be overwritten.</li><li id="dli_09_0090__en-us_topic_0197738142_li7292833201912"><strong id="dli_09_0090__b19169311182711">Append</strong>: If the data already exists, the system saves the new data.</li><li id="dli_09_0090__en-us_topic_0197738142_li1029353311911"><strong id="dli_09_0090__b56021912132711">Ignore</strong>: If the data already exists, no operation is required. This is similar to the SQL statement <strong id="dli_09_0090__en-us_topic_0197738142_b6417534109">CREATE TABLE IF NOT EXISTS</strong>.</li></ul>
</div></div>
</li><li id="dli_09_0090__en-us_topic_0197738142_li952173912546">Read data from CSS.<div class="codecoloring" codetype="Python" id="dli_09_0090__en-us_topic_0197738142_screen1042041414589"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">jdbcDF</span> <span class="o">=</span> <span class="n">sparkSession</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s2">&quot;css&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;resource&quot;</span><span class="p">,</span> <span class="n">resource</span><span class="p">)</span><span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;es.nodes&quot;</span><span class="p">,</span> <span class="n">nodes</span><span class="p">)</span><span class="o">.</span><span class="n">load</span><span class="p">()</span>
<span class="n">jdbcDF</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
</pre></div></td></tr></table></div>
</div>
</li><li id="dli_09_0090__en-us_topic_0197738142_li04481816173018">View the operation result.<p id="dli_09_0090__en-us_topic_0197738142_p179471918125015"><a name="dli_09_0090__en-us_topic_0197738142_li04481816173018"></a><a name="en-us_topic_0197738142_li04481816173018"></a><span><img id="dli_09_0090__en-us_topic_0197738142_image10946918135013" src="en-us_image_0266332985.png"></span></p>
</li></ol>
</li><li id="dli_09_0090__li157584232155">Connecting to data sources through SQL APIs<ol id="dli_09_0090__en-us_topic_0197738142_ol564813553476"><li id="dli_09_0090__en-us_topic_0197738142_li19648135510475">Create a table to connect to a CSS data source.<div class="codecoloring" codetype="Python" id="dli_09_0090__en-us_topic_0197738142_screen822818915497"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span>
<span class="normal">5</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">sparkSession</span><span class="o">.</span><span class="n">sql</span><span class="p">(</span>
<span class="s2">&quot;create table css_table(id long, name string) using css options( </span>
<span class="s1">'es.nodes'</span><span class="o">=</span><span class="s1">'to-css-1174404953-hDTx3UPK.datasource.com:9200'</span><span class="p">,</span>
<span class="s1">'es.nodes.wan.only'</span><span class="o">=</span><span class="s1">'true'</span><span class="p">,</span>
<span class="s1">'resource'</span><span class="o">=</span><span class="s1">'/mytest'</span><span class="p">)</span><span class="s2">&quot;)</span>
</pre></div></td></tr></table></div>
</div>
<div class="note" id="dli_09_0090__en-us_topic_0197738142_note0745175005018"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="dli_09_0090__en-us_topic_0197738142_p193151638176">For details about the parameters for creating a CSS datasource connection table, see <a href="dli_09_0061.html#dli_09_0061__en-us_topic_0190067468_table569314388144">Table 1</a>.</p>
</div></div>
</li><li id="dli_09_0090__en-us_topic_0197738142_li11669142935211">Insert data.<div class="codecoloring" codetype="Python" id="dli_09_0090__en-us_topic_0197738142_screen1488194510529"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">sparkSession</span><span class="o">.</span><span class="n">sql</span><span class="p">(</span><span class="s2">&quot;insert into css_table values(3,'tom')&quot;</span><span class="p">)</span>
</pre></div></td></tr></table></div>
</div>
</li><li id="dli_09_0090__en-us_topic_0197738142_li15985054105216">Query data.<div class="codecoloring" codetype="Python" id="dli_09_0090__en-us_topic_0197738142_screen11514181135310"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">jdbcDF</span> <span class="o">=</span> <span class="n">sparkSession</span><span class="o">.</span><span class="n">sql</span><span class="p">(</span><span class="s2">&quot;select * from css_table&quot;</span><span class="p">)</span>
<span class="n">jdbcDF</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
</pre></div></td></tr></table></div>
</div>
</li><li id="dli_09_0090__en-us_topic_0197738142_li1969144743111">View the operation result.<p id="dli_09_0090__en-us_topic_0197738142_p3132154717503"><a name="dli_09_0090__en-us_topic_0197738142_li1969144743111"></a><a name="en-us_topic_0197738142_li1969144743111"></a><span><img id="dli_09_0090__en-us_topic_0197738142_image151321047135013" src="en-us_image_0223997308.png"></span></p>
</li></ol>
</li><li id="dli_09_0090__li64601159181511">Submitting a Spark job<ol id="dli_09_0090__en-us_topic_0197738142_ol612481914610"><li id="dli_09_0090__li123612149207">Upload the Python code file to DLI.<p id="dli_09_0090__p5849419152015"><a name="dli_09_0090__li123612149207"></a><a name="li123612149207"></a></p>
<p id="dli_09_0090__p1792110154205"></p>
</li><li id="dli_09_0090__li8115122122017">In the Spark job editor, select the corresponding dependency module and execute the Spark job.<p id="dli_09_0090__p181362615209"><a name="dli_09_0090__li8115122122017"></a><a name="li8115122122017"></a></p>
<div class="p" id="dli_09_0090__p18805202232014"><div class="note" id="dli_09_0090__en-us_topic_0197738142_note1435543551919"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><ul id="dli_09_0090__en-us_topic_0197738142_ul17825285811"><li id="dli_09_0090__en-us_topic_0197738142_li58215295819">If the Spark version is 2.3.2 (will be offline soon) or 2.4.5, specify the <strong id="dli_09_0090__b633421718352">Module</strong> to <strong id="dli_09_0090__b1533471714350">sys.datasource.css</strong> when you submit a job.</li><li id="dli_09_0090__li6624653171317">If the Spark version is 3.1.1, you do not need to select a module. Configure <strong id="dli_09_0090__b137638181353">Spark parameters (--conf)</strong>.<p id="dli_09_0090__p13361102416273">spark.driver.extraClassPath=/usr/share/extension/dli/spark-jar/datasource/css/*</p>
<p id="dli_09_0090__p123611724162718">spark.executor.extraClassPath=/usr/share/extension/dli/spark-jar/datasource/css/*</p>
</li></ul>
</div></div>
</div>
</li></ol>
</li></ul>
</li><li id="dli_09_0090__li13741133941619">Complete example code<ul id="dli_09_0090__ul1890464916173"><li id="dli_09_0090__li22432478175">Connecting to data sources through DataFrame APIs<div class="codecoloring" codetype="Python" id="dli_09_0090__en-us_topic_0197738143_screen172016130283"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
<span class="normal"> 2</span>
<span class="normal"> 3</span>
<span class="normal"> 4</span>
<span class="normal"> 5</span>
<span class="normal"> 6</span>
<span class="normal"> 7</span>
<span class="normal"> 8</span>
<span class="normal"> 9</span>
<span class="normal">10</span>
<span class="normal">11</span>
<span class="normal">12</span>
<span class="normal">13</span>
<span class="normal">14</span>
<span class="normal">15</span>
<span class="normal">16</span>
<span class="normal">17</span>
<span class="normal">18</span>
<span class="normal">19</span>
<span class="normal">20</span>
<span class="normal">21</span>
<span class="normal">22</span>
<span class="normal">23</span>
<span class="normal">24</span>
<span class="normal">25</span>
<span class="normal">26</span>
<span class="normal">27</span>
<span class="normal">28</span>
<span class="normal">29</span>
<span class="normal">30</span>
<span class="normal">31</span>
<span class="normal">32</span></pre></div></td><td class="code"><div><pre><span></span><span class="c1"># _*_ coding: utf-8 _*_</span>
<span class="kn">from</span> <span class="nn">__future__</span> <span class="kn">import</span> <span class="n">print_function</span>
<span class="kn">from</span> <span class="nn">pyspark.sql.types</span> <span class="kn">import</span> <span class="n">Row</span><span class="p">,</span> <span class="n">StructType</span><span class="p">,</span> <span class="n">StructField</span><span class="p">,</span> <span class="n">IntegerType</span><span class="p">,</span> <span class="n">StringType</span>
<span class="kn">from</span> <span class="nn">pyspark.sql</span> <span class="kn">import</span> <span class="n">SparkSession</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s2">&quot;__main__&quot;</span><span class="p">:</span>
<span class="c1"># Create a SparkSession session. </span>
<span class="n">sparkSession</span> <span class="o">=</span> <span class="n">SparkSession</span><span class="o">.</span><span class="n">builder</span><span class="o">.</span><span class="n">appName</span><span class="p">(</span><span class="s2">&quot;datasource-css&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">getOrCreate</span><span class="p">()</span>
<span class="c1"># Setting cross-source connection parameters </span>
<span class="n">resource</span> <span class="o">=</span> <span class="s2">&quot;/mytest&quot;</span>
<span class="n">nodes</span> <span class="o">=</span> <span class="s2">&quot;to-css-1174404953-hDTx3UPK.datasource.com:9200&quot;</span>
<span class="c1"># Setting schema </span>
<span class="n">schema</span> <span class="o">=</span> <span class="n">StructType</span><span class="p">([</span><span class="n">StructField</span><span class="p">(</span><span class="s2">&quot;id&quot;</span><span class="p">,</span> <span class="n">IntegerType</span><span class="p">(),</span> <span class="kc">False</span><span class="p">),</span>
<span class="n">StructField</span><span class="p">(</span><span class="s2">&quot;name&quot;</span><span class="p">,</span> <span class="n">StringType</span><span class="p">(),</span> <span class="kc">False</span><span class="p">)])</span>
<span class="c1"># Construction data </span>
<span class="n">rdd</span> <span class="o">=</span> <span class="n">sparkSession</span><span class="o">.</span><span class="n">sparkContext</span><span class="o">.</span><span class="n">parallelize</span><span class="p">([</span><span class="n">Row</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="s2">&quot;John&quot;</span><span class="p">),</span> <span class="n">Row</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="s2">&quot;Bob&quot;</span><span class="p">)])</span>
<span class="c1"># Create a DataFrame from RDD and schema </span>
<span class="n">dataFrame</span> <span class="o">=</span> <span class="n">sparkSession</span><span class="o">.</span><span class="n">createDataFrame</span><span class="p">(</span><span class="n">rdd</span><span class="p">,</span> <span class="n">schema</span><span class="p">)</span>
<span class="c1"># Write data to the CSS </span>
<span class="n">dataFrame</span><span class="o">.</span><span class="n">write</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s2">&quot;css&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;resource&quot;</span><span class="p">,</span> <span class="n">resource</span><span class="p">)</span><span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;es.nodes&quot;</span><span class="p">,</span> <span class="n">nodes</span><span class="p">)</span><span class="o">.</span><span class="n">mode</span><span class="p">(</span><span class="s2">&quot;Overwrite&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">save</span><span class="p">()</span>
<span class="c1"># Read data </span>
<span class="n">jdbcDF</span> <span class="o">=</span> <span class="n">sparkSession</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s2">&quot;css&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;resource&quot;</span><span class="p">,</span> <span class="n">resource</span><span class="p">)</span><span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;es.nodes&quot;</span><span class="p">,</span> <span class="n">nodes</span><span class="p">)</span><span class="o">.</span><span class="n">load</span><span class="p">()</span>
<span class="n">jdbcDF</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
<span class="c1"># close session </span>
<span class="n">sparkSession</span><span class="o">.</span><span class="n">stop</span><span class="p">()</span>
</pre></div></td></tr></table></div>
</div>
</li><li id="dli_09_0090__li844951121818">Connecting to data sources through SQL APIs<div class="codecoloring" codetype="Python" id="dli_09_0090__en-us_topic_0197738143_screen12862344162714"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
<span class="normal"> 2</span>
<span class="normal"> 3</span>
<span class="normal"> 4</span>
<span class="normal"> 5</span>
<span class="normal"> 6</span>
<span class="normal"> 7</span>
<span class="normal"> 8</span>
<span class="normal"> 9</span>
<span class="normal">10</span>
<span class="normal">11</span>
<span class="normal">12</span>
<span class="normal">13</span>
<span class="normal">14</span>
<span class="normal">15</span>
<span class="normal">16</span>
<span class="normal">17</span>
<span class="normal">18</span>
<span class="normal">19</span>
<span class="normal">20</span>
<span class="normal">21</span>
<span class="normal">22</span>
<span class="normal">23</span>
<span class="normal">24</span></pre></div></td><td class="code"><div><pre><span></span><span class="c1"># _*_ coding: utf-8 _*_</span>
<span class="kn">from</span> <span class="nn">__future__</span> <span class="kn">import</span> <span class="n">print_function</span>
<span class="kn">from</span> <span class="nn">pyspark.sql</span> <span class="kn">import</span> <span class="n">SparkSession</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s2">&quot;__main__&quot;</span><span class="p">:</span>
<span class="c1"># Create a SparkSession session. </span>
<span class="n">sparkSession</span> <span class="o">=</span> <span class="n">SparkSession</span><span class="o">.</span><span class="n">builder</span><span class="o">.</span><span class="n">appName</span><span class="p">(</span><span class="s2">&quot;datasource-css&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">getOrCreate</span><span class="p">()</span>
<span class="c1"># Create a DLI data table for DLI-associated CSS </span>
<span class="n">sparkSession</span><span class="o">.</span><span class="n">sql</span><span class="p">(</span>
<span class="s2">&quot;create table css_table(id long, name string) using css options( </span><span class="se">\</span>
<span class="s2"> 'es.nodes'='to-css-1174404953-hDTx3UPK.datasource.com:9200',</span><span class="se">\</span>
<span class="s2"> 'es.nodes.wan.only'='true',</span><span class="se">\</span>
<span class="s2"> 'resource'='/mytest')&quot;</span><span class="p">)</span>
<span class="c1"># Insert data into the DLI data table </span>
<span class="n">sparkSession</span><span class="o">.</span><span class="n">sql</span><span class="p">(</span><span class="s2">&quot;insert into css_table values(3,'tom')&quot;</span><span class="p">)</span>
<span class="c1"># Read data from DLI data table </span>
<span class="n">jdbcDF</span> <span class="o">=</span> <span class="n">sparkSession</span><span class="o">.</span><span class="n">sql</span><span class="p">(</span><span class="s2">&quot;select * from css_table&quot;</span><span class="p">)</span>
<span class="n">jdbcDF</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
<span class="c1"># close session </span>
<span class="n">sparkSession</span><span class="o">.</span><span class="n">stop</span><span class="p">()</span>
</pre></div></td></tr></table></div>
</div>
</li></ul>
</li></ul>
</div>
<div class="section" id="dli_09_0090__section09040561187"><h4 class="sectiontitle">CSS Security Cluster</h4><ul id="dli_09_0090__ul129145641913"><li id="dli_09_0090__li13132103118206">Development description<ul id="dli_09_0090__ul4484204202118"><li id="dli_09_0090__li193996462233">Code implementation<ol id="dli_09_0090__en-us_topic_0199537139_ol12123050181818"><li id="dli_09_0090__en-us_topic_0199537139_li1612316509182">Import dependency packages.<div class="codecoloring" codetype="Python" id="dli_09_0090__en-us_topic_0199537139_screen195374592114"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span></pre></div></td><td class="code"><div><pre><span></span><span class="kn">from</span> <span class="nn">__future__</span> <span class="kn">import</span> <span class="n">print_function</span>
<span class="kn">from</span> <span class="nn">pyspark.sql.types</span> <span class="kn">import</span> <span class="n">StructType</span><span class="p">,</span> <span class="n">StructField</span><span class="p">,</span> <span class="n">IntegerType</span><span class="p">,</span> <span class="n">StringType</span><span class="p">,</span> <span class="n">Row</span>
<span class="kn">from</span> <span class="nn">pyspark.sql</span> <span class="kn">import</span> <span class="n">SparkSession</span>
</pre></div></td></tr></table></div>
</div>
</li><li id="dli_09_0090__en-us_topic_0199537139_li11272141817195">Create a session and set the AKs and SKs.<div class="note" id="dli_09_0090__note1358715714155"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="dli_09_0090__p1858718570154">Hard-coded or plaintext AK and SK pose significant security risks. To ensure security, encrypt your AK and SK, store them in configuration files or environment variables, and decrypt them when needed.</p>
</div></div>
<div class="codecoloring" codetype="Python" id="dli_09_0090__en-us_topic_0199537139_screen2658132002217"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span>
<span class="normal">5</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">sparkSession</span> <span class="o">=</span> <span class="n">SparkSession</span><span class="o">.</span><span class="n">builder</span><span class="o">.</span><span class="n">appName</span><span class="p">(</span><span class="s2">&quot;datasource-css&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">getOrCreate</span><span class="p">()</span>
<span class="n">sparkSession</span><span class="o">.</span><span class="n">conf</span><span class="o">.</span><span class="n">set</span><span class="p">(</span><span class="s2">&quot;fs.obs.access.key&quot;</span><span class="p">,</span> <span class="n">ak</span><span class="p">)</span>
<span class="n">sparkSession</span><span class="o">.</span><span class="n">conf</span><span class="o">.</span><span class="n">set</span><span class="p">(</span><span class="s2">&quot;fs.obs.secret.key&quot;</span><span class="p">,</span> <span class="n">sk</span><span class="p">)</span>
<span class="n">sparkSession</span><span class="o">.</span><span class="n">conf</span><span class="o">.</span><span class="n">set</span><span class="p">(</span><span class="s2">&quot;fs.obs.endpoint&quot;</span><span class="p">,</span> <span class="n">enpoint</span><span class="p">)</span>
<span class="n">sparkSession</span><span class="o">.</span><span class="n">conf</span><span class="o">.</span><span class="n">set</span><span class="p">(</span><span class="s2">&quot;fs.obs.connecton.ssl.enabled&quot;</span><span class="p">,</span> <span class="s2">&quot;false&quot;</span><span class="p">)</span>
</pre></div></td></tr></table></div>
</div>
</li></ol>
</li><li id="dli_09_0090__li41489502413">Connecting to data sources through DataFrame APIs<ol id="dli_09_0090__en-us_topic_0199537139_ol127271626541"><li id="dli_09_0090__en-us_topic_0199537139_li147277210549">Set connection parameters.<div class="codecoloring" codetype="Python" id="dli_09_0090__en-us_topic_0199537139_screen8249222145511"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">resource</span> <span class="o">=</span> <span class="s2">&quot;/mytest&quot;</span><span class="p">;</span>
<span class="n">nodes</span> <span class="o">=</span> <span class="s2">&quot;to-css-1174404953-hDTx3UPK.datasource.com:9200&quot;</span>
</pre></div></td></tr></table></div>
</div>
<div class="note" id="dli_09_0090__note13643210193819"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="dli_09_0090__p06433106387"><strong id="dli_09_0090__b98033502201">resource</strong> indicates the name of the resource associated with the CSS. You can specify the resource location in <em id="dli_09_0090__i5368951142011">/index/type</em> format. (The <strong id="dli_09_0090__b10369851182017">index</strong> can be the database and <strong id="dli_09_0090__b3369951142014">type</strong> the table.)</p>
<ul id="dli_09_0090__ul19643810103813"><li id="dli_09_0090__li16643141083810">In Elasticsearch 6.X, a single index supports only one type, and the type name can be customized.</li><li id="dli_09_0090__li464301014381">In Elasticsearch 7.X, a single index uses <strong id="dli_09_0090__b6268145613205">_doc</strong> as the type name and cannot be customized. To access Elasticsearch 7.X, set this parameter to <strong id="dli_09_0090__b15281557132015">index</strong>.</li></ul>
</div></div>
</li><li id="dli_09_0090__en-us_topic_0199537139_li1983317185547">Create a schema and add data to it.<div class="codecoloring" codetype="Python" id="dli_09_0090__en-us_topic_0199537139_screen11743105965516"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">schema</span> <span class="o">=</span> <span class="n">StructType</span><span class="p">([</span><span class="n">StructField</span><span class="p">(</span><span class="s2">&quot;id&quot;</span><span class="p">,</span> <span class="n">IntegerType</span><span class="p">(),</span> <span class="kc">False</span><span class="p">),</span>
<span class="n">StructField</span><span class="p">(</span><span class="s2">&quot;name&quot;</span><span class="p">,</span> <span class="n">StringType</span><span class="p">(),</span> <span class="kc">False</span><span class="p">)])</span>
<span class="n">rdd</span> <span class="o">=</span> <span class="n">sparkSession</span><span class="o">.</span><span class="n">sparkContext</span><span class="o">.</span><span class="n">parallelize</span><span class="p">([</span><span class="n">Row</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="s2">&quot;John&quot;</span><span class="p">),</span> <span class="n">Row</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="s2">&quot;Bob&quot;</span><span class="p">)])</span>
</pre></div></td></tr></table></div>
</div>
</li><li id="dli_09_0090__en-us_topic_0199537139_li18999848205611">Construct a DataFrame.<div class="codecoloring" codetype="Python" id="dli_09_0090__en-us_topic_0199537139_screen842219555710"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">dataFrame</span> <span class="o">=</span> <span class="n">sparkSession</span><span class="o">.</span><span class="n">createDataFrame</span><span class="p">(</span><span class="n">rdd</span><span class="p">,</span> <span class="n">schema</span><span class="p">)</span>
</pre></div></td></tr></table></div>
</div>
</li><li id="dli_09_0090__en-us_topic_0199537139_li189396266549">Save data to CSS.<div class="codecoloring" codetype="Python" id="dli_09_0090__en-us_topic_0199537139_screen360873225719"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
<span class="normal"> 2</span>
<span class="normal"> 3</span>
<span class="normal"> 4</span>
<span class="normal"> 5</span>
<span class="normal"> 6</span>
<span class="normal"> 7</span>
<span class="normal"> 8</span>
<span class="normal"> 9</span>
<span class="normal">10</span>
<span class="normal">11</span>
<span class="normal">12</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">dataFrame</span><span class="o">.</span><span class="n">write</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s2">&quot;css&quot;</span><span class="p">)</span>
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;resource&quot;</span><span class="p">,</span> <span class="n">resource</span><span class="p">)</span>
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;es.nodes&quot;</span><span class="p">,</span> <span class="n">nodes</span><span class="p">)</span>
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;es.net.ssl&quot;</span><span class="p">,</span> <span class="s2">&quot;true&quot;</span><span class="p">)</span>
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;es.net.ssl.keystore.location&quot;</span><span class="p">,</span> <span class="s2">&quot;obs://Bucket name/path/transport-keystore.jks&quot;</span><span class="p">)</span>
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;es.net.ssl.keystore.pass&quot;</span><span class="p">,</span> <span class="s2">&quot;***&quot;</span><span class="p">)</span>
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;es.net.ssl.truststore.location&quot;</span><span class="p">,</span> <span class="s2">&quot;obs://Bucket name/path/truststore.jks&quot;</span><span class="p">)</span>
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;es.net.ssl.truststore.pass&quot;</span><span class="p">,</span> <span class="s2">&quot;***&quot;</span><span class="p">)</span>
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;es.net.http.auth.user&quot;</span><span class="p">,</span> <span class="s2">&quot;admin&quot;</span><span class="p">)</span>
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;es.net.http.auth.pass&quot;</span><span class="p">,</span> <span class="s2">&quot;***&quot;</span><span class="p">)</span>
<span class="o">.</span><span class="n">mode</span><span class="p">(</span><span class="s2">&quot;Overwrite&quot;</span><span class="p">)</span>
<span class="o">.</span><span class="n">save</span><span class="p">()</span>
</pre></div></td></tr></table></div>
</div>
<div class="note" id="dli_09_0090__en-us_topic_0199537139_note2098134418572"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="dli_09_0090__en-us_topic_0199537139_p039712487568">The options of <strong id="dli_09_0090__b1543610511495">mode</strong> can be one of the following:</p>
<ul id="dli_09_0090__en-us_topic_0199537139_ul1929273321915"><li id="dli_09_0090__en-us_topic_0199537139_li8292633161916"><strong id="dli_09_0090__b49291435112616">ErrorIfExis</strong>: If the data already exists, the system throws an exception.</li><li id="dli_09_0090__en-us_topic_0199537139_li1229213391913"><strong id="dli_09_0090__b11825537202612">Overwrite</strong>: If the data already exists, the original data will be overwritten.</li><li id="dli_09_0090__en-us_topic_0199537139_li7292833201912"><strong id="dli_09_0090__b287619399261">Append</strong>: If the data already exists, the system saves the new data.</li><li id="dli_09_0090__en-us_topic_0199537139_li1029353311911"><strong id="dli_09_0090__b738304122610">Ignore</strong>: If the data already exists, no operation is required. This is similar to the SQL statement <strong id="dli_09_0090__b2062592685">CREATE TABLE IF NOT EXISTS</strong>.</li></ul>
</div></div>
</li><li id="dli_09_0090__en-us_topic_0199537139_li952173912546">Read data from CSS.<div class="codecoloring" codetype="Python" id="dli_09_0090__en-us_topic_0199537139_screen1042041414589"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
<span class="normal"> 2</span>
<span class="normal"> 3</span>
<span class="normal"> 4</span>
<span class="normal"> 5</span>
<span class="normal"> 6</span>
<span class="normal"> 7</span>
<span class="normal"> 8</span>
<span class="normal"> 9</span>
<span class="normal">10</span>
<span class="normal">11</span>
<span class="normal">12</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">jdbcDF</span> <span class="o">=</span> <span class="n">sparkSession</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s2">&quot;css&quot;</span><span class="p">)</span>\
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;resource&quot;</span><span class="p">,</span> <span class="n">resource</span><span class="p">)</span>\
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;es.nodes&quot;</span><span class="p">,</span> <span class="n">nodes</span><span class="p">)</span>\
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;es.net.ssl&quot;</span><span class="p">,</span> <span class="s2">&quot;true&quot;</span><span class="p">)</span>\
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;es.net.ssl.keystore.location&quot;</span><span class="p">,</span> <span class="s2">&quot;obs://Bucket name/path/transport-keystore.jks&quot;</span><span class="p">)</span>\
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;es.net.ssl.keystore.pass&quot;</span><span class="p">,</span> <span class="s2">&quot;***&quot;</span><span class="p">)</span>\
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;es.net.ssl.truststore.location&quot;</span><span class="p">,</span> <span class="s2">&quot;obs://Bucket name/path/truststore.jks&quot;</span><span class="p">)</span>\
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;es.net.ssl.truststore.pass&quot;</span><span class="p">,</span> <span class="s2">&quot;***&quot;</span><span class="p">)</span>\
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;es.net.http.auth.user&quot;</span><span class="p">,</span> <span class="s2">&quot;admin&quot;</span><span class="p">)</span>\
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;es.net.http.auth.pass&quot;</span><span class="p">,</span> <span class="s2">&quot;***&quot;</span><span class="p">)</span>\
<span class="o">.</span><span class="n">load</span><span class="p">()</span>
<span class="n">jdbcDF</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
</pre></div></td></tr></table></div>
</div>
</li><li id="dli_09_0090__en-us_topic_0199537139_li04481816173018">View the operation result.<p id="dli_09_0090__en-us_topic_0199537139_p179471918125015"><a name="dli_09_0090__en-us_topic_0199537139_li04481816173018"></a><a name="en-us_topic_0199537139_li04481816173018"></a><span><img id="dli_09_0090__en-us_topic_0199537139_image10946918135013" src="en-us_image_0266332986.png"></span></p>
</li></ol>
</li><li id="dli_09_0090__li1684632819462">Connecting to data sources through SQL APIs<ol id="dli_09_0090__en-us_topic_0199537139_ol564813553476"><li id="dli_09_0090__en-us_topic_0199537139_li19648135510475">Create a table to connect to a CSS data source.<div class="codecoloring" codetype="Python" id="dli_09_0090__en-us_topic_0199537139_screen822818915497"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
<span class="normal"> 2</span>
<span class="normal"> 3</span>
<span class="normal"> 4</span>
<span class="normal"> 5</span>
<span class="normal"> 6</span>
<span class="normal"> 7</span>
<span class="normal"> 8</span>
<span class="normal"> 9</span>
<span class="normal">10</span>
<span class="normal">11</span>
<span class="normal">12</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">sparkSession</span><span class="o">.</span><span class="n">sql</span><span class="p">(</span>
<span class="s2">&quot;create table css_table(id long, name string) using css options(\ </span>
<span class="s1">'es.nodes'</span><span class="o">=</span><span class="s1">'to-css-1174404953-hDTx3UPK.datasource.com:9200'</span><span class="p">,</span>\
<span class="s1">'es.nodes.wan.only'</span><span class="o">=</span><span class="s1">'true'</span><span class="p">,</span>\
<span class="s1">'resource'</span><span class="o">=</span><span class="s1">'/mytest'</span><span class="p">,</span>\
<span class="s1">'es.net.ssl'</span><span class="o">=</span><span class="s1">'true'</span><span class="p">,</span>\
<span class="s1">'es.net.ssl.keystore.location'</span><span class="o">=</span><span class="s1">'obs://Bucket name/path/transport-keystore.jks'</span><span class="p">,</span>\
<span class="s1">'es.net.ssl.keystore.pass'</span><span class="o">=</span><span class="s1">'***'</span><span class="p">,</span>\
<span class="s1">'es.net.ssl.truststore.location'</span><span class="o">=</span><span class="s1">'obs://Bucket name/path/truststore.jks'</span><span class="p">,</span>\
<span class="s1">'es.net.ssl.truststore.pass'</span><span class="o">=</span><span class="s1">'***'</span><span class="p">,</span>\
<span class="s1">'es.net.http.auth.user'</span><span class="o">=</span><span class="s1">'admin'</span><span class="p">,</span>\
<span class="s1">'es.net.http.auth.pass'</span><span class="o">=</span><span class="s1">'***'</span><span class="p">)</span><span class="s2">&quot;)</span>
</pre></div></td></tr></table></div>
</div>
<div class="note" id="dli_09_0090__en-us_topic_0199537139_note0745175005018"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="dli_09_0090__en-us_topic_0199537139_p193151638176">For details about the parameters for creating a CSS datasource connection table, see <a href="dli_09_0061.html#dli_09_0061__en-us_topic_0190067468_table569314388144">Table 1</a>.</p>
</div></div>
</li><li id="dli_09_0090__en-us_topic_0199537139_li11669142935211">Insert data.<div class="codecoloring" codetype="Python" id="dli_09_0090__en-us_topic_0199537139_screen1488194510529"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">sparkSession</span><span class="o">.</span><span class="n">sql</span><span class="p">(</span><span class="s2">&quot;insert into css_table values(3,'tom')&quot;</span><span class="p">)</span>
</pre></div></td></tr></table></div>
</div>
</li><li id="dli_09_0090__en-us_topic_0199537139_li15985054105216">Query data.<div class="codecoloring" codetype="Python" id="dli_09_0090__en-us_topic_0199537139_screen11514181135310"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">jdbcDF</span> <span class="o">=</span> <span class="n">sparkSession</span><span class="o">.</span><span class="n">sql</span><span class="p">(</span><span class="s2">&quot;select * from css_table&quot;</span><span class="p">)</span>
<span class="n">jdbcDF</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
</pre></div></td></tr></table></div>
</div>
</li><li id="dli_09_0090__en-us_topic_0199537139_li1969144743111">View the operation result.<p id="dli_09_0090__en-us_topic_0199537139_p3132154717503"><a name="dli_09_0090__en-us_topic_0199537139_li1969144743111"></a><a name="en-us_topic_0199537139_li1969144743111"></a><span><img id="dli_09_0090__en-us_topic_0199537139_image151321047135013" src="en-us_image_0266332987.png"></span></p>
</li></ol>
</li><li id="dli_09_0090__li5494818470">Submitting a Spark job<ol id="dli_09_0090__en-us_topic_0199537139_ol612481914610"><li id="dli_09_0090__en-us_topic_0199537139_li17148191617535">Upload the Python code file to DLI. </li><li id="dli_09_0090__en-us_topic_0199537139_li67827509599">In the Spark job editor, select the corresponding dependency module and execute the Spark job. <div class="note" id="dli_09_0090__en-us_topic_0199537139_note1435543551919"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><ul id="dli_09_0090__en-us_topic_0199537139_ul17825285811"><li id="dli_09_0090__en-us_topic_0199537139_li58215295819">When submitting a job, you need to specify a dependency module named <strong id="dli_09_0090__en-us_topic_0197738142_b1838454216111">sys.datasource.css</strong>.</li><li id="dli_09_0090__en-us_topic_0199537139_li14401129269">For details about how to submit a job on the DLI console, see </li><li id="dli_09_0090__en-us_topic_0199537139_li193313445818">For details about how to submit a job through an API, see the <strong id="dli_09_0090__b88941444171712">modules</strong> parameter in </li></ul>
</div></div>
</li></ol>
</li></ul>
</li><li id="dli_09_0090__li35131334124911">Complete example code<ul id="dli_09_0090__ul09131254184920"><li id="dli_09_0090__li1611295324919">Connecting to data sources through DataFrame APIs<div class="note" id="dli_09_0090__note84662050195919"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="dli_09_0090__p1140569541">Hard-coded or plaintext AK and SK pose significant security risks. To ensure security, encrypt your AK and SK, store them in configuration files or environment variables, and decrypt them when needed.</p>
</div></div>
<div class="codecoloring" codetype="Python" id="dli_09_0090__en-us_topic_0199537140_screen172016130283"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
<span class="normal"> 2</span>
<span class="normal"> 3</span>
<span class="normal"> 4</span>
<span class="normal"> 5</span>
<span class="normal"> 6</span>
<span class="normal"> 7</span>
<span class="normal"> 8</span>
<span class="normal"> 9</span>
<span class="normal">10</span>
<span class="normal">11</span>
<span class="normal">12</span>
<span class="normal">13</span>
<span class="normal">14</span>
<span class="normal">15</span>
<span class="normal">16</span>
<span class="normal">17</span>
<span class="normal">18</span>
<span class="normal">19</span>
<span class="normal">20</span>
<span class="normal">21</span>
<span class="normal">22</span>
<span class="normal">23</span>
<span class="normal">24</span>
<span class="normal">25</span>
<span class="normal">26</span>
<span class="normal">27</span>
<span class="normal">28</span>
<span class="normal">29</span>
<span class="normal">30</span>
<span class="normal">31</span>
<span class="normal">32</span>
<span class="normal">33</span>
<span class="normal">34</span>
<span class="normal">35</span>
<span class="normal">36</span>
<span class="normal">37</span>
<span class="normal">38</span>
<span class="normal">39</span>
<span class="normal">40</span>
<span class="normal">41</span>
<span class="normal">42</span>
<span class="normal">43</span>
<span class="normal">44</span>
<span class="normal">45</span>
<span class="normal">46</span>
<span class="normal">47</span>
<span class="normal">48</span>
<span class="normal">49</span>
<span class="normal">50</span>
<span class="normal">51</span>
<span class="normal">52</span>
<span class="normal">53</span>
<span class="normal">54</span>
<span class="normal">55</span>
<span class="normal">56</span>
<span class="normal">57</span></pre></div></td><td class="code"><div><pre><span></span><span class="c1"># _*_ coding: utf-8 _*_</span>
<span class="kn">from</span> <span class="nn">__future__</span> <span class="kn">import</span> <span class="n">print_function</span>
<span class="kn">from</span> <span class="nn">pyspark.sql.types</span> <span class="kn">import</span> <span class="n">Row</span><span class="p">,</span> <span class="n">StructType</span><span class="p">,</span> <span class="n">StructField</span><span class="p">,</span> <span class="n">IntegerType</span><span class="p">,</span> <span class="n">StringType</span>
<span class="kn">from</span> <span class="nn">pyspark.sql</span> <span class="kn">import</span> <span class="n">SparkSession</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s2">&quot;__main__&quot;</span><span class="p">:</span>
<span class="c1"># Create a SparkSession session. </span>
<span class="n">sparkSession</span> <span class="o">=</span> <span class="n">SparkSession</span><span class="o">.</span><span class="n">builder</span><span class="o">.</span><span class="n">appName</span><span class="p">(</span><span class="s2">&quot;datasource-css&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">getOrCreate</span><span class="p">()</span>
<span class="n">sparkSession</span><span class="o">.</span><span class="n">conf</span><span class="o">.</span><span class="n">set</span><span class="p">(</span><span class="s2">&quot;fs.obs.access.key&quot;</span><span class="p">,</span> <span class="n">ak</span><span class="p">)</span>
<span class="n">sparkSession</span><span class="o">.</span><span class="n">conf</span><span class="o">.</span><span class="n">set</span><span class="p">(</span><span class="s2">&quot;fs.obs.secret.key&quot;</span><span class="p">,</span> <span class="n">sk</span><span class="p">)</span>
<span class="n">sparkSession</span><span class="o">.</span><span class="n">conf</span><span class="o">.</span><span class="n">set</span><span class="p">(</span><span class="s2">&quot;fs.obs.endpoint&quot;</span><span class="p">,</span> <span class="n">enpoint</span><span class="p">)</span>
<span class="n">sparkSession</span><span class="o">.</span><span class="n">conf</span><span class="o">.</span><span class="n">set</span><span class="p">(</span><span class="s2">&quot;fs.obs.connecton.ssl.enabled&quot;</span><span class="p">,</span> <span class="s2">&quot;false&quot;</span><span class="p">)</span>
<span class="c1"># Setting cross-source connection parameters </span>
<span class="n">resource</span> <span class="o">=</span> <span class="s2">&quot;/mytest&quot;</span><span class="p">;</span>
<span class="n">nodes</span> <span class="o">=</span> <span class="s2">&quot;to-css-1174404953-hDTx3UPK.datasource.com:9200&quot;</span>
<span class="c1"># Setting schema </span>
<span class="n">schema</span> <span class="o">=</span> <span class="n">StructType</span><span class="p">([</span><span class="n">StructField</span><span class="p">(</span><span class="s2">&quot;id&quot;</span><span class="p">,</span> <span class="n">IntegerType</span><span class="p">(),</span> <span class="kc">False</span><span class="p">),</span>
<span class="n">StructField</span><span class="p">(</span><span class="s2">&quot;name&quot;</span><span class="p">,</span> <span class="n">StringType</span><span class="p">(),</span> <span class="kc">False</span><span class="p">)])</span>
<span class="c1"># Construction data </span>
<span class="n">rdd</span> <span class="o">=</span> <span class="n">sparkSession</span><span class="o">.</span><span class="n">sparkContext</span><span class="o">.</span><span class="n">parallelize</span><span class="p">([</span><span class="n">Row</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="s2">&quot;John&quot;</span><span class="p">),</span> <span class="n">Row</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="s2">&quot;Bob&quot;</span><span class="p">)])</span>
<span class="c1"># Create a DataFrame from RDD and schema </span>
<span class="n">dataFrame</span> <span class="o">=</span> <span class="n">sparkSession</span><span class="o">.</span><span class="n">createDataFrame</span><span class="p">(</span><span class="n">rdd</span><span class="p">,</span> <span class="n">schema</span><span class="p">)</span>
<span class="c1"># Write data to the CSS </span>
<span class="n">dataFrame</span><span class="o">.</span><span class="n">write</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s2">&quot;css&quot;</span><span class="p">)</span>
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;resource&quot;</span><span class="p">,</span> <span class="n">resource</span><span class="p">)</span>
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;es.nodes&quot;</span><span class="p">,</span> <span class="n">nodes</span><span class="p">)</span>
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;es.net.ssl&quot;</span><span class="p">,</span> <span class="s2">&quot;true&quot;</span><span class="p">)</span>
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;es.net.ssl.keystore.location&quot;</span><span class="p">,</span> <span class="s2">&quot;obs://Bucket name/path/transport-keystore.jks&quot;</span><span class="p">)</span>
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;es.net.ssl.keystore.pass&quot;</span><span class="p">,</span> <span class="s2">&quot;***&quot;</span><span class="p">)</span>
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;es.net.ssl.truststore.location&quot;</span><span class="p">,</span> <span class="s2">&quot;obs://Bucket name/path/truststore.jks&quot;</span><span class="p">)</span>
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;es.net.ssl.truststore.pass&quot;</span><span class="p">,</span> <span class="s2">&quot;***&quot;</span><span class="p">)</span>
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;es.net.http.auth.user&quot;</span><span class="p">,</span> <span class="s2">&quot;admin&quot;</span><span class="p">)</span>
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;es.net.http.auth.pass&quot;</span><span class="p">,</span> <span class="s2">&quot;***&quot;</span><span class="p">)</span>
<span class="o">.</span><span class="n">mode</span><span class="p">(</span><span class="s2">&quot;Overwrite&quot;</span><span class="p">)</span>
<span class="o">.</span><span class="n">save</span><span class="p">()</span>
<span class="c1"># Read data </span>
<span class="n">jdbcDF</span> <span class="o">=</span> <span class="n">sparkSession</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s2">&quot;css&quot;</span><span class="p">)</span>\
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;resource&quot;</span><span class="p">,</span> <span class="n">resource</span><span class="p">)</span>\
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;es.nodes&quot;</span><span class="p">,</span> <span class="n">nodes</span><span class="p">)</span>\
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;es.net.ssl&quot;</span><span class="p">,</span> <span class="s2">&quot;true&quot;</span><span class="p">)</span>\
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;es.net.ssl.keystore.location&quot;</span><span class="p">,</span> <span class="s2">&quot;obs://Bucket name/path/transport-keystore.jks&quot;</span><span class="p">)</span>\
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;es.net.ssl.keystore.pass&quot;</span><span class="p">,</span> <span class="s2">&quot;***&quot;</span><span class="p">)</span>\
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;es.net.ssl.truststore.location&quot;</span><span class="p">,</span> <span class="s2">&quot;obs://Bucket name/path/truststore.jks&quot;</span><span class="p">)</span>
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;es.net.ssl.truststore.pass&quot;</span><span class="p">,</span> <span class="s2">&quot;***&quot;</span><span class="p">)</span>\
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;es.net.http.auth.user&quot;</span><span class="p">,</span> <span class="s2">&quot;admin&quot;</span><span class="p">)</span>\
<span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;es.net.http.auth.pass&quot;</span><span class="p">,</span> <span class="s2">&quot;***&quot;</span><span class="p">)</span>\
<span class="o">.</span><span class="n">load</span><span class="p">()</span>
<span class="n">jdbcDF</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
<span class="c1"># close session </span>
<span class="n">sparkSession</span><span class="o">.</span><span class="n">stop</span><span class="p">()</span>
</pre></div></td></tr></table></div>
</div>
</li><li id="dli_09_0090__li682813483505">Connecting to data sources through SQL APIs<div class="codecoloring" codetype="Python" id="dli_09_0090__en-us_topic_0199537140_screen12862344162714"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
<span class="normal"> 2</span>
<span class="normal"> 3</span>
<span class="normal"> 4</span>
<span class="normal"> 5</span>
<span class="normal"> 6</span>
<span class="normal"> 7</span>
<span class="normal"> 8</span>
<span class="normal"> 9</span>
<span class="normal">10</span>
<span class="normal">11</span>
<span class="normal">12</span>
<span class="normal">13</span>
<span class="normal">14</span>
<span class="normal">15</span>
<span class="normal">16</span>
<span class="normal">17</span>
<span class="normal">18</span>
<span class="normal">19</span>
<span class="normal">20</span>
<span class="normal">21</span>
<span class="normal">22</span>
<span class="normal">23</span>
<span class="normal">24</span>
<span class="normal">25</span>
<span class="normal">26</span>
<span class="normal">27</span>
<span class="normal">28</span>
<span class="normal">29</span>
<span class="normal">30</span>
<span class="normal">31</span></pre></div></td><td class="code"><div><pre><span></span><span class="c1"># _*_ coding: utf-8 _*_</span>
<span class="kn">from</span> <span class="nn">__future__</span> <span class="kn">import</span> <span class="n">print_function</span>
<span class="kn">from</span> <span class="nn">pyspark.sql</span> <span class="kn">import</span> <span class="n">SparkSession</span>
<span class="kn">import</span> <span class="nn">os</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s2">&quot;__main__&quot;</span><span class="p">:</span>
<span class="c1"># Create a SparkSession session. </span>
<span class="n">sparkSession</span> <span class="o">=</span> <span class="n">SparkSession</span><span class="o">.</span><span class="n">builder</span><span class="o">.</span><span class="n">appName</span><span class="p">(</span><span class="s2">&quot;datasource-css&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">getOrCreate</span><span class="p">()</span>
<span class="c1"># Create a DLI data table for DLI-associated CSS </span>
<span class="n">sparkSession</span><span class="o">.</span><span class="n">sql</span><span class="p">(</span><span class="s2">&quot;create table css_table(id int, name string) using css options(</span><span class="se">\</span>
<span class="s2"> 'es.nodes'='192.168.6.204:9200',</span><span class="se">\</span>
<span class="s2"> 'es.nodes.wan.only'='true',</span><span class="se">\</span>
<span class="s2"> 'resource'='/mytest',</span><span class="se">\</span>
<span class="s2"> 'es.net.ssl'='true',</span><span class="se">\</span>
<span class="s2"> 'es.net.ssl.keystore.location' = 'obs://xietest1/lzq/keystore.jks',</span><span class="se">\</span>
<span class="s2"> 'es.net.ssl.keystore.pass' = '**',</span><span class="se">\</span>
<span class="s2"> 'es.net.ssl.truststore.location'='obs://xietest1/lzq/truststore.jks',</span><span class="se">\</span>
<span class="s2"> 'es.net.ssl.truststore.pass'='**',</span><span class="se">\</span>
<span class="s2"> 'es.net.http.auth.user'='admin',</span><span class="se">\</span>
<span class="s2"> 'es.net.http.auth.pass'='**')&quot;</span><span class="p">)</span>
<span class="c1"># Insert data into the DLI data table </span>
<span class="n">sparkSession</span><span class="o">.</span><span class="n">sql</span><span class="p">(</span><span class="s2">&quot;insert into css_table values(3,'tom')&quot;</span><span class="p">)</span>
<span class="c1"># Read data from DLI data table </span>
<span class="n">jdbcDF</span> <span class="o">=</span> <span class="n">sparkSession</span><span class="o">.</span><span class="n">sql</span><span class="p">(</span><span class="s2">&quot;select * from css_table&quot;</span><span class="p">)</span>
<span class="n">jdbcDF</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
<span class="c1"># close session </span>
<span class="n">sparkSession</span><span class="o">.</span><span class="n">stop</span><span class="p">()</span>
</pre></div></td></tr></table></div>
</div>
</li></ul>
</li></ul>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="dli_09_0089.html">Connecting to CSS</a></div>
</div>
</div>