forked from docs/doc-exports
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com> Co-authored-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-committed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
516 lines
67 KiB
HTML
516 lines
67 KiB
HTML
<a name="dli_09_0069"></a><a name="dli_09_0069"></a>
|
|
|
|
<h1 class="topictitle1">Scala Example Code</h1>
|
|
<div id="body8662426"><div class="section" id="dli_09_0069__section1523149131718"><h4 class="sectiontitle">Scenario</h4><p id="dli_09_0069__p98722576293">This section provides Scala example code that demonstrates how to use a Spark job to access data from the GaussDB(DWS) data source.</p>
|
|
<p id="dli_09_0069__en-us_topic_0200509991_p1944354710257">A datasource connection has been created and bound to a queue on the DLI management console. </p>
|
|
<div class="note" id="dli_09_0069__note17925192652815"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="dli_09_0069__p692572617287">Hard-coded or plaintext passwords pose significant security risks. To ensure security, encrypt your passwords, store them in configuration files or environment variables, and decrypt them when needed.</p>
|
|
</div></div>
|
|
</div>
|
|
<div class="section" id="dli_09_0069__section534162612148"><h4 class="sectiontitle">Preparations</h4><div class="p" id="dli_09_0069__p143609414346">Constructing dependency information and creating a Spark session<ol id="dli_09_0069__en-us_topic_0190920191_ol831808585"><li id="dli_09_0069__en-us_topic_0190920191_li1822810810586">Import dependencies<p id="dli_09_0069__en-us_topic_0190920191_p9751145613019"><a name="dli_09_0069__en-us_topic_0190920191_li1822810810586"></a><a name="en-us_topic_0190920191_li1822810810586"></a>Involved Maven dependency</p>
|
|
<div class="codecoloring" codetype="Scala" id="dli_09_0069__en-us_topic_0190920191_screen5760163172012"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span>
|
|
<span class="normal">3</span>
|
|
<span class="normal">4</span>
|
|
<span class="normal">5</span></pre></div></td><td class="code"><div><pre><span></span><span class="o"><</span><span class="n">dependency</span><span class="o">></span>
|
|
<span class="w"> </span><span class="o"><</span><span class="n">groupId</span><span class="o">></span><span class="n">org</span><span class="p">.</span><span class="n">apache</span><span class="p">.</span><span class="n">spark</span><span class="o"></</span><span class="n">groupId</span><span class="o">></span>
|
|
<span class="w"> </span><span class="o"><</span><span class="n">artifactId</span><span class="o">></span><span class="n">spark</span><span class="o">-</span><span class="n">sql_2</span><span class="mf">.11</span><span class="o"></</span><span class="n">artifactId</span><span class="o">></span>
|
|
<span class="w"> </span><span class="o"><</span><span class="n">version</span><span class="o">></span><span class="mf">2.3.2</span><span class="o"></</span><span class="n">version</span><span class="o">></span>
|
|
<span class="o"></</span><span class="n">dependency</span><span class="o">></span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
<div class="p" id="dli_09_0069__en-us_topic_0190920191_p13761330205">Import dependency packages.<div class="codecoloring" codetype="Scala" id="dli_09_0069__en-us_topic_0190920191_screen1761153192016"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span>
|
|
<span class="normal">3</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">import</span><span class="w"> </span><span class="nn">java</span><span class="p">.</span><span class="nn">util</span><span class="p">.</span><span class="nc">Properties</span>
|
|
<span class="k">import</span><span class="w"> </span><span class="nn">org</span><span class="p">.</span><span class="nn">apache</span><span class="p">.</span><span class="nn">spark</span><span class="p">.</span><span class="nn">sql</span><span class="p">.{</span><span class="nc">Row</span><span class="p">,</span><span class="nc">SparkSession</span><span class="p">}</span>
|
|
<span class="k">import</span><span class="w"> </span><span class="nn">org</span><span class="p">.</span><span class="nn">apache</span><span class="p">.</span><span class="nn">spark</span><span class="p">.</span><span class="nn">sql</span><span class="p">.</span><span class="nc">SaveMode</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</div>
|
|
</li><li id="dli_09_0069__en-us_topic_0190920191_li663417557599">Create a session.<div class="codecoloring" codetype="Scala" id="dli_09_0069__en-us_topic_0190920191_screen1363475510592"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="kd">val</span><span class="w"> </span><span class="n">sparkSession</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nc">SparkSession</span><span class="p">.</span><span class="n">builder</span><span class="p">().</span><span class="n">getOrCreate</span><span class="p">()</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li></ol>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="dli_09_0069__section15219273356"><h4 class="sectiontitle">Accessing a Data Source Using a SQL API</h4><ol id="dli_09_0069__ol1237415523415"><li id="dli_09_0069__li73741155113418">Create a table to connect to a GaussDB(DWS) data source.<div class="codecoloring" codetype="Scala" id="dli_09_0069__screen1372165515342"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span>
|
|
<span class="normal">3</span>
|
|
<span class="normal">4</span>
|
|
<span class="normal">5</span>
|
|
<span class="normal">6</span>
|
|
<span class="normal">7</span>
|
|
<span class="normal">8</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span>
|
|
<span class="w"> </span><span class="s">"CREATE TABLE IF NOT EXISTS dli_to_dws USING JDBC OPTIONS (</span>
|
|
<span class="s"> 'url'='jdbc:postgresql://to-dws-1174404209-cA37siB6.datasource.com:8000/postgres',</span>
|
|
<span class="s"> 'dbtable'='customer',</span>
|
|
<span class="s"> 'user'='dbadmin',</span>
|
|
<span class="s"> 'passwdauth'='######'// Name of the datasource authentication of the password type created on DLI. If datasource authentication is used, you do not need to set the username and password for the job.</span>
|
|
<span class="s">)"</span>
|
|
<span class="p">)</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
|
|
<div class="tablenoborder"><a name="dli_09_0069__table193741955203417"></a><a name="table193741955203417"></a><table cellpadding="4" cellspacing="0" summary="" id="dli_09_0069__table193741955203417" frame="border" border="1" rules="all"><caption><b>Table 1 </b>Parameters for creating a table</caption><thead align="left"><tr id="dli_09_0069__row637215514345"><th align="left" class="cellrowborder" valign="top" width="15.440000000000001%" id="mcps1.3.3.2.1.2.2.3.1.1"><p id="dli_09_0069__p4372155143414">Parameter</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="84.56%" id="mcps1.3.3.2.1.2.2.3.1.2"><p id="dli_09_0069__p1437225513341">Description</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="dli_09_0069__row113721255133414"><td class="cellrowborder" valign="top" width="15.440000000000001%" headers="mcps1.3.3.2.1.2.2.3.1.1 "><p id="dli_09_0069__p2372185523415">url</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="84.56%" headers="mcps1.3.3.2.1.2.2.3.1.2 "><p id="dli_09_0069__p737295518348">To obtain a GaussDB(DWS) IP address, you need to create a datasource connection first. Refer to <em id="dli_09_0069__i19522111718">Data Lake Insight User Guide</em> for more information.</p>
|
|
<p id="dli_09_0069__p133722554340">After an enhanced datasource connection is created, you can use the JDBC connection string (intranet) provided by GaussDB(DWS) or the intranet IP address and port number to connect to GaussDB(DWS). The format is <strong id="dli_09_0069__en-us_topic_0190920191_b8131210123519"><em id="dli_09_0069__i1312111016350">protocol header</em>://<em id="dli_09_0069__i612510173517">internal IP address</em>:<em id="dli_09_0069__i111371014358">internal network port number</em>/<em id="dli_09_0069__i4139104352">database name</em></strong>, for example: <strong id="dli_09_0069__en-us_topic_0190920191_b1513111019356">jdbc:postgresql://192.168.0.77:8000/postgres</strong>. For details about how to obtain the value, see <em id="dli_09_0069__en-us_topic_0190920191_i156701326173512">GaussDB(DWS) cluster information</em>.</p>
|
|
<div class="note" id="dli_09_0069__note19372105512346"><span class="notetitle"> NOTE: </span><div class="notebody"><p id="dli_09_0069__p1037215543418">The GaussDB(DWS) IP address is in the following format: <strong id="dli_09_0069__en-us_topic_0190920191_b221191581211"><em id="dli_09_0069__i16242112011220">protocol header</em>://<em id="dli_09_0069__i5216202681216">IP address</em>:<em id="dli_09_0069__i172713306123">port number</em>/<em id="dli_09_0069__i10160835131211">database name</em></strong></p>
|
|
<p id="dli_09_0069__p13372165513414">Example:</p>
|
|
<p id="dli_09_0069__p12372165563418">jdbc:postgresql://to-dws-1174405119-ihlUr78j.datasource.com:8000/postgres</p>
|
|
<p id="dli_09_0069__p837255510342">If you want to connect to a database created in GaussDB(DWS), change <strong id="dli_09_0069__en-us_topic_0190920191_b3374631141315">postgres</strong> to the corresponding database name in this connection.</p>
|
|
</div></div>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0069__row1437305516348"><td class="cellrowborder" valign="top" width="15.440000000000001%" headers="mcps1.3.3.2.1.2.2.3.1.1 "><p id="dli_09_0069__p5373155519346">passwdauth</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="84.56%" headers="mcps1.3.3.2.1.2.2.3.1.2 "><p id="dli_09_0069__p12373205523418">Name of datasource authentication of the password type created on DLI. If datasource authentication is used, you do not need to set the username and password for jobs.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0069__row133731155183418"><td class="cellrowborder" valign="top" width="15.440000000000001%" headers="mcps1.3.3.2.1.2.2.3.1.1 "><p id="dli_09_0069__p537311554340">dbtable</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="84.56%" headers="mcps1.3.3.2.1.2.2.3.1.2 "><p id="dli_09_0069__p1237312559347">Tables in the PostgreSQL database.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0069__row1237311553346"><td class="cellrowborder" valign="top" width="15.440000000000001%" headers="mcps1.3.3.2.1.2.2.3.1.1 "><p id="dli_09_0069__p83736558341">partitionColumn</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="84.56%" headers="mcps1.3.3.2.1.2.2.3.1.2 "><p id="dli_09_0069__p1737325518349">This parameter is used to set the numeric field used concurrently when data is read.</p>
|
|
<div class="note" id="dli_09_0069__note18373185583417"><span class="notetitle"> NOTE: </span><div class="notebody"><ul id="dli_09_0069__ul14373655163418"><li id="dli_09_0069__li143733552348">The <strong id="dli_09_0069__en-us_topic_0190920191_b1681163223511">partitionColumn</strong>, <strong id="dli_09_0069__en-us_topic_0190920191_b581303243519">lowerBound</strong>, <strong id="dli_09_0069__en-us_topic_0190920191_b108148324356">upperBound</strong>, and <strong id="dli_09_0069__en-us_topic_0190920191_b1381503213517">numPartitions</strong> parameters must be set at the same time.</li><li id="dli_09_0069__li5373165512344">To improve the concurrent read performance, you are advised to use auto-increment columns.</li></ul>
|
|
</div></div>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0069__row15373185518347"><td class="cellrowborder" valign="top" width="15.440000000000001%" headers="mcps1.3.3.2.1.2.2.3.1.1 "><p id="dli_09_0069__p10373105523415">lowerBound</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="84.56%" headers="mcps1.3.3.2.1.2.2.3.1.2 "><p id="dli_09_0069__p1537310552345">Minimum value of a column specified by <strong id="dli_09_0069__en-us_topic_0190920191_b34789390353">partitionColumn</strong>. The value is contained in the returned result.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0069__row11373555133420"><td class="cellrowborder" valign="top" width="15.440000000000001%" headers="mcps1.3.3.2.1.2.2.3.1.1 "><p id="dli_09_0069__p14373955153414">upperBound</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="84.56%" headers="mcps1.3.3.2.1.2.2.3.1.2 "><p id="dli_09_0069__p193735554344">Maximum value of a column specified by <strong id="dli_09_0069__en-us_topic_0190920191_b68101651163514">partitionColumn</strong>. The value is not contained in the returned result.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0069__row1437315533412"><td class="cellrowborder" valign="top" width="15.440000000000001%" headers="mcps1.3.3.2.1.2.2.3.1.1 "><p id="dli_09_0069__p17373135553418">numPartitions</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="84.56%" headers="mcps1.3.3.2.1.2.2.3.1.2 "><p id="dli_09_0069__p1937317556341">Number of concurrent read operations.</p>
|
|
<div class="note" id="dli_09_0069__note133736557344"><span class="notetitle"> NOTE: </span><div class="notebody"><p id="dli_09_0069__p1037375573419">When data is read,<strong id="dli_09_0069__en-us_topic_0190920191_b11596556143513"> lowerBound</strong> and <strong id="dli_09_0069__en-us_topic_0190920191_b14597195663515">upperBound</strong> are evenly allocated to each task to obtain data. Example:</p>
|
|
<p id="dli_09_0069__p153737553346">'partitionColumn'='id',</p>
|
|
<p id="dli_09_0069__p113731755183419">'lowerBound'='0',</p>
|
|
<p id="dli_09_0069__p14373185510342">'upperBound'='100',</p>
|
|
<p id="dli_09_0069__p237395516348">'numPartitions'='2'</p>
|
|
<p id="dli_09_0069__p0373455133411">Two concurrent tasks are started in DLI. The execution ID of one task is greater than or equal to <strong id="dli_09_0069__en-us_topic_0190920191_b438714012361">0</strong> and the ID is less than <strong id="dli_09_0069__en-us_topic_0190920191_b20388150183619">50</strong>, and the execution ID of the other task is greater than or equal to <strong id="dli_09_0069__en-us_topic_0190920191_b638950173615">50</strong> and the ID is less than <strong id="dli_09_0069__en-us_topic_0190920191_b10391200153612">100</strong>.</p>
|
|
</div></div>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0069__row1237365593416"><td class="cellrowborder" valign="top" width="15.440000000000001%" headers="mcps1.3.3.2.1.2.2.3.1.1 "><p id="dli_09_0069__p2373175513420">fetchsize</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="84.56%" headers="mcps1.3.3.2.1.2.2.3.1.2 "><p id="dli_09_0069__p837375510341">Number of data records obtained in each batch during data reading. The default value is <strong id="dli_09_0069__en-us_topic_0190920191_b617417593618">1000</strong>. If this parameter is set to a large value, the performance is good but more memory is occupied. If this parameter is set to a large value, memory overflow may occur.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0069__row63731955173410"><td class="cellrowborder" valign="top" width="15.440000000000001%" headers="mcps1.3.3.2.1.2.2.3.1.1 "><p id="dli_09_0069__p73731255123418">batchsize</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="84.56%" headers="mcps1.3.3.2.1.2.2.3.1.2 "><p id="dli_09_0069__p1373175511345">Number of data records written in each batch. The default value is <strong id="dli_09_0069__en-us_topic_0190920191_b108701111143615">1000</strong>. If this parameter is set to a large value, the performance is good but more memory is occupied. If this parameter is set to a large value, memory overflow may occur.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0069__row19374855163416"><td class="cellrowborder" valign="top" width="15.440000000000001%" headers="mcps1.3.3.2.1.2.2.3.1.1 "><p id="dli_09_0069__p8374195563412">truncate</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="84.56%" headers="mcps1.3.3.2.1.2.2.3.1.2 "><p id="dli_09_0069__p183745552349">Indicates whether to clear the table without deleting the original table when <strong id="dli_09_0069__en-us_topic_0190920191_b108062153365">overwrite</strong> is executed. The options are as follows:</p>
|
|
<ul id="dli_09_0069__ul2374655193413"><li id="dli_09_0069__li337411555343">true</li><li id="dli_09_0069__li93746556342">false</li></ul>
|
|
<p id="dli_09_0069__p1437485503414">The default value is <span class="parmvalue" id="dli_09_0069__en-us_topic_0190920191_parmvalue1481918203611"><b>false</b></span>, indicating that the original table is deleted and then a new table is created when the <strong id="dli_09_0069__en-us_topic_0190920191_b184831814368">overwrite</strong> operation is performed.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_09_0069__row6374105511349"><td class="cellrowborder" valign="top" width="15.440000000000001%" headers="mcps1.3.3.2.1.2.2.3.1.1 "><p id="dli_09_0069__p1637435593418">isolationLevel</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="84.56%" headers="mcps1.3.3.2.1.2.2.3.1.2 "><p id="dli_09_0069__p03741255103412">Transaction isolation level. The options are as follows:</p>
|
|
<ul id="dli_09_0069__ul1937417552349"><li id="dli_09_0069__li937455533416">NONE</li><li id="dli_09_0069__li15374135593410">READ_UNCOMMITTED</li><li id="dli_09_0069__li2374205583418">READ_COMMITTED</li><li id="dli_09_0069__li937465563420">REPEATABLE_READ</li><li id="dli_09_0069__li13374115563416">SERIALIZABLE</li></ul>
|
|
<p id="dli_09_0069__p173749551342">The default value is <span class="parmvalue" id="dli_09_0069__en-us_topic_0190920191_parmvalue41845227367"><b>READ_UNCOMMITTED</b></span>.</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</li><li id="dli_09_0069__li17374955113415">Insert data<div class="codecoloring" codetype="Scala" id="dli_09_0069__screen1337415543417"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span><span class="s">"insert into dli_to_dws values(1, 'John',24),(2, 'Bob',32)"</span><span class="p">)</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0069__li10374195519343">Query data<div class="codecoloring" codetype="Scala" id="dli_09_0069__screen3374455193410"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><span class="kd">val</span><span class="w"> </span><span class="n">dataFrame</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span><span class="s">"select * from dli_to_dws"</span><span class="p">)</span>
|
|
<span class="n">dataFrame</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
<p id="dli_09_0069__p133741455203417">Before data is inserted:</p>
|
|
<p id="dli_09_0069__p3374135513418"><span><img id="dli_09_0069__en-us_topic_0190920191_image13988182014307" src="en-us_image_0223997003.png"></span></p>
|
|
<p id="dli_09_0069__p33742055113410">Response:</p>
|
|
<p id="dli_09_0069__p12374115512347"><span><img id="dli_09_0069__en-us_topic_0190920191_image1835112103113" src="en-us_image_0223997004.png"></span></p>
|
|
</li><li id="dli_09_0069__li14374145523416">Delete the datasource connection table.<div class="codecoloring" codetype="Scala" id="dli_09_0069__screen1137475514341"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span><span class="s">"drop table dli_to_dws"</span><span class="p">)</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li></ol>
|
|
</div>
|
|
<div class="section" id="dli_09_0069__section519052144120"><a name="dli_09_0069__section519052144120"></a><a name="section519052144120"></a><h4 class="sectiontitle">Accessing a Data Source Using a DataFrame API</h4><ol id="dli_09_0069__en-us_topic_0190920191_ol115363171726"><li id="dli_09_0069__en-us_topic_0190920191_li1853621717220">Set connection parameters.<div class="codecoloring" codetype="Scala" id="dli_09_0069__en-us_topic_0190920191_screen15789351922"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span>
|
|
<span class="normal">3</span>
|
|
<span class="normal">4</span></pre></div></td><td class="code"><div><pre><span></span><span class="kd">val</span><span class="w"> </span><span class="n">url</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"jdbc:postgresql://to-dws-1174405057-EA1Kgo8H.datasource.com:8000/postgres"</span>
|
|
<span class="kd">val</span><span class="w"> </span><span class="n">username</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"dbadmin"</span>
|
|
<span class="kd">val</span><span class="w"> </span><span class="n">password</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"######"</span>
|
|
<span class="kd">val</span><span class="w"> </span><span class="n">dbtable</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"customer"</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0069__en-us_topic_0190920191_li176615551626">Create a DataFrame, add data, and rename fields<div class="codecoloring" codetype="Scala" id="dli_09_0069__en-us_topic_0190920191_screen764613543101"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span>
|
|
<span class="normal">3</span>
|
|
<span class="normal">4</span></pre></div></td><td class="code"><div><pre><span></span><span class="kd">var</span><span class="w"> </span><span class="n">dataFrame_1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">createDataFrame</span><span class="p">(</span><span class="nc">List</span><span class="p">((</span><span class="mi">8</span><span class="p">,</span><span class="w"> </span><span class="s">"Jack_1"</span><span class="p">,</span><span class="w"> </span><span class="mi">18</span><span class="p">)))</span>
|
|
<span class="kd">val</span><span class="w"> </span><span class="n">df</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">dataFrame_1</span><span class="p">.</span><span class="n">withColumnRenamed</span><span class="p">(</span><span class="s">"_1"</span><span class="p">,</span><span class="w"> </span><span class="s">"id"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">withColumnRenamed</span><span class="p">(</span><span class="s">"_2"</span><span class="p">,</span><span class="w"> </span><span class="s">"name"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">withColumnRenamed</span><span class="p">(</span><span class="s">"_3"</span><span class="p">,</span><span class="w"> </span><span class="s">"age"</span><span class="p">)</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0069__en-us_topic_0190920191_li1140812515417">Import data to GaussDB(DWS).<div class="codecoloring" codetype="Scala" id="dli_09_0069__en-us_topic_0190920191_screen41537824210"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span>
|
|
<span class="normal">3</span>
|
|
<span class="normal">4</span>
|
|
<span class="normal">5</span>
|
|
<span class="normal">6</span>
|
|
<span class="normal">7</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">df</span><span class="p">.</span><span class="n">write</span><span class="p">.</span><span class="n">format</span><span class="p">(</span><span class="s">"jdbc"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"url"</span><span class="p">,</span><span class="w"> </span><span class="n">url</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"dbtable"</span><span class="p">,</span><span class="w"> </span><span class="n">dbtable</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"user"</span><span class="p">,</span><span class="w"> </span><span class="n">username</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"password"</span><span class="p">,</span><span class="w"> </span><span class="n">password</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">mode</span><span class="p">(</span><span class="nc">SaveMode</span><span class="p">.</span><span class="nc">Append</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">save</span><span class="p">()</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
<div class="p" id="dli_09_0069__en-us_topic_0190920191_p133461428192612"><div class="note" id="dli_09_0069__en-us_topic_0190920191_note17397174817568"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="dli_09_0069__en-us_topic_0190920191_p039712487568">The options of <strong id="dli_09_0069__en-us_topic_0190920191_b17204250102510">SaveMode</strong> can be one of the following:</p>
|
|
<ul id="dli_09_0069__en-us_topic_0190920191_ul227384131711"><li id="dli_09_0069__en-us_topic_0190920191_li12273154181718"><strong id="dli_09_0069__b256713411281">ErrorIfExis</strong>: If the data already exists, the system throws an exception.</li><li id="dli_09_0069__en-us_topic_0190920191_li18273141161717"><strong id="dli_09_0069__b84094611288">Overwrite</strong>: If the data already exists, the original data will be overwritten.</li><li id="dli_09_0069__en-us_topic_0190920191_li727344171714"><strong id="dli_09_0069__b4346580284">Append</strong>: If the data already exists, the system saves the new data.</li><li id="dli_09_0069__en-us_topic_0190920191_li6273141171711"><strong id="dli_09_0069__b178618915282">Ignore</strong>: If the data already exists, no operation is required. This is similar to the SQL statement <strong id="dli_09_0069__en-us_topic_0190920191_b79689135264">CREATE TABLE IF NOT EXISTS</strong>.</li></ul>
|
|
</div></div>
|
|
</div>
|
|
</li><li id="dli_09_0069__en-us_topic_0190920191_li175338232031">Read data from GaussDB(DWS).<ul id="dli_09_0069__en-us_topic_0190920191_ul18485518142411"><li id="dli_09_0069__en-us_topic_0190920191_li184851618132412">Method 1: read.format()<div class="codecoloring" codetype="Scala" id="dli_09_0069__en-us_topic_0190920191_screen5984155015578"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span>
|
|
<span class="normal">3</span>
|
|
<span class="normal">4</span>
|
|
<span class="normal">5</span>
|
|
<span class="normal">6</span></pre></div></td><td class="code"><div><pre><span></span><span class="kd">val</span><span class="w"> </span><span class="n">jdbcDF</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">read</span><span class="p">.</span><span class="n">format</span><span class="p">(</span><span class="s">"jdbc"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"url"</span><span class="p">,</span><span class="w"> </span><span class="n">url</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"dbtable"</span><span class="p">,</span><span class="w"> </span><span class="n">dbtable</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"user"</span><span class="p">,</span><span class="w"> </span><span class="n">username</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"password"</span><span class="p">,</span><span class="w"> </span><span class="n">password</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">load</span><span class="p">()</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0069__en-us_topic_0190920191_li485184512268">Method 2: read.jdbc()<div class="codecoloring" codetype="Scala" id="dli_09_0069__en-us_topic_0190920191_screen11319163542718"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span>
|
|
<span class="normal">3</span>
|
|
<span class="normal">4</span></pre></div></td><td class="code"><div><pre><span></span><span class="kd">val</span><span class="w"> </span><span class="n">properties</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="nc">Properties</span><span class="p">()</span>
|
|
<span class="w"> </span><span class="n">properties</span><span class="p">.</span><span class="n">put</span><span class="p">(</span><span class="s">"user"</span><span class="p">,</span><span class="w"> </span><span class="n">username</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="n">properties</span><span class="p">.</span><span class="n">put</span><span class="p">(</span><span class="s">"password"</span><span class="p">,</span><span class="w"> </span><span class="n">password</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">jdbcDF2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">read</span><span class="p">.</span><span class="n">jdbc</span><span class="p">(</span><span class="n">url</span><span class="p">,</span><span class="w"> </span><span class="n">dbtable</span><span class="p">,</span><span class="w"> </span><span class="n">properties</span><span class="p">)</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li></ul>
|
|
<p id="dli_09_0069__en-us_topic_0190920191_p1157018328288">Before data is inserted:</p>
|
|
<p id="dli_09_0069__en-us_topic_0190920191_p6871138295"><span><img id="dli_09_0069__en-us_topic_0190920191_image1426610158293" src="en-us_image_0000001757887441.png"></span></p>
|
|
<p id="dli_09_0069__en-us_topic_0190920191_p106817197297">Response:</p>
|
|
<p id="dli_09_0069__en-us_topic_0190920191_p1766622618299"><span><img id="dli_09_0069__en-us_topic_0190920191_image12169174452914" src="en-us_image_0000001710007784.png"></span></p>
|
|
<p id="dli_09_0069__en-us_topic_0190920191_p18381174713012">The dateFrame read by the <strong id="dli_09_0069__en-us_topic_0190920191_b231214593264">read.format()</strong> or <strong id="dli_09_0069__en-us_topic_0190920191_b1631316593263">read.jdbc()</strong> method is registered as a temporary table. Then, you can use SQL statements to query data.</p>
|
|
<div class="codecoloring" codetype="Scala" id="dli_09_0069__en-us_topic_0190920191_screen177519171321"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">jdbcDF</span><span class="p">.</span><span class="n">registerTempTable</span><span class="p">(</span><span class="s">"customer_test"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span><span class="s">"select * from customer_test where id = 1"</span><span class="p">).</span><span class="n">show</span><span class="p">()</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
<p id="dli_09_0069__en-us_topic_0190920191_p425171673314">Query results</p>
|
|
<p id="dli_09_0069__en-us_topic_0190920191_p291618369330"><span><img id="dli_09_0069__en-us_topic_0190920191_image488080123418" src="en-us_image_0000001757807269.png"></span></p>
|
|
</li></ol>
|
|
</div>
|
|
<div class="section" id="dli_09_0069__section2767143444512"><h4 class="sectiontitle">DataFrame-Related Operations</h4><p id="dli_09_0069__dli_09_0067_en-us_topic_0190647826_p256316251443">The data created by the <strong id="dli_09_0069__b11281141783412">createDataFrame()</strong> method and the data queried by the <strong id="dli_09_0069__b8282191733415">read.format()</strong> method and the <strong id="dli_09_0069__b17282181703420">read.jdbc()</strong> method are all DataFrame objects. You can directly query a single record. (In <a href="#dli_09_0069__section519052144120">Accessing a Data Source Using a DataFrame API</a>, the DataFrame data is registered as a temporary table.)</p>
|
|
<ul id="dli_09_0069__dli_09_0067_en-us_topic_0190647826_ul42831734124912"><li id="dli_09_0069__dli_09_0067_en-us_topic_0190647826_li1283134174915">where<p id="dli_09_0069__dli_09_0067_en-us_topic_0190647826_p19687844194911"><a name="dli_09_0069__dli_09_0067_en-us_topic_0190647826_li1283134174915"></a><a name="dli_09_0067_en-us_topic_0190647826_li1283134174915"></a>The <strong id="dli_09_0069__b5648045163410">where</strong> statement can be combined with filter expressions such as AND and OR. The DataFrame object after filtering is returned. The following is an example:</p>
|
|
<div class="codecoloring" codetype="Scala" id="dli_09_0069__dli_09_0067_en-us_topic_0190647826_screen33171610519"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">jdbcDF</span><span class="p">.</span><span class="n">where</span><span class="p">(</span><span class="s">"id = 1 or age <=10"</span><span class="p">).</span><span class="n">show</span><span class="p">()</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
<p id="dli_09_0069__dli_09_0067_en-us_topic_0190647826_p23061540145118"><span><img id="dli_09_0069__dli_09_0067_en-us_topic_0190647826_image1537955013517" src="en-us_image_0000001709848312.png"></span></p>
|
|
</li><li id="dli_09_0069__dli_09_0067_en-us_topic_0190647826_li1052710112528">filter<p id="dli_09_0069__dli_09_0067_en-us_topic_0190647826_p15820201214527"><a name="dli_09_0069__dli_09_0067_en-us_topic_0190647826_li1052710112528"></a><a name="dli_09_0067_en-us_topic_0190647826_li1052710112528"></a>The <strong id="dli_09_0069__b3871658123414">filter</strong> statement can be used in the same way as <strong id="dli_09_0069__b1788115893414">where</strong>. The DataFrame object after filtering is returned. The following is an example:</p>
|
|
<div class="codecoloring" codetype="Scala" id="dli_09_0069__dli_09_0067_en-us_topic_0190647826_screen1430455175210"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">jdbcDF</span><span class="p">.</span><span class="n">filter</span><span class="p">(</span><span class="s">"id = 1 or age <=10"</span><span class="p">).</span><span class="n">show</span><span class="p">()</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
<p id="dli_09_0069__dli_09_0067_en-us_topic_0190647826_p555112219531"><span><img id="dli_09_0069__dli_09_0067_en-us_topic_0190647826_image12333183495310" src="en-us_image_0000001757887457.png"></span></p>
|
|
</li><li id="dli_09_0069__dli_09_0067_en-us_topic_0190647826_li152231752155319">select<p id="dli_09_0069__dli_09_0067_en-us_topic_0190647826_p19347919125416"><a name="dli_09_0069__dli_09_0067_en-us_topic_0190647826_li152231752155319"></a><a name="dli_09_0067_en-us_topic_0190647826_li152231752155319"></a>The <strong id="dli_09_0069__b12742013143515">select</strong> statement is used to query the DataFrame object of the specified field. Multiple fields can be queried.</p>
|
|
<ul id="dli_09_0069__dli_09_0067_en-us_topic_0190647826_ul866719515557"><li id="dli_09_0069__dli_09_0067_en-us_topic_0190647826_li11622335518">Example 1:<div class="codecoloring" codetype="Scala" id="dli_09_0069__dli_09_0067_en-us_topic_0190647826_screen720712280554"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">jdbcDF</span><span class="p">.</span><span class="n">select</span><span class="p">(</span><span class="s">"id"</span><span class="p">).</span><span class="n">show</span><span class="p">()</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
<p id="dli_09_0069__dli_09_0067_en-us_topic_0190647826_p376801511586"><span><img id="dli_09_0069__dli_09_0067_en-us_topic_0190647826_image1521542615812" src="en-us_image_0000001710007804.png"></span></p>
|
|
</li><li id="dli_09_0069__dli_09_0067_en-us_topic_0190647826_li282410115615">Example 2:<div class="codecoloring" codetype="Scala" id="dli_09_0069__dli_09_0067_en-us_topic_0190647826_screen147057205560"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">jdbcDF</span><span class="p">.</span><span class="n">select</span><span class="p">(</span><span class="s">"id"</span><span class="p">,</span><span class="w"> </span><span class="s">"name"</span><span class="p">).</span><span class="n">show</span><span class="p">()</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
<p id="dli_09_0069__dli_09_0067_en-us_topic_0190647826_p981913565812"><span><img id="dli_09_0069__dli_09_0067_en-us_topic_0190647826_image493144518581" src="en-us_image_0000001757807293.png"></span></p>
|
|
</li><li id="dli_09_0069__dli_09_0067_en-us_topic_0190647826_li569594313568">Example 3:<div class="codecoloring" codetype="Scala" id="dli_09_0069__dli_09_0067_en-us_topic_0190647826_screen884051035712"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">jdbcDF</span><span class="p">.</span><span class="n">select</span><span class="p">(</span><span class="s">"id"</span><span class="p">,</span><span class="s">"name"</span><span class="p">).</span><span class="n">where</span><span class="p">(</span><span class="s">"id<4"</span><span class="p">).</span><span class="n">show</span><span class="p">()</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
<p id="dli_09_0069__dli_09_0067_en-us_topic_0190647826_p969625418585"><span><img id="dli_09_0069__dli_09_0067_en-us_topic_0190647826_image1857104115916" src="en-us_image_0000001709848328.png"></span></p>
|
|
</li></ul>
|
|
</li><li id="dli_09_0069__dli_09_0067_en-us_topic_0190647826_li1933094065919">selectExpr<p id="dli_09_0069__dli_09_0067_en-us_topic_0190647826_p133711121805"><a name="dli_09_0069__dli_09_0067_en-us_topic_0190647826_li1933094065919"></a><a name="dli_09_0067_en-us_topic_0190647826_li1933094065919"></a>The <strong id="dli_09_0069__b178831352183718">selectExpr</strong> statement is used to perform special processing on a field. For example, it can be used to change the field name. The following is an example:</p>
|
|
<p id="dli_09_0069__dli_09_0067_en-us_topic_0190647826_p66334116219">If you want to set the <strong id="dli_09_0069__b154031184020">name</strong> field to <strong id="dli_09_0069__b7553104017">name_test</strong> and add 1 to the value of <strong id="dli_09_0069__b991831134018">age</strong>, run the following statement:</p>
|
|
<div class="codecoloring" codetype="Scala" id="dli_09_0069__dli_09_0067_en-us_topic_0190647826_screen2312105913417"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">jdbcDF</span><span class="p">.</span><span class="n">selectExpr</span><span class="p">(</span><span class="s">"id"</span><span class="p">,</span><span class="w"> </span><span class="s">"name as name_test"</span><span class="p">,</span><span class="w"> </span><span class="s">"age+1"</span><span class="p">).</span><span class="n">show</span><span class="p">()</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0069__dli_09_0067_en-us_topic_0190647826_li1249223720518">col<p id="dli_09_0069__dli_09_0067_en-us_topic_0190647826_p119341053157"><a name="dli_09_0069__dli_09_0067_en-us_topic_0190647826_li1249223720518"></a><a name="dli_09_0067_en-us_topic_0190647826_li1249223720518"></a><strong id="dli_09_0069__b52431318419">col</strong> is used to obtain a specified field. Different from <strong id="dli_09_0069__b17974132017413">select</strong>, <strong id="dli_09_0069__b119751620114119">col</strong> can only be used to query the column type and one field can be returned at a time. The following is an example:</p>
|
|
<div class="codecoloring" codetype="Scala" id="dli_09_0069__dli_09_0067_en-us_topic_0190647826_screen5117162121019"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="kd">val</span><span class="w"> </span><span class="n">idCol</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">jdbcDF</span><span class="p">.</span><span class="n">col</span><span class="p">(</span><span class="s">"id"</span><span class="p">)</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0069__dli_09_0067_en-us_topic_0190647826_li1743853613133">drop<p id="dli_09_0069__dli_09_0067_en-us_topic_0190647826_p20754345201313"><a name="dli_09_0069__dli_09_0067_en-us_topic_0190647826_li1743853613133"></a><a name="dli_09_0067_en-us_topic_0190647826_li1743853613133"></a><strong id="dli_09_0069__b10228124174118">drop</strong> is used to delete a specified field. Specify a field you need to delete (only one field can be deleted at a time), the DataFrame object that does not contain the field is returned. The following is an example:</p>
|
|
<div class="codecoloring" codetype="Scala" id="dli_09_0069__dli_09_0067_en-us_topic_0190647826_screen174231152181411"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">jdbcDF</span><span class="p">.</span><span class="n">drop</span><span class="p">(</span><span class="s">"id"</span><span class="p">).</span><span class="n">show</span><span class="p">()</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
<p id="dli_09_0069__dli_09_0067_en-us_topic_0190647826_p41511136156"><span><img id="dli_09_0069__dli_09_0067_en-us_topic_0190647826_image4709299159" src="en-us_image_0000001757887477.png"></span></p>
|
|
</li></ul>
|
|
</div>
|
|
<div class="section" id="dli_09_0069__section3155845164618"><h4 class="sectiontitle">Submitting a Job</h4><ol id="dli_09_0069__ol262511093111"><li id="dli_09_0069__li17398198144711">Generate a JAR file based on the code and upload the file to DLI.<p id="dli_09_0069__p1273651654720"><a name="dli_09_0069__li17398198144711"></a><a name="li17398198144711"></a></p>
|
|
<p id="dli_09_0069__p75263173475"></p>
|
|
</li><li id="dli_09_0069__li584121319473">In the Spark job editor, select the corresponding dependency module and execute the Spark job.<p id="dli_09_0069__p356271814476"><a name="dli_09_0069__li584121319473"></a><a name="li584121319473"></a></p>
|
|
<p id="dli_09_0069__p133511914474"></p>
|
|
</li></ol>
|
|
</div>
|
|
<div class="section" id="dli_09_0069__section320419133336"><h4 class="sectiontitle">Complete Example Code</h4><ul id="dli_09_0069__ul8381132973312"><li id="dli_09_0069__li1381192914335">Maven dependency<div class="codecoloring" codetype="Scala" id="dli_09_0069__en-us_topic_0190920190_screen67618176298"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span>
|
|
<span class="normal">3</span>
|
|
<span class="normal">4</span>
|
|
<span class="normal">5</span></pre></div></td><td class="code"><div><pre><span></span><span class="o"><</span><span class="n">dependency</span><span class="o">></span>
|
|
<span class="w"> </span><span class="o"><</span><span class="n">groupId</span><span class="o">></span><span class="n">org</span><span class="p">.</span><span class="n">apache</span><span class="p">.</span><span class="n">spark</span><span class="o"></</span><span class="n">groupId</span><span class="o">></span>
|
|
<span class="w"> </span><span class="o"><</span><span class="n">artifactId</span><span class="o">></span><span class="n">spark</span><span class="o">-</span><span class="n">sql_2</span><span class="mf">.11</span><span class="o"></</span><span class="n">artifactId</span><span class="o">></span>
|
|
<span class="w"> </span><span class="o"><</span><span class="n">version</span><span class="o">></span><span class="mf">2.3.2</span><span class="o"></</span><span class="n">version</span><span class="o">></span>
|
|
<span class="o"></</span><span class="n">dependency</span><span class="o">></span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0069__li3559640114317">Connecting to data sources through SQL APIs<div class="note" id="dli_09_0069__note35914578265"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="dli_09_0069__p1217419587263">Hard-coded or plaintext passwords pose significant security risks. To ensure security, encrypt your passwords, store them in configuration files or environment variables, and decrypt them when needed.</p>
|
|
</div></div>
|
|
<div class="codecoloring" codetype="Scala" id="dli_09_0069__en-us_topic_0190920190_screen1026910495119"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
|
|
<span class="normal"> 2</span>
|
|
<span class="normal"> 3</span>
|
|
<span class="normal"> 4</span>
|
|
<span class="normal"> 5</span>
|
|
<span class="normal"> 6</span>
|
|
<span class="normal"> 7</span>
|
|
<span class="normal"> 8</span>
|
|
<span class="normal"> 9</span>
|
|
<span class="normal">10</span>
|
|
<span class="normal">11</span>
|
|
<span class="normal">12</span>
|
|
<span class="normal">13</span>
|
|
<span class="normal">14</span>
|
|
<span class="normal">15</span>
|
|
<span class="normal">16</span>
|
|
<span class="normal">17</span>
|
|
<span class="normal">18</span>
|
|
<span class="normal">19</span>
|
|
<span class="normal">20</span>
|
|
<span class="normal">21</span>
|
|
<span class="normal">22</span>
|
|
<span class="normal">23</span>
|
|
<span class="normal">24</span>
|
|
<span class="normal">25</span>
|
|
<span class="normal">26</span>
|
|
<span class="normal">27</span>
|
|
<span class="normal">28</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">import</span><span class="w"> </span><span class="nn">java</span><span class="p">.</span><span class="nn">util</span><span class="p">.</span><span class="nc">Properties</span>
|
|
<span class="k">import</span><span class="w"> </span><span class="nn">org</span><span class="p">.</span><span class="nn">apache</span><span class="p">.</span><span class="nn">spark</span><span class="p">.</span><span class="nn">sql</span><span class="p">.</span><span class="nc">SparkSession</span>
|
|
|
|
<span class="k">object</span><span class="w"> </span><span class="nc">Test_SQL_DWS</span><span class="w"> </span><span class="p">{</span>
|
|
<span class="w"> </span><span class="k">def</span><span class="w"> </span><span class="nf">main</span><span class="p">(</span><span class="n">args</span><span class="p">:</span><span class="w"> </span><span class="nc">Array</span><span class="p">[</span><span class="nc">String</span><span class="p">]):</span><span class="w"> </span><span class="nc">Unit</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
|
|
<span class="w"> </span><span class="c1">// Create a SparkSession session.</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">sparkSession</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nc">SparkSession</span><span class="p">.</span><span class="n">builder</span><span class="p">().</span><span class="n">getOrCreate</span><span class="p">()</span>
|
|
<span class="w"> </span><span class="c1">// Create a data table for DLI-associated DWS</span>
|
|
<span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span><span class="s">"CREATE TABLE IF NOT EXISTS dli_to_dws USING JDBC OPTIONS (</span>
|
|
<span class="s"> 'url'='jdbc:postgresql://to-dws-1174405057-EA1Kgo8H.datasource.com:8000/postgres',</span>
|
|
<span class="s"> 'dbtable'='customer',</span>
|
|
<span class="s"> 'user'='dbadmin',</span>
|
|
<span class="s"> 'password'='######')"</span><span class="p">)</span>
|
|
|
|
<span class="w"> </span><span class="c1">//*****************************SQL model***********************************</span>
|
|
<span class="w"> </span><span class="c1">//Insert data into the DLI data table</span>
|
|
<span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span><span class="s">"insert into dli_to_dws values(1,'John',24),(2,'Bob',32)"</span><span class="p">)</span>
|
|
<span class="w"> </span>
|
|
<span class="w"> </span><span class="c1">//Read data from DLI data table</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">dataFrame</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span><span class="s">"select * from dli_to_dws"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="n">dataFrame</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
|
|
<span class="w"> </span>
|
|
<span class="w"> </span><span class="c1">//drop table</span>
|
|
<span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span><span class="s">"drop table dli_to_dws"</span><span class="p">)</span>
|
|
|
|
<span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">close</span><span class="p">()</span>
|
|
<span class="w"> </span><span class="p">}</span>
|
|
<span class="p">}</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li><li id="dli_09_0069__li25621020104412">Connecting to data sources through DataFrame APIs<div class="note" id="dli_09_0069__note19162316182714"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="dli_09_0069__p1616212166278">Hard-coded or plaintext passwords pose significant security risks. To ensure security, encrypt your passwords, store them in configuration files or environment variables, and decrypt them when needed.</p>
|
|
</div></div>
|
|
<div class="codecoloring" codetype="Scala" id="dli_09_0069__en-us_topic_0190920190_screen1234920217320"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
|
|
<span class="normal"> 2</span>
|
|
<span class="normal"> 3</span>
|
|
<span class="normal"> 4</span>
|
|
<span class="normal"> 5</span>
|
|
<span class="normal"> 6</span>
|
|
<span class="normal"> 7</span>
|
|
<span class="normal"> 8</span>
|
|
<span class="normal"> 9</span>
|
|
<span class="normal">10</span>
|
|
<span class="normal">11</span>
|
|
<span class="normal">12</span>
|
|
<span class="normal">13</span>
|
|
<span class="normal">14</span>
|
|
<span class="normal">15</span>
|
|
<span class="normal">16</span>
|
|
<span class="normal">17</span>
|
|
<span class="normal">18</span>
|
|
<span class="normal">19</span>
|
|
<span class="normal">20</span>
|
|
<span class="normal">21</span>
|
|
<span class="normal">22</span>
|
|
<span class="normal">23</span>
|
|
<span class="normal">24</span>
|
|
<span class="normal">25</span>
|
|
<span class="normal">26</span>
|
|
<span class="normal">27</span>
|
|
<span class="normal">28</span>
|
|
<span class="normal">29</span>
|
|
<span class="normal">30</span>
|
|
<span class="normal">31</span>
|
|
<span class="normal">32</span>
|
|
<span class="normal">33</span>
|
|
<span class="normal">34</span>
|
|
<span class="normal">35</span>
|
|
<span class="normal">36</span>
|
|
<span class="normal">37</span>
|
|
<span class="normal">38</span>
|
|
<span class="normal">39</span>
|
|
<span class="normal">40</span>
|
|
<span class="normal">41</span>
|
|
<span class="normal">42</span>
|
|
<span class="normal">43</span>
|
|
<span class="normal">44</span>
|
|
<span class="normal">45</span>
|
|
<span class="normal">46</span>
|
|
<span class="normal">47</span>
|
|
<span class="normal">48</span>
|
|
<span class="normal">49</span>
|
|
<span class="normal">50</span>
|
|
<span class="normal">51</span>
|
|
<span class="normal">52</span>
|
|
<span class="normal">53</span>
|
|
<span class="normal">54</span>
|
|
<span class="normal">55</span>
|
|
<span class="normal">56</span>
|
|
<span class="normal">57</span>
|
|
<span class="normal">58</span>
|
|
<span class="normal">59</span>
|
|
<span class="normal">60</span>
|
|
<span class="normal">61</span>
|
|
<span class="normal">62</span>
|
|
<span class="normal">63</span>
|
|
<span class="normal">64</span>
|
|
<span class="normal">65</span>
|
|
<span class="normal">66</span>
|
|
<span class="normal">67</span>
|
|
<span class="normal">68</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">import</span><span class="w"> </span><span class="nn">java</span><span class="p">.</span><span class="nn">util</span><span class="p">.</span><span class="nc">Properties</span>
|
|
<span class="k">import</span><span class="w"> </span><span class="nn">org</span><span class="p">.</span><span class="nn">apache</span><span class="p">.</span><span class="nn">spark</span><span class="p">.</span><span class="nn">sql</span><span class="p">.</span><span class="nc">SparkSession</span>
|
|
<span class="k">import</span><span class="w"> </span><span class="nn">org</span><span class="p">.</span><span class="nn">apache</span><span class="p">.</span><span class="nn">spark</span><span class="p">.</span><span class="nn">sql</span><span class="p">.</span><span class="nc">SaveMode</span>
|
|
|
|
<span class="k">object</span><span class="w"> </span><span class="nc">Test_SQL_DWS</span><span class="w"> </span><span class="p">{</span>
|
|
<span class="w"> </span><span class="k">def</span><span class="w"> </span><span class="nf">main</span><span class="p">(</span><span class="n">args</span><span class="p">:</span><span class="w"> </span><span class="nc">Array</span><span class="p">[</span><span class="nc">String</span><span class="p">]):</span><span class="w"> </span><span class="nc">Unit</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
|
|
<span class="w"> </span><span class="c1">// Create a SparkSession session.</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">sparkSession</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nc">SparkSession</span><span class="p">.</span><span class="n">builder</span><span class="p">().</span><span class="n">getOrCreate</span><span class="p">()</span>
|
|
|
|
<span class="w"> </span><span class="c1">//*****************************DataFrame model***********************************</span>
|
|
<span class="w"> </span><span class="c1">// Set the connection configuration parameters. Contains url, username, password, dbtable.</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">url</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"jdbc:postgresql://to-dws-1174405057-EA1Kgo8H.datasource.com:8000/postgres"</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">username</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"dbadmin"</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">password</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"######"</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">dbtable</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"customer"</span>
|
|
|
|
<span class="w"> </span><span class="c1">//Create a DataFrame and initialize the DataFrame data.</span>
|
|
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="n">dataFrame_1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">createDataFrame</span><span class="p">(</span><span class="nc">List</span><span class="p">((</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="s">"Jack"</span><span class="p">,</span><span class="w"> </span><span class="mi">18</span><span class="p">)))</span>
|
|
<span class="w"> </span>
|
|
<span class="w"> </span><span class="c1">//Rename the fields set by the createDataFrame() method.</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">df</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">dataFrame_1</span><span class="p">.</span><span class="n">withColumnRenamed</span><span class="p">(</span><span class="s">"_1"</span><span class="p">,</span><span class="w"> </span><span class="s">"id"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">withColumnRenamed</span><span class="p">(</span><span class="s">"_2"</span><span class="p">,</span><span class="w"> </span><span class="s">"name"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">withColumnRenamed</span><span class="p">(</span><span class="s">"_3"</span><span class="p">,</span><span class="w"> </span><span class="s">"age"</span><span class="p">)</span>
|
|
|
|
<span class="w"> </span><span class="c1">//Write data to the dws_table_1 table</span>
|
|
<span class="w"> </span><span class="n">df</span><span class="p">.</span><span class="n">write</span><span class="p">.</span><span class="n">format</span><span class="p">(</span><span class="s">"jdbc"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"url"</span><span class="p">,</span><span class="w"> </span><span class="n">url</span><span class="p">)</span><span class="w"> </span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"dbtable"</span><span class="p">,</span><span class="w"> </span><span class="n">dbtable</span><span class="p">)</span><span class="w"> </span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"user"</span><span class="p">,</span><span class="w"> </span><span class="n">username</span><span class="p">)</span><span class="w"> </span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"password"</span><span class="p">,</span><span class="w"> </span><span class="n">password</span><span class="p">)</span><span class="w"> </span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">mode</span><span class="p">(</span><span class="nc">SaveMode</span><span class="p">.</span><span class="nc">Append</span><span class="p">)</span><span class="w"> </span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">save</span><span class="p">()</span>
|
|
|
|
<span class="w"> </span><span class="c1">// DataFrame object for data manipulation</span>
|
|
<span class="w"> </span><span class="c1">//Filter users with id=1</span>
|
|
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="n">newDF</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">df</span><span class="p">.</span><span class="n">filter</span><span class="p">(</span><span class="s">"id!=1"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="n">newDF</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
|
|
<span class="w"> </span>
|
|
<span class="w"> </span><span class="c1">// Filter the id column data</span>
|
|
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="n">newDF_1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">df</span><span class="p">.</span><span class="n">drop</span><span class="p">(</span><span class="s">"id"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="n">newDF_1</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
|
|
|
|
<span class="w"> </span><span class="c1">// Read the data of the customer table in the RDS database</span>
|
|
<span class="w"> </span><span class="c1">//Way one: Read data from GaussDB(DWS) using read.format()</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">jdbcDF</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">read</span><span class="p">.</span><span class="n">format</span><span class="p">(</span><span class="s">"jdbc"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"url"</span><span class="p">,</span><span class="w"> </span><span class="n">url</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"dbtable"</span><span class="p">,</span><span class="w"> </span><span class="n">dbtable</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"user"</span><span class="p">,</span><span class="w"> </span><span class="n">username</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"password"</span><span class="p">,</span><span class="w"> </span><span class="n">password</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">option</span><span class="p">(</span><span class="s">"driver"</span><span class="p">,</span><span class="w"> </span><span class="s">"org.postgresql.Driver"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="p">.</span><span class="n">load</span><span class="p">()</span>
|
|
<span class="w"> </span><span class="c1">//Way two: Read data from GaussDB(DWS) using read.jdbc()</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">properties</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="nc">Properties</span><span class="p">()</span>
|
|
<span class="w"> </span><span class="n">properties</span><span class="p">.</span><span class="n">put</span><span class="p">(</span><span class="s">"user"</span><span class="p">,</span><span class="w"> </span><span class="n">username</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="n">properties</span><span class="p">.</span><span class="n">put</span><span class="p">(</span><span class="s">"password"</span><span class="p">,</span><span class="w"> </span><span class="n">password</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">jdbcDF2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">read</span><span class="p">.</span><span class="n">jdbc</span><span class="p">(</span><span class="n">url</span><span class="p">,</span><span class="w"> </span><span class="n">dbtable</span><span class="p">,</span><span class="w"> </span><span class="n">properties</span><span class="p">)</span>
|
|
|
|
<span class="w"> </span><span class="cm">/**</span>
|
|
<span class="cm"> * Register the dateFrame read by read.format() or read.jdbc() as a temporary table, and query the data </span>
|
|
<span class="cm"> * using the sql statement.</span>
|
|
<span class="cm"> */</span>
|
|
<span class="w"> </span><span class="n">jdbcDF</span><span class="p">.</span><span class="n">registerTempTable</span><span class="p">(</span><span class="s">"customer_test"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="kd">val</span><span class="w"> </span><span class="n">result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">sql</span><span class="p">(</span><span class="s">"select * from customer_test where id = 1"</span><span class="p">)</span>
|
|
<span class="w"> </span><span class="n">result</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
|
|
|
|
<span class="w"> </span><span class="n">sparkSession</span><span class="p">.</span><span class="n">close</span><span class="p">()</span>
|
|
<span class="w"> </span><span class="p">}</span>
|
|
<span class="p">}</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
</li></ul>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="dli_09_0086.html">Connecting to GaussDB(DWS)</a></div>
|
|
</div>
|
|
</div>
|
|
|