doc-exports/docs/dli/sqlreference/dli_08_0346.html
Su, Xiaomeng 04d4597cf3 dli_sqlreference_0511_version
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com>
Co-authored-by: Su, Xiaomeng <suxiaomeng1@huawei.com>
Co-committed-by: Su, Xiaomeng <suxiaomeng1@huawei.com>
2023-11-02 14:34:08 +00:00

162 lines
18 KiB
HTML

<a name="dli_08_0346"></a><a name="dli_08_0346"></a>
<h1 class="topictitle1">File System Result Table</h1>
<div id="body8662426"><div class="section" id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_dli_08_0256_en-us_topic_0132788972_section108631122164917"><h4 class="sectiontitle">Function</h4><p id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_p4142325173817">You can create a file system result table to export data to a file system such as HDFS or OBS. After the data is generated, a non-DLI table can be created directly according to the generated directory. The table can be processed through DLI SQL, and the output data directory can be stored in partition tables. It is applicable to scenarios such as data dumping, big data analysis, data backup, and active, deep, or cold archiving.</p>
</div>
<div class="section" id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_section434381984316"><h4 class="sectiontitle">Syntax</h4><div class="codecoloring" codetype="Sql" id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_screen153431199432"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span>
<span class="normal">5</span>
<span class="normal">6</span>
<span class="normal">7</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">create</span><span class="w"> </span><span class="k">table</span><span class="w"> </span><span class="n">filesystemSink</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="n">attr_name</span><span class="w"> </span><span class="n">attr_type</span><span class="w"> </span><span class="p">(</span><span class="s1">','</span><span class="w"> </span><span class="n">attr_name</span><span class="w"> </span><span class="n">attr_type</span><span class="p">)</span><span class="w"> </span><span class="o">*</span>
<span class="p">)</span><span class="w"> </span><span class="k">with</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="s1">'connector.type'</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'filesystem'</span><span class="p">,</span>
<span class="w"> </span><span class="s1">'connector.file-path'</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">''</span><span class="p">,</span>
<span class="w"> </span><span class="s1">'format.type'</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">''</span>
<span class="p">);</span>
</pre></div></td></tr></table></div>
</div>
</div>
<div class="section" id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_dli_08_0256_en-us_topic_0132788972_section3126105364419"><h4 class="sectiontitle">Important Notes</h4><ul id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_ul128931432114310"><li id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_li136612281325">If the data output directory in the table creation syntax is OBS, the directory must be a parallel file system and cannot be an OBS bucket.</li><li id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_li8668242174312">When using a file system table, you must enable checkpointing to ensure job consistency.</li><li id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_li266804210437">When <strong id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_b16282155519018">format.type</strong> is <strong id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_b088318561705">parquet</strong>, the supported data type is string, boolean, tinyint, smallint, int, bigint, float, double, map&lt;string, string&gt;, timestamp(3), and time.</li><li id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_li12731122515445">To avoid data loss or data coverage, you need to enable automatic restart upon job exceptions. Enable the <span class="parmvalue" id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_parmvalue94865239110"><b>Restore Job from Checkpoint</b></span>.</li><li id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_li113421130114417">Set the checkpoint interval after weighing between real-time output file, file size, and recovery time, such as 10 minutes.</li><li id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_li10668174218434">When using HDFS, you need to bind the data source and enter the host information.</li><li id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_li11524121165">When using HDFS, you need to configure information about the node where the active NameNode locates.</li></ul>
</div>
<div class="section" id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_dli_08_0256_section4299113491"><h4 class="sectiontitle">Parameter</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_table11617424154613" frame="border" border="1" rules="all"><caption><b>Table 1 </b>Parameter description</caption><thead align="left"><tr id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_row146177242466"><th align="left" class="cellrowborder" valign="top" width="15.921592159215923%" id="mcps1.3.4.2.2.4.1.1"><p id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_p1361712418461">Parameter</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="5.840584058405841%" id="mcps1.3.4.2.2.4.1.2"><p id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_p176171424114615">Mandatory</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="78.23782378237823%" id="mcps1.3.4.2.2.4.1.3"><p id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_p261712247467">Description</p>
</th>
</tr>
</thead>
<tbody><tr id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_row136171242461"><td class="cellrowborder" valign="top" width="15.921592159215923%" headers="mcps1.3.4.2.2.4.1.1 "><p id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_p9529259125720">connector.type</p>
</td>
<td class="cellrowborder" valign="top" width="5.840584058405841%" headers="mcps1.3.4.2.2.4.1.2 "><p id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_p1452995915570">Yes</p>
</td>
<td class="cellrowborder" valign="top" width="78.23782378237823%" headers="mcps1.3.4.2.2.4.1.3 "><p id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_p175291659195715">The value is fixed to <strong id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_b5231173832214">filesystem</strong>.</p>
</td>
</tr>
<tr id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_row1961742414462"><td class="cellrowborder" valign="top" width="15.921592159215923%" headers="mcps1.3.4.2.2.4.1.1 "><p id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_p205295599577">connector.file-path</p>
</td>
<td class="cellrowborder" valign="top" width="5.840584058405841%" headers="mcps1.3.4.2.2.4.1.2 "><p id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_p1652995916573">Yes</p>
</td>
<td class="cellrowborder" valign="top" width="78.23782378237823%" headers="mcps1.3.4.2.2.4.1.3 "><p id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_p195309598576">Data output directory. The format is <em id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_i32361513115911">schema</em>://<em id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_i796473635811">file.path</em>.</p>
<div class="note" id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_note10683452185818"><span class="notetitle"> NOTE: </span><div class="notebody"><div class="p" id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_p16299504598">Currently, Schema supports only OBS and HDFS.<ul id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_ul5489194714592"><li id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_li13859132517599">If <strong id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_b1044819812233">schema</strong> is set to <strong id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_b15448782232">obs</strong>, data is stored to OBS. <strong id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_b16612793519">Note that OBS directory must be a parallel file system and must not be an OBS bucket.</strong><p id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_p159112101356">For example, <strong id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_b1954445182316">obs://</strong><em id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_i12680458152319">bucketName</em>/<em id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_i1017418216248">fileName</em> indicates that data is exported to the <strong id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_b177004752412">fileName</strong> directory in the <strong id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_b6953101015245">bucketName</strong> bucket.</p>
</li><li id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_li19722927125911">If <strong id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_b1788912452412">schema</strong> is set to <strong id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_b158891724192416">hdfs</strong>, data is exported to HDFS.<p id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_p4951316510">Example: <strong id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_b27311329102415">hdfs://node-master1sYAx:9820/user/car_infos</strong>, where <strong id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_b573152910243">node-master1sYAx:9820</strong> is the name of the node where the NameNode locates.</p>
</li></ul>
</div>
</div></div>
</td>
</tr>
<tr id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_row1761802415461"><td class="cellrowborder" valign="top" width="15.921592159215923%" headers="mcps1.3.4.2.2.4.1.1 "><p id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_p053025919571">format.type</p>
</td>
<td class="cellrowborder" valign="top" width="5.840584058405841%" headers="mcps1.3.4.2.2.4.1.2 "><p id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_p453095919579">Yes</p>
</td>
<td class="cellrowborder" valign="top" width="78.23782378237823%" headers="mcps1.3.4.2.2.4.1.3 "><p id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_p2530125915717">Output data encoding format. Only<strong id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_b19871411152511"> parquet</strong> and <strong id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_b1734121617256">csv</strong> are supported.</p>
<ul id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_ul516620124111"><li id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_li171661912115">When <strong id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_b14215142314251">schema</strong> is set to <strong id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_b4215523172517">obs</strong>, the encoding format of the output data can only be <span class="parmvalue" id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_parmvalue3215523112517"><b>parquet</b></span>.</li><li id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_li128301713413">When <strong id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_b11439338152514">schema</strong> is set to <strong id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_b1543923842515">hdfs</strong>, the output data can be encoded in <span class="parmvalue" id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_parmvalue543933812257"><b>Parquet</b></span> or <span class="parmvalue" id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_parmvalue14440183815253"><b>CSV</b></span> format.</li></ul>
</td>
</tr>
<tr id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_row76463895715"><td class="cellrowborder" valign="top" width="15.921592159215923%" headers="mcps1.3.4.2.2.4.1.1 "><p id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_p653075925714">format.field-delimiter</p>
</td>
<td class="cellrowborder" valign="top" width="5.840584058405841%" headers="mcps1.3.4.2.2.4.1.2 "><p id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_p16530859185714">No</p>
</td>
<td class="cellrowborder" valign="top" width="78.23782378237823%" headers="mcps1.3.4.2.2.4.1.3 "><p id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_p11530105925720">Delimiter used to separate every two attributes.</p>
<p id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_p20530105917574">This parameter needs to be configured if the CSV encoding format is adopted. It can be user-defined, for example, a comma (<span class="parmvalue" id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_parmvalue18294102292615"><b>,</b></span>).</p>
</td>
</tr>
<tr id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_row1960184355710"><td class="cellrowborder" valign="top" width="15.921592159215923%" headers="mcps1.3.4.2.2.4.1.1 "><p id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_p253025955714">connector.ak</p>
</td>
<td class="cellrowborder" valign="top" width="5.840584058405841%" headers="mcps1.3.4.2.2.4.1.2 "><p id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_p953035917572">No</p>
</td>
<td class="cellrowborder" valign="top" width="78.23782378237823%" headers="mcps1.3.4.2.2.4.1.3 "><p id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_p20530759125711">Access key for accessing OBS</p>
<p id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_p05301259195715">This parameter is mandatory when data is written to OBS.</p>
</td>
</tr>
<tr id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_row16484715572"><td class="cellrowborder" valign="top" width="15.921592159215923%" headers="mcps1.3.4.2.2.4.1.1 "><p id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_p1953015590575">connector.sk</p>
</td>
<td class="cellrowborder" valign="top" width="5.840584058405841%" headers="mcps1.3.4.2.2.4.1.2 "><p id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_p15530195915720">No</p>
</td>
<td class="cellrowborder" valign="top" width="78.23782378237823%" headers="mcps1.3.4.2.2.4.1.3 "><p id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_p55306593574">Secret key for accessing OBS</p>
<p id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_p65301259105710">This parameter is mandatory when data is written to OBS.</p>
</td>
</tr>
<tr id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_row546724919577"><td class="cellrowborder" valign="top" width="15.921592159215923%" headers="mcps1.3.4.2.2.4.1.1 "><p id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_p1853085917576">connector.partitioned-by</p>
</td>
<td class="cellrowborder" valign="top" width="5.840584058405841%" headers="mcps1.3.4.2.2.4.1.2 "><p id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_p16530135914571">No</p>
</td>
<td class="cellrowborder" valign="top" width="78.23782378237823%" headers="mcps1.3.4.2.2.4.1.3 "><p id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_p853015917578">Partitioning field. Use commas (,) to separate multiple fields.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_section8515152835418"><h4 class="sectiontitle">Example</h4><p id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_p5999124192910">Read data from Kafka and write the data in Parquet format to the <strong id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_b1917811404281">fileName</strong> directory in the <strong id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_b371164611283">bucketName</strong> bucket.</p>
<pre class="screen" id="dli_08_0346__en-us_topic_0000001335882901_en-us_topic_0000001201521669_screen11244155517544">create table kafkaSource(
attr0 string,
attr1 boolean,
attr2 TINYINT,
attr3 smallint,
attr4 int,
attr5 bigint,
attr6 float,
attr7 double,
attr8 timestamp(3),
attr9 time
) with (
'connector.type' = 'kafka',
'connector.version' = '0.11',
'connector.topic' = 'test_json',
'connector.properties.bootstrap.servers' = 'xx.xx.xx.xx:9092',
'connector.properties.group.id' = 'test_filesystem',
'connector.startup-mode' = 'latest-offset',
'format.type' = 'csv'
);
create table filesystemSink(
attr0 string,
attr1 boolean,
attr2 TINYINT,
attr3 smallint,
attr4 int,
attr5 bigint,
attr6 float,
attr7 double,
attr8 map &lt; string, string &gt;,
attr9 timestamp(3),
attr10 time
) with (
"connector.type" = "filesystem",
"connector.file-path" = "obs://bucketName/fileName",
"format.type" = "parquet",
"connector.ak" = "xxxx",
"connector.sk" = "xxxxxx"
);
insert into
filesystemSink
select
attr0,
attr1,
attr2,
attr3,
attr4,
attr5,
attr6,
attr7,
map [attr0,attr0],
attr8,
attr9
from
kafkaSource;</pre>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="dli_08_0307.html">Creating a Result Table</a></div>
</div>
</div>