doc-exports/docs/dli/sqlreference/dli_08_0439.html
Su, Xiaomeng 76a5b1ee83 dli_sqlreference_20240227
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com>
Co-authored-by: Su, Xiaomeng <suxiaomeng1@huawei.com>
Co-committed-by: Su, Xiaomeng <suxiaomeng1@huawei.com>
2024-03-27 22:02:33 +00:00

233 lines
27 KiB
HTML

<a name="dli_08_0439"></a><a name="dli_08_0439"></a>
<h1 class="topictitle1">FileSystem Result Table</h1>
<div id="body8662426"><div class="section" id="dli_08_0439__en-us_topic_0000001390352625_en-us_topic_0000001201521669_dli_08_0256_en-us_topic_0132788972_section108631122164917"><h4 class="sectiontitle">Function</h4><p id="dli_08_0439__en-us_topic_0000001390352625_p598464619410">The FileSystem result (sink) table is used to export data to the HDFS or OBS file system. It is applicable to scenarios such as data dumping, big data analysis, data backup, and active, deep, or cold archiving.</p>
<p id="dli_08_0439__en-us_topic_0000001390352625_p92981836174714">Considering that the input stream can be unbounded, you can put the data in each bucket into <strong id="dli_08_0439__en-us_topic_0000001390352625_b17111125015245">part</strong> files of a limited size. Data can be written into a bucket based on time. For example, you can write data into a bucket every hour. This bucket contains the records received within one hour, and</p>
<p id="dli_08_0439__en-us_topic_0000001390352625_p1529893619479">data in the bucket directory is split into multiple <strong id="dli_08_0439__en-us_topic_0000001390352625_b59708598262">part</strong> files. Each sink bucket that receives data contains at least one <strong id="dli_08_0439__en-us_topic_0000001390352625_b95726812270">part</strong> file for each subtask. Other <strong id="dli_08_0439__en-us_topic_0000001390352625_b12890016317">part</strong> files are created based on the configured rolling policy. For Row Formats, the default rolling policy is based on the <strong id="dli_08_0439__en-us_topic_0000001390352625_b166549314">part</strong> file size. You need to specify the maximum timeout period for opening a file and the timeout period for the inactive state after closing a file. Bulk Formats are rolled each time a checkpoint is created. You can add other rolling conditions based on size or time.</p>
<div class="note" id="dli_08_0439__en-us_topic_0000001390352625_note11490175313473"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><ul id="dli_08_0439__en-us_topic_0000001390352625_ul856236220"><li id="dli_08_0439__en-us_topic_0000001390352625_li1674517377220">To use FileSink in STREAMING mode, you need to enable the checkpoint function. <strong id="dli_08_0439__en-us_topic_0000001390352625_b19917131474111">Part</strong> files are generated only when the checkpoint is successful. If the checkpoint function is not enabled, the files remain in the in-progress or pending state, and downstream systems cannot securely read the file data.</li><li id="dli_08_0439__en-us_topic_0000001390352625_li16561361525">The number recorded by the sink end operator is the number of checkpoints, not the actual volume of the sent data. For the actual volume, see the number recorded by the streaming-writer or StreamingFileWriter operator.</li></ul>
</div></div>
</div>
<div class="section" id="dli_08_0439__en-us_topic_0000001390352625_en-us_topic_0000001201521669_section434381984316"><h4 class="sectiontitle">Syntax</h4><div class="codecoloring" codetype="Sql" id="dli_08_0439__en-us_topic_0000001390352625_en-us_topic_0000001201521669_screen153431199432"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
<span class="normal"> 2</span>
<span class="normal"> 3</span>
<span class="normal"> 4</span>
<span class="normal"> 5</span>
<span class="normal"> 6</span>
<span class="normal"> 7</span>
<span class="normal"> 8</span>
<span class="normal"> 9</span>
<span class="normal">10</span>
<span class="normal">11</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">sink_table</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="n">string</span><span class="p">,</span>
<span class="w"> </span><span class="n">num</span><span class="w"> </span><span class="nb">INT</span><span class="p">,</span>
<span class="w"> </span><span class="n">p_day</span><span class="w"> </span><span class="n">string</span><span class="p">,</span>
<span class="w"> </span><span class="n">p_hour</span><span class="w"> </span><span class="n">string</span>
<span class="p">)</span><span class="w"> </span><span class="n">partitioned</span><span class="w"> </span><span class="k">by</span><span class="w"> </span><span class="p">(</span><span class="n">p_day</span><span class="p">,</span><span class="w"> </span><span class="n">p_hour</span><span class="p">)</span><span class="w"> </span><span class="k">WITH</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="s1">'connector'</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'filesystem'</span><span class="p">,</span>
<span class="w"> </span><span class="s1">'path'</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'obs://*** '</span><span class="p">,</span>
<span class="w"> </span><span class="s1">'format'</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'parquet'</span><span class="p">,</span>
<span class="w"> </span><span class="s1">'auto-compaction'</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'true'</span>
<span class="p">);</span>
</pre></div></td></tr></table></div>
</div>
</div>
<div class="section" id="dli_08_0439__en-us_topic_0000001390352625_en-us_topic_0000001201521669_dli_08_0256_en-us_topic_0132788972_section3126105364419"><h4 class="sectiontitle">Usage</h4><ul id="dli_08_0439__en-us_topic_0000001390352625_en-us_topic_0000001201521669_ul128931432114310"><li id="dli_08_0439__en-us_topic_0000001390352625_en-us_topic_0000001201521669_li11524121165"><strong id="dli_08_0439__en-us_topic_0000001390352625_b1889610586478">Rolling Policy</strong><p id="dli_08_0439__en-us_topic_0000001390352625_p1074651135416">The Rolling Policy defines when a given in-progress part file will be closed and moved to the pending and later to finished state. Part files in the "finished" state are the ones that are ready for viewing and are guaranteed to contain valid data that will not be reverted in case of failure.</p>
<p id="dli_08_0439__en-us_topic_0000001390352625_p1644716805414">In STREAMING mode, the Rolling Policy in combination with the checkpointing interval (pending files become finished on the next checkpoint) control how quickly part files become available for downstream readers and also the size and number of these parts. For details, see <a href="#dli_08_0439__en-us_topic_0000001390352625_en-us_topic_0000001201521669_dli_08_0256_section4299113491">Parameters</a>.</p>
</li><li id="dli_08_0439__en-us_topic_0000001390352625_li235955595519"><strong id="dli_08_0439__en-us_topic_0000001390352625_b37471861501">Part File Lifecycle</strong><p id="dli_08_0439__en-us_topic_0000001390352625_p81373204562">To use the output of the FileSink in downstream systems, we need to understand the naming and lifecycle of the output files produced.</p>
<p id="dli_08_0439__en-us_topic_0000001390352625_p1013722075615">Part files can be in one of three states:</p>
<ul id="dli_08_0439__en-us_topic_0000001390352625_ul12488943105612"><li id="dli_08_0439__en-us_topic_0000001390352625_li721815405564"><strong id="dli_08_0439__en-us_topic_0000001390352625_b1760754165017">In-progress</strong>: The part file that is currently being written to is in-progress.</li><li id="dli_08_0439__en-us_topic_0000001390352625_li221894075610"><strong id="dli_08_0439__en-us_topic_0000001390352625_b8548141211517">Pending</strong>: Closed (due to the specified rolling policy) in-progress files that are waiting to be committed.</li><li id="dli_08_0439__en-us_topic_0000001390352625_li3218124025616"><strong id="dli_08_0439__en-us_topic_0000001390352625_b630724615112">Finished</strong>: On successful checkpoints (STREAMING) or at the end of input (BATCH) pending files transition to <strong id="dli_08_0439__en-us_topic_0000001390352625_b3426603521">Finished</strong></li></ul>
<p id="dli_08_0439__en-us_topic_0000001390352625_p1997104910561">Only finished files are safe to read by downstream systems as those are guaranteed to not be modified later.</p>
<p id="dli_08_0439__en-us_topic_0000001390352625_p769164585617">By default, the file naming strategy is as follows:</p>
<ul id="dli_08_0439__en-us_topic_0000001390352625_ul557211549577"><li id="dli_08_0439__en-us_topic_0000001390352625_li679504965716"><strong id="dli_08_0439__en-us_topic_0000001390352625_b9258114775216">In-progress / Pending</strong>: part-&lt;uid&gt;-&lt;partFileIndex&gt;.inprogress.uid</li><li id="dli_08_0439__en-us_topic_0000001390352625_li3811134911572"><strong id="dli_08_0439__en-us_topic_0000001390352625_b1953917524524">Finished</strong>: part-&lt;uid&gt;-&lt;partFileIndex&gt;</li></ul>
<p id="dli_08_0439__en-us_topic_0000001390352625_p105105810576"><strong id="dli_08_0439__en-us_topic_0000001390352625_b123231434115313">uid</strong> is a random ID assigned to a subtask of the sink when the subtask is instantiated. This <strong id="dli_08_0439__en-us_topic_0000001390352625_b1459486546">uid</strong> is not fault-tolerant so it is regenerated when the subtask recovers from a failure.</p>
</li><li id="dli_08_0439__en-us_topic_0000001390352625_li12739111211583"><strong id="dli_08_0439__en-us_topic_0000001390352625_b628662514549">Compaction</strong><p id="dli_08_0439__en-us_topic_0000001390352625_p4332102215583">FileSink supports compaction of the pending files, which allows the application to have smaller checkpoint interval without generating a lot of small files.</p>
<p id="dli_08_0439__en-us_topic_0000001390352625_p203321022175810">Once enabled, the compaction happens between the files become pending and get committed. The pending files will be first committed to temporary files whose path starts with a dot (.). Then these files will be compacted according to the strategy by the compactor specified by the users, and the new compacted pending files will be generated. Then these pending files will be emitted to the committer to be committed to the formal files. After that, the source files will be removed.</p>
</li><li id="dli_08_0439__en-us_topic_0000001390352625_li6949112319011"><strong id="dli_08_0439__en-us_topic_0000001390352625_b429422616587">Partitions</strong><p id="dli_08_0439__en-us_topic_0000001390352625_p238420371009">Filesystem sink supports the partitioning function. Partitions are generated based on the selected fields by using the <strong id="dli_08_0439__en-us_topic_0000001390352625_b499281125912">partitioned by</strong> syntax. The following is an example:</p>
<pre class="screen" id="dli_08_0439__en-us_topic_0000001390352625_screen1838410371008">path
└── datetime=2022-06-25
└── hour=10
├── part-0.parquet
├── part-1.parquet
└── datetime=2022-06-26
└── hour=16
├── part-0.parquet
└── hour=17
├── part-0.parquet</pre>
<p id="dli_08_0439__en-us_topic_0000001390352625_p23845371004">Similar to files, partitions also need to be submitted to notify downstream applications that files in the partitions can be securely read. Filesystem sink provides multiple configuration submission policies.</p>
</li></ul>
</div>
<div class="section" id="dli_08_0439__en-us_topic_0000001390352625_en-us_topic_0000001201521669_dli_08_0256_section4299113491"><a name="dli_08_0439__en-us_topic_0000001390352625_en-us_topic_0000001201521669_dli_08_0256_section4299113491"></a><a name="en-us_topic_0000001390352625_en-us_topic_0000001201521669_dli_08_0256_section4299113491"></a><h4 class="sectiontitle">Parameters</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="dli_08_0439__en-us_topic_0000001390352625_en-us_topic_0000001201521669_table11617424154613" frame="border" border="1" rules="all"><caption><b>Table 1 </b>Parameter description</caption><thead align="left"><tr id="dli_08_0439__en-us_topic_0000001390352625_en-us_topic_0000001201521669_row146177242466"><th align="left" class="cellrowborder" valign="top" width="16.63%" id="mcps1.3.4.2.2.6.1.1"><p id="dli_08_0439__en-us_topic_0000001390352625_en-us_topic_0000001201521669_p1361712418461">Parameter</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="12.120000000000001%" id="mcps1.3.4.2.2.6.1.2"><p id="dli_08_0439__en-us_topic_0000001390352625_en-us_topic_0000001201521669_p176171424114615">Mandatory</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="9.91%" id="mcps1.3.4.2.2.6.1.3"><p id="dli_08_0439__en-us_topic_0000001390352625_p25718295117">Default Value</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="11.23%" id="mcps1.3.4.2.2.6.1.4"><p id="dli_08_0439__en-us_topic_0000001390352625_p108628321814">Type</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50.11%" id="mcps1.3.4.2.2.6.1.5"><p id="dli_08_0439__en-us_topic_0000001390352625_en-us_topic_0000001201521669_p261712247467">Description</p>
</th>
</tr>
</thead>
<tbody><tr id="dli_08_0439__en-us_topic_0000001390352625_en-us_topic_0000001201521669_row136171242461"><td class="cellrowborder" valign="top" width="16.63%" headers="mcps1.3.4.2.2.6.1.1 "><p id="dli_08_0439__en-us_topic_0000001390352625_p1571520572210">connector</p>
</td>
<td class="cellrowborder" valign="top" width="12.120000000000001%" headers="mcps1.3.4.2.2.6.1.2 "><p id="dli_08_0439__en-us_topic_0000001390352625_p137151257126">Yes</p>
</td>
<td class="cellrowborder" valign="top" width="9.91%" headers="mcps1.3.4.2.2.6.1.3 "><p id="dli_08_0439__en-us_topic_0000001390352625_p1371510571213">None</p>
</td>
<td class="cellrowborder" valign="top" width="11.23%" headers="mcps1.3.4.2.2.6.1.4 "><p id="dli_08_0439__en-us_topic_0000001390352625_p117156578210">String</p>
</td>
<td class="cellrowborder" valign="top" width="50.11%" headers="mcps1.3.4.2.2.6.1.5 "><p id="dli_08_0439__en-us_topic_0000001390352625_p671525710212">The value is fixed at <strong id="dli_08_0439__en-us_topic_0000001390352625_b463015361115">filesystem</strong>.</p>
</td>
</tr>
<tr id="dli_08_0439__en-us_topic_0000001390352625_en-us_topic_0000001201521669_row1961742414462"><td class="cellrowborder" valign="top" width="16.63%" headers="mcps1.3.4.2.2.6.1.1 "><p id="dli_08_0439__en-us_topic_0000001390352625_p18715195712211">path</p>
</td>
<td class="cellrowborder" valign="top" width="12.120000000000001%" headers="mcps1.3.4.2.2.6.1.2 "><p id="dli_08_0439__en-us_topic_0000001390352625_p127151571826">Yes</p>
</td>
<td class="cellrowborder" valign="top" width="9.91%" headers="mcps1.3.4.2.2.6.1.3 "><p id="dli_08_0439__en-us_topic_0000001390352625_p15715195713218">None</p>
</td>
<td class="cellrowborder" valign="top" width="11.23%" headers="mcps1.3.4.2.2.6.1.4 "><p id="dli_08_0439__en-us_topic_0000001390352625_p4715125716218">String</p>
</td>
<td class="cellrowborder" valign="top" width="50.11%" headers="mcps1.3.4.2.2.6.1.5 "><p id="dli_08_0439__en-us_topic_0000001390352625_p1071513570215">OBS path</p>
</td>
</tr>
<tr id="dli_08_0439__en-us_topic_0000001390352625_en-us_topic_0000001201521669_row1761802415461"><td class="cellrowborder" valign="top" width="16.63%" headers="mcps1.3.4.2.2.6.1.1 "><p id="dli_08_0439__en-us_topic_0000001390352625_p1271519571527">format</p>
</td>
<td class="cellrowborder" valign="top" width="12.120000000000001%" headers="mcps1.3.4.2.2.6.1.2 "><p id="dli_08_0439__en-us_topic_0000001390352625_p1471585713210">Yes</p>
</td>
<td class="cellrowborder" valign="top" width="9.91%" headers="mcps1.3.4.2.2.6.1.3 "><p id="dli_08_0439__en-us_topic_0000001390352625_p17715757023">None</p>
</td>
<td class="cellrowborder" valign="top" width="11.23%" headers="mcps1.3.4.2.2.6.1.4 "><p id="dli_08_0439__en-us_topic_0000001390352625_p1071555715218">String</p>
</td>
<td class="cellrowborder" valign="top" width="50.11%" headers="mcps1.3.4.2.2.6.1.5 "><p id="dli_08_0439__en-us_topic_0000001390352625_p19887028171">File format</p>
<p id="dli_08_0439__en-us_topic_0000001390352625_p771585717220">Available values are: <strong id="dli_08_0439__en-us_topic_0000001390352625_b7856126120">csv</strong> and <strong id="dli_08_0439__en-us_topic_0000001390352625_b99914272025">parquet</strong></p>
</td>
</tr>
<tr id="dli_08_0439__en-us_topic_0000001390352625_row1579114310217"><td class="cellrowborder" valign="top" width="16.63%" headers="mcps1.3.4.2.2.6.1.1 "><p id="dli_08_0439__en-us_topic_0000001390352625_p07162057224">sink.rolling-policy.file-size</p>
</td>
<td class="cellrowborder" valign="top" width="12.120000000000001%" headers="mcps1.3.4.2.2.6.1.2 "><p id="dli_08_0439__en-us_topic_0000001390352625_p271615579212">No</p>
</td>
<td class="cellrowborder" valign="top" width="9.91%" headers="mcps1.3.4.2.2.6.1.3 "><p id="dli_08_0439__en-us_topic_0000001390352625_p27161957921">128 MB</p>
</td>
<td class="cellrowborder" valign="top" width="11.23%" headers="mcps1.3.4.2.2.6.1.4 "><p id="dli_08_0439__en-us_topic_0000001390352625_p1271685717215">MemorySize</p>
</td>
<td class="cellrowborder" valign="top" width="50.11%" headers="mcps1.3.4.2.2.6.1.5 "><p id="dli_08_0439__en-us_topic_0000001390352625_p18716165718211">Maximum size of a part file. If the size of a part file exceeds this value, a new file will be generated.</p>
<div class="note" id="dli_08_0439__en-us_topic_0000001390352625_note075415147817"><span class="notetitle"> NOTE: </span><div class="notebody"><p id="dli_08_0439__en-us_topic_0000001390352625_p575518141388">The Rolling Policy defines when a given in-progress part file will be closed and moved to the pending and later to finished state. Part files in the "finished" state are the ones that are ready for viewing and are guaranteed to contain valid data that will not be reverted in case of failure. In STREAMING mode, the Rolling Policy in combination with the checkpointing interval (pending files become finished on the next checkpoint) control how quickly part files become available for downstream readers and also the size and number of these parts.</p>
</div></div>
</td>
</tr>
<tr id="dli_08_0439__en-us_topic_0000001390352625_row67911443225"><td class="cellrowborder" valign="top" width="16.63%" headers="mcps1.3.4.2.2.6.1.1 "><p id="dli_08_0439__en-us_topic_0000001390352625_p6716457023">sink.rolling-policy.rollover-interval</p>
</td>
<td class="cellrowborder" valign="top" width="12.120000000000001%" headers="mcps1.3.4.2.2.6.1.2 "><p id="dli_08_0439__en-us_topic_0000001390352625_p1371665715214">No</p>
</td>
<td class="cellrowborder" valign="top" width="9.91%" headers="mcps1.3.4.2.2.6.1.3 "><p id="dli_08_0439__en-us_topic_0000001390352625_p671611571023">30 min</p>
</td>
<td class="cellrowborder" valign="top" width="11.23%" headers="mcps1.3.4.2.2.6.1.4 "><p id="dli_08_0439__en-us_topic_0000001390352625_p171616571124">Duration</p>
</td>
<td class="cellrowborder" valign="top" width="50.11%" headers="mcps1.3.4.2.2.6.1.5 "><p id="dli_08_0439__en-us_topic_0000001390352625_p1171715718212">Maximum duration that a part file can be opened. If a part file is opened longer than the maximum duration, a new file will be generated in rolling mode. The default value is 30 minutes so that there will not be a large number of small files. The check frequency is specified by <strong id="dli_08_0439__en-us_topic_0000001390352625_b1415170767">sink.rolling-policy.check-interval</strong>.</p>
<div class="note" id="dli_08_0439__en-us_topic_0000001390352625_note141985161018"><span class="notetitle"> NOTE: </span><div class="notebody"><p id="dli_08_0439__en-us_topic_0000001390352625_p1232336105818">There must be a space between the number and the unit.</p>
<p id="dli_08_0439__en-us_topic_0000001390352625_p5606300569">The supported time units include <strong id="dli_08_0439__en-us_topic_0000001390352625_b9903415268">d</strong>, <strong id="dli_08_0439__en-us_topic_0000001390352625_b1415113177611">h</strong>, <strong id="dli_08_0439__en-us_topic_0000001390352625_b201354191466">min</strong>, <strong id="dli_08_0439__en-us_topic_0000001390352625_b911310211661">s</strong>, and <strong id="dli_08_0439__en-us_topic_0000001390352625_b1526619221363">ms</strong>.</p>
<p id="dli_08_0439__en-us_topic_0000001390352625_p154191255102">For bulk files (parquet, orc, and avro), the checkpoint interval also controls the maximum open duration of a part file.</p>
</div></div>
</td>
</tr>
<tr id="dli_08_0439__en-us_topic_0000001390352625_row879111436210"><td class="cellrowborder" valign="top" width="16.63%" headers="mcps1.3.4.2.2.6.1.1 "><p id="dli_08_0439__en-us_topic_0000001390352625_p17171857628">sink.rolling-policy.check-interval</p>
</td>
<td class="cellrowborder" valign="top" width="12.120000000000001%" headers="mcps1.3.4.2.2.6.1.2 "><p id="dli_08_0439__en-us_topic_0000001390352625_p67171571210">No</p>
</td>
<td class="cellrowborder" valign="top" width="9.91%" headers="mcps1.3.4.2.2.6.1.3 "><p id="dli_08_0439__en-us_topic_0000001390352625_p87179571923">1 min</p>
</td>
<td class="cellrowborder" valign="top" width="11.23%" headers="mcps1.3.4.2.2.6.1.4 "><p id="dli_08_0439__en-us_topic_0000001390352625_p671718573214">Duration</p>
</td>
<td class="cellrowborder" valign="top" width="50.11%" headers="mcps1.3.4.2.2.6.1.5 "><p id="dli_08_0439__en-us_topic_0000001390352625_p325552631110">Check interval of the time-based rolling policy</p>
<p id="dli_08_0439__en-us_topic_0000001390352625_p971714571329">This parameter controls the frequency of checking whether a file should be rolled based on <strong id="dli_08_0439__en-us_topic_0000001390352625_b1452014566713">sink.rolling-policy.rollover-interval</strong>.</p>
</td>
</tr>
<tr id="dli_08_0439__en-us_topic_0000001390352625_en-us_topic_0000001201521669_row76463895715"><td class="cellrowborder" valign="top" width="16.63%" headers="mcps1.3.4.2.2.6.1.1 "><p id="dli_08_0439__en-us_topic_0000001390352625_p271714574219">auto-compaction</p>
</td>
<td class="cellrowborder" valign="top" width="12.120000000000001%" headers="mcps1.3.4.2.2.6.1.2 "><p id="dli_08_0439__en-us_topic_0000001390352625_p771719574211">No</p>
</td>
<td class="cellrowborder" valign="top" width="9.91%" headers="mcps1.3.4.2.2.6.1.3 "><p id="dli_08_0439__en-us_topic_0000001390352625_p47174571927">false</p>
</td>
<td class="cellrowborder" valign="top" width="11.23%" headers="mcps1.3.4.2.2.6.1.4 "><p id="dli_08_0439__en-us_topic_0000001390352625_p137171957529">Boolean</p>
</td>
<td class="cellrowborder" valign="top" width="50.11%" headers="mcps1.3.4.2.2.6.1.5 "><p id="dli_08_0439__en-us_topic_0000001390352625_p123311547121110">Whether automatic compaction is enabled for the streaming sink. Data is first written to temporary files. After the checkpoint is complete, the temporary files generated by the checkpoint are compacted.</p>
</td>
</tr>
<tr id="dli_08_0439__en-us_topic_0000001390352625_en-us_topic_0000001201521669_row1960184355710"><td class="cellrowborder" valign="top" width="16.63%" headers="mcps1.3.4.2.2.6.1.1 "><p id="dli_08_0439__en-us_topic_0000001390352625_p18717155719212">compaction.file-size</p>
</td>
<td class="cellrowborder" valign="top" width="12.120000000000001%" headers="mcps1.3.4.2.2.6.1.2 "><p id="dli_08_0439__en-us_topic_0000001390352625_p16717105714219">No</p>
</td>
<td class="cellrowborder" valign="top" width="9.91%" headers="mcps1.3.4.2.2.6.1.3 "><p id="dli_08_0439__en-us_topic_0000001390352625_p59411626172614">Size of <strong id="dli_08_0439__en-us_topic_0000001390352625_b439598917">sink.rolling-policy.file-size</strong></p>
</td>
<td class="cellrowborder" valign="top" width="11.23%" headers="mcps1.3.4.2.2.6.1.4 "><p id="dli_08_0439__en-us_topic_0000001390352625_p07171457429">MemorySize</p>
</td>
<td class="cellrowborder" valign="top" width="50.11%" headers="mcps1.3.4.2.2.6.1.5 "><p id="dli_08_0439__en-us_topic_0000001390352625_p17177571023">Size of the files that will be compacted. The default value is the size of the files that will be rolled.</p>
<div class="note" id="dli_08_0439__en-us_topic_0000001390352625_note14751752162511"><span class="notetitle"> NOTE: </span><div class="notebody"><ul id="dli_08_0439__en-us_topic_0000001390352625_ul57661119268"><li id="dli_08_0439__en-us_topic_0000001390352625_li97661111267">Only files in the same checkpoint are compacted. The final files must be more than or equal to the number of checkpoints.</li><li id="dli_08_0439__en-us_topic_0000001390352625_li37662162612">If the compaction takes a long time, back pressure may occur and the checkpointing may be prolonged.</li><li id="dli_08_0439__en-us_topic_0000001390352625_li976611142613">After this function is enabled, final files are generated during checkpoint and a new file is opened to receive the data generated at the next checkpoint.</li></ul>
</div></div>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="dli_08_0439__en-us_topic_0000001390352625_en-us_topic_0000001201521669_section8515152835418"><h4 class="sectiontitle">Example 1</h4><div class="p" id="dli_08_0439__en-us_topic_0000001390352625_en-us_topic_0000001201521669_p5999124192910">Use datagen to randomly generate data and write the data into the <strong id="dli_08_0439__en-us_topic_0000001390352625_b194833145">fileName</strong> directory in the OBS bucket <strong id="dli_08_0439__en-us_topic_0000001390352625_b936959143">bucketName</strong>. The file generation time is irrelevant to the checkpoint. When the file is opened more than 30 minutes or is bigger than 128 MB, a new file is generated.<pre class="screen" id="dli_08_0439__en-us_topic_0000001390352625_screen52251636192412">create table orders(
name string,
num INT
) with (
'connector' = 'datagen',
'rows-per-second' = '100',
'fields.name.kind' = 'random',
'fields.name.length' = '5'
);
CREATE TABLE sink_table (
name string,
num INT
) WITH (
'connector' = 'filesystem',
'path' = 'obs://bucketName/fileName',
'format' = 'csv',
'sink.rolling-policy.file-size'='128m',
'sink.rolling-policy.rollover-interval'='30 min'
);
INSERT into sink_table SELECT * from orders;
</pre>
</div>
</div>
<div class="section" id="dli_08_0439__en-us_topic_0000001390352625_section174497311241"><h4 class="sectiontitle">Example 2</h4><div class="p" id="dli_08_0439__en-us_topic_0000001390352625_p1533619172414">Use datagen to randomly generate data and write the data into the <strong id="dli_08_0439__en-us_topic_0000001390352625_b819051415162">fileName</strong> directory in the OBS bucket <strong id="dli_08_0439__en-us_topic_0000001390352625_b1819011461613">bucketName</strong>. The file generation time is relevant to the checkpoint. When the checkpoint interval is reached or the file size reaches 100 MB, a new file is generated.<pre class="screen" id="dli_08_0439__en-us_topic_0000001390352625_screen7798231255">create table orders(
name string,
num INT
) with (
'connector' = 'datagen',
'rows-per-second' = '100',
'fields.name.kind' = 'random',
'fields.name.length' = '5'
);
CREATE TABLE sink_table (
name string,
num INT
) WITH (
'connector' = 'filesystem',
'path' = 'obs://bucketName/fileName',
'format' = 'csv',
'sink.rolling-policy.file-size'='128m',
'sink.rolling-policy.rollover-interval'='30 min',
'auto-compaction'='true',
'compaction.file-size'='100m'
);
INSERT into sink_table SELECT * from orders;</pre>
</div>
</div>
<p id="dli_08_0439__en-us_topic_0000001390352625_p8060118"></p>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="dli_08_0391.html">Creating Result Tables</a></div>
</div>
</div>