doc-exports/docs/dli/sqlreference/dli_08_0330.html
Su, Xiaomeng 04d4597cf3 dli_sqlreference_0511_version
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com>
Co-authored-by: Su, Xiaomeng <suxiaomeng1@huawei.com>
Co-committed-by: Su, Xiaomeng <suxiaomeng1@huawei.com>
2023-11-02 14:34:08 +00:00

188 lines
21 KiB
HTML

<a name="dli_08_0330"></a><a name="dli_08_0330"></a>
<h1 class="topictitle1">User-Defined Functions</h1>
<div id="body8662426"><div class="section" id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_section853916453261"><h4 class="sectiontitle">Overview</h4><p id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_p6430721172612">DLI supports the following three types of user-defined functions (UDFs):</p>
</div>
<ul id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_ul515821418301"><li id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_li6158314133010">Regular UDF: takes in one or more input parameters and returns a single result.</li><li id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_li181581014143010">User-defined table-generating function (UDTF): takes in one or more input parameters and returns multiple rows or columns.</li><li id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_li1159131412308">User-defined aggregate function (UDAF): aggregates multiple records into one value.</li></ul>
<div class="note" id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_note242023416269"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_p13657183532610">UDFs can only be used in dedicated queues.</p>
</div></div>
<div class="section" id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_section698052817425"><h4 class="sectiontitle">POM Dependency</h4><pre class="screen" id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_screen16979513144211">&lt;dependency&gt;
&lt;groupId&gt;org.apache.flink&lt;/groupId&gt;
&lt;artifactId&gt;flink-table-common&lt;/artifactId&gt;
&lt;version&gt;1.10.0&lt;/version&gt;
&lt;scope&gt;provided&lt;/scope&gt;
&lt;/dependency&gt;</pre>
</div>
<div class="section" id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_section6319112904614"><h4 class="sectiontitle">Important Notes</h4><ul id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_ul51090318498"><li id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_li1910933118496">Currently, Python is not supported for programming UDFs, UDTFs, and UDAFs.</li><li id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_li13251103010507">If you use IntelliJ IDEA to debug the created UDF, select <strong id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_b31304303341">include dependencies with "Provided" scope</strong>. Otherwise, the dependency packages in the POM file cannot be loaded for local debugging.<p id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_p1172263318466">The following uses IntelliJ IDEA 2020.2 as an example:</p>
<ol id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_ol14317162417547"><li id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_li931742413543">On the IntelliJ IDEA page, select the configuration file you need to debug and click <strong id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_b86401647353">Edit Configurations</strong>.<p id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_p365585819572"><span><img id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_image1265595855718" src="en-us_image_0000001282578329.png"></span></p>
</li><li id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_li1445714119566">On the <strong id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_b37847683516">Run/Debug Configurations</strong> page, select <strong id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_b1678417619357">include dependencies with "Provided" scope</strong>.<p id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_p2087914155814"><span><img id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_image12879841135819" src="en-us_image_0000001282578421.png"></span></p>
</li><li id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_li15130140175616">Click <strong id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_b17106163673419">OK</strong>.</li></ol>
</li></ul>
</div>
<div class="section" id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_section127111224513"><h4 class="sectiontitle">Using UDFs</h4><ol id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_ol15899113116191"><li id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_li98991431121915">Encapsulate the implemented UDFs into a JAR package and upload the package to OBS.</li><li id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_li4578444204">In the navigation pane of the DLI management console, choose <strong id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_b114219353413">Data Management</strong> &gt; <strong id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_b10854254344">Package Management</strong>. On the displayed page, click <strong id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_b720254953410">Create</strong> and use the JAR package uploaded to OBS to create a package.</li><li id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_li15899113112196">In the left navigation, choose <strong id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_b10160112819361">Job Management</strong> and click <span class="menucascade" id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_menucascade113294410610"><b><span class="uicontrol" id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_uicontrol23217443612">Flink Jobs</span></b></span>. Locate the row where the target resides and click <span class="uicontrol" id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_uicontrol56543515169"><b>Edit</b></span> in the <span class="parmname" id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_parmname96593571619"><b>Operation</b></span> column to switch to the page where you can edit the job.</li><li id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_li31121447112220">Click the <strong id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_b0731123183512">Running Parameters</strong> tab of your job, select the UDF JAR and click <strong id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_b8271651163510">Save</strong>.</li><li id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_li18991931131914">Add the following statement to the SQL statements to use the functions:</li></ol>
</div>
<div class="section" id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_section2840164931914"><h4 class="sectiontitle">UDF</h4><p id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_p2582162417489">The regular UDF must inherit the ScalarFunction function and implement the eval method. The open and close functions are optional.</p>
</div>
<p id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_p1258222410484"><strong id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_b53991656192713">Example code</strong></p>
<pre class="screen" id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_screen458010514913">import org.apache.flink.table.functions.FunctionContext;
import org.apache.flink.table.functions.ScalarFunction;
public class UdfScalarFunction extends ScalarFunction {
private int factor = 12;
public UdfScalarFunction() {
this.factor = 12;
}
/**
* (optional) Initialization
* @param context
*/
@Override
public void open(FunctionContext context) {}
/**
* Custom logic
* @param s
* @return
*/
public int eval(String s) {
return s.hashCode() * factor;
}
/**
* Optional
*/
@Override
public void close() {}
}</pre>
<p id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_p11751270584"><strong id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_b132921143171316">Example</strong></p>
<div class="codecoloring" codetype="Sql" id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_screen79691822155813"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">FUNCTION</span><span class="w"> </span><span class="n">udf_test</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="s1">'com.company.udf.UdfScalarFunction'</span><span class="p">;</span>
<span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">sink_stream</span><span class="w"> </span><span class="k">select</span><span class="w"> </span><span class="n">udf_test</span><span class="p">(</span><span class="n">attr</span><span class="p">)</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">source_stream</span><span class="p">;</span>
</pre></div></td></tr></table></div>
</div>
<div class="section" id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_section1554503417406"><h4 class="sectiontitle">UDTF</h4><p id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_p91842023192819">The UDTF must inherit the TableFunction function and implement the eval method. The open and close functions are optional. If the UDTF needs to return multiple columns, you only need to declare the returned value as <strong id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_b4439546105017">Tuple</strong> or <strong id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_b455817488508">Row</strong>. If <strong id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_b1559418555500">Row</strong> is used, you need to overload the getResultType method to declare the returned field type.</p>
</div>
<p id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_p779511345287"><strong id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_b4960155661317">Example code</strong></p>
<pre class="screen" id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_screen678715832812">import org.apache.flink.api.common.typeinfo.TypeInformation;
import org.apache.flink.api.common.typeinfo.Types;
import org.apache.flink.table.functions.FunctionContext;
import org.apache.flink.table.functions.TableFunction;
import org.apache.flink.types.Row;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class UdfTableFunction extends TableFunction&lt;Row&gt; {
private Logger log = LoggerFactory.getLogger(TableFunction.class);
/**
* (optional) Initialization
* @param context
*/
@Override
public void open(FunctionContext context) {}
public void eval(String str, String split) {
for (String s : str.split(split)) {
Row row = new Row(2);
row.setField(0, s);
row.setField(1, s.length());
collect(row);
}
}
/**
* Declare the type returned by the function
* @return
*/
@Override
public TypeInformation&lt;Row&gt; getResultType() {
return Types.ROW(Types.STRING, Types.INT);
}
/**
* Optional
*/
@Override
public void close() {}
}</pre>
<p id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_p14401211152914"><strong id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_b7182429191417">Example</strong></p>
<p id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_p176516349294">The UDTF supports CROSS JOIN and LEFT JOIN. When the UDTF is used, the <strong id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_b1149132155517">LATERAL</strong> and <strong id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_b18476162413557">TABLE</strong> keywords must be included.</p>
<ul id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_ul18302152614619"><li id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_li2302526164619">CROSS JOIN: does not output the data of a row in the left table if the UDTF does not output the result for the data of the row.</li><li id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_li113020265465">LEFT JOIN: outputs the data of a row in the left table even if the UDTF does not output the result for the data of the row, but pads null with UDTF-related fields.</li></ul>
<div class="codecoloring" codetype="Sql" id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_screen2193195515318"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span>
<span class="normal">5</span>
<span class="normal">6</span>
<span class="normal">7</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">FUNCTION</span><span class="w"> </span><span class="n">udtf_test</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="s1">'com.company.udf.TableFunction'</span><span class="p">;</span>
<span class="o">//</span><span class="w"> </span><span class="k">CROSS</span><span class="w"> </span><span class="k">JOIN</span>
<span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">sink_stream</span><span class="w"> </span><span class="k">select</span><span class="w"> </span><span class="n">subValue</span><span class="p">,</span><span class="w"> </span><span class="k">length</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">source_stream</span><span class="p">,</span><span class="w"> </span><span class="k">LATERAL</span>
<span class="k">TABLE</span><span class="p">(</span><span class="n">udtf_test</span><span class="p">(</span><span class="n">attr</span><span class="p">,</span><span class="w"> </span><span class="s1">','</span><span class="p">))</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">T</span><span class="p">(</span><span class="n">subValue</span><span class="p">,</span><span class="w"> </span><span class="k">length</span><span class="p">);</span>
<span class="o">//</span><span class="w"> </span><span class="k">LEFT</span><span class="w"> </span><span class="k">JOIN</span>
<span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">sink_stream</span><span class="w"> </span><span class="k">select</span><span class="w"> </span><span class="n">subValue</span><span class="p">,</span><span class="w"> </span><span class="k">length</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">source_stream</span><span class="w"> </span><span class="k">LEFT</span><span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="k">LATERAL</span>
<span class="k">TABLE</span><span class="p">(</span><span class="n">udtf_test</span><span class="p">(</span><span class="n">attr</span><span class="p">,</span><span class="w"> </span><span class="s1">','</span><span class="p">))</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">T</span><span class="p">(</span><span class="n">subValue</span><span class="p">,</span><span class="w"> </span><span class="k">length</span><span class="p">)</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="k">TRUE</span><span class="p">;</span>
</pre></div></td></tr></table></div>
</div>
<div class="section" id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_section09770367469"><h4 class="sectiontitle">UDAF</h4><p id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_p126820532327">The UDAF must inherit the AggregateFunction function. You need to create an accumulator for storing the computing result, for example, <strong id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_b18264137121712">WeightedAvgAccum</strong> in the following example code.</p>
</div>
<p id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_p195255616337"><strong id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_b1361201856">Example code</strong></p>
<pre class="screen" id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_screen46581774359">public class WeightedAvgAccum {
public long sum = 0;
public int count = 0;
}</pre>
<p id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_p119111759203720"></p>
<pre class="screen" id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_screen186844523816">import org.apache.flink.table.functions.AggregateFunction;
import java.util.Iterator;
/**
* The first type variable is the type returned by the aggregation function, and the second type variable is of the Accumulator type.
* Weighted Average user-defined aggregate function.
*/
public class UdfAggFunction extends AggregateFunction&lt;Long, WeightedAvgAccum&gt; {
// Initialize the accumulator.
@Override
public WeightedAvgAccum createAccumulator() {
return new WeightedAvgAccum();
}
// Return the intermediate computing value stored in the accumulator.
@Override
public Long getValue(WeightedAvgAccum acc) {
if (acc.count == 0) {
return null;
} else {
return acc.sum / acc.count;
}
}
// Update the intermediate computing value according to the input.
public void accumulate(WeightedAvgAccum acc, long iValue) {
acc.sum += iValue;
acc.count += 1;
}
// Perform the retraction operation, which is opposite to the accumulate operation.
public void retract(WeightedAvgAccum acc, long iValue) {
acc.sum -= iValue;
acc.count -= 1;
}
// Combine multiple accumulator values.
public void merge(WeightedAvgAccum acc, Iterable&lt;WeightedAvgAccum&gt; it) {
Iterator&lt;WeightedAvgAccum&gt; iter = it.iterator();
while (iter.hasNext()) {
WeightedAvgAccum a = iter.next();
acc.count += a.count;
acc.sum += a.sum;
}
}
// Reset the intermediate computing value.
public void resetAccumulator(WeightedAvgAccum acc) {
acc.count = 0;
acc.sum = 0L;
}
}</pre>
<p id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_p1193357173817"><strong id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_b978135240">Example</strong></p>
<div class="codecoloring" codetype="Sql" id="dli_08_0330__en-us_topic_0000001166031883_en-us_topic_0000001081131044_dli_08_0099_screen1727113190396"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">FUNCTION</span><span class="w"> </span><span class="n">udaf_test</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="s1">'com.company.udf.UdfAggFunction'</span><span class="p">;</span>
<span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">sink_stream</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">udaf_test</span><span class="p">(</span><span class="n">attr2</span><span class="p">)</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">source_stream</span><span class="w"> </span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">attr1</span><span class="p">;</span>
</pre></div></td></tr></table></div>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="dli_08_0329.html">Functions</a></div>
</div>
</div>