forked from docs/doc-exports
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
131 lines
20 KiB
HTML
131 lines
20 KiB
HTML
<a name="mrs_01_1743"></a><a name="mrs_01_1743"></a>
|
|
|
|
<h1 class="topictitle1">Hive UDF Development and Application</h1>
|
|
<div id="body32001227"><p id="mrs_01_1743__en-us_topic_0000001173631252_p124035816151">You can customize functions to extend SQL statements to meet personalized requirements. These functions are called UDFs.</p>
|
|
<p id="mrs_01_1743__en-us_topic_0000001173631252_p08731157164015">This section describes how to develop and apply Hive UDFs.</p>
|
|
<div class="section" id="mrs_01_1743__en-us_topic_0000001173631252_section18748111917814"><h4 class="sectiontitle">Developing Hive UDFs</h4><p id="mrs_01_1743__en-us_topic_0000001173631252_p864133031018">This sample implements one Hive UDF described in the following table.</p>
|
|
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="mrs_01_1743__en-us_topic_0000001173631252_table43781635161110" frame="border" border="1" rules="all"><caption><b>Table 1 </b>Hive UDF</caption><thead align="left"><tr id="mrs_01_1743__en-us_topic_0000001173631252_row637903520119"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.3.2.3.1.1"><p id="mrs_01_1743__en-us_topic_0000001173631252_p83791635151110">Parameter</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.3.2.3.1.2"><p id="mrs_01_1743__en-us_topic_0000001173631252_p7379035111116">Description</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="mrs_01_1743__en-us_topic_0000001173631252_row23791356115"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.3.2.3.1.1 "><p id="mrs_01_1743__en-us_topic_0000001173631252_p203799351118">AutoAddOne</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.3.2.3.1.2 "><p id="mrs_01_1743__en-us_topic_0000001173631252_p4379103511120">Adds <strong id="mrs_01_1743__en-us_topic_0000001173631252_b155371852116">1</strong> to the input value and returns the result.</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
<div class="note" id="mrs_01_1743__en-us_topic_0000001173631252_note7542232154314"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><ul id="mrs_01_1743__en-us_topic_0000001173631252_ul254218327435"><li id="mrs_01_1743__en-us_topic_0000001173631252_li5542113214314">A common Hive UDF must be inherited from <strong id="mrs_01_1743__en-us_topic_0000001173631252_b1647968413755">org.apache.hadoop.hive.ql.exec.UDF</strong>.</li><li id="mrs_01_1743__en-us_topic_0000001173631252_li20542173217437">A common Hive UDF must implement at least one <strong id="mrs_01_1743__en-us_topic_0000001173631252_b509058238755">evaluate()</strong>. The <strong id="mrs_01_1743__en-us_topic_0000001173631252_b0898457297">evaluate</strong> function supports overloading.</li><li id="mrs_01_1743__en-us_topic_0000001173631252_li165421832104313">Currently, only the following data types are supported:<ul id="mrs_01_1743__en-us_topic_0000001173631252_ul1954273214431"><li id="mrs_01_1743__en-us_topic_0000001173631252_li1554213217431">boolean, byte, short, int, long, float, and double</li><li id="mrs_01_1743__en-us_topic_0000001173631252_li5542232114313">Boolean, Byte, Short, Int, Long, Float, and Double</li><li id="mrs_01_1743__en-us_topic_0000001173631252_li19542173211439">List and Map</li></ul>
|
|
<p id="mrs_01_1743__en-us_topic_0000001173631252_p5740124714469">UDFs, UDAFs, and UDTFs currently do not support complex data types other than the preceding ones.</p>
|
|
</li><li id="mrs_01_1743__en-us_topic_0000001173631252_li13542193219437">Currently, Hive UDFs supports only less than or equal to five input parameters. UDFs with more than five input parameters will fail to be registered.</li><li id="mrs_01_1743__en-us_topic_0000001173631252_li14542163254310">If the input parameter of a Hive UDF is <strong id="mrs_01_1743__en-us_topic_0000001173631252_b284612030755">null</strong>, the call returns <strong id="mrs_01_1743__en-us_topic_0000001173631252_b1515319193755">null</strong> directly without parsing the Hive UDF logic. As a result, the UDF execution result may be inconsistent with the Hive execution result.</li><li id="mrs_01_1743__en-us_topic_0000001173631252_li85438320438">To add the <strong id="mrs_01_1743__en-us_topic_0000001173631252_b8758124132211">hive-exec-3.1.1</strong> dependency package to the Maven project, you can obtain the package from the Hive installation directory.</li><li id="mrs_01_1743__en-us_topic_0000001173631252_li1854363216433">(Optional) If the Hive UDF depends on a configuration file, you are advised to save the configuration file as a resource file in the <strong id="mrs_01_1743__en-us_topic_0000001173631252_b333353253918">resources</strong> directory so that it can be packed into the Hive UDF function package.</li></ul>
|
|
</div></div>
|
|
<ol id="mrs_01_1743__en-us_topic_0000001173631252_ol152112282303"><li id="mrs_01_1743__en-us_topic_0000001173631252_li182117286305"><span>Create a Maven project. Set <strong id="mrs_01_1743__en-us_topic_0000001173631252_b46531047183412">groupId</strong> to <strong id="mrs_01_1743__en-us_topic_0000001173631252_b566118477341">com.test.udf</strong> and <strong id="mrs_01_1743__en-us_topic_0000001173631252_b1666374711345">artifactId</strong> to <strong id="mrs_01_1743__en-us_topic_0000001173631252_b2664124720340">udf-test</strong>. The two values can be customized based on the site requirements.</span></li><li id="mrs_01_1743__en-us_topic_0000001173631252_li536320406309"><span>Modify the <strong id="mrs_01_1743__en-us_topic_0000001173631252_b1629490027755">pom.xml</strong> file as follows:</span><p><pre class="screen" id="mrs_01_1743__en-us_topic_0000001173631252_screen10909112212313"><project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
|
|
<modelVersion>4.0.0</modelVersion>
|
|
<groupId>com.test.udf</groupId>
|
|
<artifactId>udf-test</artifactId>
|
|
<version>0.0.1-SNAPSHOT</version>
|
|
|
|
<dependencies>
|
|
<dependency>
|
|
<groupId>org.apache.hive</groupId>
|
|
<artifactId>hive-exec</artifactId>
|
|
<version>3.1.1</version>
|
|
</dependency>
|
|
</dependencies>
|
|
|
|
<build>
|
|
<plugins>
|
|
<plugin>
|
|
<artifactId>maven-shade-plugin</artifactId>
|
|
<executions>
|
|
<execution>
|
|
<phase>package</phase>
|
|
<goals>
|
|
<goal>shade</goal>
|
|
</goals>
|
|
</execution>
|
|
</executions>
|
|
</plugin>
|
|
<plugin>
|
|
<artifactId>maven-resources-plugin</artifactId>
|
|
<executions>
|
|
<execution>
|
|
<id>copy-resources</id>
|
|
<phase>package</phase>
|
|
<goals>
|
|
<goal>copy-resources</goal>
|
|
</goals>
|
|
<configuration>
|
|
<outputDirectory>${project.build.directory}/</outputDirectory>
|
|
<resources>
|
|
<resource>
|
|
<directory>src/main/resources/</directory>
|
|
<filtering>false</filtering>
|
|
</resource>
|
|
</resources>
|
|
</configuration>
|
|
</execution>
|
|
</executions>
|
|
</plugin>
|
|
</plugins>
|
|
</build>
|
|
</project></pre>
|
|
</p></li><li id="mrs_01_1743__en-us_topic_0000001173631252_li165141153119"><span>Create the implementation class of the Hive UDF.</span><p><pre class="screen" id="mrs_01_1743__en-us_topic_0000001173631252_screen10386185323114">import org.apache.hadoop.hive.ql.exec.UDF;
|
|
|
|
/**
|
|
* AutoAddOne
|
|
*
|
|
* @since 2020-08-24
|
|
*/
|
|
public class AutoAddOne extends UDF {
|
|
public int evaluate(int data) {
|
|
return data + 1;
|
|
}
|
|
}</pre>
|
|
</p></li><li id="mrs_01_1743__en-us_topic_0000001173631252_li16170114019312"><span>Package the Maven project. The <strong id="mrs_01_1743__en-us_topic_0000001173631252_b16969336351">udf-test-0.0.1-SNAPSHOT.jar</strong> file in the <strong id="mrs_01_1743__en-us_topic_0000001173631252_b20977133113511">target</strong> directory is the Hive UDF function package.</span></li></ol>
|
|
</div>
|
|
<div class="section" id="mrs_01_1743__en-us_topic_0000001173631252_section156542610420"><h4 class="sectiontitle">Configuring Hive UDFs</h4><p id="mrs_01_1743__en-us_topic_0000001173631252_p167047494915">In configuration file <span class="filepath" id="mrs_01_1743__en-us_topic_0000001173631252_filepath1880487472755"><b>udf.properties</b></span>, add registration information in the "Function_name Class_path" format to each line.</p>
|
|
<p id="mrs_01_1743__en-us_topic_0000001173631252_p49497258428">The following provides an example of registering four Hive UDFs in configuration file <strong id="mrs_01_1743__en-us_topic_0000001173631252_b169577465467">udf.properties</strong>:</p>
|
|
<pre class="screen" id="mrs_01_1743__en-us_topic_0000001173631252_screen395002516424">booleanudf io.hetu.core.hive.dynamicfunctions.examples.udf.BooleanUDF
|
|
shortudf io.hetu.core.hive.dynamicfunctions.examples.udf.ShortUDF
|
|
byteudf io.hetu.core.hive.dynamicfunctions.examples.udf.ByteUDF
|
|
intudf io.hetu.core.hive.dynamicfunctions.examples.udf.IntUDF</pre>
|
|
<div class="note" id="mrs_01_1743__en-us_topic_0000001173631252_note11779471994"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><ul id="mrs_01_1743__en-us_topic_0000001173631252_ul99578559910"><li id="mrs_01_1743__en-us_topic_0000001173631252_li4950325174216">If the added Hive UDF registration information is incorrect, for example, the format is incorrect or the class path does not exist, the system ignores the incorrect registration information and prints the corresponding logs.</li><li id="mrs_01_1743__en-us_topic_0000001173631252_li295012574217">If duplicate Hive UDFs are registered, the system will only register once and ignore the duplicate registrations.</li><li id="mrs_01_1743__en-us_topic_0000001173631252_li1695712555910">If the Hive UDF to be registered is the same as that already registered in the system, the system throws an exception and cannot be started properly. To solve this problem, you need to delete the Hive UDF registration information.</li></ul>
|
|
</div></div>
|
|
</div>
|
|
<div class="section" id="mrs_01_1743__en-us_topic_0000001173631252_section119871146144711"><h4 class="sectiontitle">Deploying Hive UDFs</h4><p id="mrs_01_1743__en-us_topic_0000001173631252_p882514199479">To use an existing Hive UDF in <span id="mrs_01_1743__en-us_topic_0000001173631252_text15197175511819">HetuEngine</span>, you need to upload the UDF function package, <span class="filepath" id="mrs_01_1743__en-us_topic_0000001173631252_filepath1075011510312"><b>udf.properties</b></span> file, and configuration file on which the UDF depends to the specified HDFS directory, for example, <span class="filepath" id="mrs_01_1743__en-us_topic_0000001173631252_filepath1623517528365"><b>/user/hetuserver/udf/</b></span>, and restart the <span id="mrs_01_1743__en-us_topic_0000001173631252_text1099563411562">HetuEngine</span> compute instance.</p>
|
|
</div>
|
|
<ol id="mrs_01_1743__en-us_topic_0000001173631252_ol12825151912479"><li id="mrs_01_1743__en-us_topic_0000001173631252_li1282516192477"><span>Create the <strong id="mrs_01_1743__en-us_topic_0000001173631252_b1415178019755">/user/hetuserver/udf/data/externalFunctions</strong> directory, save the <strong id="mrs_01_1743__en-us_topic_0000001173631252_b607813456755">udf.properties</strong> file in the <strong id="mrs_01_1743__en-us_topic_0000001173631252_b662731899755">/user/hetuserver/udf</strong> directory, save the UDF function package in the <strong id="mrs_01_1743__en-us_topic_0000001173631252_b1472408419755">/user/hetuserver/udf/data/externalFunctions</strong> directory, and save the configuration files on which the UDF depends in the <strong id="mrs_01_1743__en-us_topic_0000001173631252_b1278783047755">/user/hetuserver/udf/data</strong> directory.</span><p><ul id="mrs_01_1743__en-us_topic_0000001173631252_ul1482541916470"><li id="mrs_01_1743__en-us_topic_0000001173631252_li169014196404">Upload the files on the HDFS page:<ol type="a" id="mrs_01_1743__en-us_topic_0000001173631252_ol15369144216613"><li id="mrs_01_1743__en-us_topic_0000001173631252_li1410104618619">Log in to FusionInsight Manager using the <span id="mrs_01_1743__en-us_topic_0000001173631252_text11891102115415">HetuEngine</span> username and choose <strong id="mrs_01_1743__en-us_topic_0000001173631252_b13653859134018">Cluster</strong> > <strong id="mrs_01_1743__en-us_topic_0000001173631252_b13655185914408">Services</strong> > <strong id="mrs_01_1743__en-us_topic_0000001173631252_b965715919405">HDFS</strong>.</li><li id="mrs_01_1743__en-us_topic_0000001173631252_li154083502619">In the <strong id="mrs_01_1743__en-us_topic_0000001173631252_b12257597417">Basic Information</strong> area on the <strong id="mrs_01_1743__en-us_topic_0000001173631252_b522612590410">Dashboard</strong> page, click the link next to <strong id="mrs_01_1743__en-us_topic_0000001173631252_b11226135919413">NameNode WebUI</strong>.</li><li id="mrs_01_1743__en-us_topic_0000001173631252_li163691942365">Choose <strong id="mrs_01_1743__en-us_topic_0000001173631252_b36741187391">Utilities</strong> > <strong id="mrs_01_1743__en-us_topic_0000001173631252_b77068229398">Browse the file system</strong> and click <span><img id="mrs_01_1743__en-us_topic_0000001173631252_image126191230477" src="en-us_image_0000001295740228.png"></span> to create the <span class="filepath" id="mrs_01_1743__en-us_topic_0000001173631252_filepath1949368780755"><b>/user/hetuserver/udf/data/externalFunctions</b></span> directory.</li><li id="mrs_01_1743__en-us_topic_0000001173631252_li9779151119">Go to <span class="filepath" id="mrs_01_1743__en-us_topic_0000001173631252_filepath139359324238"><b>/user/hetuserver/udf</b></span> and click <span><img id="mrs_01_1743__en-us_topic_0000001173631252_image1417364610238" src="en-us_image_0000001349259325.png"></span> to upload the <span class="filepath" id="mrs_01_1743__en-us_topic_0000001173631252_filepath14469101513248"><b>udf.properties</b></span> file.</li><li id="mrs_01_1743__en-us_topic_0000001173631252_li47257316253">Go to the <span class="filepath" id="mrs_01_1743__en-us_topic_0000001173631252_filepath083419131253"><b>/user/hetuserver/udf/data/</b></span> directory and click <span><img id="mrs_01_1743__en-us_topic_0000001173631252_image740742017251" src="en-us_image_0000001349059877.png"></span> to upload the configuration file on which the UDF depends.</li><li id="mrs_01_1743__en-us_topic_0000001173631252_li5803118132411">Go to the <span class="filepath" id="mrs_01_1743__en-us_topic_0000001173631252_filepath13712163018241"><b>/user/hetuserver/udf/data/externalFunctions</b></span> directory and click <span><img id="mrs_01_1743__en-us_topic_0000001173631252_image1876111377245" src="en-us_image_0000001296060032.png"></span> to upload the UDF function package.</li></ol>
|
|
</li></ul>
|
|
<ul id="mrs_01_1743__en-us_topic_0000001173631252_ul182541944718"><li id="mrs_01_1743__en-us_topic_0000001173631252_li282561924712">Use the HDFS CLI to upload the files.<ol type="a" id="mrs_01_1743__en-us_topic_0000001173631252_ol14191015201410"><li id="mrs_01_1743__en-us_topic_0000001173631252_li4307516142219">Log in to the node where the HDFS service client is located and switch to the client installation directory, for example, <strong id="mrs_01_1743__en-us_topic_0000001173631252_b16209171113461">/opt/</strong><strong id="mrs_01_1743__en-us_topic_0000001173631252_b3209111164613"></strong><strong id="mrs_01_1743__en-us_topic_0000001173631252_b182091911104613">client</strong>.<p id="mrs_01_1743__en-us_topic_0000001173631252_p310152911224"><strong id="mrs_01_1743__en-us_topic_0000001173631252_b31941511148">cd /opt/client</strong></p>
|
|
</li><li id="mrs_01_1743__en-us_topic_0000001173631252_li3910629141247">Run the following command to configure environment variables:<p id="mrs_01_1743__en-us_topic_0000001173631252_p30260676141247"><a name="mrs_01_1743__en-us_topic_0000001173631252_li3910629141247"></a><a name="en-us_topic_0000001173631252_li3910629141247"></a><strong id="mrs_01_1743__en-us_topic_0000001173631252_b10818837141247">source bigdata_env</strong></p>
|
|
</li><li id="mrs_01_1743__en-us_topic_0000001173631252_li1584913952810">If the cluster is in security mode, run the following command to authenticate the user. In normal mode, skip user authentication.<p id="mrs_01_1743__en-us_topic_0000001173631252_p63101812102911"><a name="mrs_01_1743__en-us_topic_0000001173631252_li1584913952810"></a><a name="en-us_topic_0000001173631252_li1584913952810"></a><strong id="mrs_01_1743__en-us_topic_0000001173631252_b6986152714296">kinit</strong><em id="mrs_01_1743__en-us_topic_0000001173631252_i643812375414"> </em><em id="mrs_01_1743__en-us_topic_0000001173631252_i1694713152515"><span id="mrs_01_1743__en-us_topic_0000001173631252_text1183395195115">HetuEngine</span></em> <em id="mrs_01_1743__en-us_topic_0000001173631252_i17950141565112">username</em></p>
|
|
<p id="mrs_01_1743__en-us_topic_0000001173631252_p3191715131413">Enter the password as prompted.</p>
|
|
</li><li id="mrs_01_1743__en-us_topic_0000001173631252_li7279132517313">Run the following commands to create directories and upload the prepared UDF function package, <span class="filepath" id="mrs_01_1743__en-us_topic_0000001173631252_filepath1061518323618"><b>udf.properties</b></span> file, and configuration file on which the UDF depends to the target directories:<p id="mrs_01_1743__en-us_topic_0000001173631252_p164261544716"><strong id="mrs_01_1743__en-us_topic_0000001173631252_b1572519133716">hdfs dfs -mkdir </strong><strong id="mrs_01_1743__en-us_topic_0000001173631252_b9820837107">/user/hetuserver/udf/data/externalFunctions</strong></p>
|
|
<p id="mrs_01_1743__en-us_topic_0000001173631252_p2825111994715"><strong id="mrs_01_1743__en-us_topic_0000001173631252_b9706134011217">hdfs dfs -put ./</strong><em id="mrs_01_1743__en-us_topic_0000001173631252_i1272804071215">Configuration files on which the UDF depends</em><strong id="mrs_01_1743__en-us_topic_0000001173631252_b14706104016126"> /user/hetuserver/udf/data</strong></p>
|
|
<p id="mrs_01_1743__en-us_topic_0000001173631252_p6825219144714"><strong id="mrs_01_1743__en-us_topic_0000001173631252_b16305165220121">hdfs dfs -put ./udf.properties /user/hetuserver/udf</strong></p>
|
|
<p id="mrs_01_1743__en-us_topic_0000001173631252_p8252854171212"><strong id="mrs_01_1743__en-us_topic_0000001173631252_b77861971315">hdfs dfs -put ./</strong><em id="mrs_01_1743__en-us_topic_0000001173631252_i47521720161315">UDF function package</em> <strong id="mrs_01_1743__en-us_topic_0000001173631252_b16295203501317">/user/hetuserver/udf/data/externalFunctions</strong></p>
|
|
</li></ol>
|
|
</li></ul>
|
|
</p></li><li id="mrs_01_1743__en-us_topic_0000001173631252_li13185112234"><span>Restart the <span id="mrs_01_1743__en-us_topic_0000001173631252_text444354171819">HetuEngine</span> compute instance.</span></li></ol>
|
|
<div class="section" id="mrs_01_1743__en-us_topic_0000001173631252_section43751414184815"><h4 class="sectiontitle">Using Hive UDFs</h4><p id="mrs_01_1743__en-us_topic_0000001173631252_p1382514197471">Use a client to access a Hive UDF:</p>
|
|
<ol id="mrs_01_1743__en-us_topic_0000001173631252_ol6304194111615"><li id="mrs_01_1743__en-us_topic_0000001173631252_li12593164118174">Log in to the HetuEngine client. For details, see <a href="mrs_01_1737.html">Using the HetuEngine Client</a>.</li><li id="mrs_01_1743__en-us_topic_0000001173631252_li142982318208">Run the following command to use a Hive UDF:<p id="mrs_01_1743__en-us_topic_0000001173631252_p996844819222"><a name="mrs_01_1743__en-us_topic_0000001173631252_li142982318208"></a><a name="en-us_topic_0000001173631252_li142982318208"></a><strong id="mrs_01_1743__en-us_topic_0000001173631252_b7404510162511">select AutoAddOne(1);</strong></p>
|
|
<pre class="screen" id="mrs_01_1743__en-us_topic_0000001173631252_screen1916721673315">select AutoAddOne(1);
|
|
_col0
|
|
-------
|
|
2
|
|
(1 row)</pre>
|
|
</li></ol>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_2338.html">Function & UDF Development and Application</a></div>
|
|
</div>
|
|
</div>
|
|
|