Yang, Tong 3f5759eed2 MRS comp-lts 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2023-01-19 17:08:45 +00:00

45 lines
11 KiB
HTML

<a name="mrs_01_0963"></a><a name="mrs_01_0963"></a>
<h1 class="topictitle1">Creating User-Defined Hive Functions</h1>
<div id="body8662426"><p id="mrs_01_0963__en-us_topic_0000001173470732_a60a5a1b608ee44c68072f23f06efd943">When built-in functions of Hive cannot meet requirements, you can compile user-defined functions (UDFs) and use them for query.</p>
<p id="mrs_01_0963__en-us_topic_0000001173470732_ad349741b3dac46cc9f4faa40d1955eae">According to implementation methods, UDFs are classified as follows:</p>
<ul id="mrs_01_0963__en-us_topic_0000001173470732_uf147e22894d844908b644c6d8cdc6201"><li id="mrs_01_0963__en-us_topic_0000001173470732_l949bbfac83d741c487bd3bc76291c98f">Common UDFs: used to perform operations on a single data row and export a single data row.</li><li id="mrs_01_0963__en-us_topic_0000001173470732_lf71170cbd96f4fc7902d55bfac9d7a6d">User-defined aggregating functions (UDAFs): used to input multiple data rows and export a single data row.</li><li id="mrs_01_0963__en-us_topic_0000001173470732_l7232c97221be4281a191b8e17681e060">User-defined table-generating functions (UDTFs): used to perform operations on a single data row and export multiple data rows.</li></ul>
<p id="mrs_01_0963__en-us_topic_0000001173470732_af5567575102a40f4a425b15c9f054a5c">According to use methods, UDFs are classified as follows:</p>
<ul id="mrs_01_0963__en-us_topic_0000001173470732_u2a7ac4887a274417afe9b5a77c129870"><li id="mrs_01_0963__en-us_topic_0000001173470732_l373ce7074ebc4580885b5ff5495f4d69">Temporary functions: used only in the current session and must be recreated after a session restarts.</li><li id="mrs_01_0963__en-us_topic_0000001173470732_l1e198b2726844384b6b647bc29fb2736">Permanent functions: used in multiple sessions. You do not need to create them every time a session restarts.</li></ul>
<div class="note" id="mrs_01_0963__en-us_topic_0000001173470732_note145022591363"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="mrs_01_0963__en-us_topic_0000001173470732_p8503125917619">You need to properly control the memory and thread usage of variables in UDFs. Improper control may cause memory overflow or high CPU usage.</p>
</div></div>
<p id="mrs_01_0963__en-us_topic_0000001173470732_a29461dedc78148ec83cac953827764e5">The following uses AddDoublesUDF as an example to describe how to compile and use UDFs.</p>
<div class="section" id="mrs_01_0963__en-us_topic_0000001173470732_sc8d5f41b8e7344b7a70edb1821f04cff"><h4 class="sectiontitle">Function</h4><p id="mrs_01_0963__en-us_topic_0000001173470732_a914e691e112d4ca2bd08852a1ae3abe6">AddDoublesUDF is used to add two or more floating point numbers. In this example, you can learn how to write and use UDFs.</p>
<div class="note" id="mrs_01_0963__en-us_topic_0000001173470732_n33d39352c9d34d388c11c9d882d85da0"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><ul id="mrs_01_0963__en-us_topic_0000001173470732_u489e3518c6f44c4f98320121fd7cb5cf"><li id="mrs_01_0963__en-us_topic_0000001173470732_lbc7bf60a28c3429ab6ac104a8b267f9f">A common UDF must be inherited from <strong id="mrs_01_0963__en-us_topic_0000001173470732_b842352706105541">org.apache.hadoop.hive.ql.exec.UDF</strong>.</li><li id="mrs_01_0963__en-us_topic_0000001173470732_le8ff31da611248c08a11f30fa4d128d4">A common UDF must implement at least one <strong id="mrs_01_0963__en-us_topic_0000001173470732_b84235270610568">evaluate()</strong>. The evaluate function supports overloading.</li><li id="mrs_01_0963__en-us_topic_0000001173470732_li450787289373">To develop a customized function, you need to add the <span class="filepath" id="mrs_01_0963__en-us_topic_0000001173470732_filepath16521123820322"><b>hive-exec-3.1.0.jar</b></span> dependency package to the project. The package can be obtained from the Hive installation directory.</li></ul>
</div></div>
</div>
<div class="section" id="mrs_01_0963__en-us_topic_0000001173470732_s365ed277169640a58ffa5fddd248363d"><h4 class="sectiontitle">How to Use</h4><ol id="mrs_01_0963__en-us_topic_0000001173470732_o55b3780bd33a4e99a90a25eb66c73be9"><li id="mrs_01_0963__en-us_topic_0000001173470732_li443984543315"><span>Packing programs as <strong id="mrs_01_0963__en-us_topic_0000001173470732_b3528019184515">AddDoublesUDF.jar</strong> on the client node, and upload the package to a specified directory in HDFS, for example, <strong id="mrs_01_0963__en-us_topic_0000001173470732_b1618194914454">/user/hive_examples_jars</strong>.</span><p><p id="mrs_01_0963__en-us_topic_0000001173470732_p10942104593318">Both the user who creates the function and the user who uses the function must have the read permission on the file.</p>
<p id="mrs_01_0963__en-us_topic_0000001173470732_p1774301114334">The following are example statements:</p>
<p id="mrs_01_0963__en-us_topic_0000001173470732_a0894581a020844afa187a0527068a9a5"><strong id="mrs_01_0963__en-us_topic_0000001173470732_a4e12b401cbca4bccbabb253ebcb2fde7">hdfs dfs -put ./hive_examples_jars /user/hive_examples_jars</strong></p>
<p id="mrs_01_0963__en-us_topic_0000001173470732_ae0d7a9c93f6b405ab26ca3d3d25fb3c1"><strong id="mrs_01_0963__en-us_topic_0000001173470732_aa97fcf7c4aef41a7803a832654842305">hdfs dfs -chmod 777 /user/hive_examples_jars</strong></p>
</p></li><li id="mrs_01_0963__en-us_topic_0000001173470732_l40e080dd84f04a1aa35a7bf075d8cf3f"><span>Check the cluster authentication mode.</span><p><ul id="mrs_01_0963__en-us_topic_0000001173470732_u49f107234b184807a4786d87c4fce397"><li id="mrs_01_0963__en-us_topic_0000001173470732_lfd2467f253494013b1f12eb7b9a008cf">In security mode, log in to the beeline client as a user with the Hive management permission and run the following commands:<p id="mrs_01_0963__en-us_topic_0000001173470732_a0c37def60e9d4e37ae8facefad47ace4"><a name="mrs_01_0963__en-us_topic_0000001173470732_lfd2467f253494013b1f12eb7b9a008cf"></a><a name="en-us_topic_0000001173470732_lfd2467f253494013b1f12eb7b9a008cf"></a><strong id="mrs_01_0963__en-us_topic_0000001173470732_b82538355547">kinit</strong> <em id="mrs_01_0963__en-us_topic_0000001173470732_i82589357542">Hive service user</em></p>
<p id="mrs_01_0963__en-us_topic_0000001173470732_a2009b9ae87034e4daced4ba9105df504"><strong id="mrs_01_0963__en-us_topic_0000001173470732_a2d671abe4e6d41dcac433684011047b1">beeline</strong></p>
<p class="litext" id="mrs_01_0963__en-us_topic_0000001173470732_a59c3078e67b643c9b92e74d57fd0e1a2"><strong id="mrs_01_0963__en-us_topic_0000001173470732_a60c3f6a8d0e846e599917b8d8163a1d3">set role admin;</strong></p>
</li><li id="mrs_01_0963__en-us_topic_0000001173470732_l659c67d920e7492a92a8afa6b8f9a3dc">In common mode, run the following command:<p id="mrs_01_0963__en-us_topic_0000001173470732_ad995c65aa3064db69efc613ebe4dca07"><a name="mrs_01_0963__en-us_topic_0000001173470732_l659c67d920e7492a92a8afa6b8f9a3dc"></a><a name="en-us_topic_0000001173470732_l659c67d920e7492a92a8afa6b8f9a3dc"></a><strong id="mrs_01_0963__en-us_topic_0000001173470732_acee70dec3f4044f782a44b9eb22dd864">beeline -n</strong> <i><span class="varname" id="mrs_01_0963__en-us_topic_0000001173470732_v449abd5b9d1d4dc49d4cc6115006533d">Hive service user</span></i></p>
</li></ul>
</p></li><li id="mrs_01_0963__en-us_topic_0000001173470732_l20db81d1f9904dd1b7002dc7232adf79"><span>Define the function in HiveServer. Run the following SQL statement to create a permanent function:</span><p><p class="litext" id="mrs_01_0963__en-us_topic_0000001173470732_a3a06344ed1fa4faea7607e9904b748c6"><strong id="mrs_01_0963__en-us_topic_0000001173470732_a2f1d3b5bbce0427391908db672266153">CREATE FUNCTION </strong><i><span class="varname" id="mrs_01_0963__en-us_topic_0000001173470732_va89f38d4568d4b2e91195cda5a69ab13">addDoubles</span></i><strong id="mrs_01_0963__en-us_topic_0000001173470732_b915198712651"> AS 'com.xxx.bigdata.hive.example.udf.AddDoublesUDF' <strong id="mrs_01_0963__en-us_topic_0000001173470732_b5321267012651">using jar 'hdfs://hacluster</strong>/user/hive_examples_jars/AddDoublesUDF.jar<strong id="mrs_01_0963__en-us_topic_0000001173470732_a8dc8148d493a4b2bab730c6e54f719af">'</strong>;</strong></p>
<p id="mrs_01_0963__en-us_topic_0000001173470732_a8b202ecadcff4dd4bc43841fab503f24"><i><span class="varname" id="mrs_01_0963__en-us_topic_0000001173470732_v1ae1119ab9ab49d181c4edf35842a50a">addDoubles</span></i> indicates the function alias that is used for SELECT query.</p>
<p id="mrs_01_0963__en-us_topic_0000001173470732_a0bc1f49f6c3d4c508b45fe2ad200caf7">Run the following statement to create a temporary function:</p>
<p class="litext" id="mrs_01_0963__en-us_topic_0000001173470732_a3b75404b2f69429eaf35af310244f8fb"><strong id="mrs_01_0963__en-us_topic_0000001173470732_b16421141246">CREATE TEMPORARY FUNCTION addDoubles AS 'com.xxx.bigdata.hive.example.udf.AddDoublesUDF' <strong id="mrs_01_0963__en-us_topic_0000001173470732_b150955381246">using jar 'hdfs://hacluster</strong>/user/hive_examples_jars/AddDoublesUDF.jar<strong id="mrs_01_0963__en-us_topic_0000001173470732_a78d8f53a684449db8c547440f4cb1de0">'</strong>;</strong></p>
<ul class="subitemlist" id="mrs_01_0963__en-us_topic_0000001173470732_uced363da54414b978d5f19ee3fed696c"><li id="mrs_01_0963__en-us_topic_0000001173470732_l6baeebb205514efb854dcb5e2eb57b08"><em id="mrs_01_0963__en-us_topic_0000001173470732_en-us_topic_0035209743_i38788755">addDoubles</em> indicates the function alias that is used for SELECT query.</li><li id="mrs_01_0963__en-us_topic_0000001173470732_l29607837122a4f50a64a23fd3be5c673"><strong id="mrs_01_0963__en-us_topic_0000001173470732_b8423527061152">TEMPORARY</strong> indicates that the function is used only in the current session with the HiveServer.</li></ul>
</p></li><li id="mrs_01_0963__en-us_topic_0000001173470732_l537ef8fb81144045b9d8cd7d3376b05a"><span>Run the following SQL statement to use the function on the HiveServer:</span><p><p id="mrs_01_0963__en-us_topic_0000001173470732_aa95def24601f4796b10086b41eba78de"><strong id="mrs_01_0963__en-us_topic_0000001173470732_afc48bc1771744895914c77f7793c57d6">SELECT addDoubles(1,2,3);</strong></p>
<div class="note" id="mrs_01_0963__en-us_topic_0000001173470732_n824507b89e8347a692579d82510873c0"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="mrs_01_0963__en-us_topic_0000001173470732_aa4dbfdf5766c45b6aa8e82397432734c">If an [Error 10011] error is displayed when you log in to the client again, run the <strong id="mrs_01_0963__en-us_topic_0000001173470732_af531a0e1e31545159ffd6333d77df7dd">reload function;</strong> command and then use this function.</p>
</div></div>
</p></li><li id="mrs_01_0963__en-us_topic_0000001173470732_lcf3b374c6266450eae1b0b94109c0e0d"><span>Run the following SQL statement to delete the function from the HiveServer:</span><p><p id="mrs_01_0963__en-us_topic_0000001173470732_ae46ccf57ec234be29e25ad5ad56c16d1"><strong id="mrs_01_0963__en-us_topic_0000001173470732_a9b0eda085945411f9961c8a68d986717">DROP FUNCTION addDoubles;</strong></p>
</p></li></ol>
</div>
<div class="section" id="mrs_01_0963__en-us_topic_0000001173470732_s19faa45c00ac4afd8db3ed07e533acf8"><h4 class="sectiontitle">Extended Applications</h4><p id="mrs_01_0963__en-us_topic_0000001173470732_a934629ff8cf04f689148968a3eba9a58">None</p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_0581.html">Using Hive</a></div>
</div>
</div>