DLI allows you to use a Hive User Defined Aggregation Function (UDAF) to process multiple rows of data. Hive UDAF is usually used together with groupBy. It is equivalent to SUM() and AVG() commonly used in SQL and is also an aggregation function.
To grant required permissions, log in to the DLI console and choose Data Management > Package Management. On the displayed page, select your UDAF Jar package and click Manage Permissions in the Operation column. On the permission management page, click Grant Permission in the upper right corner and select the required permissions.
Before you start, set up the development environment.
Item |
Description |
---|---|
OS |
Windows 7 or later |
JDK |
JDK 1.8 (Java downloads). |
IntelliJ IDEA |
IntelliJ IDEA is used for application development. The version of the tool must be 2019.1 or later. |
Maven |
Basic configuration of the development environment. For details about how to get started, see Downloading Apache Maven and Installing Apache Maven. Maven is used for project management throughout the lifecycle of software development. |
The following figure shows the process of developing a UDAF.
No. |
Phase |
Software Portal |
Description |
---|---|---|---|
1 |
Create a Maven project and configure the POM file. |
IntelliJ IDEA |
Compile the UDAF function code by referring to the Procedure description. |
2 |
Editing UDAF code |
||
3 |
Debug, compile, and pack the code into a Jar package. |
||
4 |
Upload the Jar package to OBS. |
OBS console |
Upload the UDAF Jar file to an OBS path. |
5 |
Create a DLI package. |
DLI console |
Select the UDAF Jar file that has been uploaded to OBS for management. |
6 |
Create a UDAF on DLI. |
DLI console |
Create a UDAF on the SQL job management page of the DLI console. |
7 |
Verify and use the UDAF. |
DLI console |
Use the UDAF in your DLI job. |
<dependencies> <dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-exec</artifactId> <version>1.2.1</version> </dependency> </dependencies>
Create a Java Class file in the package path. In this example, the Java Class file is AvgFilterUDAFDemo.
For details about how to implement the UDAF, see the following sample code:
package com.dli.demo; import org.apache.hadoop.hive.ql.exec.UDAF; import org.apache.hadoop.hive.ql.exec.UDAFEvaluator; /*** * @jdk jdk1.8.0 * @version 1.0 ***/ public class AvgFilterUDAFDemo extends UDAF { /** * Defines the static inner class AvgFilter. */ public static class PartialResult { public Long sum; } public static class VarianceEvaluator implements UDAFEvaluator { // Initializes the PartialResult object. private AvgFilterUDAFDemo.PartialResult partial; // Declares a VarianceEvaluator constructor that has no parameters. public VarianceEvaluator(){ this.partial = new AvgFilterUDAFDemo.PartialResult(); init(); } /** * Initializes the UDAF, which is similar to a constructor. */ @Override public void init() { // Sets the initial value of sum. this.partial.sum = 0L; } /** * Receives input parameters for internal iteration. * @param x * @return */ public void iterate(Long x) { if (x == null) { return; } AvgFilterUDAFDemo.PartialResult tmp9_6 = this.partial; tmp9_6.sum = tmp9_6.sum | x; } /** * Returns the data obtained after the iterate traversal is complete. * terminatePartial is similar to Hadoop Combiner. * @return */ public AvgFilterUDAFDemo.PartialResult terminatePartial() { return this.partial; } /** * Receives the return values of terminatePartial and merges the data. * @param * @return */ public void merge(AvgFilterUDAFDemo.PartialResult pr) { if (pr == null) { return; } AvgFilterUDAFDemo.PartialResult tmp9_6 = this.partial; tmp9_6.sum = tmp9_6.sum | pr.sum; } /** * Returns the aggregated result. * @return */ public Long terminate() { if (this.partial.sum == null) { return 0L; } return this.partial.sum; } } }
After the compilation is successful, click package.
The region of the OBS bucket to which the Jar package is uploaded must be the same as the region of the DLI queue. Cross-region operations are not allowed.
If the reloading function of the UDAF is enabled, the create statement changes.
CREATE FUNCTION AvgFilterUDAFDemo AS 'com.dli.demo.AvgFilterUDAFDemo' using jar 'obs://dli-test-obs01/MyUDAF-1.0-SNAPSHOT.jar';
Or
CREATE OR REPLACE FUNCTION AvgFilterUDAFDemo AS 'com.dli.demo.AvgFilterUDAFDemo' using jar 'obs://dli-test-obs01/MyUDAF-1.0-SNAPSHOT.jar';
Use the UDAF function created in 6 in the query statement:
select AvgFilterUDAFDemo(real_stock_rate) AS show_rate FROM dw_ad_estimate_real_stock_rate limit 1000;
If the UDAF is no longer used, run the following statement to delete it:
Drop FUNCTION AvgFilterUDAFDemo;