When built-in functions of Hive cannot meet requirements, you can compile user-defined functions (UDFs) and use them for query.
According to implementation methods, UDFs are classified as follows:
According to use methods, UDFs are classified as follows:
You need to properly control the memory and thread usage of variables in UDFs. Improper control may cause memory overflow or high CPU usage.
The following uses AddDoublesUDF as an example to describe how to compile and use UDFs.
AddDoublesUDF is used to add two or more floating point numbers. In this example, you can learn how to write and use UDFs.
Both the user who creates the function and the user who uses the function must have the read permission on the file.
The following are example statements:
hdfs dfs -put ./hive_examples_jars /user/hive_examples_jars
hdfs dfs -chmod 777 /user/hive_examples_jars
CREATE FUNCTION addDoubles AS 'com.xxx.bigdata.hive.example.udf.AddDoublesUDF' using jar 'hdfs://hacluster/user/hive_examples_jars/AddDoublesUDF.jar';
addDoubles indicates the function alias that is used for SELECT query.
Run the following statement to create a temporary function:
CREATE TEMPORARY FUNCTION addDoubles AS 'com.xxx.bigdata.hive.example.udf.AddDoublesUDF' using jar 'hdfs://hacluster/user/hive_examples_jars/AddDoublesUDF.jar';
SELECT addDoubles(1,2,3);
If an [Error 10011] error is displayed when you log in to the client again, run the reload function; command and then use this function.
DROP FUNCTION addDoubles;
None