UDF Classification | Description |
UDF (User Defined Scalar Function) | A custom scalar function, commonly referred to as UDF. It has a one-to-one relationship between input and output, meaning that it reads one row of data and writes out a single output value. |
UDTF (User Defined Table-valued Function) | A custom table-valued function, which is used in scenes where a single function call outputs multiple rows of data. It is also the only type of custom function that can return multiple fields. |
UDAF (User Defined Aggregation Function) | A custom aggregation function where the relationship between input and output is many-to-one. It aggregates multiple input records into a single output value and can be used in conjunction with the GROUP BY statement in SQL. |
<groupId>org.example</groupId><artifactId>hive-udf</artifactId><version>1.0-SNAPSHOT</version><packaging>jar</packaging>
<!-- https://mvnrepository.com/artifact/org.apache.hive/hive-exec --><dependency><groupId>org.apache.hive</groupId><artifactId>hive-exec</artifactId><version>3.1.3</version><exclusions><exclusion><groupId>org.pentaho</groupId><artifactId>*</artifactId></exclusion></exclusions></dependency>
package org.example;import org.apache.hadoop.hive.ql.exec.UDF;public class nvl extends UDF {public String evaluate(final String s) {if (s == null) { return null; }return s + ":HelloWorld";}}
package org.example;import org.apache.hadoop.hive.ql.exec.Description;import org.apache.hadoop.hive.ql.exec.UDFArgumentException;import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;import org.apache.hadoop.hive.ql.metadata.HiveException;import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;import org.apache.hadoop.hive.ql.udf.generic.GenericUDFUtils;import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;@Description(name = "nvl",value = "nvl(value, default_value) - Returns default value if value is null else returns value",extended = "Example: SELECT nvl(null, default_value);")public class MyUDF extends GenericUDF {private GenericUDFUtils.ReturnObjectInspectorResolver returnOIResolver;private ObjectInspector[] argumentOIs;/*** Determine the return type based on the parameter types of the function.*/public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException {argumentOIs = arguments;if(arguments.length != 2) {throw new UDFArgumentException("The operator 'NVL' accepts 2 arguments.");}returnOIResolver = new GenericUDFUtils.ReturnObjectInspectorResolver(true);if(!(returnOIResolver.update(arguments[0]) && returnOIResolver.update(arguments[1]))) {throw new UDFArgumentTypeException(2, "The 1st and 2nd args of function NLV should have the same type, "+ "but they are different: \\""+arguments[0].getTypeName()+"\\" and \\"" + arguments[1].getTypeName() + "\\"");}return returnOIResolver.get();}/*** Calculate the result. The final result’s data type will be determined based on the return type specified in the initialize method.*/public Object evaluate(DeferredObject[] arguments) throws HiveException {Object retVal = returnOIResolver.convertIfNecessary(arguments[0].get(), argumentOIs[0]);if(retVal == null) {retVal = returnOIResolver.convertIfNecessary(arguments[1].get(), argumentOIs[1]);}return retVal;}/*** Get the string to display in the explain*/public String getDisplayString(String[] children) {StringBuilder builder = new StringBuilder();builder.append("if ");builder.append(children[0]);builder.append(" is null ");builder.append("returns ");builder.append(children[1]);return builder.toString();}}
mvn clean package -DskipTests
scp ./target/hive-udf-1.0-SNAPSHOT.jar root@${master_public_ip}:/usr/local/service/hive
su hadoophadoop fs -put ./hive-udf-1.0-SNAPSHOT.jar /
hadoop fs -ls /Found 5 itemsdrwxr-xr-x - hadoop supergroup 0 2023-08-22 09:20 /datadrwxrwx--- - hadoop supergroup 0 2023-08-22 09:20 /emr-rw-r--r-- 2 hadoop supergroup 3235 2023-08-22 15:39 /hive-udf-1.0-SNAPSHOT.jardrwx-wx-wx - hadoop supergroup 0 2023-08-22 09:20 /tmpdrwxr-xr-x - hadoop supergroup 0 2023-08-22 09:20 /user
hive
hive> create function nvl as "org.example.MyUDF" using jar "hdfs:///hive-udf-1.0-SNAPSHOT.jar";
Added [/data/emr/hive/tmp/1b0f12a6-3406-4700-8227-37dec721297b_resources/hive-udf-1.0-SNAPSHOT.jar] to class pathAdded resources: [hdfs:///hive-udf-1.0-SNAPSHOT.jar]OKTime taken: 1.549 seconds
hive> select nvl("tur", "def");OKturTime taken:0.344 seconds, Fetched:1 row(s)hive> select nvl(null, "def");OKdefTime taken:0.471 seconds, Fetched:1 row(s)
Was this page helpful?