UDF 分类 | 描述 |
UDF(User Defined Scalar Function) | 自定义标量函数,通常称为 UDF 。其输入与输出是一对一的关系,即读入一行数据,写出一条输出值。 |
UDTF(User Defined Table-valued Function) | 自定义表值函数,用来解决一次函数调用输出多行数据场景的,也是唯一一个可以返回多个字段的自定义函数。 |
UDAF(User Defined Aggregation Function) | 自定义聚合函数,其输入与输出是多对一的关系,即将多条输入记录聚合成一条输出值,可以与 SQL 中的 Group By 语句联合使用。 |
<groupId>org.example</groupId><artifactId>hive-udf</artifactId><version>1.0-SNAPSHOT</version><packaging>jar</packaging>
<!-- https://mvnrepository.com/artifact/org.apache.hive/hive-exec --><dependency><groupId>org.apache.hive</groupId><artifactId>hive-exec</artifactId><version>3.1.3</version><exclusions><exclusion><groupId>org.pentaho</groupId><artifactId>*</artifactId></exclusion></exclusions></dependency>
package org.example;import org.apache.hadoop.hive.ql.exec.UDF;public class nvl extends UDF {public String evaluate(final String s) {if (s == null) { return null; }return s + ":HelloWorld";}}
package org.example;import org.apache.hadoop.hive.ql.exec.Description;import org.apache.hadoop.hive.ql.exec.UDFArgumentException;import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;import org.apache.hadoop.hive.ql.metadata.HiveException;import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;import org.apache.hadoop.hive.ql.udf.generic.GenericUDFUtils;import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;@Description(name = "nvl",value = "nvl(value, default_value) - Returns default value if value is null else returns value",extended = "Example: SELECT nvl(null, default_value);")public class MyUDF extends GenericUDF {private GenericUDFUtils.ReturnObjectInspectorResolver returnOIResolver;private ObjectInspector[] argumentOIs;/*** 根据函数的入参类型确定出参类型*/public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException {argumentOIs = arguments;if(arguments.length != 2) {throw new UDFArgumentException("The operator 'NVL' accepts 2 arguments.");}returnOIResolver = new GenericUDFUtils.ReturnObjectInspectorResolver(true);if(!(returnOIResolver.update(arguments[0]) && returnOIResolver.update(arguments[1]))) {throw new UDFArgumentTypeException(2, "The 1st and 2nd args of function NLV should have the same type, "+ "but they are different: \\""+arguments[0].getTypeName()+"\\" and \\"" + arguments[1].getTypeName() + "\\"");}return returnOIResolver.get();}/*** 计算结果,最后结果的数据类型会根据initialize方法的返回值类型确定函数的返回值类型*/public Object evaluate(DeferredObject[] arguments) throws HiveException {Object retVal = returnOIResolver.convertIfNecessary(arguments[0].get(), argumentOIs[0]);if(retVal == null) {retVal = returnOIResolver.convertIfNecessary(arguments[1].get(), argumentOIs[1]);}return retVal;}/*** 获取要在explain中显示的字符串*/public String getDisplayString(String[] children) {StringBuilder builder = new StringBuilder();builder.append("if ");builder.append(children[0]);builder.append(" is null ");builder.append("returns ");builder.append(children[1]);return builder.toString();}}
mvn clean package -DskipTests
scp ./target/hive-udf-1.0-SNAPSHOT.jar root@${master_public_ip}:/usr/local/service/hive
su hadoophadoop fs -put ./hive-udf-1.0-SNAPSHOT.jar /
hadoop fs -ls /Found 5 itemsdrwxr-xr-x - hadoop supergroup 0 2023-08-22 09:20 /datadrwxrwx--- - hadoop supergroup 0 2023-08-22 09:20 /emr-rw-r--r-- 2 hadoop supergroup 3235 2023-08-22 15:39 /hive-udf-1.0-SNAPSHOT.jardrwx-wx-wx - hadoop supergroup 0 2023-08-22 09:20 /tmpdrwxr-xr-x - hadoop supergroup 0 2023-08-22 09:20 /user
hive
hive> create function nvl as "org.example.MyUDF" using jar "hdfs:///hive-udf-1.0-SNAPSHOT.jar";
Added [/data/emr/hive/tmp/1b0f12a6-3406-4700-8227-37dec721297b_resources/hive-udf-1.0-SNAPSHOT.jar] to class pathAdded resources: [hdfs:///hive-udf-1.0-SNAPSHOT.jar]OKTime taken: 1.549 seconds
hive> select nvl("tur", "def");OKturTime taken: 0.344 seconds, Fetched: 1 row(s)hive> select nvl(null, "def");OKdefTime taken: 0.471 seconds, Fetched: 1 row(s)
本页内容是否解决了您的问题?