UDF Overview

IoTDB provides multiple built-in functions and user-defined functions (UDFs) to meet users' computing requirements.

UDF Types

Table 1 lists the UDF types supported by IoTDB.

Table 1 UDF types

Type

Description

User-defined timeseries generating function (UDTF)

This type of function can take multiple time series as input and generate one time series, which can contain any number of data points.

UDTF

To write a UDTF, you need to inherit the org.apache.iotdb.db.query.udf.api.UDTF class and implement at least the beforeStart method and one transform method.

Table 2 describes all interfaces that can be implemented by users.

Table 2 Interface description

Interface Definition

Description

Mandatory

void validate(UDFParameterValidator validator) throws Exception

This method is used to validate UDFParameters and is executed before beforeStart is called.

No

void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception

This is an initialization method used to call the user-defined initialization behavior before the UDTF processes the input data. Each time a user executes a UDTF query, the framework constructs a new UDF instance, and this method is called. It is called only once in the lifecycle of each UDF instance.

Yes

void transform(Row row, PointCollector collector) throws Exception

This method is called by the framework. When you choose to use the RowByRowAccessStrategy strategy in beforeStart to consume raw data, this data processing method is called. The input data is passed in by Row, and the result is output by PointCollector. You need to call the data collection method provided by collector in this method to determine the output data.

Use either this method or transform(RowWindow rowWindow, PointCollector collector).

void transform(RowWindow rowWindow, PointCollector collector) throws Exception

This method is called by the framework. When you choose to use the SlidingSizeWindowAccessStrategy or SlidingTimeWindowAccessStrategy strategy in beforeStart to consume raw data, this data processing method will be called. The input data is passed in by RowWindow, and the result is output by PointCollector. You need to call the data collection method provided by collector in this method to determine the output data.

Use either this method or transform(Row row, PointCollector collector).

void terminate(PointCollector collector) throws Exception

This method is called by the framework. This method is called after all transform calls have been executed and before beforeDestory is called. In a single UDF query, this method will be called only once. You need to call the data collection method provided by collector in this method to determine the output data.

No

void beforeDestroy()

This method is called by the framework after the last input data is processed, and will be called only once in the lifecycle of each UDF instance.

No

Calling sequence of each method:

  1. void validate(UDFParameterValidator validator) throws Exception
  2. void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception
  3. void transform(Row row, PointCollector collector) throws Exception or void transform(RowWindow rowWindow, PointCollector collector) throws Exception
  4. void terminate(PointCollector collector) throws Exception
  5. void beforeDestroy()

Each time the framework executes a UDTF query, a new UDF instance will be constructed. When the query ends, this UDF instance will be destroyed. Therefore, the internal data of the instances in different UDTF queries (even in the same SQL statement) is isolated. You can maintain some state data in the UDTF without considering the impact of concurrency and other factors.

Interface usage:

UDFParameters

UDFParameters is used to parse the UDF parameters in SQL statements (the part in the parentheses following the UDF name in the SQL statements). The parameters include two parts. The first part is the path and its data type of the time series to be processed by the UDF. The second part is the key-value pair attributes for customization.

Example:

SELECT UDF(s1, s2, 'key1'='iotdb', 'key2'='123.45') FROM root.sg.d;

Usage:

void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception {
  // parameters
 for (PartialPath path : parameters.getPaths()) {
    TSDataType dataType = parameters.getDataType(path);
   // do something
  }
  String stringValue = parameters.getString("key1"); // iotdb
  Float floatValue = parameters.getFloat("key2"); // 123.45
  Double doubleValue = parameters.getDouble("key3"); // null
  int intValue = parameters.getIntOrDefault("key4", 678); // 678
  // do something

  // configurations
  // ...
}

UDTFConfigurations

You can use UDTFConfigurations to specify the strategy used by the UDF to access raw data and the type of the output time series.

Usage:

void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception {
  // parameters
  // ...

  // configurations
  configurations
    .setAccessStrategy(new RowByRowAccessStrategy())
    .setOutputDataType(TSDataType.INT32);
}

The setAccessStrategy method is used to set the strategy used by the UDF to access raw data. The setOutputDataType method is used to set the data type of the output time series.