DLI provides table lifecycle management to allow you to specify the lifecycle of a table when creating the table. DLI determines whether to reclaim a table based on the table's last modification time and its lifecycle. By setting the lifecycle of a table, you can better manage a large number of tables, automatically delete data tables that are no longer used for a long time, and simplify the process of reclaiming data tables. Additionally, data restoration settings are supported to prevent data loss caused by misoperations.
If the table is not a partitioned table, the system determines whether to reclaim the table after the lifecycle time based on the last modification time of the table.
If the table is a partitioned table, the system determines whether the partition needs to be reclaimed based on the last modification time (LAST_ACCESS_TIME) of the partition. After the last partition of a partitioned table is reclaimed, the table is not deleted.
Only table-level lifecycle management is supported for partitioned tables.
Lifecycle reclamation starts at a specified time every day. Reclamation only occurs if the last modification time of the table data (LAST_ACCESS_TIME) detected when scanning complete partitions exceeds the time specified by the lifecycle.
Assume that the lifecycle of a partitioned table is one day and the last modification time of the partitioned data is 15:00 on May 20, 2023. If the table is scanned before 15:00 on May 20, 2023 (less than one day), the partitions in the table will not be reclaimed. If the last data modification time (LAST_ACCESS_TIME) of a table partition exceeds the time specified by the lifecycle during reclamation scan on May 20, 2023, the partition will be reclaimed.
CREATE TABLE table_name(name string, id int) USING parquet TBLPROPERTIES( "dli.lifecycle.days"=1 );
CREATE TABLE table_name(name string, id int) stored as parquet TBLPROPERTIES( "dli.lifecycle.days"=1 );
CREATE TABLE table_name(name string, id int) USING parquet OPTIONS (path "obs://dli-test/table_name") TBLPROPERTIES( "dli.lifecycle.days"=1, "external.table.purge"='true', "dli.lifecycle.trash.dir"='obs://dli-test/Lifecycle-Trash' );
1 2 3 4 | CREATE TABLE table_name(name string, id int) STORED AS parquet LOCATION 'obs://dli-test/table_name' TBLPROPERTIES( "dli.lifecycle.days"=1, "external.table.purge"='true', "dli.lifecycle.trash.dir"='obs://dli-test/Lifecycle-Trash' ); |
Parameter |
Mandatory |
Description |
---|---|---|
table_name |
Yes |
Name of the table whose lifecycle needs to be set |
dli.lifecycle.days |
Yes |
Lifecycle duration. The value must be a positive integer, in days. |
external.table.purge |
No |
This parameter is available only for OBS tables. Whether to clear data in the path when deleting a table or partition. The data is not cleared by default. When this parameter is set to true:
|
dli.lifecycle.trash.dir |
No |
This parameter is available only for OBS tables. When external.table.purge is set to true, the backup directory will be deleted. By default, backup data is deleted seven days later. |
1 2 3 | CREATE TABLE test_datasource_lifecycle(id int) USING parquet TBLPROPERTIES( "dli.lifecycle.days"=100); |
1 2 3 | CREATE TABLE test_hive_lifecycle(id int) stored as parquet TBLPROPERTIES( "dli.lifecycle.days"=100); |
1 2 3 4 | CREATE TABLE test_datasource_lifecycle_obs(name string, id int) USING parquet OPTIONS (path "obs://dli-test/xxx") TBLPROPERTIES( "dli.lifecycle.days"=100, "external.table.purge"='true', "dli.lifecycle.trash.dir"='obs://dli-test/Lifecycle-Trash' ); |
1 2 3 4 | CREATE TABLE test_hive_lifecycle_obs(name string, id int) STORED AS parquet LOCATION 'obs://dli-test/xxx' TBLPROPERTIES( "dli.lifecycle.days"=100, "external.table.purge"='true', "dli.lifecycle.trash.dir"='obs://dli-test/Lifecycle-Trash' ); |