The CDLService web UI provides a visualized page for users to quickly create CDL jobs and import real-time data into the data lake.
A user with the CDL management permission has been created for the cluster with Kerberos authentication enabled.
Parameter |
Description |
Example Value |
---|---|---|
Name |
Job name |
job_pgsqltokafka |
Desc |
Job description |
xxx |
Double-click the two elements to connect them and set related parameters as required.
To delete an element, select the element to be deleted and click Delete in the lower right corner of the page.
Parameter |
Description |
Example Value |
---|---|---|
Link |
Created MySQL link |
mysqllink |
Tasks Max |
Maximum number of tasks that can be created by a connector. For a connector of the database type, this parameter must be set to 1. |
1 |
Mode |
Type of the CDC event to be captured by the job. Value options are as follows:
|
insert, update, and delete |
DB Name |
MySQL database name |
cdl-test |
Schema Auto Create |
Whether to create table schemas after the job is started |
No |
Connect With Hudi |
Whether to connect to Hudi |
Yes |
DBZ Snapshot Locking Mode |
Lock mode used when a task starts to execute a snapshot. Value options are as follows:
|
none |
WhiteList |
Whitelisted tables to be captured. Separate multiple tables using commas (,). Wildcards are supported. (Optional) This parameter is displayed when you click |
testtable |
BlackList |
Whitelisted tables not to be captured. Separate multiple tables using commas (,). Wildcards are supported. (Optional) This parameter is displayed when you click |
- |
Multi Partition |
Whether to enable multi-partition mode for topics. If enabled, you need to set Topic Table Mapping and specify the number of topic partitions, and the data of a single table will be scattered in multiple partitions. (Optional) This parameter is displayed when you click NOTE:
The data receiving sequence cannot be ensured. Exercise caution when setting this parameter. |
No |
Topic Table Mapping |
Mapping between topics and tables. If configured, table data can be sent to the specified topic. If multi-partitioning is enabled, you need to set the number of partitions, which must be greater than 1. This parameter is displayed when you click |
testtable testtable_topic |
Parameter |
Description |
Example Value |
---|---|---|
Link |
Created PgSQL link. |
pgsqllink |
Tasks Max |
Maximum number of tasks that can be created by a connector. For a connector of the database type, this parameter must be set to 1. |
1 |
Mode |
Type of the CDC event to be captured by the job. The options are as follows:
|
insert, update, and delete |
dbName Alias |
Database name. |
test |
Schema |
Schema of the database to be connected to. |
public |
Slot Name |
Name of the PostgreSQL logical replication slot. The value can contain lowercase letters, digits, and underscores (_), and cannot be the same in any other job. |
test_solt_1 |
Enable FailOver Slot |
Whether to enable the failover slot function. After it is enabled, the information about the logical replication slot specified as the failover slot is synchronized from the active instance to the standby instance. In this manner, logical subscription can continue even upon an active/standby switchover, implementing the failover of the logical replication slot. |
No |
Slot Drop |
Whether to delete the slot when a task is stopped |
No |
Connect With Hudi |
Whether to connect to Hudi. |
Yes |
Use Exist Publication |
Use a created publication |
Yes |
Publication Name |
Name of a created publication This parameter is available when Use Exist Publication is set to Yes. |
test |
Start Time |
Start time for synchronizing tables |
2022/03/16 11:33:37 |
WhiteList |
Whitelisted tables to be captured. Separate multiple tables using commas (,). Wildcards are supported. (Optional) This parameter is displayed when you click |
testtable |
BlackList |
Whitelisted tables not to be captured. Separate multiple tables using commas (,). Wildcards are supported. (Optional) This parameter is displayed when you click |
- |
Start Position |
Start LSN of the data captured by a task |
- |
Start Txid |
Start TXID of the data captured by a task |
- |
Multi Partition |
Whether to enable multi-partition mode for topics. If enabled, you need to set Topic Table Mapping and specify the number of topic partitions, and the data of a single table will be scattered in multiple partitions. (Optional) This parameter is displayed when you click NOTE:
The data receiving sequence cannot be ensured. Exercise caution when setting this parameter. |
No |
Topic Table Mapping |
Mapping between topics and tables. If configured, table data can be sent to the specified topic. If multi-partitioning is enabled, you need to set the number of partitions, which must be greater than 1. This parameter is displayed when you click |
testtable testtable_topic |
Parameter |
Description |
Example Value |
---|---|---|
Link |
Link used by the Hudi app |
hudilink |
Interval |
Interval for synchronizing the Hudi table, in seconds |
10 |
Start Time |
Start time for synchronizing tables |
2022/03/16 11:40:52 |
Max Commit Number |
Maximum number of commits that can be pulled from an incremental view at a time. |
10 |
Hudi Custom Config |
Customized configuration related to Hudi. |
- |
Table Info |
Detailed configuration information about the synchronization table. Hudi and DWS must have the same table names and field types. |
{"table1":[{"source.database":"base1","source.tablename":"table1"}],"table2":[{"source.database":"base2","source.tablename":"table2"}],"table3":[{"source.database":"base3","source.tablename":"table3"}]} |
Execution Env |
Environment variable required for running the Hudi App. If no ENV is available, manually create one. |
defaultEnv |
Parameter |
Description |
Example Value |
---|---|---|
Link |
Created Kafka link |
kafkalink |
Parameter |
Description |
Example Value |
---|---|---|
Link |
Created thirdparty-kafka link |
thirdparty-kafkalink |
DB Name |
Name of the database to be connected to. |
opengaussdb |
Schema |
Schema of the database to be checked |
oprngaussschema |
Datastore Type |
Type of the upper-layer source. Value options are as follows:
|
opengauss |
Avro Schema Topic |
Schema topic used by OGG Kafka to store table schemas in JSON format. NOTE:
This parameter is available when Datastore Type is set to ogg. |
ogg_topic |
Source Topics |
Source topics can contain letters, digits, and special characters (-,_). Topics must be separated by commas (,). |
topic1 |
Tasks Max |
Maximum number of tasks that can be created by a connector. For a connector of the database type, this parameter must be set to 1. |
10 |
Tolerance |
Fault tolerance policy.
|
all |
Start Time |
Start time for synchronizing tables |
2022/03/16 14:14:50 |
Multi Partition |
Whether to enable multi-partitioning for topics. If it is enabled, you need to set Topic Table Mapping and specify the number of topic partitions, and the data of a single table will be scattered in multiple partitions. |
No |
Topic Table Mapping |
Mapping between topics and tables. If configured, table data can be sent to the specified topic. If multi-partitioning is enabled, you need to set the number of partitions, which must be greater than 1. |
testtable testtable_topic |
Parameter |
Description |
Example Value |
---|---|---|
Link |
Created Hudi link. |
hudilink |
Path |
Path for storing data. |
/cdldata |
Interval |
Spark RDD execution interval, in seconds. |
1 |
Max Rate Per Partition |
Maximum rate for reading data from each Kafka partition using the Kafka direct stream API. It is the number of records per second. 0 indicates that the rate is not limited. |
0 |
Parallelism |
Parallelism for writing data to Hudi. |
100 |
Target Hive Database |
Database of the target Hive |
default |
Configuring Hudi Table Attributes |
View for configuring attributes of the Hudi table. The value can be:
|
Visual View |
Global Configuration of Hudi Table Attributes |
Global parameters on Hudi. |
- |
Configuring the Attributes of the Hudi Table |
Configuration of the Hudi table attributes. |
- |
Configuring the Attributes of the Hudi Table: Table Name |
Hudi table name, which must be the same as the source table name. |
- |
Configuring the Attributes of the Hudi Table: Table Type Opt Key |
Hudi table type. The options are as follows:
|
MERGE_ON_READ |
Configuring the Attributes of the Hudi Table: Hudi TableName Mapping |
Hudi table name. If this parameter is not set, the name of the Hudi table is the same as that of the source table by default. |
- |
Configuring the Attributes of the Hudi Table: Hive TableName Mapping |
Mapping between Hudi tables and Hive tables. |
- |
Configuring the Attributes of the Hudi Table: Table Primarykey Mapping |
Primary key mapping of the Hudi table |
id |
Configuring the Attributes of the Hudi Table: Table Hudi Partition Type |
Mapping between the Hudi table and partition fields. If the Hudi table uses partitioned tables, you need to configure the mapping between the table name and partition fields. The value can be time or customized. |
time |
Configuring the Attributes of the Hudi Table: Custom Config |
Custom configuration |
- |
Execution Env |
Environment variable required for running the Hudi App. If no ENV is available, create one by referring to Managing ENV. |
defaultEnv |
Parameter |
Description |
Example Value |
---|---|---|
Link |
Created Kafka link |
kafkalink |
Parameter |
Description |
Example Value |
---|---|---|
Link |
Link used by Connector |
dwslink |
Query Timeout |
Timeout interval for connecting to DWS, in milliseconds |
180000 |
Batch Size |
Amount of data batch written to DWS |
50 |
Sink Task Number |
Maximum number of concurrent jobs when a table is written to DWS. |
- |
DWS Custom Config |
Custom configuration |
- |
Table 11 ClickHouse job parameters
Parameter |
Description |
Example Value |
---|---|---|
Link |
Link used by Connector |
dwslink |
Query Timeout |
Timeout interval for connecting to ClickHouse, in milliseconds |
60000 |
Batch Size |
Amount of data batch written to ClickHouse NOTE:
It is best practice to set this parameter to a large value. The recommended value range is 10000-100000. |
100000 |
Check whether the data transmission takes effect, for example, insert data into the table in the MySQL database and view the content of the file imported to Hudi.