CDL Usage Instructions

CDL is a simple and efficient real-time data integration service. It captures data change events from various OLTP databases and pushes them to Kafka. The Sink Connector consumes data in topics and imports the data to the software applications of big data ecosystems. In this way, data is imported to the data lake in real time.

The CDL service contains two roles: CDLConnector and CDLService. CDLConnector is the instance for executing a data capture job, and CDLService is the instance for managing and creating a job.

You can create data synchronization and comparison tasks on the CDLService WebUI.

Data synchronization task

Data Types and Mapping Supported by CDL Synchronization Tasks

This section describes the data types supported by CDL synchronization tasks and the mapping between data types of the source database and Spark data types.

Table 2 Mapping between PostgreSQL and Spark data types

PostgreSQL Data Type

Spark (Hudi) Data Type

int2

int

int4

int

int8

bigint

numeric(p, s)

decimal[p,s]

bool

boolean

char

string

varchar

string

text

string

timestamptz

timestamp

timestamp

timestamp

date

date

json, jsonb

string

float4

float

float8

double

Table 3 Mapping between MySQL and Spark data types

MySQL Data Type

Spark (Hudi) Data Type

int

int

integer

int

bigint

bigint

double

double

decimal[p,s]

decimal[p,s]

varchar

string

char

string

text

string

timestamp

timestamp

datetime

timestamp

date

date

json

string

float

double

Table 4 Mapping between Ogg and Spark data types

Oracle Data Type

Spark (Hudi) Data Type

NUMBER(3), NUMBER(5)

bigint

INTEGER

decimal

NUMBER(20)

decimal

NUMBER

decimal

BINARY_DOUBLE

double

CHAR

string

VARCHAR

string

TIMESTAMP, DATETIME

timestamp

timestamp with time zone

timestamp

DATE

timestamp

Table 5 Mapping between Spark (Hudi) and DWS data types

Spark (Hudi) Data Type

DWS Data Type

int

int

long

bigint

float

float

double

double

decimal[p,s]

decimal[p,s]

boolean

boolean

string

varchar

date

date

timestamp

timestamp

Table 6 Mapping between Spark (Hudi) and ClickHouse data types

Spark (Hudi) Data Type

ClickHouse Data Type

int

Int32

long

Int64 (bigint)

float

Float32 (float)

double

Float64 (double)

decimal[p,s]

Decimal(P,S)

boolean

bool

string

String (LONGTEXT, MEDIUMTEXT, TINYTEXT, TEXT, LONGBLOB, MEDIUMBLOB, TINYBLOB, BLOB, VARCHAR, CHAR)

date

Date

timestamp

DateTime

Data comparison task

Data comparison checks the consistency between data in the source database and that in the target Hive. If the data is inconsistent, CDL can attempt to repair the inconsistent data. For detail, see Creating a CDL Data Comparison Job.