The MySQL CDC source table, that is, the MySQL streaming source table, reads all historical data in the database first and then smoothly switches data read to the Binlog to ensure data integrity.
create table mySqlCdcSource ( attr_name attr_type (',' attr_name attr_type)* (','PRIMARY KEY (attr_name, ...) NOT ENFORCED) ) with ( 'connector' = 'mysql-cdc', 'hostname' = 'mysqlHostname', 'username' = 'mysqlUsername', 'password' = 'mysqlPassword', 'database-name' = 'mysqlDatabaseName', 'table-name' = 'mysqlTableName' );
Parameter |
Mandatory |
Default Value |
Data Type |
Description |
---|---|---|---|---|
connector |
Yes |
None |
String |
Connector to be used. Set this parameter to mysql-cdc. |
hostname |
Yes |
None |
String |
IP address or hostname of the MySQL database. |
username |
Yes |
None |
String |
Username of the MySQL database. |
password |
Yes |
None |
String |
Password of the MySQL database. |
database-name |
Yes |
None |
String |
Name of the database to connect. The database name supports regular expressions to read data from multiple databases. For example, flink(.)* indicates all database names starting with flink. |
table-name |
Yes |
None |
String |
Name of the table to read data from. The table name supports regular expressions to read data from multiple tables. For example, cdc_order(.)* indicates all table names starting with cdc_order. |
port |
No |
3306 |
Integer |
Port number of the MySQL database. |
server-id |
No |
A random value from 5400 to 6000 |
String |
A numeric ID of the database client, which must be globally unique in the MySQL cluster. You are advised to set a unique ID for each job in the same database. By default, a random value ranging from 5400 to 6400 is generated. |
scan.startup.mode |
No |
initial |
String |
Startup mode for consuming data.
|
server-time-zone |
No |
None |
String |
Time zone of the session used by the database. |
In this example, MySQL-CDC is used to read data from RDS for MySQL in real time and write the data to the Print result table. The procedure is as follows (MySQL 5.7.32 is used in this example):
CREATE TABLE `flink`.`cdc_order` ( `order_id` VARCHAR(32) NOT NULL, `order_channel` VARCHAR(32) NULL, `order_time` VARCHAR(32) NULL, `pay_amount` DOUBLE NULL, `real_pay` DOUBLE NULL, `pay_time` VARCHAR(32) NULL, `user_id` VARCHAR(32) NULL, `user_name` VARCHAR(32) NULL, `area_id` VARCHAR(32) NULL, PRIMARY KEY (`order_id`) ) ENGINE = InnoDB DEFAULT CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci;
create table mysqlCdcSource( order_id string, order_channel string, order_time string, pay_amount double, real_pay double, pay_time string, user_id string, user_name string, area_id STRING ) with ( 'connector' = 'mysql-cdc', 'hostname' = 'mysqlHostname', 'username' = 'mysqlUsername', 'password' = 'mysqlPassword', 'database-name' = 'mysqlDatabaseName', 'table-name' = 'mysqlTableName' ); create table printSink( order_id string, order_channel string, order_time string, pay_amount double, real_pay double, pay_time string, user_id string, user_name string, area_id STRING, primary key(order_id) not enforced ) with ( 'connector' = 'print' ); insert into printSink select * from mysqlCdcSource;
insert into cdc_order values ('202103241000000001','webShop','2021-03-24 10:00:00','100.00','100.00','2021-03-24 10:02:03','0001','Alice','330106'), ('202103241606060001','appShop','2021-03-24 16:06:06','200.00','180.00','2021-03-24 16:10:06','0001','Alice','330106'); delete from cdc_order where order_channel = 'webShop'; insert into cdc_order values('202103251202020001','miniAppShop','2021-03-25 12:02:02','60.00','60.00','2021-03-25 12:03:00','0002','Bob','330110');
The data result is as follows:
+I(202103241000000001,webShop,2021-03-2410:00:00,100.0,100.0,2021-03-2410:02:03,0001,Alice,330106) +I(202103241606060001,appShop,2021-03-2416:06:06,200.0,180.0,2021-03-2416:10:06,0001,Alice,330106) -D(202103241000000001,webShop,2021-03-2410:00:00,100.0,100.0,2021-03-2410:02:03,0001,Alice,330106) +I(202103251202020001,miniAppShop,2021-03-2512:02:02,60.0,60.0,2021-03-2512:03:00,0002,Bob,330110)
Q: How do I perform window aggregation if the MySQL CDC source table does not support definition of watermarks?
A: You can use the non-window aggregation method. That is, convert the time field into a window value, and then use GROUP BY to perform aggregation based on the window value.
For example, you can use the following script to collect statistics on the number of orders per minute (order_time indicates the order time, in the string format):
insert into printSink select DATE_FORMAT(order_time, 'yyyy-MM-dd HH:mm'), count(*) from mysqlCdcSource group by DATE_FORMAT(order_time, 'yyyy-MM-dd HH:mm');