DataGen is used to generate random data for debugging and testing.
None
create table dataGenSource( attr_name attr_type (',' attr_name attr_type)* (',' WATERMARK FOR rowtime_column_name AS watermark-strategy_expression) ) with ( 'connector' = 'datagen' );
Parameter |
Mandatory |
Default Value |
Data Type |
Description |
---|---|---|---|---|
connector |
Yes |
None |
String |
Connector to be used. Set this parameter to datagen. |
rows-per-second |
No |
10000 |
Long |
Number of rows generated per second, which is used to control the emit rate. |
fields.#.kind |
No |
random |
String |
Generator of the # field. The # field must be an actual field in the DataGen table. Replace # with the corresponding field name. The meanings of the # field for other parameters are the same. The value can be sequence or random.
|
fields.#.min |
No |
Minimum value of the field type specified by # |
Field type specified by # |
This parameter is valid only when fields.#.kind is set to random. Minimum value of the random generator. It applies only to numeric field types specified by #. |
fields.#.max |
No |
Maximum value of the field type specified by # |
Field type specified by # |
This parameter is valid only when fields.#.kind is set to random. Maximum value of the random generator. It applies only to numeric field types specified by #. |
fields.#.length |
No |
100 |
Integer |
This parameter is valid only when fields.#.kind is set to random. Length of the characters generated by the random generator. It applies only to char, varchar, and string types specified by #. |
fields.#.start |
No |
None |
Field type specified by # |
This parameter is valid only when fields.#.kind is set to sequence. Start value of a sequence generator. |
fields.#.end |
No |
None |
Field type specified by # |
This parameter is valid only when fields.#.kind is set to sequence. End value of a sequence generator. |
Create a Flink OpenSource SQL job. Run the following script to generate random data through the DataGen table and output the data to the Print result table.
When you create a job, set Flink Version to 1.12 on the Running Parameters tab. Select Save Job Log, and specify the OBS bucket for saving job logs.
create table dataGenSOurce( user_id string, amount int ) with ( 'connector' = 'datagen', 'rows-per-second' = '1', --Generates a piece of data per second. 'fields.user_id.kind' = 'random', --Specifies a random generator for the user_id field. 'fields.user_id.length' = '3' --Limits the length of user_id to 3. ); create table printSink( user_id string, amount int ) with ( 'connector' = 'print' ); insert into printSink select * from dataGenSOurce;
After the job is submitted, the job status changes to Running. You can perform the following operations to view the output result: