Job Pipeline
A pipeline consists of multiple Flink jobs connected through TCP. Upstream jobs can send data to downstream jobs. The flow diagram about data transmission is called a job pipeline, as shown in Figure 1.
Job Pipeline Principles
In a pipeline, upstream jobs and downstream jobs communicate with each other through Netty. The Sink operator of the upstream job works as a server and the Source operator of the downstream job works as a client. The Sink operator of the upstream job is called NettySink, and the Source operator of the downstream job is called NettySource.
NettySink functions as the server of Netty. In NettySink, NettyServer achieves the function of a server. NettySource functions as the client of Netty. In NettySource, NettyClient achieves the function of a client.
The job that sends data to downstream jobs through NettySink is called a publisher.
The job that receives data from upstream jobs through NettySource is called a subscriber.
RegisterServer is the third-party memory that stores the IP address, port number, and concurrency information about NettyServer.
Job Pipeline Functions
NettySink consists of the following major modules:
NettySink inherits RichParallelSinkFunction and attributes of Sink operators. The RichParallelSinkFunction API implements following functions:
Following information can be obtained using the attribute of RichParallelSinkFunction:
RegisterServerHandler interacts with the component of RegisterServer and defines following APIs:
Namespace |---Topic-1 |---parallel-1 |---parallel-2 |.... |---parallel-n |---Topic-2 |---parallel-1 |---parallel-2 |.... |---parallel-m |...
nettyconnector.registerserver.topic.storage: /flink/nettyconnector
NettyServer is the core of the NettySink operator, whose main function is to create a NettyServer and receive connection requests from NettyClient. Use NettyServerHandler to send data received from upstream operators of a same job. The port number and subnet of NettyServer needs to be configured in the flink-conf.yaml file.
nettyconnector.sinkserver.port.range: 28444-28943
nettyconnector.sinkserver.subnet: 10.162.222.123/24
The nettyconnector.sinkserver.subnet parameter is set to the subnet (service IP address) of the Flink client by default. If the client and TaskManager are not in the same subnet, an error may occur. Therefore, you need to manually set this parameter to the subnet (service IP address) of TaskManager.
The handler enables the interaction between NettySink and subscribers. After NettySink receives messages, the handler sends these messages out. To ensure data transmission security, this channel is encrypted using SSL. The nettyconnector.ssl.enabled configures whether to enable SSL encryption. The SSL encryption is enabled only when nettyconnector.ssl.enabled is set to true.
NettySource consists of the following major modules:
NettySource inherits RichParallelSinkFunction and attributes of Source operators. The RichParallelSourceFunction API implements following functions:
Following information can be obtained using the attribute of RichParallelSourceFunction:
When the NettySource operator enters the running stage, the NettyClient status is monitored. Once abnormality occurs, NettyClient is restarted and reconnected to NettyServer, preventing data confusion.
RegisterServerHandler of NettySource has similar function as the RegisterServerHandler of NettySink. It obtains the IP address, port number, and information of concurrent operators of each subscribed job obtained in the NettySource operator.
NettyClient establishes a connection with NettyServer and uses NettyClientHandler to receive data. Each NettySource operator must have a unique name (specified by the user). NettyServer determines whether each client comes from different NettySources based on unique names. When a connection is established between NettyClient and NettyServer, NettyClient is registered with NettyServer and the NettySource name of NettyClient is transferred to NettyServer.
The NettyClientHandler enables the interaction with publishers and other operators of the job. When messages are received, NettyClientHandler transfers these messages to the job. To ensure secure data transmission, SSL encryption is enabled for the communication with NettySink. The SSL encryption is enabled only when SSL is enabled and nettyconnector.ssl.enabled is set to true.