Configuring Reliability for Connected Kafka

Scenario

When the Spark Streaming application is connected to Kafka and the application is restarted, the application reads data from Kafka based on the last read topic offset and the latest offset of the current topic.

If the leader of a Kafka topic fails and the offset of the Kafka leader is greatly different from that of the Kafka follower, the Kafka follower and leader are switched over after the Kafka service is restarted. As a result, the offset of the topic decreases after the Kafka service is restarted.

To resolve the preceding problem, you can configure reliability for Kafka connected to Spark Streaming. After the reliability function of connected Kafka is enabled:

If the state function is used in the Spark Streaming application, do not enable the reliability function of connected Kafka.

Configuration

Configure the following parameter in the spark-defaults.conf file of the Spark client.

Table 1 Parameter description

Parameter

Description

Default Value

spark.streaming.Kafka.reliability

Indicates whether to enable the reliability function for Kafka connected to Spark Streaming.

  • true: The reliability function is enabled.
  • false: The reliability function is disabled.

false