The system checks the checkpointing timeout of Flink jobs every 30 seconds. This alarm is generated if the checkpointing timeout of a Flink job is longer than the threshold (600 seconds by default). This alarm is cleared when the checkpointing timeout of a job is less than or equal to the threshold.
Alarm ID |
Alarm Severity |
Auto Clear |
---|---|---|
45639 |
Minor |
Yes |
Name |
Meaning |
---|---|
Source |
Specifies the cluster for which the alarm is generated. |
ServiceName |
Specifies the service for which the alarm is generated. |
JobName |
Specifies the job for which the alarm is generated. |
UserName |
Specifies the username for which the alarm is generated. |
This alarm has no impact on the system.
The job may be in the sub-healthy state. The possible causes are as follows:
You can also log in to Manager as a user who has the FlinkServer management permission. Choose Cluster > Services > Flink, and click the link next to Flink WebUI. On the displayed Flink web UI, click Job Management, click More in the Operation column, and select Job Monitoring to view TaskManager logs.
If logs are unavailable on the Yarn page, download logs from HDFS.
This alarm is cleared when the checkpointing timeout a Flink job is less than or equal to the threshold.
None