forked from docs/doc-exports
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
9.1 KiB
9.1 KiB
ALM-45428 ClickHouse Disk I/O Exception
Description
This alarm is generated when the alarm module detects EIO or EROFS errors during ClickHouse read and write every 60 seconds.
Attribute
Alarm ID |
Alarm Severity |
Auto Clear |
---|---|---|
45428 |
Major (default) |
No |
Parameters
Name |
Meaning |
---|---|
Source |
Specifies the cluster for which the alarm is generated. |
ServiceName |
Specifies the service for which the alarm is generated. |
RoleName |
Specifies the role for which the alarm is generated. |
HostName |
Specifies the host for which the alarm is generated. |
Impact on the System
- ClickHouse fails to read and write data. The INSERT, SELECT, and CREATE operations on the local tables may be abnormal. Distributed tables are not affected.
- Services are affected, and I/Os fail.
Possible Causes
The disk is aged or has bad sectors.
Procedure
- On FusionInsight Manager, choose O&M > Alarm > Alarms > ALM-45428 ClickHouse Disk I/O Exception. Check the role name and the IP address of the host where the alarm is generated in Location.
- Use PuTTY to log in to the node for which the fault is generated as user root.
- Run the df -h command to check the mount directory and find the disk mounted to the faulty directory.
- Run the smartctl -a /dev/sd* command to check disks.
- If SMART Health Status: OK is displayed, as shown in the following figure, the disk is healthy. In this case, go to 7.
- If the number following Elements in grown defect list is not 0, as shown in the following figure, the disk may have bad sectors. If SMART Health Status: FAILURE is displayed, the disk is in the sub-health state. In this case, go to 5.
- Rectify the fault by following the instructions provided in "Hard Disk Mounted to the ClickHouse Partition Directory Is Faulty" in .
- After the fault is rectified, manually clear the alarm on FusionInsight Manager and check whether the alarm is generated again during the periodic check.
- If yes, go to 7.
- If no, no further action is required.
Collect the fault information.
- On FusionInsight Manager, choose O&M. In the navigation pane on the left, choose Log > Download.
- Expand the Service drop-down list, and select ClickHouse for the target cluster.
- Choose the corresponding host form the host list.
- Click
in the upper right corner, and set Start Date and End Date for log collection to 1 hour ahead of and after the alarm generation time, respectively. Then, click Download.
- Contact O&M personnel and provide the collected logs.
Alarm Clearing
If the alarm has no impact, manually clear the alarm.
Related Information
None
Parent topic: Alarm Reference (Applicable to MRS 3.x)