For MRS 3.3.0 and its later versions:
The collection period is 3 seconds, and the detection period is 30 or 300 seconds. This alarm is automatically cleared when neither of the preceding conditions is met for three consecutive detection periods (30 or 300 seconds).
For versions earlier than MRS 3.3.0:
This alarm is automatically cleared when the preceding conditions have not been met for 15 minutes.
The svctm value can be obtained as follows:
Run the iostat -x -t command in the OS.
svctm = (tot_ticks_new - tot_ticks_old)/(rd_ios_new + wr_ios_new - rd_ios_old - wr_ios_old)
When the detection period is 30 seconds, if rd_ios_new + wr_ios_new - rd_ios_old - wr_ios_old = 0, then svctm = 0.
When the detection period is 300 seconds and rd_ios_new + wr_ios_new - rd_ios_old - wr_ios_old = 0, if tot_ticks_new - tot_ticks_old = 0, then svctm = 0; otherwise, the value of svctm is infinite.
If rd_ios_new + wr_ios_new - rd_ios_old - wr_ios_old is 0, then svctm is 0.
The parameters can be obtained as follows:
The system runs the cat /proc/diskstats command every 3 seconds to collect data. For example:
In these two commands:
In the data collected for the first time, the number in the fourth column is the rd_ios_old value, the number in the eighth column is the wr_ios_old value, and the number in the thirteenth column is the tot_ticks_old value.
In the data collected for the second time, the number in the fourth column is the rd_ios_new value, the number in the eighth column is the wr_ios_new value, and the number in the thirteenth column is the tot_ticks_new value.
In this case, the value of svctm is as follows:
(19571460 - 19569526)/(1101553 + 28747977 - 1101553 - 28744856) = 0.6197
Alarm ID |
Alarm Severity |
Auto Clear |
---|---|---|
12033 |
|
Yes |
Name |
Meaning |
---|---|
Source |
Specifies the cluster or system for which the alarm is generated. |
ServiceName |
Specifies the service for which the alarm is generated. |
RoleName |
Specifies the role for which the alarm is generated. |
HostName |
Specifies the host for which the alarm is generated. |
DiskName |
Specifies the disk for which the alarm is generated. |
Service performance deteriorates, service processing capabilities become poor, and services may be unavailable.
The disk is aged or has bad sectors.
Check the disk status.
Example:
lsscsi | grep "/dev/sda"
In the command output, if ATA, SATA, or SAS is displayed in the third line, the disk has not been organized into a RAID group. If other information is displayed, RAID has been set up.
Example:
smartctl -i /dev/sda
In the command output, if "SMART support is: Enabled" is displayed, the hardware supports SMART. If "Device does not support SMART" or other information is displayed, the hardware does not support SMART.
Example:
smartctl -H --all /dev/sda
Check the value of SMART overall-health self-assessment test result in the command output. If the value is FAILED, the disk is faulty and needs to be replaced. If the value is PASSED, check the value of Reallocated_Sector_Ct or Elements in grown defect list. If the value is greater than 100, the disk is faulty and needs to be replaced.
Example:
smartctl -l error -H /dev/sda
Check the Command/Feature_name column in the command output. If READ SECTOR(S) or WRITE SECTOR(S) is displayed, the disk has bad sectors. If other errors occur, the disk circuit board is faulty. Both errors indicate that the disk is abnormal and needs to be replaced.
If "No Errors Logged" is displayed, no error log exists. You can trigger the disk SMART self-check.
Example:
smartctl -t long /dev/sda
For example, run the following commands in sequence:
smartctl -d sat+megaraid,0 -H --all /dev/sda
smartctl -d sat+megaraid,1 -H --all /dev/sda
smartctl -d sat+megaraid,2 -H --all /dev/sda
...
Try the command combinations of different disk types and slot information. If "SMART support is: Enabled" is displayed in the command output, the disk supports SMART. Record the parameters of the disk type and slot information when a command is successfully executed. If "SMART support is: Enabled" is not displayed in the command output, the disk does not support SMART.
Example:
smartctl -d sat+megaraid,2 -H --all /dev/sda
Check the value of SMART overall-health self-assessment test result in the command output. If the value is FAILED, the disk is faulty and needs to be replaced. If the value is PASSED, check the value of Reallocated_Sector_Ct or Elements in grown defect list. If the value is greater than 100, the disk is faulty and needs to be replaced.
Example:
smartctl -d sat+megaraid,2 -l error -H /dev/sda
Check the Command/Feature_name column in the command output. If READ SECTOR(S) or WRITE SECTOR(S) is displayed, the disk has bad sectors. If other errors occur, the disk circuit board is faulty. Both errors indicate that the disk is abnormal and needs to be replaced.
If "No Errors Logged" is displayed, no error log exists. You can trigger the disk SMART self-check.
Example:
smartctl -d sat+megaraid,2 -t long /dev/sda
For example, LSI is a MegaCLI tool.
If the alarm is reported for three times, replace the disk.
Replace the disk.
Collect the fault information.
This alarm is automatically cleared after the fault is rectified.
None