Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
12 KiB
ALM-45436 Skew ClickHouse Table Data
Alarm Description
This alarm is generated when data skew occurs in the local table of a distributed table between ClickHouse nodes. This alarm is automatically cleared when data becomes balanced.
Data skew check method:
- If min_table_check_data_bytes is set to 0, data skew check is disabled.
- If min_table_check_data_bytes is greater than 0, data skew check is enabled.
After data skew check is enabled, if the data volume in a table is less than the min_table_check_data_bytes value, no alarm will be reported due to data skew. When the data volume is greater than the min_table_check_data_bytes value and the data volume difference between the same table on different nodes is greater than the percentage specified in min_table_data_varies_rate, data skew occurs and this alarm is reported.
Alarm Attributes
Alarm ID |
Alarm Severity |
Auto Cleared |
---|---|---|
45436 |
Minor |
Yes |
Alarm Parameters
Parameter |
Description |
---|---|
Source |
Specifies the cluster or system for which the alarm is generated. |
ServiceName |
Specifies the service for which the alarm is generated. |
RoleName |
Specifies the role for which the alarm is generated. |
Table |
Specifies the database name and table name for which the alarm is generated. |
Impact on the System
SQL execution efficiency may be lowered.
Possible Causes
The data write policy is improper, causing unbalanced data among nodes.
Handling Procedure
- Log in to FusionInsight Manager, choose O&M > Alarm > Alarms, and view the role name and the IP address for the hostname in Location.
- Log in to the node where the client is installed as the client installation user and run the following commands:
cd {Client installation path}
source bigdata_env
- Security mode (with Kerberos enabled):
clickhouse client --host IP address of the ClickHouseServer instance that reports the alarm --port 9440 --secure
- Normal mode (with Kerberos disabled):
clickhouse client --host IP address of the ClickHouseServer instance that reports the alarm --user Username --password --port 9000
- Security mode (with Kerberos enabled):
- View data distribution.
select FQDN(), database, table, sum(data_compressed_bytes) from clusterAllReplicas(Name of the logical cluster, system.parts) where database='Database name' and table='Table name' and active=1 group by (FQDN(), database, table);
- Balance data with a few clicks or migrate data based on service requirements.
- Check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 6.
Collect fault information.
- On FusionInsight Manager, choose O&M. In the navigation pane on the left, choose Log > Download.
- Expand the Service drop-down list, and select ClickHouse for the target cluster.
- Expand the Hosts drop-down list. In the Select Host dialog box that is displayed, select the abnormal host, and click OK.
- Click the edit icon in the upper right corner, and set Start Date and End Date for log collection to 1 hour ahead of and after the alarm generation time, respectively. Then, click Download.
- Contact O&M personnel and provide the collected logs.
Alarm Clearance
This alarm is automatically cleared after the fault is rectified.
Related Information
None.