Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-authored-by: Lu, Huayi <luhuayi@huawei.com> Co-committed-by: Lu, Huayi <luhuayi@huawei.com>
165 KiB
Monitoring Metrics
You can check the status and available resources of a cluster and learn about its real-time resource consumption through the GaussDB(DWS) monitoring items.
Table 1 describes GaussDB(DWS) monitoring metrics.
Monitored Object |
Metric |
Description |
Value Range |
Monitoring Period (Raw Data) |
---|---|---|---|---|
Cluster Overview |
Cluster Status |
Status of a cluster. |
Normal/Abnormal/Degraded |
30s |
Nodes |
Number of available nodes and total number of nodes (Available/Total) in a cluster. |
≥ 0 |
60s |
|
CNs |
Number of CNs in a cluster. |
≥ 0 |
60s |
|
Databases |
Number of created databases in a cluster. |
≥ 0 |
90s |
|
Resource Consumption |
CPU Usage |
Average real-time CPU usage of all nodes in a cluster. |
0% to 100% |
30s |
Memory Usage |
Average real-time memory usage of all nodes in a cluster. |
0% to 100% |
30s |
|
Disk Usage |
Average real-time disk usage of all nodes in a cluster. |
0% to 100% |
30s |
|
Disk I/O |
Average real-time disk I/O of all nodes in a cluster. |
≥ 0 KB/s |
30s |
|
Network I/O |
Average real-time network I/O of all NICs in a cluster. |
≥ 0 KB/s |
30s |
|
Top 5 Time-Consuming Queries |
Query ID |
ID of a query, which is automatically generated by the database. |
≥ 0 |
180s |
SQL Statement |
Query statement executed by a user. |
Character string |
180s |
|
Execution Time |
Execution time of a query statement (unit: ms). |
≥ 0 ms |
180s |
|
Top 5 Queries with Most Data Written to Disk |
Query ID |
ID of a query, which is automatically generated by the database. |
≥ 0 |
180s |
SQL Statement |
Query statement executed by a user. |
Character string |
180s |
|
Data Written to Disk |
Data to be written to disks after a user runs a statement (unit: MB). |
≥ 0 MB |
180s |
|
Cluster Resource Metrics |
CPU Usage |
Average CPU usage and skew ratio of all nodes in the cluster. The formula for calculating the skew is (max-avg)/max. |
0% to 100% |
30s |
Memory Usage |
Average memory usage and skew ratio of all nodes in the cluster. The formula for calculating the skew is (max-avg)/max. |
0% to 100% |
30s |
|
Disk Usage |
Average usage and skew ratio of all disks in the cluster. The formula for calculating the skew is (max-avg)/max. |
0% to 100% |
30s |
|
Disk I/O Usage |
Average I/O usage and skew rate of all disks in the cluster. The formula for calculating the skew is (max-avg)/max. |
0% to 100% |
30s |
|
Network I/O Usage |
Average I/O usage and skew rate of all NICs in the cluster. The formula for calculating the skew is (max-avg)/max. |
0% to 100% |
30s |
|
Key Database Metrics |
Cluster Status |
Cluster running status. |
Normal/Degraded/Abnormal |
30s |
Cluster Abnormal CNs |
Number of abnormal CNs in the cluster |
≥ 0 |
60s |
|
Cluster Read-only |
Whether the cluster is in the read-only state |
Yes/No |
30s |
|
Concurrent Sessions |
Number of concurrent sessions in a cluster within a specified period. |
≥ 0 |
30s |
|
Concurrent Queries |
Number of concurrent queries in a cluster within a specified period. |
≥ 0 |
30s |
|
Node Monitoring-Overview |
Node Name |
Name of a node in a cluster. |
Character string |
30s |
CPU Usage |
CPU usage of a host. |
0% to 100% |
30s |
|
Memory Usage |
Memory usage of a host. |
0% to 100% |
30s |
|
Average Disk Usage (%) |
Disk usage of a host. |
0% to 100% |
30s |
|
IP Address |
Service IP address of a host. |
Character string |
30s |
|
Disk I/O |
Disk I/O of a host (unit: KB/s) |
≥ 0 KB/s |
30s |
|
TCP Protocol Stack Retransmission Rate |
Retransmission rate of TCP packets per unit time. |
0% to 100% |
30s |
|
Status |
Running status of a host |
Online/Offline |
30s |
|
Node Monitoring-Disks |
Node Name |
Name of a node in a cluster. |
Character string |
30s |
Disk Name |
Name of a disk on a host. |
Character string |
30s |
|
Disk Capacity |
Disk capacity of the host (unit: GB) |
≥ 0 GB |
30s |
|
Disk Usage |
Disk usage of a host. |
0% to 100% |
30s |
|
Disk Read Rate |
Disk read rate of the host (unit: KB/s) |
≥ 0 KB/s |
30s |
|
Disk Write Rate |
Disk write rate of the host (unit: KB/s) |
≥ 0 KB/s |
30s |
|
I/O Wait Time (await, ms) |
Average waiting time for each I/O request (unit: ms) |
≥ 0 ms |
30s |
|
I/O Service Time (svctm, ms) |
Average processing time for each I/O request (unit: ms) |
≥ 0 ms |
30s |
|
I/O Utility (util, %) |
Disk I/O usage of a host. |
0% to 100% |
30s |
|
Node Monitoring-Network |
Node Name |
Name of a node in a cluster. |
Character string |
30s |
NIC Name |
Name of the NIC on a host. |
Character string |
30s |
|
NIC Status |
NIC status. |
up/down |
30s |
|
NIC Speed |
Working rate of a NIC, in Mbit/s. |
≥ 0 |
30s |
|
Received Packets |
Number of received packets of a NIC. |
≥ 0 |
30s |
|
Sent Packets |
Number of sent packets of a NIC. |
≥ 0 |
30s |
|
Lost Packets Received |
Number of received lost packets of a NIC. |
≥ 0 |
30s |
|
Receive Rate |
Number of bytes received by a NIC per unit of time (KB/s). |
≥ 0 KB/s |
30s |
|
Transmit Rate |
Number of bytes sent by a NIC per unit of time (unit: KB/s) |
≥ 0 KB/s |
30s |
|
Database Monitoring |
Database Name |
Name of the database created by a user in a cluster. |
Character string |
60s |
Usage |
Used capacity of the current database (unit: GB). |
≥ 0 GB |
86400s |
|
Users |
Number of users in the current database. |
≥ 0 |
30s |
|
Sessions |
Number of sessions in the current database. |
≥ 0 |
30s |
|
Applications |
Number of applications in the current database. |
≥ 0 |
30s |
|
Queries |
Number of active queries in the current database. |
≥ 0 |
30s |
|
Scanning Rows |
Number of rows returned by the full table scan query in the current database. |
≥ 0 |
60s |
|
Index Query Rows |
Number of rows returned by the index query in the current database. |
≥ 0 |
60s |
|
Inserted Rows |
Number of rows inserted in the current database. |
≥ 0 |
60s |
|
Updated Rows |
Number of rows updated in the current database. |
≥ 0 |
60s |
|
Deleted Rows |
Number of rows deleted from the current database. |
≥ 0 |
60s |
|
Executed Transactions |
Number of transaction executions on the current database. |
≥ 0 |
60s |
|
Transaction Rollbacks |
Number of transactions in the current database that have been rolled back. |
≥ 0 |
60s |
|
Deadlocks |
Number of deadlocks detected in the current database. |
≥ 0 |
60s |
|
Physical Read Times |
Number of disk blocks read in the current database. |
≥ 0 |
60s |
|
Logical Read Times |
Number of times that disk blocks are found in the cache. |
≥ 0 |
60s |
|
Temporary Files |
Number of temporary files created in the current database. |
≥ 0 |
60s |
|
Temporary File Capacity |
Size of temporary files written by the current database, in GB. |
≥ 0 |
60s |
|
Performance Monitoring |
Cluster CPU Usage |
Historical trend of the average CPU usage and skew of all nodes in the cluster. The formula for calculating the skew is (max-avg)/max. |
0% to 100% |
30s |
Cluster Memory Usage |
Historical trend of the average memory usage and skew of all nodes in the cluster. The formula for calculating the skew is (max-avg)/max. |
0% to 100% |
30s |
|
Cluster Disk Usage |
Historical trend of the average disk usage and skew of all nodes in the cluster. The formula for calculating the skew is (max-avg)/max. |
0% to 100% |
30s |
|
Cluster Disk I/O |
Historical trend of the average disk I/O and skew of all disks in the cluster. The formula for calculating the skew is (max-avg)/max. |
0% to 100% |
30s |
|
Cluster Network I/O |
Historical trend of the average network I/O value and skew of all NICs in the cluster. The formula for calculating the skew is (max-avg)/max. |
0% to 100% |
30s |
|
Cluster Status |
Historical trend of the cluster status. |
Normal/Abnormal/Degraded |
30s |
|
Cluster Read-only |
Historical trend of the cluster read-only status change trend. |
Yes/No |
30s |
|
Cluster Abnormal CNs |
Historical trend of the number of abnormal CNs in the cluster. |
≥ 0 |
60s |
|
Cluster Abnormal DNs |
Historical trend of the number of abnormal DNs in the cluster. |
≥ 0 |
60s |
|
Cluster CPU Usage of DNs |
Historical trends of the average CPU usage and skew ratio changes of all DNs in the cluster. The formula for calculating the skew ratio is (max – avg)/max. |
0% to 100% |
60s |
|
Cluster Sessions |
Historical trend of the number of sessions in a cluster. |
≥ 0 |
30s |
|
Cluster Queries |
Historical change trend of the number of queries in the cluster. |
≥ 0 |
30s |
|
Cluster Deadlocks |
Historical trend of the number of deadlocks in a cluster. |
≥ 0 |
60s |
|
Cluster TPS |
Average number of transactions per second of all databases in a cluster. Formula: (delta_xact_commit + delta_xact_rollback)/current_collect_rate |
≥0 |
60s |
|
Cluster QPS |
Average number of concurrent requests per second of all databases in a cluster. Formula: delta_query_count/current_collect_rate |
≥ 0 |
60s |
|
Database Sessions |
Historical trend of the number of sessions on a single database in a cluster. |
≥ 0 |
30s |
|
Database Queries |
Historical trend of the number of queries on a single database in a cluster. |
≥ 0 |
30s |
|
Database Submitted Transactions |
Historical trend of the number of transactions submitted on a single database in a cluster. |
≥ 0 |
60s |
|
Database Rollback Transactions |
Historical trend of the number of rollback transactions on a single database in a cluster. |
≥ 0 |
60s |
|
Cluster Scanning Rows |
Historical trend of the number of rows returned by a full table scan on a single database in a cluster. |
≥ 0 |
60s |
|
Database Index Query Rows |
Historical trend of the number of rows returned by an index query in a single database of a cluster. |
≥ 0 |
60s |
|
Database Inserted Rows |
Historical trend of the number of rows inserted into a single database in a cluster. |
≥ 0 |
60s |
|
Database Updated Rows |
Historical trend of the number of updated rows in a single database in a cluster. |
≥ 0 |
60s |
|
Database Deleted Rows |
Historical trend of the number of deleted rows in a single database in a cluster. |
≥ 0 |
60s |
|
Database Capacity |
Historical trend of the capacity in a single database in a cluster. |
≥ 0 |
86400s |
|
Database Length of the Request Waiting Queue |
Historical trend of the waiting queue length on a single database in a cluster. |
≥ 0 |
30s |
|
Database TPS |
Number of transactions per second of each database in a cluster. Formula: (delta_xact_commit + delta_xact_rollback)/current_collect_rate. |
≥0 |
60s |
|
Session Monitoring |
Session ID |
ID of the current session (query thread ID). |
Character string |
30s |
User Name |
Name of the user who executes the current session. |
Character string |
30s |
|
Database Name |
Name of the database connected to the current session. |
Character string |
30s |
|
Session Duration |
Duration of the current session (unit: ms). |
≥ 0 ms |
30s |
|
Application Name |
Name of the application that creates the current session. |
Character string |
30s |
|
Queries |
Number of SQL statements executed in the current session. |
≥ 0 |
30s |
|
Latest Query Duration |
Duration for executing the previous SQL statement in the current session. |
≥ 0 ms |
30s |
|
Client IP Address |
IP address of the client that initiates the current session. |
Character string |
30s |
|
Connected CN |
Connected CN of the current session. |
Character string |
30s |
|
Session Status |
Execution status of the current session. |
Running/Idle/Retry |
30s |
|
Query Monitoring-Real-Time |
Query ID |
Query ID of a current query statement, which is a unique identifier allocated by the kernel to each query statement. |
Character string |
30s |
User Name |
Name of the user who submits the current query statement. |
Character string |
30s |
|
Database Name |
Name of the database corresponding to the current query statement. |
Character string |
30s |
|
Application Name |
Name of the application corresponding to the current query statement. |
Character string |
30s |
|
Workload Queue |
Name of the workload queue that carries the current query statement. |
Character string |
30s |
|
Submitted |
Timestamp when the current query statement is submitted. |
Character string |
30s |
|
Blocking Time |
Waiting time before the current query statement is executed, in ms. |
≥ 0 |
30s |
|
Execution Time |
Execution time of the current query statement, in ms. |
≥ 0 |
30s |
|
CPU Time |
Total CPU time spent by the current query statement on all DNs, in ms. |
≥ 0 |
30s |
|
CPU Time Skew |
CPU time skew of the current query statement among all DNs. |
0% to 100% |
30s |
|
Average Written Data |
Average data size of the current query statement flushed to disks on all DNs, in MB. |
≥ 0 |
30s |
|
Statement |
Query statement that is being executed. |
Character string |
30s |
|
Connected CN |
Name of the CN that submits the current query statement. |
Character string |
30s |
|
Client IP Address |
IP address of the client that submits the current query statement. |
Character string |
30s |
|
Lane |
Lane where the current query statement is located. |
Fast Lane/Slow Lane |
30s |
|
Query Status |
Query status of the statement that is being executed. |
Character string |
30s |
|
Session ID |
Session ID of the current query statement, which is a unique identifier allocated by the kernel to each client connection. |
Character string |
30s |
|
Queuing Status |
Status of the current query execution in the database, indicating whether the query is queued in the workload queue. |
Yes/No |
30s |
|
Query Monitoring-History |
Query ID |
Query ID of a query statement, which is a unique identifier allocated by the kernel to each query statement. |
Character string |
180s |
User Name |
Name of the user who submits a query statement. |
Character string |
180s |
|
Application Name |
Application name corresponding to a query statement. |
Character string |
180s |
|
Database Name |
Name of the database corresponding to a query statement. |
Character string |
180s |
|
Workload Queue |
Name of the workload queue that carries the current query statement. |
Character string |
180s |
|
Submitted |
Timestamp when a query statement is submitted. |
Character string |
180s |
|
Blocking Time |
Waiting time before the query statement is executed, in ms. |
≥ 0 |
180s |
|
Execution Time |
Execution time of the query statement, in ms. |
≥ 0 |
180s |
|
CPU Time |
Total CPU time spent by the query statement on all DNs, in ms. |
≥ 0 |
180s |
|
CPU Time Skew |
CPU time skew of a query statement executed on all DNs. |
0% to 100% |
180s |
|
Average Written Data |
Average data size of the query statement flushed to disks on all DNs, in MB. |
≥ 0 |
180s |
|
Statement |
Query statements to be parsed |
Character string |
180s |
|
Slow Instance Monitoring |
Slow Instance |
Number of slow instances detected at the current time point. |
≥ 0 |
240s |
Detected |
Time when a slow instance is detected for the first time. |
Character string |
240s |
|
Node Name |
Name of the node where the slow instance is deployed. |
Character string |
240s |
|
Instance |
Name of an instance. |
Character string |
240s |
|
Slow Node Detections (within 24 hours) |
Number of times that a slow instance is detected within 24 hours. |
≥ 0 |
240s |
|
Workload Queue Monitoring |
Workload Queue |
Name of the workload queue in the cluster. |
Character string |
120s |
CPU Usage |
Real-time CPU usage of the workload queue. |
0% to 100% |
120s |
|
CPU Resource |
CPU usage quotas of the workload queue. |
0% to 100% |
120s |
|
Real-Time Concurrent Short Queries |
Number of real-time concurrent simple queries in a workload queue. |
≥ 0 |
120s |
|
Concurrent Short Queries |
Concurrent simple query quotas of a workload queue. |
≥ 0 |
120s |
|
Real-Time Concurrent Queries |
Number of real-time concurrent complex queries in a workload queue. |
≥ 0 |
120s |
|
Query Concurrency |
Concurrent complex query quotas of a workload queue. |
≥ 0 |
120s |
|
Waiting Queries |
User |
Name of the user of waiting queries |
Character string |
120s |
Application |
Name of the application to be queried. |
Character string |
120s |
|
Database |
Name of the database to be queried. |
Character string |
120s |
|
Queuing Status |
Execution status of a query in the database (CCN/CN/DN). |
Character string |
120s |
|
Wait Time |
Waiting time for a waiting query (unit: ms). |
≥ 0 ms |
120s |
|
Workload Queue |
Workload queue to which the waiting query belongs. |
Character string |
120s |
|
Statement |
Query statement for the waiting status. |
Character string |
120s |
|
Circuit Breaking Queries |
Query ID |
Query ID of the circuit breaking query statement. |
Character string |
120s |
Query Statement |
Query statement for the circuit breaking status. |
Character string |
120s |
|
Blocking Time |
Blocking time before the query statement triggers circuit breaking, in ms. |
≥ 0 |
120s |
|
Execution Time |
Execution time before the query statement triggers circuit breaking, in ms. |
≥ 0 |
120s |
|
CPU Time |
Average CPU time consumed by each DN before the query statement triggers circuit breaking, in ms. |
≥ 0 |
120s |
|
CPU Skew |
Skew rate of CPU time consumed by each DN before the query statement triggers circuit breaking. |
0% to 100% |
120s |
|
Exception Handling |
Handling method after the query statement triggers circuit breaking. |
Abort/Degrade |
120s |
|
Status |
Circuit breaking handling status of a query statement. |
Executing/Completed |
120s |
|
SQL Tuning |
Query ID |
IP address of the current query (query logic ID). |
Character string |
180s |
Database |
Name of the database where the current query is executed. |
Character string |
180s |
|
Schema Name |
Name of the current query schema. |
Character string |
180s |
|
User Name |
Name of the user who performs the query. |
Character string |
180s |
|
Client |
Name of the client that initiates the current query. |
Character string |
180s |
|
Client IP Address |
IP address of the client that initiates the current query. |
Character string |
180s |
|
Running Time |
Execution time of the current query, in ms. |
≥ 0 |
180s |
|
CPU Time |
CPU time of the current query, in ms. |
≥ 0 |
180s |
|
Scale-Out Started |
Start time of the current query. |
Timestamp |
180s |
|
Completed |
End time of the current query. |
Timestamp |
180s |
|
Details |
Details about the current query. |
Character string |
180s |
|
INODE |
Inode Usage |
Disk inode usage. |
0% to 100% |
30s |
SCHEMA |
Schema Usage |
Database schema usage. |
0% to 100% |
3600s |
Type |
Metric |
ECS Cluster |
BMS Cluster |
---|---|---|---|
Network |
NIC nominal speed |
None |
100/1000/10000/50000M |
NIC working mode (duplex) |
None |
full/half |