Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com> Reviewed-by: Rechenburg, Matthias <matthias.rechenburg@t-systems.com> Co-authored-by: Lu, Huayi <luhuayi@huawei.com> Co-committed-by: Lu, Huayi <luhuayi@huawei.com>
185 KiB
Monitoring Metrics
You can check the status and available resources of a cluster and learn about its real-time resource consumption through the GaussDB(DWS) monitoring items.
Table 1 describes GaussDB(DWS) monitoring metrics.
Monitored Object |
Metric |
Description |
Value Range |
Monitoring Period (Raw Data) |
---|---|---|---|---|
Cluster Overview |
Cluster Status |
Status of a cluster. |
Normal/Abnormal/Degraded |
30s |
Nodes |
Number of available nodes and total number of nodes (Available/Total) in a cluster. |
≥ 0 |
60s |
|
CNs |
Number of CNs in a cluster. |
≥ 0 |
60s |
|
Databases |
Number of created databases in a cluster. |
≥ 0 |
90s |
|
Resource Consumption |
CPU Usage |
Average real-time CPU usage of all nodes in a cluster. |
0% to 100% |
30s |
Memory Usage |
Average real-time memory usage of all nodes in a cluster. |
0% to 100% |
30s |
|
Disk Usage |
Average real-time disk usage of all nodes in a cluster. |
0% to 100% |
30s |
|
Disk I/O |
Average real-time disk I/O of all nodes in a cluster. |
≥ 0 KB/s |
30s |
|
Network I/O |
Average real-time network I/O of all NICs in a cluster. |
≥ 0 KB/s |
30s |
|
Top 5 Time-Consuming Queries |
Query ID |
ID of a query, which is automatically generated by the database. |
≥ 0 |
180s |
SQL Statement |
Query statement executed by a user. |
String |
180s |
|
Execution Time |
Execution time of a query statement (unit: ms). |
≥ 0 ms |
180s |
|
Top 5 Queries with Most Data Written to Disk |
Query ID |
ID of a query, which is automatically generated by the database. |
≥ 0 |
180s |
SQL Statement |
Query statement executed by a user. |
String |
180s |
|
Data Written to Disk |
Data to be written to disks after a user runs a statement (unit: MB). |
≥ 0 MB |
180s |
|
Cluster Resource Metrics |
CPU Usage |
Average CPU usage of all nodes in a cluster. |
0% to 100% |
30s |
Memory Usage |
Average memory usage of all nodes in a cluster. |
0% to 100% |
30s |
|
Disk Usage |
Average usage of all disks in a cluster. |
0% to 100% |
30s |
|
Disk I/O Usage |
Average I/O usage of all disks in a cluster. |
0% to 100% |
30s |
|
Network I/O Usage |
Average I/O usage of all NICs in a cluster. |
0% to 100% |
30s |
|
Key Database Metrics |
Cluster Status |
Cluster running status. |
Normal/Degraded/Abnormal |
30s |
Cluster Abnormal CNs |
Number of abnormal CNs in the cluster |
≥ 0 |
60s |
|
Cluster Read-only |
Whether the cluster is in the read-only state |
Yes/No |
30s |
|
Concurrent Sessions |
Number of concurrent sessions in a cluster within a specified period. |
≥ 0 |
30s |
|
Concurrent Queries |
Number of concurrent queries in a cluster within a specified period. |
≥ 0 |
30s |
|
Node Monitoring-Overview |
Node Name |
Name of a node in a cluster. |
String |
30s |
CPU Usage |
CPU usage of a host. |
0% to 100% |
30s |
|
Memory Usage |
Memory usage of a host. |
0% to 100% |
30s |
|
Average Disk Usage (%) |
Disk usage of a host. |
0% to 100% |
30s |
|
IP Address |
Service IP address of a host. |
String |
30s |
|
Disk I/O |
Disk I/O of a host (unit: KB/s) |
≥ 0 KB/s |
30s |
|
TCP Protocol Stack Retransmission Rate |
Retransmission rate of TCP packets per unit time. |
0% to 100% |
30s |
|
Status |
Running status of a host |
Online/Offline |
30s |
|
Node Monitoring-Disks |
Node Name |
Name of a node in a cluster. |
String |
30s |
Disk Name |
Name of a disk on a host. |
String |
30s |
|
Disk Capacity |
Disk capacity of the host (unit: GB) |
≥ 0 GB |
30s |
|
Disk Usage |
Disk usage of a host. |
0% to 100% |
30s |
|
Disk Read Rate |
Disk read rate of the host (unit: KB/s) |
≥ 0 KB/s |
30s |
|
Disk Write Rate |
Disk write rate of the host (unit: KB/s) |
≥ 0 KB/s |
30s |
|
I/O Wait Time (await, ms) |
Average waiting time for each I/O request (unit: ms) |
≥ 0 ms |
30s |
|
I/O Service Time (svctm, ms) |
Average processing time for each I/O request (unit: ms) |
≥ 0 ms |
30s |
|
I/O Utility (util, %) |
Disk I/O usage of a host. |
0% to 100% |
30s |
|
Node Monitoring-Network |
Node Name |
Name of a node in a cluster. |
String |
30s |
NIC Name |
Name of the NIC on a host. |
String |
30s |
|
NIC Status |
NIC status. |
up/down |
30s |
|
NIC Speed |
Working rate of a NIC, in Mbit/s. |
≥ 0 |
30s |
|
Received Packets |
Number of received packets of a NIC. |
≥ 0 |
30s |
|
Sent Packets |
Number of sent packets of a NIC. |
≥ 0 |
30s |
|
Lost Packets Received |
Number of received lost packets of a NIC. |
≥ 0 |
30s |
|
Receive Rate |
Number of bytes received by a NIC per unit of time (KB/s). |
≥ 0 KB/s |
30s |
|
Transmit Rate |
Number of bytes sent by a NIC per unit of time (unit: KB/s) |
≥ 0 KB/s |
30s |
|
Database Monitoring |
Database Name |
Name of the database created by a user in a cluster. |
String |
60s |
Usage |
Used capacity of the current database (unit: GB). |
≥ 0 GB |
86400s |
|
Users |
Number of users in the current database. |
≥ 0 |
30s |
|
Sessions |
Number of sessions in the current database. |
≥ 0 |
30s |
|
Applications |
Number of applications in the current database. |
≥ 0 |
30s |
|
Queries |
Number of active queries in the current database. |
≥ 0 |
30s |
|
Scanning Rows |
Number of rows returned by the full table scan query in the current database. |
≥ 0 |
60s |
|
Index Query Rows |
Number of rows returned by the index query in the current database. |
≥ 0 |
60s |
|
Inserted Rows |
Number of rows inserted in the current database. |
≥ 0 |
60s |
|
Updated Rows |
Number of rows updated in the current database. |
≥ 0 |
60s |
|
Deleted Rows |
Number of rows deleted from the current database. |
≥ 0 |
60s |
|
Executed Transactions |
Number of transaction executions on the current database. |
≥ 0 |
60s |
|
Transaction Rollbacks |
Number of transactions in the current database that have been rolled back. |
≥ 0 |
60s |
|
Deadlocks |
Number of deadlocks detected in the current database. |
≥ 0 |
60s |
|
Temporary Files |
Number of temporary files created in the current database. |
≥ 0 |
60s |
|
Temporary File Capacity |
Size of temporary files written by the current database, in GB. |
≥ 0 |
60s |
|
Performance Monitoring |
Cluster CPU Usage |
Average CPU usage of all nodes in a cluster. |
0% to 100% |
30s |
Cluster Memory Usage |
Average memory usage of all nodes in a cluster. |
0% to 100% |
30s |
|
Cluster Disk Usage |
Average disk usage of all nodes in a cluster. |
0% to 100% |
30s |
|
Cluster Disk I/O |
Average I/O of all disks in a cluster. |
0% to 100% |
30s |
|
Cluster Network I/O |
Average I/O of all NICs in a cluster. |
0% to 100% |
30s |
|
Cluster Status |
Historical trend of the cluster status. |
Normal/Abnormal/Degraded |
30s |
|
Cluster Read-only |
Historical trend of the cluster read-only status change trend. |
Yes/No |
30s |
|
Cluster Abnormal CNs |
Historical trend of the number of abnormal CNs in the cluster. |
≥ 0 |
60s |
|
Cluster Abnormal DNs |
Historical trend of the number of abnormal DNs in the cluster. |
≥ 0 |
60s |
|
Cluster CPU Usage of DNs |
Average CPU usage of all DNs in a cluster. |
0% to 100% |
60s |
|
Cluster Sessions |
Historical trend of the number of sessions in a cluster. |
≥ 0 |
30s |
|
Cluster Queries |
Historical change trend of the number of queries in the cluster. |
≥ 0 |
30s |
|
Cluster Deadlocks |
Historical trend of the number of deadlocks in a cluster. |
≥ 0 |
60s |
|
Cluster TPS |
Average number of transactions per second of all databases in a cluster. Formula: (delta_xact_commit + delta_xact_rollback)/current_collect_rate |
≥0 |
60s |
|
Cluster QPS |
Average number of concurrent requests per second of all databases in a cluster. Formula: delta_query_count/current_collect_rate |
≥ 0 |
60s |
|
Database Sessions |
Historical trend of the number of sessions on a single database in a cluster. |
≥ 0 |
30s |
|
Database Queries |
Historical trend of the number of queries on a single database in a cluster. |
≥ 0 |
30s |
|
Database Inserted Rows |
Historical trend of the number of rows inserted into a single database in a cluster. |
≥ 0 |
60s |
|
Database Updated Rows |
Historical trend of the number of updated rows in a single database in a cluster. |
≥ 0 |
60s |
|
Database Deleted Rows |
Historical trend of the number of deleted rows in a single database in a cluster. |
≥ 0 |
60s |
|
Database Capacity |
Historical trend of the capacity in a single database in a cluster. |
≥ 0 |
86400s |
|
Live Session |
Session ID |
ID of the current session (query thread ID). |
String |
30s |
User Name |
Name of the user who executes the current session. |
String |
30s |
|
Database Name |
Name of the database connected to the current session. |
String |
30s |
|
Session Duration |
Duration of the current session (unit: ms). |
≥ 0 ms |
30s |
|
Application Name |
Name of the application that creates the current session. |
String |
30s |
|
Queries |
Number of SQL statements executed in the current session. |
≥ 0 |
30s |
|
Latest Query Duration |
Duration for executing the previous SQL statement in the current session. |
≥ 0 ms |
30s |
|
Client IP Address |
IP address of the client that initiates the current session. |
String |
30s |
|
Connected CN |
Connected CN of the current session. |
String |
30s |
|
Session Status |
Execution status of the current session. |
Running/Idle/Retry |
30s |
|
Real-Time Query |
Query ID |
Query ID of a current query statement, which is a unique identifier allocated by the kernel to each query statement. |
String |
30s |
User Name |
Name of the user who submits the current query statement. |
String |
30s |
|
Database Name |
Name of the database corresponding to the current query statement. |
String |
30s |
|
Application Name |
Name of the application corresponding to the current query statement. |
String |
30s |
|
Resource Pool |
Name of the resource pool for the current query statement. |
String |
30s |
|
Submitted |
Timestamp when the current query statement is submitted. |
String |
30s |
|
Blocking Time |
Waiting time before the current query statement is executed, in ms. |
≥ 0 |
30s |
|
Execution Time |
Execution time of the current query statement, in ms. |
≥ 0 |
30s |
|
CPU Time |
Total CPU time spent by the current query statement on all DNs, in ms. |
≥ 0 |
30s |
|
CPU Time Skew |
CPU time skew of the current query statement among all DNs. |
0% to 100% |
30s |
|
Statement |
Query statement that is being executed. |
String |
30s |
|
Connected CN |
Name of the CN that submits the current query statement. |
String |
30s |
|
Client IP Address |
IP address of the client that submits the current query statement. |
String |
30s |
|
Lane |
Lane where the current query statement is located. |
Fast Lane/Slow Lane |
30s |
|
Query Status |
Query status of the statement that is being executed. |
String |
30s |
|
Session ID |
Session ID of the current query statement, which is a unique identifier allocated by the kernel to each client connection. |
String |
30s |
|
Queuing Status |
Status of the current query execution in the database, indicating whether the query is queued in the resource pool. |
Yes/No |
30s |
|
Historical Query |
Query ID |
Query ID of a query statement, which is a unique identifier allocated by the kernel to each query statement. |
String |
180s |
User Name |
Name of the user who submits a query statement. |
String |
180s |
|
Application Name |
Application name corresponding to a query statement. |
String |
180s |
|
Database Name |
Name of the database corresponding to a query statement. |
String |
180s |
|
Resource Pool |
Name of the resource pool for the current query statement. |
String |
180s |
|
Submitted |
Timestamp when a query statement is submitted. |
String |
180s |
|
Blocking Time |
Waiting time before the query statement is executed, in ms. |
≥ 0 |
180s |
|
Execution Time |
Execution time of the query statement, in ms. |
≥ 0 |
180s |
|
CPU Time |
Total CPU time spent by the query statement on all DNs, in ms. |
≥ 0 |
180s |
|
CPU Time Skew |
CPU time skew of a query statement executed on all DNs. |
0% to 100% |
180s |
|
Statement |
Query statements to be parsed |
String |
180s |
|
Slow Instance Monitoring |
Slow Instance |
Number of slow instances detected at the current time point. |
≥ 0 |
240s |
Detected |
Time when a slow instance is detected for the first time. |
String |
240s |
|
Node Name |
Name of the node where the slow instance is deployed. |
String |
240s |
|
Instance |
Name of an instance. |
String |
240s |
|
Slow Node Detections (within 24 hours) |
Number of times that a slow instance is detected within 24 hours. |
≥ 0 |
240s |
|
Resource Pool Monitoring |
Resource Pool |
Name of a resource pool in a cluster. |
String |
120s |
CPU Usage |
Real-time CPU usage of a resource pool. |
0% to 100% |
120s |
|
CPU Resource |
CPU usage quota of a resource pool. |
0% to 100% |
120s |
|
Real-Time Concurrent Short Queries |
Simple concurrency in a resource pool. |
≥ 0 |
120s |
|
Concurrent Short Queries |
Quota for simple concurrency in a resource pool. |
≥ 0 |
120s |
|
Real-Time Concurrent Queries |
Real-time complex concurrency in a resource pool. |
≥ 0 |
120s |
|
Query Concurrency |
Quota for complex concurrency in a resource pool. |
≥ 0 |
120s |
|
Storage |
Storage resource quota of a resource pool. |
≥ 0 |
120s |
|
Disk Usage |
Disk usage of a resource pool. |
0% to 100% |
120s |
|
Memory |
Memory quota of a resource pool. |
≥ 0 |
120s |
|
Memory Usage |
Memory usage of a resource pool. |
0% to 100% |
120s |
|
Queries Waiting in a Resource Pool |
User |
Name of the user of waiting queries |
String |
120s |
Application |
Name of the application to be queried. |
String |
120s |
|
Database |
Name of the database to be queried. |
String |
120s |
|
Queuing Status |
Execution status of a query in the database (CCN/CN/DN). |
String |
120s |
|
Wait Time |
Waiting time for a waiting query (unit: ms). |
≥ 0 ms |
120s |
|
Resource Pool |
Resource pool of the waiting query. |
String |
120s |
|
Statement |
Query statement for the waiting status. |
String |
120s |
|
Circuit Breaking Queries |
Query ID |
Query ID of the circuit breaking query statement. |
String |
120s |
Query Statement |
Query statement for the circuit breaking status. |
String |
120s |
|
Blocking Time |
Blocking time before the query statement triggers circuit breaking, in ms. |
≥ 0 |
120s |
|
Execution Time |
Execution time before the query statement triggers circuit breaking, in ms. |
≥ 0 |
120s |
|
CPU Time |
Average CPU time consumed by each DN before the query statement triggers circuit breaking, in ms. |
≥ 0 |
120s |
|
CPU Skew |
Skew rate of CPU time consumed by each DN before the query statement triggers circuit breaking. |
0% to 100% |
120s |
|
Exception Handling |
Handling method after the query statement triggers circuit breaking. |
Abort/Degrade |
120s |
|
Status |
Circuit breaking handling status of a query statement. |
Executing/Completed |
120s |
|
SQL Tuning |
Query ID |
IP address of the current query (query logic ID). |
String |
180s |
Database |
Name of the database where the current query is executed. |
String |
180s |
|
Schema Name |
Name of the current query schema. |
String |
180s |
|
User Name |
Name of the user who performs the query. |
String |
180s |
|
Client |
Name of the client that initiates the current query. |
String |
180s |
|
Client IP Address |
IP address of the client that initiates the current query. |
String |
180s |
|
Running Time |
Execution time of the current query, in ms. |
≥ 0 |
180s |
|
CPU Time |
CPU time of the current query, in ms. |
≥ 0 |
180s |
|
Scale-Out Started |
Start time of the current query. |
Timestamp |
180s |
|
Completed |
End time of the current query. |
Timestamp |
180s |
|
Details |
Details about the current query. |
String |
180s |
|
INODE |
Inode Usage |
Disk inode usage. |
0% to 100% |
30s |
SCHEMA |
Schema Usage |
Database schema usage. |
0% to 100% |
3600s |