Reviewed-by: gtema <artem.goncharov@gmail.com> Co-authored-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-committed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
9.0 KiB
Metric Databases
Metrics are stored in 2 different database types:
- Graphite time series database
- Postgresql relational database
Graphite
Graphite is an open-source enterprise-ready time-series database. ApiMon, EpMon, and CloudMon data are stored in the clustered Graphite TSDB. Metrics emitted by the processes are gathered in the row of statsd processes which aggregate metrics to 10s precision.
Parameter | Value |
---|---|
Grafana Datasource | apimon-carbonapi |
Database type | time series |
Main namespace | stats |
Metric type | OpenStack API metrics (including otcextensions) collecting response codes, latencies, methods ApiMOn metrics (create_cce_cluster, delete_volume_eu-de-01, etc) Custom metrics which can be created by tags in ansible playbooks |
Database attributes | "timers", "counters", "environment name", "monitoring location", "service", "request method", "resource", "response code", "result", custom metrics, etc |
result of API calls | attempted passed failed |
All metrics are under "stats" namespace:
Under "stats" there are following important metric types:
- counters
- timers
- gauges
Counters and timers have following subbranches:
- apimon.metric → specific apimon metrics not gathered by the OpenStack API methods
- openstack.api → pure API request metrics
Every section has further following branches:
environment name (production_regA, production_regB, etc)
- monitoring location (production_regA, awx) - specification of the environment from which the metric is gathered
openstack.api
OpenStack metrics branch is structured as following:
- service (normally service_type from the service catalog, but sometimes differs slightly)
request method (GET/POST/DELETE/PUT)
resource (service resource, i.e. server, keypair, volume, etc). Sub-resources are joined with "_" (i.e. cluster_nodes)
response code - received response code
- count/upper/lower/mean/etc - timer specific metrics (available only under stats.timers.openstack.api.$environment.$zone.$service.$request_method.$resource.$status_code.{count,mean,upper,*})
- count/rate - counter specific metrics (available only under stats.counters.openstack.api.$environment.$zone.$service.$request_method.$resource.$status_code.{count,mean,upper,*})
attempted - counter for the attempted requests (only for counters)
failed - counter of failed requests (not received response, connection problems, etc) (only for counters)
passed - counter of requests receiving any response back (only for counters)
apimon.metric
metric name (i.e. create_cce_cluster, delete_volume_eu-de-01, etc) - complex metrics branch
attempted/failed/failedignored/passed/skipped - counters for the corresponding operation results (this branch element represents status of the corresponding ansible task)
- $az - some metrics would have availability zone for the operation on that level. Since this info is not always available this is a varying path
curl - subtree for the curl type of metrics
- $name - short name of the host to be checked
stats.timers.apimon.metric.$environment.$zone.csm_lb_timings.{public,private}.{http,https,tcp}.$az.__VALUE__ - timer values for the loadbalancer test
stats.counters.apimon.metric.$environment.$zone.csm_lb_timings.{public,private}.{http,https,tcp}.$az.{attempted,passed,failed} - counter values for the loadbalancer test
stats.timers.apimon.metric.$environment.$zone.curl.$host.{passed,failed}.__VALUE__ - timer values for the curl test
stats.counters.apimon.metric.$environment.$zone.curl.$host.{attempted,passed,failed} - counter values for the curl test
stats.timers.apimon.metric.$environment.$zone.dns.$ns_name.$host - timer values for the NS lookup test. $ns_name is the DNS servers used to query the records
stats.counters.apimon.metric.$environment.$zone.dns.$ns_name.$host.{attempted,passed,failed} - counter values for the NS lookup test
Postgresql
Relational database stores ApiMon playbook scenario results which provides statistics about most common service functionalities and use cases. These queries are used mainly on Test Results dashboard and Service specific statistics dashboards.
Parameter | Value |
---|---|
Grafana Datasource | apimon-pg |
Database Type | relational |
Database Table | results_summary |
Metric type | apimon playbook result statistics |
Database Fields | "timestamp", "name", "job_id", "result", "duration", "result_task" |
result field values | 0 - success 1 - ? 2 - skipped 3 - failed |
result_task object parameters | "timestamp", "name", "job_id", "result", "duration", "action", "environment", "zone", "anonymized_response" |