forked from docs/internal-documentation
Reviewed-by: Gode, Sebastian <sebastian.gode@t-systems.com> Co-authored-by: tischrei <tino.schreiber@t-systems.com> Co-committed-by: tischrei <tino.schreiber@t-systems.com>
161 lines
7.8 KiB
ReStructuredText
161 lines
7.8 KiB
ReStructuredText
.. _sd2_metric_databases:
|
|
|
|
================
|
|
Metric Databases
|
|
================
|
|
|
|
Metrics are stored in Graphite time series database in different databases:
|
|
|
|
- cloudmon-metrics
|
|
- cloudmon
|
|
|
|
|
|
cloudmon database
|
|
=================
|
|
|
|
|
|
EpMon data are stored in the clustered Graphite TSDB.
|
|
Metrics emitted by the processes are gathered in the
|
|
row of statsd processes which aggregate metrics to 10s precision.
|
|
|
|
|
|
+---------------------+-----------------------------------------------------------------------------------------------+
|
|
| Parameter | Value |
|
|
+=====================+===============================================================================================+
|
|
| Grafana Datasource | cloudmon |
|
|
+---------------------+-----------------------------------------------------------------------------------------------+
|
|
| Database type | time series |
|
|
+---------------------+-----------------------------------------------------------------------------------------------+
|
|
| Main namespace | stats |
|
|
+---------------------+-----------------------------------------------------------------------------------------------+
|
|
| Metric type | OpenStack API metrics (including otcextensions) collecting response codes, latencies, methods |
|
|
+---------------------+-----------------------------------------------------------------------------------------------+
|
|
| Database attributes | "timers", "counters", "environment name", "monitoring location", "service", "request method", |
|
|
| | "resource", "response code", "result", custom metrics, etc |
|
|
+---------------------+-----------------------------------------------------------------------------------------------+
|
|
| result of API calls | attempted |
|
|
| | passed |
|
|
| | failed |
|
|
+---------------------+-----------------------------------------------------------------------------------------------+
|
|
|
|
|
|
.. image:: training_images/graphite_query.png
|
|
|
|
|
|
All metrics are under "stats" namespace:
|
|
|
|
Under "stats" there are following important metric types:
|
|
|
|
- counters
|
|
- timers
|
|
- gauges
|
|
|
|
Counters and timers have following subbranches:
|
|
|
|
- openstack.api → pure API request metrics
|
|
|
|
Every section has further following branches:
|
|
|
|
- environment name (production_regA, production_regB, etc)
|
|
|
|
- monitoring location (production_regA, awx) - specification of the environment from which the metric is gathered
|
|
|
|
|
|
openstack.api
|
|
-------------
|
|
|
|
OpenStack metrics branch is structured as following:
|
|
|
|
- service (normally service_type from the service catalog, but sometimes differs slightly)
|
|
|
|
- request method (GET/POST/DELETE/PUT)
|
|
|
|
- resource (service resource, i.e. server, keypair, volume, etc). Sub-resources are joined with "_" (i.e. cluster_nodes)
|
|
|
|
- response code - received response code
|
|
|
|
- count/upper/lower/mean/etc - timer specific metrics (available only under stats.timers.openstack.api.$environment.$zone.$service.$request_method.$resource.$status_code.{count,mean,upper,*})
|
|
- count/rate - counter specific metrics (available only under stats.counters.openstack.api.$environment.$zone.$service.$request_method.$resource.$status_code.{count,mean,upper,*})
|
|
|
|
- attempted - counter for the attempted requests (only for counters)
|
|
- failed - counter of failed requests (not received response, connection problems, etc) (only for counters)
|
|
- passed - counter of requests receiving any response back (only for counters)
|
|
|
|
|
|
cloudmon-metrics database
|
|
=========================
|
|
|
|
|
|
Cloudmon data are stored in the clustered Graphite TSDB.
|
|
Metrics are emitted by the Metric Processor.
|
|
Metric Processor is processing the cloudmon metrics (from EpMon) and based on defined flag metrics (https://github.com/opentelekomcloud-infra/stackmon-config/blob/main/mp-prod/conf.d/flag_metrics.yaml)
|
|
and defined thresholds(https://github.com/opentelekomcloud-infra/stackmon-config/blob/main/mp-prod/conf.d/metric_templates.yaml) finally produces the health metrics
|
|
(https://github.com/opentelekomcloud-infra/stackmon-config/blob/main/mp-prod/conf.d/health_metrics.yaml) with different impact.
|
|
Final health metrics are then sent to Status Dashboard to visualize them as semaphore lights.
|
|
|
|
|
|
|
|
+---------------------+-----------------------------------------------------------------------------------------------+
|
|
| Parameter | Value |
|
|
+=====================+===============================================================================================+
|
|
| Grafana Datasource | cloudmon-metrics |
|
|
+---------------------+-----------------------------------------------------------------------------------------------+
|
|
| Database type | time series |
|
|
+---------------------+-----------------------------------------------------------------------------------------------+
|
|
| Main namespace | stats |
|
|
+---------------------+-----------------------------------------------------------------------------------------------+
|
|
| Metric type | Metric Processor produces flag metric values (0,1) and health metric values (0,1,2) |
|
|
+---------------------+-----------------------------------------------------------------------------------------------+
|
|
| Database attributes | "health", "flag", "environment name", "service", "service type", "flag metric type" |
|
|
+---------------------+-----------------------------------------------------------------------------------------------+
|
|
| result | 0 |
|
|
| | 1 |
|
|
| | 2 |
|
|
+---------------------+-----------------------------------------------------------------------------------------------+
|
|
|
|
|
|
.. image:: training_images/mp_query.png
|
|
|
|
|
|
Based on the type of metric All metrics are under "stats" namespace:
|
|
|
|
Under "cloudmon-metrics" there are following important metric types:
|
|
|
|
- flag
|
|
- health
|
|
|
|
- environment name (production_regA, production_regB, etc)
|
|
|
|
|
|
flag metrics
|
|
------------
|
|
|
|
flag metrics branch is structured as following:
|
|
|
|
- environment name (production_regA, production_regB, etc)
|
|
|
|
- service type (service type from the service catalog)
|
|
|
|
- flag metric type (api_slow, api_down, api_success_rate_low, ...)
|
|
|
|
flag metrics contain following values:
|
|
|
|
- 0 - flag metric is not breaching the defined threshold
|
|
- 1 - flag metric is breaching the defined threshold
|
|
|
|
|
|
Health metrics
|
|
--------------
|
|
|
|
Health metrics branch is structured as following:
|
|
|
|
- environment name (production_regA, production_regB, etc)
|
|
|
|
- service (cloud service)
|
|
|
|
Health metrics contain following values:
|
|
|
|
- 0 - Service operates normally
|
|
- 1 - Service has a minor issue resulting from defined reached flag metric(s)
|
|
- 2 - Service has an outage resulting from defined reached flag metrics(s)
|