forked from docs/internal-documentation
100 lines
4.0 KiB
ReStructuredText
100 lines
4.0 KiB
ReStructuredText
====================
|
|
Dashboard Management
|
|
====================
|
|
|
|
As explained in previous pages, the resulting metrics of the configured
|
|
monitor plugins (mainly of EpMon, but possibly also from other plugins)
|
|
are first stored in a Graphite time series database, befor they are
|
|
furthe rprocessed as flags and semaphores for the actual public dashboard.
|
|
|
|
However, sometimes Service Engineers or Service Managers benefit from
|
|
deeper inspection of this time series data for debugging purposes.
|
|
Therefore a Grafana frontend may be used to visualize and drill down
|
|
the data. The entrypoint to a set of predefined dahboards is:
|
|
|
|
https://dashboard.tsi-dev.otc-service.com/dashboards/f/CloudMon/cloudmon
|
|
|
|
The authentication to this dashboard is only available for OTC staff member.
|
|
It is managed by Keycloak which in turn utilizes the OTC LDAP directory.
|
|
|
|
The Dashboards are grouped by the type of service:
|
|
|
|
- The **Squad Flag and Health** dashboard provides a high level overview
|
|
of the service health and flag metric status for each service of a
|
|
squad, respectively.
|
|
- The **Cloud Service Statistics** dashboard monitors the health of each
|
|
endpoint url listed by an EpMon configuration entry.
|
|
- Dashboards can be replicated and customized for individual squad needs.
|
|
|
|
The Cloud Service Statistics dashboards honor the ``Environment`` (target
|
|
monitored platform) and ``Zone`` (monitoring source location) variables
|
|
at the top of each dashboard so these views can be adjusted based on
|
|
chosen value.
|
|
|
|
All the Squad Flag And Health dashboards support Environment (target
|
|
monitored platform) variables at the top of each dashboard.
|
|
|
|
|
|
Squad Flag and Health Dashboard
|
|
===============================
|
|
|
|
The dashboard provides deeper insight in Metric Processor generated metrics.
|
|
Flag panels provide information whether service has exceeded a threshold
|
|
of a predefined flag metric type. Health panels provide information about
|
|
resulting service health status based on evaluated flag metrics.
|
|
|
|
The resulting flag values are visualized in state timeline panels with the
|
|
following values:
|
|
|
|
- 0 - flag metric is not breaching the defined threshold.
|
|
- 1 - flag metric is breaching the defined threshold.
|
|
|
|
The resulting health values are visualized and mapped in state timeline
|
|
panels with the following values:
|
|
|
|
- 0 - Service operates normally.
|
|
- 1 - Service has a minor issue resulting from defined reached flag metric(s).
|
|
- 2 - Service has an outage resulting from defined reached flag metrics(s).
|
|
|
|
Example at https://dashboard.tsi-dev.otc-service.com/d/s75qyOU4z/compute-flags?orgId=1
|
|
|
|
.. image:: training_images/flag_and_health_dashboard.png
|
|
|
|
|
|
Cloud Service Statistics dashboard
|
|
==================================
|
|
|
|
The Cloud Service Statistics dashboards uses metrics from GET query
|
|
requests towards OTC platform (:ref:`EpMon Overview <sd2_epmon_overview>`)
|
|
and visualize it in:
|
|
|
|
- API calls duration per each URL query.
|
|
- API calls duration (aggregated).
|
|
- API calls response codes.
|
|
|
|
Example at https://dashboard.tsi-dev.otc-service.com/d/b4560ed6-95f0-45c0-904c-6ff9f8a491e8/sfs-service-statistics?orgId=1&refresh=10s
|
|
|
|
.. image:: training_images/cloud_service_statistics.png
|
|
|
|
|
|
Custom Dashboards
|
|
=================
|
|
|
|
The dashboards described above are predefined and read-only. Further
|
|
customization is currently possible via system-config in GitHub:
|
|
|
|
https://github.com/stackmon/apimon-tests/tree/main/dashboards/grafana
|
|
|
|
The predefined simplified dashboard panel in YAML syntax is defined in
|
|
the Stackmon Github repository:
|
|
|
|
https://github.com/stackmon/apimon-tests/tree/main/dashboards
|
|
|
|
Dashboards can be customized also just by copy/save function directly in
|
|
Grafana. The whole dashboard can be saved under new name and then edited
|
|
without any restrictions.
|
|
|
|
This approach is valid for testing proofs of concept, temporary solutions,
|
|
and investigations but should not be used as permanent solution as
|
|
customized dashboards which are not properly stored on Github repositories
|
|
might be permanently deleted in case of full dashboard service re-installation. |