forked from docs/docsportal
Reviewed-by: Gode, Sebastian <sebastian.gode@t-systems.com> Co-authored-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-committed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
89 lines
3.3 KiB
ReStructuredText
89 lines
3.3 KiB
ReStructuredText
=====================
|
|
Dashboards management
|
|
=====================
|
|
|
|
https://dashboard.tsi-dev.otc-service.com/dashboards/f/CloudMon/cloudmon
|
|
|
|
The authentication is centrally managed by OTC LDAP.
|
|
|
|
|
|
The CloudMon Dashboards are segregated based on the type of service:
|
|
|
|
- The “Squad Flag and Health" dashboard provides high level overview about the service health
|
|
and flag metric status per each service from respective squad.
|
|
- “Cloud Service" Statistics dashboard monitors health of every endpoint url listed
|
|
by EpMon config entry.
|
|
- Dashboards can be replicated/customized for individual Squad needs.
|
|
|
|
|
|
All the Cloud Service Statistics dashboards support Environment (target monitored platform) and Zone
|
|
(monitoring source location) variables at the top of each dashboard so these
|
|
views can be adjusted based on chosen value.
|
|
|
|
All the Squad Flag And Health dashboards support Environment (target monitored platform) variables at the top of each dashboard.
|
|
|
|
|
|
Squad Flag and Health Dashboard
|
|
===============================
|
|
|
|
The dashboard provides deeper insight in Metric Processor generated metrics.
|
|
Flag panels provide information whether service has breached the thresholds
|
|
of predefined flag metric types.
|
|
Health panels provide information about resulting service health status based on evaluated flag metrics.
|
|
|
|
The resulting flag values are visualized in state timeline panels with following values:
|
|
|
|
- 0 - flag metric is not breaching the defined threshold
|
|
- 1 - flag metric is breaching the defined threshold
|
|
|
|
|
|
The resulting health values are visualized in state timeline panels with following values:
|
|
|
|
- 0 - Service operates normally
|
|
- 1 - Service has a minor issue resulting from defined reached flag metric(s)
|
|
- 2 - Service has an outage resulting from defined reached flag metrics(s)
|
|
|
|
Example at https://dashboard.tsi-dev.otc-service.com/d/s75qyOU4z/compute-flags?orgId=1
|
|
|
|
.. image:: training_images/flag_and_health_dashboard.png
|
|
|
|
|
|
Cloud Service Statistics dashboard
|
|
==================================
|
|
|
|
Cloud Service Statistics dashboards uses metrics from GET query requests towards OTC
|
|
platform (:ref:`EpMon Overview <sd2_epmon_overview>`) and visualize it in:
|
|
|
|
- API calls duration per each URL query
|
|
- API calls duration (aggregated)
|
|
- API calls response codes
|
|
|
|
Example at https://dashboard.tsi-dev.otc-service.com/d/b4560ed6-95f0-45c0-904c-6ff9f8a491e8/sfs-service-statistics?orgId=1&refresh=10s
|
|
|
|
.. image:: training_images/cloud_service_statistics.png
|
|
|
|
|
|
Custom Dashboards
|
|
=================
|
|
|
|
Previous dashboards are predefined and read-only.
|
|
The further customization is currently possible via system-config in github:
|
|
|
|
https://github.com/stackmon/apimon-tests/tree/main/dashboards/grafana
|
|
|
|
The predefined simplified dashboard panel in yaml syntax
|
|
is defined in Stackmon Github repository
|
|
(https://github.com/stackmon/apimon-tests/tree/main/dashboards)
|
|
|
|
Dashboards can be customized also just by copy/save function directly in
|
|
Grafana. The whole dashboard can be saved under new name and then edited
|
|
without any restrictions.
|
|
|
|
This approach is valid for PoC, temporary solutions and investigations but
|
|
should not be used as permanent solution as customized dashboards which are not
|
|
properly stored on Github repositories might be permanently deleted in case of
|
|
full dashboard service re-installation.
|
|
|
|
|
|
|