Hasko, Vladimir f114248cfb adding SD2 training content
Reviewed-by: Gode, Sebastian <sebastian.gode@t-systems.com>
Co-authored-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-committed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
2023-10-04 10:07:42 +00:00

89 lines
3.3 KiB
ReStructuredText

=====================
Dashboards management
=====================
https://dashboard.tsi-dev.otc-service.com/dashboards/f/CloudMon/cloudmon
The authentication is centrally managed by OTC LDAP.
The CloudMon Dashboards are segregated based on the type of service:
- The “Squad Flag and Health" dashboard provides high level overview about the service health
and flag metric status per each service from respective squad.
- “Cloud Service" Statistics dashboard monitors health of every endpoint url listed
by EpMon config entry.
- Dashboards can be replicated/customized for individual Squad needs.
All the Cloud Service Statistics dashboards support Environment (target monitored platform) and Zone
(monitoring source location) variables at the top of each dashboard so these
views can be adjusted based on chosen value.
All the Squad Flag And Health dashboards support Environment (target monitored platform) variables at the top of each dashboard.
Squad Flag and Health Dashboard
===============================
The dashboard provides deeper insight in Metric Processor generated metrics.
Flag panels provide information whether service has breached the thresholds
of predefined flag metric types.
Health panels provide information about resulting service health status based on evaluated flag metrics.
The resulting flag values are visualized in state timeline panels with following values:
- 0 - flag metric is not breaching the defined threshold
- 1 - flag metric is breaching the defined threshold
The resulting health values are visualized in state timeline panels with following values:
- 0 - Service operates normally
- 1 - Service has a minor issue resulting from defined reached flag metric(s)
- 2 - Service has an outage resulting from defined reached flag metrics(s)
Example at https://dashboard.tsi-dev.otc-service.com/d/s75qyOU4z/compute-flags?orgId=1
.. image:: training_images/flag_and_health_dashboard.png
Cloud Service Statistics dashboard
==================================
Cloud Service Statistics dashboards uses metrics from GET query requests towards OTC
platform (:ref:`EpMon Overview <sd2_epmon_overview>`) and visualize it in:
- API calls duration per each URL query
- API calls duration (aggregated)
- API calls response codes
Example at https://dashboard.tsi-dev.otc-service.com/d/b4560ed6-95f0-45c0-904c-6ff9f8a491e8/sfs-service-statistics?orgId=1&refresh=10s
.. image:: training_images/cloud_service_statistics.png
Custom Dashboards
=================
Previous dashboards are predefined and read-only.
The further customization is currently possible via system-config in github:
https://github.com/stackmon/apimon-tests/tree/main/dashboards/grafana
The predefined simplified dashboard panel in yaml syntax
is defined in Stackmon Github repository
(https://github.com/stackmon/apimon-tests/tree/main/dashboards)
Dashboards can be customized also just by copy/save function directly in
Grafana. The whole dashboard can be saved under new name and then edited
without any restrictions.
This approach is valid for PoC, temporary solutions and investigations but
should not be used as permanent solution as customized dashboards which are not
properly stored on Github repositories might be permanently deleted in case of
full dashboard service re-installation.