Hasko, Vladimir f114248cfb adding SD2 training content
Reviewed-by: Gode, Sebastian <sebastian.gode@t-systems.com>
Co-authored-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-committed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
2023-10-04 10:07:42 +00:00

3.3 KiB

Dashboards management

https://dashboard.tsi-dev.otc-service.com/dashboards/f/CloudMon/cloudmon

The authentication is centrally managed by OTC LDAP.

The CloudMon Dashboards are segregated based on the type of service:

  • The “Squad Flag and Health" dashboard provides high level overview about the service health and flag metric status per each service from respective squad.
  • “Cloud Service" Statistics dashboard monitors health of every endpoint url listed by EpMon config entry.
  • Dashboards can be replicated/customized for individual Squad needs.

All the Cloud Service Statistics dashboards support Environment (target monitored platform) and Zone (monitoring source location) variables at the top of each dashboard so these views can be adjusted based on chosen value.

All the Squad Flag And Health dashboards support Environment (target monitored platform) variables at the top of each dashboard.

Squad Flag and Health Dashboard

The dashboard provides deeper insight in Metric Processor generated metrics. Flag panels provide information whether service has breached the thresholds of predefined flag metric types. Health panels provide information about resulting service health status based on evaluated flag metrics.

The resulting flag values are visualized in state timeline panels with following values:

  • 0 - flag metric is not breaching the defined threshold
  • 1 - flag metric is breaching the defined threshold

The resulting health values are visualized in state timeline panels with following values:

  • 0 - Service operates normally
  • 1 - Service has a minor issue resulting from defined reached flag metric(s)
  • 2 - Service has an outage resulting from defined reached flag metrics(s)

Example at https://dashboard.tsi-dev.otc-service.com/d/s75qyOU4z/compute-flags?orgId=1

image

Cloud Service Statistics dashboard

Cloud Service Statistics dashboards uses metrics from GET query requests towards OTC platform (EpMon Overview <sd2_epmon_overview>) and visualize it in:

  • API calls duration per each URL query
  • API calls duration (aggregated)
  • API calls response codes

Example at https://dashboard.tsi-dev.otc-service.com/d/b4560ed6-95f0-45c0-904c-6ff9f8a491e8/sfs-service-statistics?orgId=1&refresh=10s

image

Custom Dashboards

Previous dashboards are predefined and read-only. The further customization is currently possible via system-config in github:

https://github.com/stackmon/apimon-tests/tree/main/dashboards/grafana

The predefined simplified dashboard panel in yaml syntax is defined in Stackmon Github repository (https://github.com/stackmon/apimon-tests/tree/main/dashboards)

Dashboards can be customized also just by copy/save function directly in Grafana. The whole dashboard can be saved under new name and then edited without any restrictions.

This approach is valid for PoC, temporary solutions and investigations but should not be used as permanent solution as customized dashboards which are not properly stored on Github repositories might be permanently deleted in case of full dashboard service re-installation.