Hasko, Vladimir f114248cfb adding SD2 training content
Reviewed-by: Gode, Sebastian <sebastian.gode@t-systems.com>
Co-authored-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-committed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
2023-10-04 10:07:42 +00:00

2.8 KiB

Introduction

The Open Telekom Cloud is represented to users and customers by the API endpoints and the various services behind them. Customers are interested in a reliable way to check and verify if the services are actually available to them via the Internet.

The Status Dashboard 2 (SD2) is a service facility monitoring of all OTC services, intended for customers to grasp an overview of the service availability. It comprises of a set of monitoring zones, each monitoring services of an monitoring environment (a. k. a. regions like eu-de, eu-nl, etc.). The mapping of monitoring zones to monitoring sites is configured in a mesh matrix to validate internal as well as external connections to cloud.

The SD2 framework:

  • Developed with aim to supervise 24/7 the public APIs of OTC platform.
  • GET Requests repeatedly sent to the API.
  • Requests grouped in service metrics are sent to Metric Processor
  • Metric Processor defines so called Flag metrics which evaluate whether service metrics reach the defined thresholds
  • Based on severity of the flag metrics the health metrics are produced
  • Status Dashboard visualizes health of the service based health metrics
  • Green - service is ok, Yellow - service has a minor issue, Red - service has an outage
  • Based on yellow and red service health the incident is created on Status Dashboard and MOD / 24/7 squad is notified

image

SD2 Architecture Summary

  • EpMon executes various HTTP query requests towards service endpoints and generates metrics
  • The HTTP requests metrics (generated by OpenStackSDK) are collected by statsd.
  • Time Series database (graphite) is pulling metrics from statsd.
  • Metric Processor processes the requests metrics and based on defined thresholds evaluates the resulting service health metrics
  • Status Dashboard visualize service health based on health metrics produced by metric processor and stored in SQL database
  • Grafana dashboards visualize data from graphite as well as from metric processor

SD2 features

SD2 comes with the following features:

  • Support of service health with 5 service statuses (3 generated semaphore lights, 1 custom semaphore light, 1 maintenance status)
  • Support of HTTP requests (GET) for Endpoint Monitoring
  • Support of custom metrics and custom thresholds
  • Support of automatically generated incidents as well as custom incidents
  • Support of all OTC environments
    • EU-DE
    • EU-NL
    • Swisscloud
  • Support of multiple Monitoring sources:
    • EU-DE
    • EU-NL
    • Swisscloud
  • Internal dashboards to understand the root cause for service health changes
  • Each squad can control and manage their metrics and dashboards
  • All parameters configured from single place (stackmon-config) in human readable form (yaml)