forked from docs/docsportal
Reviewed-by: Gode, Sebastian <sebastian.gode@t-systems.com> Co-authored-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-committed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
2.8 KiB
2.8 KiB
Introduction
The Open Telekom Cloud is represented to users and customers by the API endpoints and the various services behind them. Customers are interested in a reliable way to check and verify if the services are actually available to them via the Internet.
The Status Dashboard 2 (SD2) is a service facility monitoring of all OTC services, intended for customers to grasp an overview of the service availability. It comprises of a set of monitoring zones, each monitoring services of an monitoring environment (a. k. a. regions like eu-de, eu-nl, etc.). The mapping of monitoring zones to monitoring sites is configured in a mesh matrix to validate internal as well as external connections to cloud.
The SD2 framework:
- Developed with aim to supervise 24/7 the public APIs of OTC platform.
- GET Requests repeatedly sent to the API.
- Requests grouped in service metrics are sent to Metric Processor
- Metric Processor defines so called Flag metrics which evaluate whether service metrics reach the defined thresholds
- Based on severity of the flag metrics the health metrics are produced
- Status Dashboard visualizes health of the service based health metrics
- Green - service is ok, Yellow - service has a minor issue, Red - service has an outage
- Based on yellow and red service health the incident is created on Status Dashboard and MOD / 24/7 squad is notified
SD2 Architecture Summary
- EpMon executes various HTTP query requests towards service endpoints and generates metrics
- The HTTP requests metrics (generated by OpenStackSDK) are collected by statsd.
- Time Series database (graphite) is pulling metrics from statsd.
- Metric Processor processes the requests metrics and based on defined thresholds evaluates the resulting service health metrics
- Status Dashboard visualize service health based on health metrics produced by metric processor and stored in SQL database
- Grafana dashboards visualize data from graphite as well as from metric processor
SD2 features
SD2 comes with the following features:
- Support of service health with 5 service statuses (3 generated semaphore lights, 1 custom semaphore light, 1 maintenance status)
- Support of HTTP requests (GET) for Endpoint Monitoring
- Support of custom metrics and custom thresholds
- Support of automatically generated incidents as well as custom incidents
- Support of all OTC environments
- EU-DE
- EU-NL
- Swisscloud
- Support of multiple Monitoring sources:
- EU-DE
- EU-NL
- Swisscloud
- Internal dashboards to understand the root cause for service health changes
- Each squad can control and manage their metrics and dashboards
- All parameters configured from single place (stackmon-config) in human readable form (yaml)