============ Introduction ============ The Open Telekom Cloud is represented to users and customers by the API endpoints and the various services behind them. Customers are interested in a reliable way to check and verify if the services are actually available to them via the Internet. The Status Dashboard 2 (SD2) is a service facility monitoring of all OTC services, intended for customers to grasp an overview of the service availability. It comprises of a set of **monitoring zones**, each monitoring services of an **monitoring environment** (a. k. a. regions like eu-de, eu-nl, etc.). The mapping of monitoring zones to monitoring sites is configured in a mesh matrix to validate internal as well as external connections to cloud. The SD2 framework: - Developed with aim to supervise 24/7 the public APIs of OTC platform. - GET Requests repeatedly sent to the API. - Requests grouped in service metrics are sent to Metric Processor - Metric Processor defines so called Flag metrics which evaluate whether service metrics reach the defined thresholds - Based on severity of the flag metrics the health metrics are produced - Status Dashboard visualizes health of the service based health metrics - Green - service is ok, Yellow - service has a minor issue, Red - service has an outage - Based on yellow and red service health the incident is created on Status Dashboard and MOD / 24/7 squad is notified .. image:: https://stackmon.github.io/assets/images/solution-diagram.svg SD2 Architecture Summary ------------------------ - EpMon executes various HTTP query requests towards service endpoints and generates metrics - The HTTP requests metrics (generated by OpenStackSDK) are collected by statsd. - Time Series database (graphite) is pulling metrics from statsd. - Metric Processor processes the requests metrics and based on defined thresholds evaluates the resulting service health metrics - Status Dashboard visualize service health based on health metrics produced by metric processor and stored in SQL database - Grafana dashboards visualize data from graphite as well as from metric processor SD2 features ------------ SD2 comes with the following features: - Support of service health with 5 service statuses (3 generated semaphore lights, 1 custom semaphore light, 1 maintenance status) - Support of HTTP requests (GET) for Endpoint Monitoring - Support of custom metrics and custom thresholds - Support of automatically generated incidents as well as custom incidents - Support of all OTC environments - EU-DE - EU-NL - Swisscloud - Support of multiple Monitoring sources: - EU-DE - EU-NL - Swisscloud - Internal dashboards to understand the root cause for service health changes - Each squad can control and manage their metrics and dashboards - All parameters configured from single place (stackmon-config) in human readable form (yaml)