forked from docs/docsportal
Reviewed-by: Gode, Sebastian <sebastian.gode@t-systems.com> Co-authored-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-committed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
69 lines
2.8 KiB
ReStructuredText
69 lines
2.8 KiB
ReStructuredText
============
|
|
Introduction
|
|
============
|
|
|
|
The Open Telekom Cloud is represented to users and customers by the API
|
|
endpoints and the various services behind them. Customers are
|
|
interested in a reliable way to check and verify if the services are actually
|
|
available to them via the Internet.
|
|
|
|
The Status Dashboard 2 (SD2) is a service facility monitoring of all OTC
|
|
services, intended for customers to grasp an overview of the service
|
|
availability. It comprises of a set of **monitoring zones**, each
|
|
monitoring services of an **monitoring environment** (a. k. a. regions
|
|
like eu-de, eu-nl, etc.). The mapping of monitoring zones to monitoring
|
|
sites is configured in a mesh matrix to validate internal as well as external connections to cloud.
|
|
|
|
The SD2 framework:
|
|
|
|
- Developed with aim to supervise 24/7 the public APIs of OTC platform.
|
|
- GET Requests repeatedly sent to the API.
|
|
- Requests grouped in service metrics are sent to Metric Processor
|
|
- Metric Processor defines so called Flag metrics which evaluate whether service metrics reach the defined thresholds
|
|
- Based on severity of the flag metrics the health metrics are produced
|
|
- Status Dashboard visualizes health of the service based health metrics
|
|
- Green - service is ok, Yellow - service has a minor issue, Red - service has an outage
|
|
- Based on yellow and red service health the incident is created on Status Dashboard and MOD / 24/7 squad is notified
|
|
|
|
.. image:: https://stackmon.github.io/assets/images/solution-diagram.svg
|
|
|
|
SD2 Architecture Summary
|
|
------------------------
|
|
|
|
- EpMon executes various HTTP query requests towards service endpoints and
|
|
generates metrics
|
|
- The HTTP requests metrics (generated by OpenStackSDK) are collected by
|
|
statsd.
|
|
- Time Series database (graphite) is pulling metrics from statsd.
|
|
- Metric Processor processes the requests metrics and based on defined thresholds evaluates the resulting service health metrics
|
|
- Status Dashboard visualize service health based on health metrics produced by metric processor and stored in SQL database
|
|
- Grafana dashboards visualize data from graphite as well as from metric processor
|
|
|
|
|
|
|
|
SD2 features
|
|
------------
|
|
|
|
SD2 comes with the following features:
|
|
|
|
- Support of service health with 5 service statuses (3 generated semaphore lights, 1 custom semaphore light, 1 maintenance status)
|
|
- Support of HTTP requests (GET) for Endpoint Monitoring
|
|
- Support of custom metrics and custom thresholds
|
|
- Support of automatically generated incidents as well as custom incidents
|
|
- Support of all OTC environments
|
|
|
|
- EU-DE
|
|
- EU-NL
|
|
- Swisscloud
|
|
|
|
- Support of multiple Monitoring sources:
|
|
|
|
- EU-DE
|
|
- EU-NL
|
|
- Swisscloud
|
|
|
|
- Internal dashboards to understand the root cause for service health changes
|
|
- Each squad can control and manage their metrics and dashboards
|
|
- All parameters configured from single place (stackmon-config) in human readable form (yaml)
|
|
|