tischrei 0618989a8a hc_ops
Reviewed-by: Gode, Sebastian <sebastian.gode@t-systems.com>
Co-authored-by: tischrei <tino.schreiber@t-systems.com>
Co-committed-by: tischrei <tino.schreiber@t-systems.com>
2024-02-22 14:55:55 +00:00

2.6 KiB

Status Dashboard 2 Frontend

The web based frontend of the SD2 provides public (and internal, after authentication) status information of OTC cloud services across all configured regions. It supports these features:

  • Displays the service health through five service status.
  • Authentication by OpenID connect (which in turn is connected to the OTC LDAP directory).
  • Several service are grouped into categories.
  • Regions - several services are existing in regions.
  • Incidents - entry about issues affecting certain regions and certain services.
  • Support of all OTC environments
  • Incident data is available through an API.
  • RSS notification (for the OTC mobile app and other integrations).
  • SLA view of the services.
  • Incident history.

Two Status Dashboard portals are available:

Service Health View

image

From the architecture POV Status Dashboard is a Flask based web server serving API and rendering web content with a PostgreSQL database. The project source is available at https://github.com/stackmon/status-dashboard

Configuration of the status dashboard frontend is located at github: https://github.com/opentelekomcloud-infra/stackmon-config/blob/main/sdb_prod/catalog.yaml The catalog.yaml file contains definitions of the service name, service type, service categories and regions.

Example of AutoScaling service entry in SD catalog:

- attributes:
    category: Compute
    region: EU-DE
    type: as
  name: Auto Scaling
- attributes:
    category: Compute
    region: EU-NL
    type: as
  name: Auto Scaling

Applying Catalog Changes

After the SD changes are merged on github repository the cloudmon operators must execute rollout steps on CloudMon platform:

  1. #~ cloudmon --config-dir prod --config-repo https://github.com/opentelekomcloud-infra/stackmon-config.git status dashboard provision
  2. #~ kubectl exec into status dashboard container
  3. #~ export FLASK_APP=status_dashboard.py
  4. #~ flask bootstrap provision

SLA view

SLA view https://status.cloudmon.eco.tsi-dev.otc-service.com/sla is calculated only from the "outage" service health status and provide 6 months SLA history of each service.

image

Details how to work with incidents are described on the incidents <sd2_incidents> page.