adding SD2 training content
Reviewed-by: Gode, Sebastian <sebastian.gode@t-systems.com> Co-authored-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-committed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
21
doc/source/internal/sd2_training/contact.rst
Normal file
@ -0,0 +1,21 @@
|
|||||||
|
Contact - Whom to address for Feedback?
|
||||||
|
=======================================
|
||||||
|
|
||||||
|
In case you have any feedback, proposals or found any issues regarding the
|
||||||
|
Status Dashboard EpMon or CloudMon, you can address them in the corresponding GitHub
|
||||||
|
OpenTelekomCloud-Infra repositories or StackMon repositories.
|
||||||
|
|
||||||
|
Issues or feedback regarding the **ApiMon, EpMon, Status Dashboard, Metric
|
||||||
|
processor** as well as new feature requests can be addressed by filing an issue
|
||||||
|
on the **Gihub** repository under
|
||||||
|
https://github.com/opentelekomcloud-infra/stackmon-config
|
||||||
|
|
||||||
|
If you have found any problems which affects the **internal dashboard design**
|
||||||
|
please open an issue/PR on **GitHub**
|
||||||
|
https://github.com/stackmon/apimon-tests
|
||||||
|
|
||||||
|
If there is another general issue/demand/request try to locate proper repository in
|
||||||
|
https://github.com/orgs/stackmon/repositories
|
||||||
|
|
||||||
|
For general questions you can write an E-Mail to the `Ecosystems Squad
|
||||||
|
<mailto:dl-pbcotcdeleco@t-systems.com>`_.
|
88
doc/source/internal/sd2_training/dashboards.rst
Normal file
@ -0,0 +1,88 @@
|
|||||||
|
=====================
|
||||||
|
Dashboards management
|
||||||
|
=====================
|
||||||
|
|
||||||
|
https://dashboard.tsi-dev.otc-service.com/dashboards/f/CloudMon/cloudmon
|
||||||
|
|
||||||
|
The authentication is centrally managed by OTC LDAP.
|
||||||
|
|
||||||
|
|
||||||
|
The CloudMon Dashboards are segregated based on the type of service:
|
||||||
|
|
||||||
|
- The “Squad Flag and Health" dashboard provides high level overview about the service health
|
||||||
|
and flag metric status per each service from respective squad.
|
||||||
|
- “Cloud Service" Statistics dashboard monitors health of every endpoint url listed
|
||||||
|
by EpMon config entry.
|
||||||
|
- Dashboards can be replicated/customized for individual Squad needs.
|
||||||
|
|
||||||
|
|
||||||
|
All the Cloud Service Statistics dashboards support Environment (target monitored platform) and Zone
|
||||||
|
(monitoring source location) variables at the top of each dashboard so these
|
||||||
|
views can be adjusted based on chosen value.
|
||||||
|
|
||||||
|
All the Squad Flag And Health dashboards support Environment (target monitored platform) variables at the top of each dashboard.
|
||||||
|
|
||||||
|
|
||||||
|
Squad Flag and Health Dashboard
|
||||||
|
===============================
|
||||||
|
|
||||||
|
The dashboard provides deeper insight in Metric Processor generated metrics.
|
||||||
|
Flag panels provide information whether service has breached the thresholds
|
||||||
|
of predefined flag metric types.
|
||||||
|
Health panels provide information about resulting service health status based on evaluated flag metrics.
|
||||||
|
|
||||||
|
The resulting flag values are visualized in state timeline panels with following values:
|
||||||
|
|
||||||
|
- 0 - flag metric is not breaching the defined threshold
|
||||||
|
- 1 - flag metric is breaching the defined threshold
|
||||||
|
|
||||||
|
|
||||||
|
The resulting health values are visualized in state timeline panels with following values:
|
||||||
|
|
||||||
|
- 0 - Service operates normally
|
||||||
|
- 1 - Service has a minor issue resulting from defined reached flag metric(s)
|
||||||
|
- 2 - Service has an outage resulting from defined reached flag metrics(s)
|
||||||
|
|
||||||
|
Example at https://dashboard.tsi-dev.otc-service.com/d/s75qyOU4z/compute-flags?orgId=1
|
||||||
|
|
||||||
|
.. image:: training_images/flag_and_health_dashboard.png
|
||||||
|
|
||||||
|
|
||||||
|
Cloud Service Statistics dashboard
|
||||||
|
==================================
|
||||||
|
|
||||||
|
Cloud Service Statistics dashboards uses metrics from GET query requests towards OTC
|
||||||
|
platform (:ref:`EpMon Overview <sd2_epmon_overview>`) and visualize it in:
|
||||||
|
|
||||||
|
- API calls duration per each URL query
|
||||||
|
- API calls duration (aggregated)
|
||||||
|
- API calls response codes
|
||||||
|
|
||||||
|
Example at https://dashboard.tsi-dev.otc-service.com/d/b4560ed6-95f0-45c0-904c-6ff9f8a491e8/sfs-service-statistics?orgId=1&refresh=10s
|
||||||
|
|
||||||
|
.. image:: training_images/cloud_service_statistics.png
|
||||||
|
|
||||||
|
|
||||||
|
Custom Dashboards
|
||||||
|
=================
|
||||||
|
|
||||||
|
Previous dashboards are predefined and read-only.
|
||||||
|
The further customization is currently possible via system-config in github:
|
||||||
|
|
||||||
|
https://github.com/stackmon/apimon-tests/tree/main/dashboards/grafana
|
||||||
|
|
||||||
|
The predefined simplified dashboard panel in yaml syntax
|
||||||
|
is defined in Stackmon Github repository
|
||||||
|
(https://github.com/stackmon/apimon-tests/tree/main/dashboards)
|
||||||
|
|
||||||
|
Dashboards can be customized also just by copy/save function directly in
|
||||||
|
Grafana. The whole dashboard can be saved under new name and then edited
|
||||||
|
without any restrictions.
|
||||||
|
|
||||||
|
This approach is valid for PoC, temporary solutions and investigations but
|
||||||
|
should not be used as permanent solution as customized dashboards which are not
|
||||||
|
properly stored on Github repositories might be permanently deleted in case of
|
||||||
|
full dashboard service re-installation.
|
||||||
|
|
||||||
|
|
||||||
|
|
160
doc/source/internal/sd2_training/databases.rst
Normal file
82
doc/source/internal/sd2_training/epmon_checks.rst
Normal file
@ -0,0 +1,82 @@
|
|||||||
|
.. _sd2_epmon_overview:
|
||||||
|
|
||||||
|
============================
|
||||||
|
Endpoint Monitoring overview
|
||||||
|
============================
|
||||||
|
|
||||||
|
|
||||||
|
EpMon is a standalone python based process targeting every OTC service. It
|
||||||
|
finds service in the service catalogs and sends GET requests to the configured
|
||||||
|
endpoints.
|
||||||
|
|
||||||
|
Performing extensive tests like provisioning a server is giving a great
|
||||||
|
coverage, but is usually not something what can be performed very often and
|
||||||
|
leaves certain gaps on the timescale of monitoring. In order to cover this gap
|
||||||
|
EpMon component is capable to send GET requests to the given URLs relying on the
|
||||||
|
API discovery of the OpenStack cloud (perform GET request to /servers or the
|
||||||
|
compute endpoint). Such requests are cheap and can be performed in the loop, i.e.
|
||||||
|
every 5 seconds. Latency of those calls, as well as the return codes, are being
|
||||||
|
captured and sent to the metrics storage.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Currently EpMon configuration is located in stackmon-config:
|
||||||
|
https://github.com/opentelekomcloud-infra/stackmon-config/blob/main/epmon/config.yaml
|
||||||
|
|
||||||
|
And defines the query HTTP targets (urls) for every single OTC service.
|
||||||
|
|
||||||
|
Service entry in OTC Service Catalog (https://git.tsi-dev.otc-service.com/ecosystem/service_catalog) is a prerequisite to enable service to be queried by EpMon.
|
||||||
|
If there are multiple entries in service catalog, such service entries can be marked for skip in case they are obsolete.
|
||||||
|
EpMon config.yaml only defines the service queries but doesn't say how and when to use them.
|
||||||
|
For actual use across different monitoring sources and targets the configuration matrix is defined in:
|
||||||
|
https://github.com/opentelekomcloud-infra/stackmon-config/blob/main/config.yaml
|
||||||
|
|
||||||
|
|
||||||
|
In the following example autoscaling service confiration in EpMon is shown:
|
||||||
|
|
||||||
|
.. code:: yaml
|
||||||
|
|
||||||
|
as:
|
||||||
|
service_type: as
|
||||||
|
sdk_proxy: auto_scaling
|
||||||
|
urls:
|
||||||
|
- /
|
||||||
|
- /scaling_group
|
||||||
|
- /scaling_configuration
|
||||||
|
- /scaling_policy
|
||||||
|
as_swiss:
|
||||||
|
service_type: as
|
||||||
|
sdk_proxy: auto_scaling
|
||||||
|
urls:
|
||||||
|
- /
|
||||||
|
- /scaling_group
|
||||||
|
- /scaling_configuration
|
||||||
|
as_skip_v1:
|
||||||
|
service_type: asv1
|
||||||
|
urls: []
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
There are 3 entries of autoscaling service.
|
||||||
|
|
||||||
|
- "as" entry is default one and used for public cloud regions.
|
||||||
|
- "as_swiss" entry is specific for Swisscloud
|
||||||
|
- "as_skip_v1" entry is entry to be skipped from EpMon
|
||||||
|
|
||||||
|
By default all entries in service catalog are triggered for EpMon.
|
||||||
|
|
||||||
|
The mandatory parameter for all entries is "service_type". This must match the service_type entry in service catalog.
|
||||||
|
|
||||||
|
Another important parameter is "sdk_proxy". This attribute identifies which otcextension module should be used
|
||||||
|
for execution of HTTP GET queries.
|
||||||
|
|
||||||
|
The most important parameter is "urls". It defines list of URLs which will be triggered for the specific service.
|
||||||
|
As service_type is known then not full url is required to be defined but only required is its path which appears after predefined url from service catalog.
|
||||||
|
|
||||||
|
If some specific service (or some specific service version) is supposed to be skipped from endpoint monitoring then it must
|
||||||
|
defined in epmon config with urls parameter setting the empty list. This ensures that even default queries from service catalog are overwritten
|
||||||
|
by the empty list in this config. In this example service type asv1 (entry from service catalog) is not being triggered by EpMon at all
|
||||||
|
as it contains empty urls list.
|
||||||
|
|
||||||
|
|
||||||
|
Collected response codes and response times are sent to graphite for further processing by Metrics Processor.
|
68
doc/source/internal/sd2_training/incidents.rst
Normal file
@ -0,0 +1,68 @@
|
|||||||
|
.. _sd2_incidents:
|
||||||
|
|
||||||
|
=========
|
||||||
|
Incidents
|
||||||
|
=========
|
||||||
|
|
||||||
|
TODO
|
||||||
|
Incidents inform customers about the reason why some cloud service has changed its status from "green" (normal operation) to any other state.
|
||||||
|
|
||||||
|
Incidents are created under following conditions:
|
||||||
|
|
||||||
|
- Metric Processor evaluates value 1 or 2 on health metric of specific cloud service and incident is automatically created on SD.
|
||||||
|
- Service Incident Manager (SIM) manually creates incident on SD for one or more cloud services.
|
||||||
|
|
||||||
|
Each cloud service on SD is represented by its name and the status semaphore color icon representing its current health status.
|
||||||
|
The following states of the service can be shown on SD2:
|
||||||
|
|
||||||
|
- Operational - green "check" mark icon
|
||||||
|
- Maintenance - blue "wrench" mark icon
|
||||||
|
- Minor Issue - yellow "cross" mark icon
|
||||||
|
- Major Issue - brown "cross" mark icon
|
||||||
|
- Service Outage - red "cross" mark icon
|
||||||
|
|
||||||
|
These 5 states can be set manually for specific service(s) during incident creation but only 2 states (Minor issue and Service Outage) are set automatically by the Metric Processor health metrics.
|
||||||
|
Incidents are visualized in the respective color scheme on the top of the SD page. Also it's possible to navigate to the related incident via clicking on the service state icon next to the service.
|
||||||
|
|
||||||
|
Once the service health status is changed and incident is created there's no automated clean-up of the incident and incident must be handledl by respective SIM. Only after incident is closed the service changes its state back to "green" Operation state.
|
||||||
|
|
||||||
|
Incident manual creation process
|
||||||
|
================================
|
||||||
|
|
||||||
|
As mentioned besides the automated incident creation the incidents can be created manually as well.
|
||||||
|
Service incident manager must authenticate prior to be able to create an incident.
|
||||||
|
Login is ensured by Openid connect feature on page https://status.cloudmon.eco.tsi-dev.otc-service.com/login/openid
|
||||||
|
|
||||||
|
Once logged in the new option "Open new incident" appears at top right corner of the page.
|
||||||
|
|
||||||
|
.. image:: training_images/sd2_incident.jpg
|
||||||
|
|
||||||
|
The incident creation process consists of these mandatory fields:
|
||||||
|
|
||||||
|
- Incident Summary - Description of the incident
|
||||||
|
- Incident Impact - Drop-down menu of 4 service states (Scheduled Maintenance, Minor Issue, Major Issue, Service Outage)
|
||||||
|
- Affected services - List of all OTC cloud services in conjunctions with regions. One or more items can be chosen
|
||||||
|
- Start - Timestamp when incident has started
|
||||||
|
|
||||||
|
Incident update process
|
||||||
|
=======================
|
||||||
|
|
||||||
|
During the incident lifecycle SIM can update incident with relevant information.
|
||||||
|
The incident creation process consists of these optional fields:
|
||||||
|
|
||||||
|
- Incident title - Change the title of the incident
|
||||||
|
- Update Message - Additional details related to the current status of the incident
|
||||||
|
- Update Status - Drop-down menu of 4 incident statuses (Analyzing incident, Fixing incident, Observing fix, Incident resolved)
|
||||||
|
- Next Update by - Timestamp when incident is expected to be updated with another information
|
||||||
|
|
||||||
|
Incident manual closure process
|
||||||
|
===============================
|
||||||
|
|
||||||
|
Incident is never closed automatically. SIM needs to close the incident by changing its status during the update incident process to "Incident resolved".
|
||||||
|
After that incident disappears from the active list of incidents and service health status is changed back to "green" operational state.
|
||||||
|
Every closed incident is recorded in the Incident History.
|
||||||
|
|
||||||
|
Incident notifications
|
||||||
|
======================
|
||||||
|
|
||||||
|
Status Dashboard support RSS feeds for incident notifications. The details how to setup RSS feed are described on :ref:`notifications <sd2_notifications>` page.
|
@ -6,3 +6,14 @@ Status Dashboard 2 Training
|
|||||||
:maxdepth: 1
|
:maxdepth: 1
|
||||||
|
|
||||||
onepager
|
onepager
|
||||||
|
introduction
|
||||||
|
workflow
|
||||||
|
status_dashboard_frontend
|
||||||
|
monitoring_coverage
|
||||||
|
epmon_checks
|
||||||
|
dashboards
|
||||||
|
metrics
|
||||||
|
databases
|
||||||
|
incidents
|
||||||
|
notifications
|
||||||
|
contact
|
||||||
|
68
doc/source/internal/sd2_training/introduction.rst
Normal file
@ -0,0 +1,68 @@
|
|||||||
|
============
|
||||||
|
Introduction
|
||||||
|
============
|
||||||
|
|
||||||
|
The Open Telekom Cloud is represented to users and customers by the API
|
||||||
|
endpoints and the various services behind them. Customers are
|
||||||
|
interested in a reliable way to check and verify if the services are actually
|
||||||
|
available to them via the Internet.
|
||||||
|
|
||||||
|
The Status Dashboard 2 (SD2) is a service facility monitoring of all OTC
|
||||||
|
services, intended for customers to grasp an overview of the service
|
||||||
|
availability. It comprises of a set of **monitoring zones**, each
|
||||||
|
monitoring services of an **monitoring environment** (a. k. a. regions
|
||||||
|
like eu-de, eu-nl, etc.). The mapping of monitoring zones to monitoring
|
||||||
|
sites is configured in a mesh matrix to validate internal as well as external connections to cloud.
|
||||||
|
|
||||||
|
The SD2 framework:
|
||||||
|
|
||||||
|
- Developed with aim to supervise 24/7 the public APIs of OTC platform.
|
||||||
|
- GET Requests repeatedly sent to the API.
|
||||||
|
- Requests grouped in service metrics are sent to Metric Processor
|
||||||
|
- Metric Processor defines so called Flag metrics which evaluate whether service metrics reach the defined thresholds
|
||||||
|
- Based on severity of the flag metrics the health metrics are produced
|
||||||
|
- Status Dashboard visualizes health of the service based health metrics
|
||||||
|
- Green - service is ok, Yellow - service has a minor issue, Red - service has an outage
|
||||||
|
- Based on yellow and red service health the incident is created on Status Dashboard and MOD / 24/7 squad is notified
|
||||||
|
|
||||||
|
.. image:: https://stackmon.github.io/assets/images/solution-diagram.svg
|
||||||
|
|
||||||
|
SD2 Architecture Summary
|
||||||
|
------------------------
|
||||||
|
|
||||||
|
- EpMon executes various HTTP query requests towards service endpoints and
|
||||||
|
generates metrics
|
||||||
|
- The HTTP requests metrics (generated by OpenStackSDK) are collected by
|
||||||
|
statsd.
|
||||||
|
- Time Series database (graphite) is pulling metrics from statsd.
|
||||||
|
- Metric Processor processes the requests metrics and based on defined thresholds evaluates the resulting service health metrics
|
||||||
|
- Status Dashboard visualize service health based on health metrics produced by metric processor and stored in SQL database
|
||||||
|
- Grafana dashboards visualize data from graphite as well as from metric processor
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
SD2 features
|
||||||
|
------------
|
||||||
|
|
||||||
|
SD2 comes with the following features:
|
||||||
|
|
||||||
|
- Support of service health with 5 service statuses (3 generated semaphore lights, 1 custom semaphore light, 1 maintenance status)
|
||||||
|
- Support of HTTP requests (GET) for Endpoint Monitoring
|
||||||
|
- Support of custom metrics and custom thresholds
|
||||||
|
- Support of automatically generated incidents as well as custom incidents
|
||||||
|
- Support of all OTC environments
|
||||||
|
|
||||||
|
- EU-DE
|
||||||
|
- EU-NL
|
||||||
|
- Swisscloud
|
||||||
|
|
||||||
|
- Support of multiple Monitoring sources:
|
||||||
|
|
||||||
|
- EU-DE
|
||||||
|
- EU-NL
|
||||||
|
- Swisscloud
|
||||||
|
|
||||||
|
- Internal dashboards to understand the root cause for service health changes
|
||||||
|
- Each squad can control and manage their metrics and dashboards
|
||||||
|
- All parameters configured from single place (stackmon-config) in human readable form (yaml)
|
||||||
|
|
164
doc/source/internal/sd2_training/metrics.rst
Normal file
191
doc/source/internal/sd2_training/monitoring_coverage.rst
Normal file
25
doc/source/internal/sd2_training/notifications.rst
Normal file
@ -0,0 +1,25 @@
|
|||||||
|
.. _sd2_notifications:
|
||||||
|
|
||||||
|
=============
|
||||||
|
Notifications
|
||||||
|
=============
|
||||||
|
|
||||||
|
Status Dashboard application comes with a RSS feeds to provide the information about the incidents
|
||||||
|
|
||||||
|
Current RSS Feeds based on the "feedgen" library.
|
||||||
|
https://pypi.org/project/feedgen/
|
||||||
|
|
||||||
|
RSS feeds support region based queries and service name and service category based queries.
|
||||||
|
|
||||||
|
Example of region based query:
|
||||||
|
|
||||||
|
https://status.cloudmon.eco.tsi-dev.otc-service.com/rss/?mt=EU-DE
|
||||||
|
|
||||||
|
Example of service category based query:
|
||||||
|
|
||||||
|
https://status.cloudmon.eco.tsi-dev.otc-service.com/rss/?srvc=Compute
|
||||||
|
|
||||||
|
Examples of region and service name based query:
|
||||||
|
|
||||||
|
https://status.cloudmon.eco.tsi-dev.otc-service.com/rss/?mt=EU-DE&srv=Data%20Warehouse%20Service
|
||||||
|
|
@ -0,0 +1,62 @@
|
|||||||
|
=========================
|
||||||
|
Status Dashboard Frontend
|
||||||
|
=========================
|
||||||
|
|
||||||
|
Status Dashboard provides the status information of OTC cloud services across different regions.
|
||||||
|
|
||||||
|
The following features are supported on Status Dashboard:
|
||||||
|
|
||||||
|
- Support of service health with 5 service statuses
|
||||||
|
- Authentication by OpenID connect
|
||||||
|
- Service categories - meta grouping of services into groups
|
||||||
|
- Regions - different services are existing in regions
|
||||||
|
- Incidents - entry about issues affecting certain regions and certain services
|
||||||
|
- Support of all OTC environments
|
||||||
|
- built-in API support
|
||||||
|
- RSS notification
|
||||||
|
- SLA view on all services
|
||||||
|
- Incident history
|
||||||
|
|
||||||
|
|
||||||
|
Two Status Dashboard portals are available:
|
||||||
|
- public status dashboard: https://status.cloudmon.eco.tsi-dev.otc-service.com/
|
||||||
|
- hybrid status dashboard: https://status-ch2.cloudmon.eco.tsi-dev.otc-service.com/
|
||||||
|
|
||||||
|
Service Health View
|
||||||
|
===================
|
||||||
|
|
||||||
|
.. image:: training_images/sd2_frontend.jpg
|
||||||
|
|
||||||
|
|
||||||
|
From the architecture POV Status Dashboard is a flask based web server serving API and rendering web content with the postgresql as database.
|
||||||
|
Source can be found at https://github.com/stackmon/status-dashboard
|
||||||
|
|
||||||
|
Configuration of the status dashboard frontend is located at github: https://github.com/opentelekomcloud-infra/stackmon-config/blob/main/sdb_prod/catalog.yaml
|
||||||
|
The catalog yaml file contains definitions of service name, service type, service categories and regions.
|
||||||
|
|
||||||
|
Example of AutoScaling service entry in SD catalog:
|
||||||
|
|
||||||
|
.. code:: yaml
|
||||||
|
|
||||||
|
- attributes:
|
||||||
|
category: Compute
|
||||||
|
region: EU-DE
|
||||||
|
type: as
|
||||||
|
name: Auto Scaling
|
||||||
|
- attributes:
|
||||||
|
category: Compute
|
||||||
|
region: EU-NL
|
||||||
|
type: as
|
||||||
|
name: Auto Scaling
|
||||||
|
|
||||||
|
|
||||||
|
SLA view
|
||||||
|
========
|
||||||
|
|
||||||
|
SLA view https://status.cloudmon.eco.tsi-dev.otc-service.com/sla is calculated only from the "outage" service health status and provide 6 months SLA history of each service.
|
||||||
|
|
||||||
|
.. image:: training_images/sd2_sla.jpg
|
||||||
|
|
||||||
|
Details how to work with incidents can be found at :ref:`incidents <sd2_incidents>` page.
|
||||||
|
|
||||||
|
|
BIN
doc/source/internal/sd2_training/training_images/cloud_service_statistics.png
Executable file
After Width: | Height: | Size: 457 KiB |
BIN
doc/source/internal/sd2_training/training_images/flag_and_health_dashboard.png
Executable file
After Width: | Height: | Size: 74 KiB |
BIN
doc/source/internal/sd2_training/training_images/graphite_query.png
Executable file
After Width: | Height: | Size: 190 KiB |
BIN
doc/source/internal/sd2_training/training_images/mp_query.png
Executable file
After Width: | Height: | Size: 60 KiB |
4
doc/source/internal/sd2_training/training_images/sd2_data_flow.svg
Executable file
After Width: | Height: | Size: 74 KiB |
BIN
doc/source/internal/sd2_training/training_images/sd2_frontend.jpg
Executable file
After Width: | Height: | Size: 123 KiB |
BIN
doc/source/internal/sd2_training/training_images/sd2_incident.jpg
Executable file
After Width: | Height: | Size: 72 KiB |
BIN
doc/source/internal/sd2_training/training_images/sd2_sla.jpg
Executable file
After Width: | Height: | Size: 107 KiB |
26
doc/source/internal/sd2_training/workflow.rst
Normal file
@ -0,0 +1,26 @@
|
|||||||
|
.. _sd2_flow:
|
||||||
|
|
||||||
|
SD2 Flow Process
|
||||||
|
================
|
||||||
|
|
||||||
|
|
||||||
|
.. image:: training_images/sd2_data_flow.svg
|
||||||
|
:target: training_images/sd2_data_flow.svg
|
||||||
|
:alt: sd2_data_flow
|
||||||
|
|
||||||
|
|
||||||
|
#. Service squad adds new data entries in github repository for
|
||||||
|
EpMOn (service URL queries),
|
||||||
|
adjusts flag and health metrics if required,
|
||||||
|
and adds service entry in SD catalog.
|
||||||
|
#. Cloudmon fetches public configuration from GitHub
|
||||||
|
and internal configuration (credentials, certs, keys,...) from local place and generate final configuration.
|
||||||
|
#. EpMon plugin is executed and triggers HTTP requests from defined configuration
|
||||||
|
#. Metrics from HTTP requests are collected by Statsd.
|
||||||
|
#. Collected metrics are stored in time-series database Graphite.
|
||||||
|
#. Metric Processor evaluates HTTP metrics from Graphite TSDB.
|
||||||
|
and generates new flag and health metrics based on defined rules and thresholds in configuration.
|
||||||
|
#. Status Dashboard changing service health semaphore light based on resulting health metrics from Metric Procesor.
|
||||||
|
#. Grafana uses metrics and statistics databases as the data sources for the
|
||||||
|
dashboards. The dashboard with various panels show the real-time status of
|
||||||
|
the platform. Grafana supports also historical views and trends.
|