diff --git a/doc/source/internal/sd2_training/dashboards.rst b/doc/source/internal/sd2_training/dashboards.rst index d95ba07..0ef681d 100644 --- a/doc/source/internal/sd2_training/dashboards.rst +++ b/doc/source/internal/sd2_training/dashboards.rst @@ -1,47 +1,60 @@ -===================== -Dashboards management -===================== +==================== +Dashboard Management +==================== -https://dashboard.tsi-dev.otc-service.com/dashboards/f/CloudMon/cloudmon +As explained in previous pages, the resulting metrics of the configured +monitor plugins (mainly of EpMon, but possibly also from other plugins) +are first stored in a Graphite time series database, befor they are +furthe rprocessed as flags and semaphores for the actual public dashboard. -The authentication is centrally managed by OTC LDAP. +However, sometimes Service Engineers or Service Managers benefit from +deeper inspection of this time series data for debugging purposes. +Therefore a Grafana frontend may be used to visualize and drill down +the data. The entrypoint to a set of predefined dahboards is: + https://dashboard.tsi-dev.otc-service.com/dashboards/f/CloudMon/cloudmon -The CloudMon Dashboards are segregated based on the type of service: +The authentication to this dashboard is only available for OTC staff member. +It is managed by Keycloak which in turn utilizes the OTC LDAP directory. - - The “Squad Flag and Health" dashboard provides high level overview about the service health - and flag metric status per each service from respective squad. - - “Cloud Service" Statistics dashboard monitors health of every endpoint url listed - by EpMon config entry. - - Dashboards can be replicated/customized for individual Squad needs. +The Dashboards are grouped by the type of service: + - The **Squad Flag and Health** dashboard provides a high level overview + of the service health and flag metric status for each service of a + squad, respectively. + - The **Cloud Service Statistics** dashboard monitors the health of each + endpoint url listed by an EpMon configuration entry. + - Dashboards can be replicated and customized for individual squad needs. -All the Cloud Service Statistics dashboards support Environment (target monitored platform) and Zone -(monitoring source location) variables at the top of each dashboard so these -views can be adjusted based on chosen value. +The Cloud Service Statistics dashboards honor the ``Environment`` (target +monitored platform) and ``Zone`` (monitoring source location) variables +at the top of each dashboard so these views can be adjusted based on +chosen value. -All the Squad Flag And Health dashboards support Environment (target monitored platform) variables at the top of each dashboard. +All the Squad Flag And Health dashboards support Environment (target +monitored platform) variables at the top of each dashboard. Squad Flag and Health Dashboard =============================== The dashboard provides deeper insight in Metric Processor generated metrics. -Flag panels provide information whether service has breached the thresholds -of predefined flag metric types. -Health panels provide information about resulting service health status based on evaluated flag metrics. +Flag panels provide information whether service has exceeded a threshold +of a predefined flag metric type. Health panels provide information about +resulting service health status based on evaluated flag metrics. -The resulting flag values are visualized in state timeline panels with following values: +The resulting flag values are visualized in state timeline panels with the +following values: -- 0 - flag metric is not breaching the defined threshold -- 1 - flag metric is breaching the defined threshold +- 0 - flag metric is not breaching the defined threshold. +- 1 - flag metric is breaching the defined threshold. +The resulting health values are visualized and mapped in state timeline +panels with the following values: -The resulting health values are visualized in state timeline panels with following values: - -- 0 - Service operates normally -- 1 - Service has a minor issue resulting from defined reached flag metric(s) -- 2 - Service has an outage resulting from defined reached flag metrics(s) +- 0 - Service operates normally. +- 1 - Service has a minor issue resulting from defined reached flag metric(s). +- 2 - Service has an outage resulting from defined reached flag metrics(s). Example at https://dashboard.tsi-dev.otc-service.com/d/s75qyOU4z/compute-flags?orgId=1 @@ -51,12 +64,13 @@ Example at https://dashboard.tsi-dev.otc-service.com/d/s75qyOU4z/compute-flags?o Cloud Service Statistics dashboard ================================== -Cloud Service Statistics dashboards uses metrics from GET query requests towards OTC -platform (:ref:`EpMon Overview `) and visualize it in: +The Cloud Service Statistics dashboards uses metrics from GET query +requests towards OTC platform (:ref:`EpMon Overview `) +and visualize it in: - - API calls duration per each URL query - - API calls duration (aggregated) - - API calls response codes + - API calls duration per each URL query. + - API calls duration (aggregated). + - API calls response codes. Example at https://dashboard.tsi-dev.otc-service.com/d/b4560ed6-95f0-45c0-904c-6ff9f8a491e8/sfs-service-statistics?orgId=1&refresh=10s @@ -66,23 +80,21 @@ Example at https://dashboard.tsi-dev.otc-service.com/d/b4560ed6-95f0-45c0-904c-6 Custom Dashboards ================= -Previous dashboards are predefined and read-only. -The further customization is currently possible via system-config in github: +The dashboards described above are predefined and read-only. Further +customization is currently possible via system-config in GitHub: -https://github.com/stackmon/apimon-tests/tree/main/dashboards/grafana + https://github.com/stackmon/apimon-tests/tree/main/dashboards/grafana -The predefined simplified dashboard panel in yaml syntax -is defined in Stackmon Github repository -(https://github.com/stackmon/apimon-tests/tree/main/dashboards) +The predefined simplified dashboard panel in YAML syntax is defined in +the Stackmon Github repository: + + https://github.com/stackmon/apimon-tests/tree/main/dashboards Dashboards can be customized also just by copy/save function directly in Grafana. The whole dashboard can be saved under new name and then edited without any restrictions. -This approach is valid for PoC, temporary solutions and investigations but -should not be used as permanent solution as customized dashboards which are not -properly stored on Github repositories might be permanently deleted in case of -full dashboard service re-installation. - - - +This approach is valid for testing proofs of concept, temporary solutions, +and investigations but should not be used as permanent solution as +customized dashboards which are not properly stored on Github repositories +might be permanently deleted in case of full dashboard service re-installation. \ No newline at end of file diff --git a/doc/source/internal/sd2_training/epmon_checks.rst b/doc/source/internal/sd2_training/epmon_checks.rst index 2b92024..8d6abca 100644 --- a/doc/source/internal/sd2_training/epmon_checks.rst +++ b/doc/source/internal/sd2_training/epmon_checks.rst @@ -5,34 +5,43 @@ Endpoint Monitoring overview ============================ -EpMon is a standalone python based process targeting every OTC service. It -finds service in the service catalogs and sends GET requests to the configured -endpoints. +EpMon is a standalone Python based process targeting every OTC service. It +looks up the services from the service catalog and sends GET requests to +the configured endpoints. -Performing extensive tests like provisioning a server is giving a great -coverage, but is usually not something what can be performed very often and -leaves certain gaps on the timescale of monitoring. In order to cover this gap -EpMon component is capable to send GET requests to the given URLs relying on the -API discovery of the OpenStack cloud (perform GET request to /servers or the -compute endpoint). Such requests are cheap and can be performed in the loop, i.e. -every 5 seconds. Latency of those calls, as well as the return codes, are being -captured and sent to the metrics storage. +While performing extensive tests like provisioning a server provides a great +coverage and deep insights, it is a rather expensive and complex activity. +It can only be performed every so often and leaves certain gaps on the +timescale of monitoring. To cover this gap, the EpMon plugin sends +GET-requests to a list of URLs endpoints discovered from the OTC service +catalog and augmented by simple paths like ``/server``. Such requests +are cheap and can be sent in a loop, i.e. every five seconds. +The latency and the HTTP status code of those calls are captured, +stored in a time series database, and further processed by the Metric +Processor. +Currently EpMon configuration is located in the project ``stackmon-config``: + https://github.com/opentelekomcloud-infra/stackmon-config/blob/main/epmon/config.yaml -Currently EpMon configuration is located in stackmon-config: -https://github.com/opentelekomcloud-infra/stackmon-config/blob/main/epmon/config.yaml +It defines the query HTTP targets (urls) for every single OTC service. -And defines the query HTTP targets (urls) for every single OTC service. +An entry in the OTC service catalog is a prerequisite to enable service +to be queried by EpMon: -Service entry in OTC Service Catalog (https://git.tsi-dev.otc-service.com/ecosystem/service_catalog) is a prerequisite to enable service to be queried by EpMon. -If there are multiple entries in service catalog, such service entries can be marked for skip in case they are obsolete. -EpMon config.yaml only defines the service queries but doesn't say how and when to use them. -For actual use across different monitoring sources and targets the configuration matrix is defined in: -https://github.com/opentelekomcloud-infra/stackmon-config/blob/main/config.yaml + https://git.tsi-dev.otc-service.com/ecosystem/service_catalog +If there are multiple entries in service catalog, such service entries +can be marked for skip in case they are obsolete. EpMon ``config.yaml`` +only defines the service queries but doesn't say how and when to use them. +For actual use across different monitoring sources and targets the +configuration matrix is defined in: -In the following example autoscaling service confiration in EpMon is shown: + https://github.com/opentelekomcloud-infra/stackmon-config/blob/main/config.yaml + +The following example configures the autoscaling service (``as``) in +EpMon and adds four paths to the service endpoint (three URL path +for the Swisscloud): .. code:: yaml @@ -55,28 +64,32 @@ In the following example autoscaling service confiration in EpMon is shown: service_type: asv1 urls: [] +There are three separate items defined for the autoscaling service: - -There are 3 entries of autoscaling service. - -- "as" entry is default one and used for public cloud regions. -- "as_swiss" entry is specific for Swisscloud -- "as_skip_v1" entry is entry to be skipped from EpMon +- The ``as`` entry is the default. It is used for the public OTC regions. +- The ``as_swiss`` entry defines the specific settings for the Swisscloud. +- The ``as_skip_v1`` entry is entry to be skipped from EpMon. By default all entries in service catalog are triggered for EpMon. -The mandatory parameter for all entries is "service_type". This must match the service_type entry in service catalog. +The mandatory parameter for all entries is ``service_type``. This has to +match the ``service_type`` entry in the OTC service catalog. -Another important parameter is "sdk_proxy". This attribute identifies which otcextension module should be used -for execution of HTTP GET queries. +Another important parameter is ``sdk_proxy``. This attribute identifies +which otcextension module should be used for the execution of HTTP GET-queries. -The most important parameter is "urls". It defines list of URLs which will be triggered for the specific service. -As service_type is known then not full url is required to be defined but only required is its path which appears after predefined url from service catalog. +The most important parameter is ``urls``. It defines a list of URLs which +EpMon triggers for this service. As ``service_type`` is known then not +full url is required to be defined but only required is its path which +appears after the predefined url from the OTC service catalog. -If some specific service (or some specific service version) is supposed to be skipped from endpoint monitoring then it must -defined in epmon config with urls parameter setting the empty list. This ensures that even default queries from service catalog are overwritten -by the empty list in this config. In this example service type asv1 (entry from service catalog) is not being triggered by EpMon at all -as it contains empty urls list. +If some specific service (or some specific service version) is should be +skipped from endpoint monitoring, the value of the ``urls`` key has to +be set to the empty list in the EpMon configuration file. This ensures that +even default queries from service catalog are overwritten by the empty +list in this configuration. In this example, the service type asv1 (entry from +the OTC service catalog) is not being triggered by EpMon at all as it +contains an empty ``urls`` list. - -Collected response codes and response times are sent to graphite for further processing by Metrics Processor. +Collected response codes and response times are sent to the Graphite time +series database for further processing by the Metrics Processor. \ No newline at end of file diff --git a/doc/source/internal/sd2_training/introduction.rst b/doc/source/internal/sd2_training/introduction.rst index 7ccbb23..2c1bef5 100644 --- a/doc/source/internal/sd2_training/introduction.rst +++ b/doc/source/internal/sd2_training/introduction.rst @@ -1,44 +1,73 @@ -============ -Introduction -============ +====================================== +Introduction to the Status Dashboard 2 +====================================== The Open Telekom Cloud is represented to users and customers by the API endpoints and the various services behind them. Customers are -interested in a reliable way to check and verify if the services are actually +interested in a reliable way to check and verify if those services are actually available to them via the Internet. The Status Dashboard 2 (SD2) is a service facility monitoring of all OTC -services, intended for customers to grasp an overview of the service +services, intended for customers to grasp a quick overview of the service availability. It comprises of a set of **monitoring zones**, each monitoring services of an **monitoring environment** (a. k. a. regions like eu-de, eu-nl, etc.). The mapping of monitoring zones to monitoring -sites is configured in a mesh matrix to validate internal as well as external connections to cloud. +sites is configured in a mesh matrix to validate internal as well as +external connections to cloud. -The SD2 framework: +Monitoring can be a tricky process, as there are many approaches of how +deep, realistic, practical, synthetic, and reliable to measure the systems +and services. The SD2 provides a reliable, quick, and comprehensive view +on the OTC, and makes some opinionated, deliberate simplifications. This +document guides through the architecture and necessary steps to maintain +the monitoring process by all OTC staff roles involved in providing a +service. - - Developed with aim to supervise 24/7 the public APIs of OTC platform. - - GET Requests repeatedly sent to the API. - - Requests grouped in service metrics are sent to Metric Processor - - Metric Processor defines so called Flag metrics which evaluate whether service metrics reach the defined thresholds - - Based on severity of the flag metrics the health metrics are produced - - Status Dashboard visualizes health of the service based health metrics - - Green - service is ok, Yellow - service has a minor issue, Red - service has an outage - - Based on yellow and red service health the incident is created on Status Dashboard and MOD / 24/7 squad is notified +Key features of the SD2 framework: + + - Developed to **supervise the 24/7 availability** of the public APIs + of the OTC platform. + - SD2 **sends GET-requests that list resources** to API-endoints. It + does explicitly not simulate more complex, multi-stage use-cases. + - Answers to such requests (status, roundtrip time) are grouped by + **service** and considered as **metrics**. They are sent to the + **Metric Processor**. + - The Metric Processor maps the metrics to **flags**, that are raised + for certain situations, like request probes not being answered (API + down), a majority not answering within a defined threshold period + (API slow) or other situations. + - Based on a combination of raised flags and their severity, the Metric + Processor calculates health metrics as **semaphores**. No flags result in + a green semaphore, minor issues result in a yellow semaphore (service + degradation), while severe situations lead to red semaphores (service + unavailable). + - The **SD2 frontend** visualizes health of the service based on the + semaphores on a website. + - Each non-green semaphore raises automatically an **issue** and displays + it on the website. MODs and/or service squad owners should now take over. + - It requires the **manual intervention** of the affected service's owners + to review, document, resolve, and eventually delete the issue condition. .. image:: https://stackmon.github.io/assets/images/solution-diagram.svg + SD2 Architecture Summary ------------------------ - - EpMon executes various HTTP query requests towards service endpoints and - generates metrics - - The HTTP requests metrics (generated by OpenStackSDK) are collected by - statsd. - - Time Series database (graphite) is pulling metrics from statsd. - - Metric Processor processes the requests metrics and based on defined thresholds evaluates the resulting service health metrics - - Status Dashboard visualize service health based on health metrics produced by metric processor and stored in SQL database - - Grafana dashboards visualize data from graphite as well as from metric processor - + - The **EpMon** plugin (end point monitoring) sends several HTTP query + requests to service endpoints and generates metrics. + - HTTP request metrics (status code, round trip time) are generated by + OpenStack SDK and are collected by Statsd. + - A time series database (Graphite) pulls metrics from Statsd. + - The Metric Processor (MP) processes the requests metrics and flags + certain circumstances. Based on defined rules and thresholds, the + MP computes resulting service health metrics (semaphores). + - The MP raises an issue for any non-green semaphore and stores it in + the SQL-based incident database that is part of the frontend component. + - The Status Dashboard frontend visualizes the incidents on a website. + - Grafana dashboards visualize data from Graphite as well as from the + Metric Processor for OTC staff members. + - Service Levels are computed based on how long incidents last. SD2 features @@ -46,23 +75,19 @@ SD2 features SD2 comes with the following features: -- Support of service health with 5 service statuses (3 generated semaphore lights, 1 custom semaphore light, 1 maintenance status) -- Support of HTTP requests (GET) for Endpoint Monitoring -- Support of custom metrics and custom thresholds -- Support of automatically generated incidents as well as custom incidents -- Support of all OTC environments - - - EU-DE - - EU-NL - - Swisscloud - -- Support of multiple Monitoring sources: - - - EU-DE - - EU-NL - - Swisscloud - -- Internal dashboards to understand the root cause for service health changes -- Each squad can control and manage their metrics and dashboards -- All parameters configured from single place (stackmon-config) in human readable form (yaml) - +- Service health with 5 service statuses (three generated + semaphores, one custom semaphore light, one maintenance status). +- HTTP GET-requests for Endpoint Monitoring. +- Custom metrics and custom thresholds. +- Incidents are generated once non-green semaphores are detected. + Alternatively, incidents can be raised manually as maintence + downtimes. +- All OTC-environments including eu-de, eu-nl, and eu-sc2 are covered. +- The monitoring environments are decoupled from the monitoring zones + obtaining the metrics and include eu-de, eu-nl, eu-sc2, and GCP. +- Linked Grafana dashboards support service squad members and MODs to + understand the root cause for service health changes. +- Each service squad can control and manage their metrics as well as + dashboards individually. +- All parameters configured from single place (stackmon-config) in + human readable form (YAML). \ No newline at end of file diff --git a/doc/source/internal/sd2_training/monitoring_coverage.rst b/doc/source/internal/sd2_training/monitoring_coverage.rst index ae056f3..e86a963 100644 --- a/doc/source/internal/sd2_training/monitoring_coverage.rst +++ b/doc/source/internal/sd2_training/monitoring_coverage.rst @@ -2,49 +2,59 @@ Monitoring coverage =================== -Multiple factors define the monitoring coverage to simulate common customer use -cases. The overall matrix configuration of all combined targets, sources and scopes is located at: -https://github.com/opentelekomcloud-infra/stackmon-config/blob/main/config.yaml +While monitoring the cloud services of the OTC (which we call +monitoring environments) is convenient and effective most of +the time, it is obvious that in corner cases the servers performing +the actual monitoring (which we call monitoring zones) should +include also externa zones. Who monitors whom (and how) can be +configured in a matrix definition: + + https://github.com/opentelekomcloud-infra/stackmon-config/blob/main/config.yaml -Monitored targets -################# +Monitoring Environments +----------------------- -* EU-DE -* EU-NL -* EU-CH2 (Swisscloud) +These targets are covered by the SD2 monitoring setup and are +displayed in separate tabs (or on separate pages for the Swisscloud): + +* eu-de, +* eu-nl, and +* eu-ch2 (Swisscloud). -Monitoring sources -################## +Monitoring Zones +---------------- +From these zones the monitoring probes are sent to the targets: * Inside OTC (eu-de, eu-ch2) * Outside OTC (Swisscloud) Scope of monitoring -################### +------------------- -* Endpoints and HTTP query requests +The SD2 is a special application of the more generic Stackmon project +and utilizes several plugins to collect its metrics: - * all services - * multiple GET queries +* HTTP-GET queries are sent to service API endpoints: + * applies to all services from the service catalog, + * multiple GET queries may be configured per service. * Static Resources - - * not yet in SD2 - * specific services - * availability of the resource or resource functionality + * not yet implemented in SD2 (projected for 1Q2024), + * specific services, + * availability of the resource or resource functionality. * Global resources - - * not yet in SD2 - * OTC console - * OTC docs portal - * OTC public site + * not yet implemented in SD2 (projected for 2024), + * OTC console, + * OTC helpcenter, + * OTC community portal, + * OTC public website. -Example of monitoring coverage: +Example configuration of the monitoring matrix and covered services: .. code:: yaml @@ -171,21 +181,31 @@ Example of monitoring coverage: - volume_skip_v2 - volume +Note that Service Managers or Engineers usually don't need to +touch this configuration. Details should be negotiated with +Platform Engineers. -Parameter "env" defines what is the target for monitoring (which region is to be monitored). +The attribute ``env`` defines the target for monitoring (which +region is to be monitored). The attribute ``monitoring_zone`` +defines the source of monitoring (from which region the monitoring +will be triggered). -Parameter "monitoring_zone" defines the source of monitoring (from which region the monitoring will be triggered) - -As Cloudmon is plugin based framework there's possibility to add as many plugins as required. -Currently 2 plugins are enabled: +Note that this configuration covers not only SD2 component, but +also the even more generic Stackmon framework. It is plugin based +so additional plugins can be added. Currently two plugins are enabled: - apimon - epmon -Apimon plugin triggers scenario-based Ansible playbooks which simulate the customer use-cases including also creation of resources (POST requests). -Currently only one scenario is enabled for token authorization (scenario1_token.yaml). As Status Dasbhoard only evaluates the HTTP GET metrics -other scenarios are not yet enabled. Playbooks are stored on github (https://github.com/stackmon/apimon-tests/tree/main/playbooks). +Apimon plugin triggers scenario-based Ansible playbooks which +simulate the customer use-cases including also creation of +resources (POST requests). Currently only one scenario is enabled +for token authorization (scenario1_token.yaml). As the SD2 only +evaluates the HTTP GET metrics other scenarios are not yet enabled. +Playbooks are stored on GitHub at: -EpMon plugin defines which service entries will be used in which specific environment. -Services which are not present in respective environment won't have entry in this config as well. + https://github.com/stackmon/apimon-tests/tree/main/playbooks +The EpMon plugin defines which service entries are used in which +specific environment. Services not present in an environment +won't have entry in this config as well, respectively. \ No newline at end of file diff --git a/doc/source/internal/sd2_training/status_dashboard_frontend.rst b/doc/source/internal/sd2_training/status_dashboard_frontend.rst index d632194..cbf8803 100644 --- a/doc/source/internal/sd2_training/status_dashboard_frontend.rst +++ b/doc/source/internal/sd2_training/status_dashboard_frontend.rst @@ -1,24 +1,26 @@ -========================= -Status Dashboard Frontend -========================= +=========================== +Status Dashboard 2 Frontend +=========================== -Status Dashboard provides the status information of OTC cloud services across different regions. +The web based frontend of the SD2 provides public (and internal, +after authentication) status information of OTC cloud services +across all configured regions. It supports these features: -The following features are supported on Status Dashboard: - -- Support of service health with 5 service statuses -- Authentication by OpenID connect -- Service categories - meta grouping of services into groups -- Regions - different services are existing in regions -- Incidents - entry about issues affecting certain regions and certain services +- Displays the service health through five service status. +- Authentication by OpenID connect (which in turn is connected + to the OTC LDAP directory). +- Several service are grouped into categories. +- Regions - several services are existing in regions. +- Incidents - entry about issues affecting certain regions and + certain services. - Support of all OTC environments -- built-in API support -- RSS notification -- SLA view on all services -- Incident history - +- Incident data is available through an API. +- RSS notification (for the OTC mobile app and other integrations). +- SLA view of the services. +- Incident history. Two Status Dashboard portals are available: + - public status dashboard: https://status.cloudmon.eco.tsi-dev.otc-service.com/ - hybrid status dashboard: https://status-ch2.cloudmon.eco.tsi-dev.otc-service.com/ @@ -27,12 +29,15 @@ Service Health View .. image:: training_images/sd2_frontend.jpg +From the architecture POV Status Dashboard is a Flask based +web server serving API and rendering web content with a +PostgreSQL database. The project source is available at +https://github.com/stackmon/status-dashboard -From the architecture POV Status Dashboard is a flask based web server serving API and rendering web content with the postgresql as database. -Source can be found at https://github.com/stackmon/status-dashboard - -Configuration of the status dashboard frontend is located at github: https://github.com/opentelekomcloud-infra/stackmon-config/blob/main/sdb_prod/catalog.yaml -The catalog yaml file contains definitions of service name, service type, service categories and regions. +Configuration of the status dashboard frontend is located +at github: https://github.com/opentelekomcloud-infra/stackmon-config/blob/main/sdb_prod/catalog.yaml +The ``catalog.yaml`` file contains definitions of the +service name, service type, service categories and regions. Example of AutoScaling service entry in SD catalog: @@ -57,6 +62,4 @@ SLA view https://status.cloudmon.eco.tsi-dev.otc-service.com/sla is calculated o .. image:: training_images/sd2_sla.jpg -Details how to work with incidents can be found at :ref:`incidents ` page. - - +Details how to work with incidents are described on the :ref:`incidents ` page. \ No newline at end of file diff --git a/doc/source/internal/sd2_training/workflow.rst b/doc/source/internal/sd2_training/workflow.rst index 42bc584..25ad8c8 100644 --- a/doc/source/internal/sd2_training/workflow.rst +++ b/doc/source/internal/sd2_training/workflow.rst @@ -1,7 +1,7 @@ .. _sd2_flow: -SD2 Flow Process -================ +SD2 Data Flow Process +===================== .. image:: training_images/sd2_data_flow.svg @@ -9,18 +9,22 @@ SD2 Flow Process :alt: sd2_data_flow -#. Service squad adds new data entries in github repository for - EpMOn (service URL queries), - adjusts flag and health metrics if required, - and adds service entry in SD catalog. -#. Cloudmon fetches public configuration from GitHub - and internal configuration (credentials, certs, keys,...) from local place and generate final configuration. -#. EpMon plugin is executed and triggers HTTP requests from defined configuration -#. Metrics from HTTP requests are collected by Statsd. -#. Collected metrics are stored in time-series database Graphite. -#. Metric Processor evaluates HTTP metrics from Graphite TSDB. - and generates new flag and health metrics based on defined rules and thresholds in configuration. -#. Status Dashboard changing service health semaphore light based on resulting health metrics from Metric Procesor. -#. Grafana uses metrics and statistics databases as the data sources for the - dashboards. The dashboard with various panels show the real-time status of - the platform. Grafana supports also historical views and trends. +#. Service squad adds new data entries in GitHub repository for + EpMon (service URL queries), adjusting flag and health + metrics if required, and adds a service entry in the SD catalog. +#. Cloudmon fetches public configuration from GitHub and internal + configuration (credentials, certs, keys, ...) from a local + repository place to generate the final configuration. +#. EpMon plugin is executed and triggers HTTP requests as defined + by the configuration. +#. Metrics resulting by the HTTP requests are collected by Statsd. +#. Collected metrics are stored in the time series database Graphite. +#. The Metric Processor evaluates HTTP metrics from Graphite TSDB + and generates new flag and health metrics based on defined + rules and thresholds in configuration. +#. Status Dashboard changes service health semaphore based on the + resulting health metrics from the Metric Procesor. +#. Grafana uses metrics and statistics databases as the data + sources for the dashboards. The dashboard with various panels + shows the real-time status of the platform. Grafana supports + also historical views and trends. \ No newline at end of file