adding new content for ApiMon training

Reviewed-by: gtema <artem.goncharov@gmail.com> Co-authored-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-committed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
2023-05-25 13:25:47 +00:00 · 2023-05-25 13:25:47 +00:00 · 2838ebed03
commit 2838ebed03
parent 12d8d43edb
37 changed files with 1124 additions and 0 deletions
--- a/doc/source/internal/apimon_training/alerts.rst
+++ b/doc/source/internal/apimon_training/alerts.rst
@ -0,0 +1,31 @@
+======
+Alerts
+======
+
+Alerta is the component of the ApiMon that is designed to integrate alerts
+from multiple sources. It supports many different standard sources like Syslog,
+SNMP, Prometheus, Nagios, Zabbix, etc. Additionally any other type of source
+using URL request or command line can be integrated as well.
+
+Native functions like correlation and de-duplication help to manage thousands of
+alerts in transparent way and consolidate alerts in proper categories based on
+environment, service, resource, failure type, etc.
+
+Alerta is hosted on https://alerts.eco.tsi-dev.otc-service.com/ .
+The authentication is centrally managed by OTC LDAP.
+
+The Zulip API was integrated with Alerta, to send notification of errors/alerts
+on Zulip stream.
+
+Alerts displayed on OTC Alerta are generated either by Executor, Scheduler,
+EpMon or by Grafana.
+
+ - “Executor alerts” focus on playbook results, whether playbook has completed
+   or failed.
+ - “Grafana alerts” focus on breaching the defined thresholds. For example API
+   response time is higher than defined threshold.
+ - "Scheduler alerts" TBD
+ - "EpMon alerts" provide information about failed endpoint queries with details
+   of the request in curl form and the respective error response details
+
+.. image:: training_images/alerta_dashboard.png
--- a/doc/source/internal/apimon_training/contact.rst
+++ b/doc/source/internal/apimon_training/contact.rst
@ -0,0 +1,29 @@
+Contact - Whom to address for Feedback?
+=======================================
+
+In case you have any feedback, proposals or found any issues regarding the
+ApiMon, EpMon or CloudMon, you can address them in the corresponding GitHub
+OpenTelekomCloud-Infra repositories or StackMon repositories.
+
+Issues or feedback regarding the **ApiMon, EpMon, Status Dashboard, Metric
+processor** as well as new feature requests can be addressed by filing an issue
+on the **Gihub** repository under
+https://github.com/opentelekomcloud-infra/system-config/blob/main/inventory/service/group_vars/apimon.yaml (CMO)
+https://github.com/opentelekomcloud-infra/stackmon-config (FMO)
+
+If you have found any problems which affects the **ApiMon dashboard design**
+please open an issue/PR on **GitHub**
+https://github.com/opentelekomcloud-infra/system-config/tree/main/playbooks/templates/grafana/apimon (CMO)
+https://github.com/stackmon/apimon-tests (FMO)
+
+
+If you have found any problems which affects the **ApiMon playbook scenarios**
+please open an issue/PR on **GitHub**
+https://github.com/opentelekomcloud-infra/apimon-tests (CMO)
+https://github.com/stackmon/apimon-tests (FMO).
+
+If there is another issue/demand/request try to locate proper repository in
+https://github.com/orgs/stackmon/repositories
+
+For general questions you can write an E-Mail to the `Ecosystems Squad
+<mailto:dl-pbcotcdeleco@t-systems.com>`_.
--- a/doc/source/internal/apimon_training/dashboards.rst
+++ b/doc/source/internal/apimon_training/dashboards.rst
--- a/doc/source/internal/apimon_training/databases.rst
+++ b/doc/source/internal/apimon_training/databases.rst
--- a/doc/source/internal/apimon_training/difference_cmo_fmo.rst
+++ b/doc/source/internal/apimon_training/difference_cmo_fmo.rst
@ -0,0 +1,34 @@
+.. _difference_apimon_cmo_fmo:
+
+===================================
+Difference  ApiMon(CMO)/ApiMon(FMO)
+===================================
+
+Due to the ongoing transformation of ApiMon and integration to a more robust
+CloudMon there are two operation modes right now. Therefore it's important to
+understand what is supported in which mode.
+
+This page aims to provide navigation links and understand the changes once the
+transformation is completed and some of the locations will change.
+
+The most important differences are described in the table below:
+
+---------------------+------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------+
+|   **Differences**   |                                              **ApiMon (CMO)**                                              |                             **ApiMon(FMO)**                              |
+=====================+============================================================================================================+==========================================================================+
+| Playbook scenarios  | https://github.com/opentelekomcloud-infra/apimon-test                                                      | https://github.com/stackmon/apimon-tests/tree/main/playbooks             |
+---------------------+------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------+
+| Dashboards setup    | https://github.com/opentelekomcloud-infra/system-config/tree/main/playbooks/templates/grafana/apimon       | https://github.com/stackmon/apimon-tests/tree/main/dashboards            |
+---------------------+------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------+
+| Environment setup   | https://github.com/opentelekomcloud-infra/system-config/blob/main/inventory/service/group_vars/apimon.yaml | https://github.com/opentelekomcloud-infra/stackmon-config                |
+---------------------+------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------+
+| Implementation mode | standalone app                                                                                             | plugin based                                                             |
+---------------------+------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------+
+| Organization        | opentelekomcloud-infra                                                                                     | stackmon                                                                 |
+---------------------+------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------+
+| Dashboards          | https://dashboard.tsi-dev.otc-service.com/                                                                 | https://dashboard.tsi-dev.otc-service.com/                               |
+|                     | https://dashboard.tsi-dev.otc-service.com/dashboards/f/UaB8meoZk/apimon                                    | https://dashboard.tsi-dev.otc-service.com/dashboards/f/CloudMon/cloudmon |
+---------------------+------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------+
+| Documentation       | https://confluence.tsi-dev.otc-service.com/display/ES/API-Monitoring                                       | https://stackmon.github.io/                                              |
+|                     |                                                                                                            | https://stackmon-cloudmon.readthedocs.io/en/latest/index.html            |
+---------------------+------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------+
--- a/doc/source/internal/apimon_training/epmon_checks.rst
+++ b/doc/source/internal/apimon_training/epmon_checks.rst
@ -0,0 +1,40 @@
+.. _epmon_overview:
+
+============================
+Endpoint Monitoring overview
+============================
+
+
+EpMon is a standalone python based process targeting every OTC service. It
+finds service in the service catalogs and sends GET requests to the configured
+endpoints.
+
+Performing extensive tests like provisioning a server is giving a great
+coverage, but is usually not something what can be performed very often and
+leaves certain gaps on the timescale of monitoring. In order to cover this gap
+EpMon component is capable to send GET requests to the given URLs relying on the
+API discovery of the OpenStack cloud (perform GET request to /servers or the
+compute endpoint). Such requests are cheap and can be performed in the loop, i.e.
+every 5 seconds. Latency of those calls, as well as the return codes, are being
+captured and sent to the metrics storage.
+
+
+
+Currently EpMon configuration is located in system-config:
+https://github.com/opentelekomcloud-infra/system-config/blob/main/inventory/service/group_vars/apimon.yaml
+(this will change in future once CloudMon will take place)
+
+And defines the query HTTP targets for every single OTC service.
+
+EpMon dashboard provides general availability status of every service definition
+from service catalog:
+
+.. image:: training_images/epmon_status_dashboard.jpg
+
+Additionally it provides further details for the endpoints like response times,
+detected error codes or no responses at all.
+
+.. image:: training_images/epmon_dashboard_details.jpg
+
+EpMon findings are also reported to Alerta and notifications are sent to Zulip
+dedicated topic "apimon_endpoint_monitoring".
--- a/doc/source/internal/apimon_training/faq/faq_images/alerta_alerts_detail.png
+++ b/doc/source/internal/apimon_training/faq/faq_images/alerta_alerts_detail.png
--- a/doc/source/internal/apimon_training/faq/faq_images/annotations.jpg
+++ b/doc/source/internal/apimon_training/faq/faq_images/annotations.jpg
--- a/doc/source/internal/apimon_training/faq/faq_images/dashboard_log_links.jpg
+++ b/doc/source/internal/apimon_training/faq/faq_images/dashboard_log_links.jpg
--- a/doc/source/internal/apimon_training/faq/faq_images/zulip_notification_links.jpg
+++ b/doc/source/internal/apimon_training/faq/faq_images/zulip_notification_links.jpg
--- a/doc/source/internal/apimon_training/faq/how_can_i_access_dashboard.rst
+++ b/doc/source/internal/apimon_training/faq/how_can_i_access_dashboard.rst
@ -0,0 +1,7 @@
+============================
+How Can I Access Dashboard ?
+============================
+
+OTC LDAP authentication is supported on
+https://dashboard.tsi-dev.otc-service.com.
+
--- a/doc/source/internal/apimon_training/faq/how_to_read_the_logs_and_understand_the_issue.rst
+++ b/doc/source/internal/apimon_training/faq/how_to_read_the_logs_and_understand_the_issue.rst
@ -0,0 +1,80 @@
+.. _working_with_logs:
+
+=============================================
+How To Read The Logs And Understand The Issue
+=============================================
+
+
+Logs are stored on swift OBS and they expire after ~1 week. The logs are can be
+accessed from multiple locations:
+
+  - Zulip notifications:
+
+
+    .. image:: faq_images/zulip_notification_links.jpg
+
+
+  - Alerts in Alerta
+
+
+    .. image:: faq_images/alerta_alerts_detail.png
+
+
+  - Tables in dashboards
+
+
+    .. image:: faq_images/dashboard_log_links.jpg
+
+
+The logs contain whole ansible playbook output and help to analyze the problem
+in detail.
+For example following log detail describes the failed scenario for ECS deployment::
+
+    2023-05-17 21:08:09.038955 | TASK [server_create_delete : Try connecting]
+    2023-05-17 21:08:09.485569 | localhost | ERROR
+    2023-05-17 21:08:09.485862 | localhost | {
+    2023-05-17 21:08:09.485922 | localhost |   "changed": true,
+    2023-05-17 21:08:09.485950 | localhost |   "cmd": [
+    2023-05-17 21:08:09.485984 | localhost |     "ssh",
+    2023-05-17 21:08:09.486016 | localhost |     "-o",
+    2023-05-17 21:08:09.486052 | localhost |     "UserKnownHostsFile=/dev/null",
+    2023-05-17 21:08:09.486076 | localhost |     "-o",
+    2023-05-17 21:08:09.486097 | localhost |     "StrictHostKeyChecking=no",
+    2023-05-17 21:08:09.486118 | localhost |     "linux@80.158.60.117",
+    2023-05-17 21:08:09.486138 | localhost |     "-i",
+    2023-05-17 21:08:09.486160 | localhost |     "~/.ssh/scenario2a-162b6915911748c5809474be69d2a3b3-kp.pem"
+    2023-05-17 21:08:09.486192 | localhost |   ],
+    2023-05-17 21:08:09.486221 | localhost |   "delta": "0:00:00.127394",
+    2023-05-17 21:08:09.486242 | localhost |   "end": "2023-05-17 21:08:09.454247",
+    2023-05-17 21:08:09.486262 | localhost |   "invocation": {
+    2023-05-17 21:08:09.486283 | localhost |     "module_args": {
+    2023-05-17 21:08:09.486314 | localhost |       "_raw_params": "ssh -o 'UserKnownHostsFile=/dev/null' -o 'StrictHostKeyChecking=no' linux@80.158.60.117 -i ~/.ssh/scenario2a-162b6915911748c5809474be69d2a3b3-kp.pem",
+    2023-05-17 21:08:09.486373 | localhost |       "_uses_shell": false,
+    2023-05-17 21:08:09.486397 | localhost |       "argv": null,
+    2023-05-17 21:08:09.486428 | localhost |       "chdir": null,
+    2023-05-17 21:08:09.486455 | localhost |       "creates": null,
+    2023-05-17 21:08:09.486487 | localhost |       "executable": null,
+    2023-05-17 21:08:09.486513 | localhost |       "removes": null,
+    2023-05-17 21:08:09.486533 | localhost |       "stdin": null,
+    2023-05-17 21:08:09.486553 | localhost |       "stdin_add_newline": true,
+    2023-05-17 21:08:09.486573 | localhost |       "strip_empty_ends": true,
+    2023-05-17 21:08:09.486593 | localhost |       "warn": false
+    2023-05-17 21:08:09.486613 | localhost |     }
+    2023-05-17 21:08:09.486633 | localhost |   },
+    2023-05-17 21:08:09.486657 | localhost |   "msg": "non-zero return code",
+    2023-05-17 21:08:09.486689 | localhost |   "rc": 255,
+    2023-05-17 21:08:09.486713 | localhost |   "start": "2023-05-17 21:08:09.326853",
+    2023-05-17 21:08:09.486734 | localhost |   "stderr": "Pseudo-terminal will not be allocated because stdin is not a terminal.\r\nWarning: Permanently added '80.158.60.117' (ED25519) to the list of known hosts.\r\nlinux@80.158.60.117: Permission denied (publickey).",
+    2023-05-17 21:08:09.486755 | localhost |   "stderr_lines": [
+    2023-05-17 21:08:09.486776 | localhost |     "Pseudo-terminal will not be allocated because stdin is not a terminal.",
+    2023-05-17 21:08:09.486808 | localhost |     "Warning: Permanently added '80.158.60.117' (ED25519) to the list of known hosts.",
+    2023-05-17 21:08:09.486834 | localhost |     "linux@80.158.60.117: Permission denied (publickey)."
+    2023-05-17 21:08:09.486855 | localhost |   ]
+    2023-05-17 21:08:09.486875 | localhost | }
+
+In this case it seems that deployed ECS doesn't contain injected public SSH key
+which can point to issue with cloud init or metadata server.
+
+The playbooks can be run also manually on any OTC tenant and can be used
+for further investigation and analysis.
+
--- a/doc/source/internal/apimon_training/faq/index.rst
+++ b/doc/source/internal/apimon_training/faq/index.rst
@ -0,0 +1,10 @@
+==========================
+Frequently Asked Questions
+==========================
+
+.. toctree::
+   :maxdepth: 1
+
+   how_can_i_access_dashboard
+   how_to_read_the_logs_and_understand_the_issue
+   what_are_the_annotations
--- a/doc/source/internal/apimon_training/faq/what_are_the_annotations.rst
+++ b/doc/source/internal/apimon_training/faq/what_are_the_annotations.rst
@ -0,0 +1,22 @@
+#########################
+What Are The Annotations?
+#########################
+
+Annotations provide a way to mark points on the graph with rich events. When you
+hover over an annotation you can get event description and event tags. The text
+field can include links to other systems with more detail.
+
+.. image:: faq_images/annotations.jpg
+
+
+In ApiMon Dashboards annotations are used to show the JIRA change issue types
+which change the transition from SCHEDULED to IN EXECUTION. This helps to
+identify if some JIRA change has negative impact on platform in real time. The
+annotations contain several fields which help to correlate the platform behavior
+with the respective change directly on the dashboard:
+
+ - JIRA Change issue ID
+ - Impacted Availability Zone
+ - Affected Environment
+ - Main component
+ - Summary
--- a/doc/source/internal/apimon_training/index.rst
+++ b/doc/source/internal/apimon_training/index.rst
@ -0,0 +1,21 @@
+===================
+Apimon Training
+===================
+
+.. toctree::
+   :maxdepth: 1
+
+   introduction
+   workflow
+   monitoring_coverage
+   test_scenarios
+   epmon_checks
+   dashboards
+   metrics
+   databases
+   alerts
+   notifications
+   logs
+   difference_cmo_fmo
+   contact
+   faq/index
--- a/doc/source/internal/apimon_training/introduction.rst
+++ b/doc/source/internal/apimon_training/introduction.rst
--- a/doc/source/internal/apimon_training/logs.rst
+++ b/doc/source/internal/apimon_training/logs.rst
@ -0,0 +1,45 @@
+.. _logs:
+
+====
+Logs
+====
+
+
+- Every single job run log is stored on OpenStack Swift object storage.
+- Each single job log file provides unique URL which can be accessed to see log
+  details
+- These URLs are available on all ApiMon levels:
+
+  - In Zulip alarm messages
+  - In Alerta events
+  - In Grafana Dashboards
+
+- Logs are simple plain text files of the whole playbook output::
+
+    2020-07-12 05:54:04.661170 | TASK [List Servers]
+    2020-07-12 05:54:09.050491 | localhost | ok
+    2020-07-12 05:54:09.067582 | TASK [Create Server in default AZ]
+    2020-07-12 05:54:46.055650 | localhost | MODULE FAILURE:
+    2020-07-12 05:54:46.055873 | localhost | Traceback (most recent call last):
+    2020-07-12 05:54:46.057441 | localhost |
+    2020-07-12 05:54:46.057499 | localhost | During handling of the above exception, another exception occurred:
+    2020-07-12 05:54:46.057535 | localhost |
+    …
+    2020-07-12 05:54:46.063992 | localhost |  File "/tmp/ansible_os_server_payload_uz1c7_iw/ansible_os_server_payload.zip/ansible/modules/cloud/openstack/os_server.py", line 500, in _create_server
+    2020-07-12 05:54:46.065152 | localhost |  return self._send_request(
+    2020-07-12 05:54:46.065186 | localhost |  File "/root/.local/lib/python3.8/site-packages/keystoneauth1/session.py", line 1020, in _send_request
+    2020-07-12 05:54:46.065334 | localhost |  raise exceptions.ConnectFailure(msg)
+    2020-07-12 05:54:46.065378 | localhost | keystoneauth1.exceptions.connection.ConnectFailure: Unable to establish connection to https://ims.eu-de.otctest.t-systems.com/v2/images: ('Connection aborted.', OSError(107, 'Transport endpoint is not connected'))
+    2020-07-12 05:54:46.295035 |
+    2020-07-12 05:54:46.295241 | TASK [Delete server]
+    2020-07-12 05:54:48.481374 | localhost | ok
+    2020-07-12 05:54:48.505761 |
+    2020-07-12 05:54:48.505906 | TASK [Delete SecurityGroup]
+    2020-07-12 05:54:50.727174 | localhost | changed
+    2020-07-12 05:54:50.745541 |
+
+
+For further details how to work with logs please refer to
+:ref:`How To Read The Logs And Understand The Issue <working_with_logs>` FAQ 
+page.
+
--- a/doc/source/internal/apimon_training/metrics.rst
+++ b/doc/source/internal/apimon_training/metrics.rst
@ -0,0 +1,57 @@
+.. _metrics_definition:
+
+=======
+Metrics
+=======
+
+The Ansible playbook scenarios generate metrics in two ways:
+
+- The Ansible playbook internally invokes method calls to **OpenStack SDK
+  libraries.** They in turn generate metrics about each API call they do. This
+  requires some special configuration in the clouds.yaml file (currently
+  exposing metrics into statsd and InfluxDB is supported). For details refer
+  to the `config
+  documentation <https://docs.openstack.org/openstacksdk/latest/user/guides/stats.html>`_
+  of the OpenStack SDK. The following metrics are captured:
+
+  - response HTTP code
+  - duration of API call
+  - name of API call
+  - method of API call
+  - service type
+
+- Ansible plugins may **expose additional metrics** (i.e. whether the overall
+  scenario succeed or not) with help of `callback
+  plugin <https://github.com/stackmon/apimon/tree/main/apimon/ansible/callback>`_.
+  Since sometimes it is not sufficient to know only the timings of each API
+  call, Ansible callbacks are utilized to report overall execution time and
+  result (whether the scenario succeeded and how long it took). The following
+  metrics are captured:
+
+  - test case
+  - playbook name
+  - environment
+  - action name
+  - result code
+  - result string
+  - service type
+  - state type
+  - total amount of (failed, passed, ignored, skipped tests)
+
+Custom metrics:
+
+In some situations more complex metric generation is required which consists of
+execution of multiple tasks in scenario. For such cases, the tags parameter is
+used. Once the specific tasks in playbook are tagged with some specific metric
+name the metrics are calculated as sum of all executed tasks with respective
+tag. It's useful in cases where the measured metric contains multiple steps to
+achieve the desired state of service or service resource. For example, boot up of
+virtual machine from deployment until successful login via SSH.
+
+.. code-block::
+
+    tags: ["metric=delete_server"]
+    tags: ["az={{ availability_zone }}", "service=compute", "metric=create_server{{ metric_suffix }}"]
+
+More details how to query metrics from databases are described on :ref:`Metric
+databases <metric_databases>` page.
--- a/doc/source/internal/apimon_training/monitoring_coverage.rst
+++ b/doc/source/internal/apimon_training/monitoring_coverage.rst
@ -0,0 +1,51 @@
+===================
+Monitoring coverage
+===================
+
+Multiple factors define the monitoring coverage to simulate common customer use
+cases.
+
+
+Monitored locations
+###################
+
+* EU-DE
+* EU-NL
+* PREPROD (EU_DE)
+* EU-CH2 (Swisscloud)
+
+
+Monitoring sources
+##################
+
+* Inside OTC (eu-de, eu-ch2)
+* Outside OTC (Swisscloud)
+
+
+Monitored targets
+#################
+
+* Endpoints and HTTP query requests
+
+  * all services
+  * multiple GET queries
+
+* Static Resources
+
+  * specific services
+  * availability of the resource or resource functionality
+
+* Dynamic resources
+
+  *  ansible playbooks
+  *  specific services
+  *  monitoring of most common use cases in cloud services
+
+
+Monitoring dashboards
+#####################
+
+* KPI dashboards
+* 24/7 dashboards
+* Test results dashboards
+* Specific service dashboards
--- a/doc/source/internal/apimon_training/notifications.rst
+++ b/doc/source/internal/apimon_training/notifications.rst
@ -0,0 +1,68 @@
+=============
+Notifications
+=============
+
+Zulip as official OTC communication channel supports API interface for pushing
+the notifications from ApiMon to various Zulip streams:
+
+ -  #Alerts Stream
+ -  #Alerts-Hybrid Stream
+ -  #Alerts-Preprod Stream
+
+Every stream contains topics based on the service type (if represented by
+standalone Ansible playbook) and general apimon_endpoint_monitor topic which
+contains alerts of GET queries towards all services.
+
+
+.. image:: training_images/zulip_notifications.png
+
+
+If the error has been acknowledged on Alerta, the new notification message for
+repeating error won't get posted again on Zulip.
+
+Notifications contain further details which help to identify root cause faster
+and more effectively. 
+
+Notification parameters
+#######################
+
+The ApiMon notification consists of several fields:
+
+---------------------------+------------------------------------------------------------------------+
+|    Notification Field     |                              Description                               |
+===========================+========================================================================+
+| **APIMon Alert link**     | Reference to alert in Alerta                                           |
+---------------------------+------------------------------------------------------------------------+
+| **Status**                | Status of the alert in Alerta                                          |
+---------------------------+------------------------------------------------------------------------+
+| **Environment**           | Information about affected environment/region                          |
+---------------------------+------------------------------------------------------------------------+
+| **Severity**              | Severity of the alarm                                                  |
+---------------------------+------------------------------------------------------------------------+
+| **Origin**                | Information about origin location from where the job has been executed |
+---------------------------+------------------------------------------------------------------------+
+| **Service**               | Information about affected service and type of monitoring              |
+---------------------------+------------------------------------------------------------------------+
+| **Resource**              | Further details in which particular resource issue has happened        |
+---------------------------+------------------------------------------------------------------------+
+| **Error message Summary** | Short description of error result                                      |
+---------------------------+------------------------------------------------------------------------+
+| **Execution Log link**    | Reference to job execution output on Swift object storage              |
+---------------------------+------------------------------------------------------------------------+
+
+Th EpMon notification consists of several fields:
+
+----------------------------+------------------------------------------------------------------+
+|     Notification Field     |                           Description                            |
+============================+==================================================================+
+| **APIMon Alert link**      | Reference to alert in Alerta                                     |
+----------------------------+------------------------------------------------------------------+
+| **Environment**            | Information about affected environment/region                    |
+----------------------------+------------------------------------------------------------------+
+| **Curl command**           | Interpreted request in curl format for reproducible applications |
+----------------------------+------------------------------------------------------------------+
+| **Request error response** | Error result of the requested API call                           |
+----------------------------+------------------------------------------------------------------+
+
+  
+
--- a/doc/source/internal/apimon_training/test_scenarios.rst
+++ b/doc/source/internal/apimon_training/test_scenarios.rst
--- a/doc/source/internal/apimon_training/training_images/24_7_dashboard.jpg
+++ b/doc/source/internal/apimon_training/training_images/24_7_dashboard.jpg
--- a/doc/source/internal/apimon_training/training_images/alerta_alerts.png
+++ b/doc/source/internal/apimon_training/training_images/alerta_alerts.png
--- a/doc/source/internal/apimon_training/training_images/alerta_dashboard.png
+++ b/doc/source/internal/apimon_training/training_images/alerta_dashboard.png
--- a/doc/source/internal/apimon_training/training_images/apimon_data_flow.svg
+++ b/doc/source/internal/apimon_training/training_images/apimon_data_flow.svg
--- a/doc/source/internal/apimon_training/training_images/apimon_test_results.jpg
+++ b/doc/source/internal/apimon_training/training_images/apimon_test_results.jpg
--- a/doc/source/internal/apimon_training/training_images/compute_service_statistics_1.jpg
+++ b/doc/source/internal/apimon_training/training_images/compute_service_statistics_1.jpg
--- a/doc/source/internal/apimon_training/training_images/compute_service_statistics_2.jpg
+++ b/doc/source/internal/apimon_training/training_images/compute_service_statistics_2.jpg
--- a/doc/source/internal/apimon_training/training_images/dashboards.png
+++ b/doc/source/internal/apimon_training/training_images/dashboards.png
--- a/doc/source/internal/apimon_training/training_images/epmon_dashboard_details.jpg
+++ b/doc/source/internal/apimon_training/training_images/epmon_dashboard_details.jpg
--- a/doc/source/internal/apimon_training/training_images/epmon_status_dashboard.jpg
+++ b/doc/source/internal/apimon_training/training_images/epmon_status_dashboard.jpg
--- a/doc/source/internal/apimon_training/training_images/graphite_query.jpg
+++ b/doc/source/internal/apimon_training/training_images/graphite_query.jpg
--- a/doc/source/internal/apimon_training/training_images/kpi_dashboard.png
+++ b/doc/source/internal/apimon_training/training_images/kpi_dashboard.png
--- a/doc/source/internal/apimon_training/training_images/postgresql_query.jpg
+++ b/doc/source/internal/apimon_training/training_images/postgresql_query.jpg
--- a/doc/source/internal/apimon_training/training_images/zulip_notifications.png
+++ b/doc/source/internal/apimon_training/training_images/zulip_notifications.png
--- a/doc/source/internal/apimon_training/workflow.rst
+++ b/doc/source/internal/apimon_training/workflow.rst
@ -0,0 +1,28 @@
+.. _apimon_flow:
+
+ApiMon Flow Process
+===================
+
+
+.. image:: training_images/apimon_data_flow.svg
+   :target: training_images/apimon_data_flow.svg
+   :alt: apimon_data_flow
+
+
+#. Service squad adds test scenario to github repository.
+#. Scheduler fetches test scenarios from Github and add them to queue.
+#. Executor plays Ansible test scenario playbooks. Up to 8 parallel threads are enabled
+#. Test scenario which has finished is being removed from the thread and next
+   playbook in the queue is added to the free thread. The previous playbook is
+   added to the queue on the last position.
+#. Test scenario statistics are stored in the Postgresql database.
+#. Metrics from HTTP requests are collected by Statsd.
+#. Collected metrics are stored in time-series database Graphite.
+#. Grafana uses metrics and statistics databases as the data sources for the
+   dashboards. The dashboard with various panels show the real-time status of
+   the platform. Grafana supports also historical views and trends.
+#. Breached thresholds as well as failed test scenarios result in generated
+   alerts on Alerta.
+#. Notifications containing alert details are sent to Zulip
+#. Every test scenario stores it's job output log into Swift object storage for further analysis and investigation.
+    
--- a/doc/source/internal/index.rst
+++ b/doc/source/internal/index.rst
@ -6,3 +6,4 @@ Internal Documentation
   :maxdepth: 1

   helpcenter_training/index
+   apimon_training/index