update based on review

This commit is contained in:
Hasko, Vladimir 2023-05-24 12:37:58 +00:00
parent f7ad5765ac
commit e7301444c6
12 changed files with 88 additions and 72 deletions

View File

@ -4,7 +4,7 @@ Alerts
Alerta is the component of the ApiMon that is designed to integrate alerts Alerta is the component of the ApiMon that is designed to integrate alerts
from multiple sources. It supports many different standard sources like Syslog, from multiple sources. It supports many different standard sources like Syslog,
SNMP, Prometheus, Nagios, Zabbix, etc. Additioanlly any other type of source SNMP, Prometheus, Nagios, Zabbix, etc. Additionally any other type of source
using URL request or command line can be integrated as well. using URL request or command line can be integrated as well.
Native functions like correlation and de-duplication help to manage thousands of Native functions like correlation and de-duplication help to manage thousands of
@ -12,10 +12,10 @@ alerts in transparent way and consolidate alerts in proper categories based on
environment, service, resource, failure type, etc. environment, service, resource, failure type, etc.
Alerta is hosted on https://alerts.eco.tsi-dev.otc-service.com/ . Alerta is hosted on https://alerts.eco.tsi-dev.otc-service.com/ .
The authentication is centrally managed by LDAP. The authentication is centrally managed by OTC LDAP.
The Zulip API was integrated with Alerta, to send notification of errors/alerts The Zulip API was integrated with Alerta, to send notification of errors/alerts
on zulip stream. on Zulip stream.
Alerts displayed on OTC Alerta are generated either by Executor, Scheduler, Alerts displayed on OTC Alerta are generated either by Executor, Scheduler,
EpMon or by Grafana. EpMon or by Grafana.

File diff suppressed because it is too large Load Diff

View File

@ -75,7 +75,7 @@ OpenStack metrics branch is structured as following:
- request method (GET/POST/DELETE/PUT) - request method (GET/POST/DELETE/PUT)
- resource (service resource, i.e. server, keypair, volume, etc). Subresources are joined with "_" (i.e. cluster_nodes) - resource (service resource, i.e. server, keypair, volume, etc). Sub-resources are joined with "_" (i.e. cluster_nodes)
- response code - received response code - response code - received response code

View File

@ -8,6 +8,9 @@ Due to the ongoing transformation of ApiMon and integration to a more robust
CloudMon there are two operation modes right now. Therefore it's important to CloudMon there are two operation modes right now. Therefore it's important to
understand what is supported in which mode. understand what is supported in which mode.
This pages aims to provide navigation links and understand the changes once the
transformation is completed and some of the locations will change.
The most important differences are described in the table below: The most important differences are described in the table below:
+-----------------------+------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------+ +-----------------------+------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------+

View File

@ -5,7 +5,7 @@ Endpoint Monitoring overview
============================ ============================
EpMon is a standalone python based process targetting every OTC service. Tt EpMon is a standalone python based process targeting every OTC service. It
finds service in the service catalogs and sends GET requests to the configured finds service in the service catalogs and sends GET requests to the configured
endpoints. endpoints.
@ -14,8 +14,8 @@ coverage, but is usually not something what can be performed very often and
leaves certain gaps on the timescale of monitoring. In order to cover this gap leaves certain gaps on the timescale of monitoring. In order to cover this gap
EpMon component is capable to send GET requests to the given URLs relying on the EpMon component is capable to send GET requests to the given URLs relying on the
API discovery of the OpenStack cloud (perform GET request to /servers or the API discovery of the OpenStack cloud (perform GET request to /servers or the
compute endpoint). Such requests are cheap and can be performed in the loop i.e. compute endpoint). Such requests are cheap and can be performed in the loop, i.e.
every 5 seconds. Latency of those calls, as well as the return codes are being every 5 seconds. Latency of those calls, as well as the return codes, are being
captured and sent to the metrics storage. captured and sent to the metrics storage.

View File

@ -1,3 +1,5 @@
.. _working_with_logs:
============================================= =============================================
How To Read The Logs And Understand The Issue How To Read The Logs And Understand The Issue
============================================= =============================================
@ -24,7 +26,7 @@ accessed from multiple locations:
.. image:: faq_images/dashboard_log_links.jpg .. image:: faq_images/dashboard_log_links.jpg
The logs contain whole ansible playbook output and help to analyse the problem The logs contain whole ansible playbook output and help to analyze the problem
in detail. in detail.
For example following log detail describes the failed scenario for ECS deployment:: For example following log detail describes the failed scenario for ECS deployment::

View File

@ -13,7 +13,7 @@ field can include links to other systems with more detail.
In Cloudmon Dashboards annotations are used to show the JIRA change issue types In Cloudmon Dashboards annotations are used to show the JIRA change issue types
which change the transition from SCHEDULED to IN EXECUTION. This helps to which change the transition from SCHEDULED to IN EXECUTION. This helps to
identify if some JIRA change has negative impact on platform in real time. The identify if some JIRA change has negative impact on platform in real time. The
annotations contain several fields which help to corelate the platform behaviour annotations contain several fields which help to correlate the platform behavior
with the respective change directly on the dashboard: with the respective change directly on the dashboard:
- JIRA Change issue ID - JIRA Change issue ID

View File

@ -34,8 +34,8 @@ ApiMon Architecture Summary
`Github <https://github.com/opentelekomcloud-infra/apimon-test>`_. `Github <https://github.com/opentelekomcloud-infra/apimon-test>`_.
- EpMon executes various HTTP query requests towards service endpoints and - EpMon executes various HTTP query requests towards service endpoints and
generates statistsic generates statistics
- Scheduler fetches the latest playbooks from repo and puts them in - Scheduler fetches the latest playbooks from repo and puts them in a
queue to run in a endless loop. queue to run in a endless loop.
- Executor is running the playbooks from queue and capturing the metrics - Executor is running the playbooks from queue and capturing the metrics
- The ansible playbook results generates the metrics (duration, result). - The ansible playbook results generates the metrics (duration, result).
@ -69,8 +69,8 @@ ApiMon comes with the following features:
- internal (OTC) - internal (OTC)
- external (vCloud) - external (vCloud)
- Alerts agregated in Alerta and notifications sent to zulip - Alerts aggregated in Alerta and notifications sent to zulip
- Various dasbhoards - Various dashboards
- KPI dashboards - KPI dashboards
- 24/7 squad dashboards - 24/7 squad dashboards
@ -102,7 +102,7 @@ possible):
- No synthetic workloads: The service is not simulating any workloads (for - No synthetic workloads: The service is not simulating any workloads (for
example a benchmark suite) on the provisioned resources. Instead it measures example a benchmark suite) on the provisioned resources. Instead it measures
and reports only if APIs are available and return expected results with an and reports only if APIs are available and return expected results with an
expected behaviour. expected behavior.
- No every single API monitoring .The API-Monitoring focuses on basic API - No every single API monitoring .The API-Monitoring focuses on basic API
functionality of selected components. It doesn't cover every single API call functionality of selected components. It doesn't cover every single API call
available in OTC API product portfolio. available in OTC API product portfolio.

View File

@ -9,7 +9,7 @@ Logs
- Every single job run log is stored on OpenStack Swift object storage. - Every single job run log is stored on OpenStack Swift object storage.
- Each single job log file provides unique URL which can be accessed to see log - Each single job log file provides unique URL which can be accessed to see log
details details
- These URLs are available on all APIMON levels: - These URLs are available on all ApiMon levels:
- In Zulip alarm messages - In Zulip alarm messages
- In Alerta events - In Alerta events
@ -38,3 +38,9 @@ Logs
2020-07-12 05:54:48.505906 | TASK [Delete SecurityGroup] 2020-07-12 05:54:48.505906 | TASK [Delete SecurityGroup]
2020-07-12 05:54:50.727174 | localhost | changed 2020-07-12 05:54:50.727174 | localhost | changed
2020-07-12 05:54:50.745541 | 2020-07-12 05:54:50.745541 |
For further details how to work with logs please refer to :ref:`How To Read The
Logs And Understand The Issue
<working_with_logs>` FAQ page.

View File

@ -4,7 +4,7 @@
Metrics Metrics
======= =======
The ansible playbook scenarios generate metrics in two ways: The Ansible playbook scenarios generate metrics in two ways:
- The Ansible playbook internally invokes method calls to **OpenStack SDK - The Ansible playbook internally invokes method calls to **OpenStack SDK
libraries.** They in turn generate metrics about each API call they do. This libraries.** They in turn generate metrics about each API call they do. This
@ -41,16 +41,17 @@ The ansible playbook scenarios generate metrics in two ways:
Custom metrics: Custom metrics:
In some situations more complex metric generation is required which consists of In some situations more complex metric generation is required which consists of
execution of multiple tasks in scenario. For such cases the tags parameter is execution of multiple tasks in scenario. For such cases, the tags parameter is
used. Once the specific tasks in playbook are tagged with some specific metric used. Once the specific tasks in playbook are tagged with some specific metric
name the metrics are calculated as sum of all executed tasks with respective name the metrics are calculated as sum of all executed tasks with respective
tag. It's useful in cases where measured metric contains multiple steps to tag. It's useful in cases where the measured metric contains multiple steps to
achieve the desired state of service or service resource. For example boot up of achieve the desired state of service or service resource. For example, boot up of
virtual machine from deployment until succesfull login via SSH. virtual machine from deployment until successful login via SSH.
.. code-block:: .. code-block::
tags: ["metric=delete_server"] tags: ["metric=delete_server"]
tags: ["az={{ availability_zone }}", "service=compute", "metric=create_server{{ metric_suffix }}"] tags: ["az={{ availability_zone }}", "service=compute", "metric=create_server{{ metric_suffix }}"]
More details how to query metrics from databases are described on :ref:`Metric databases <metric_databases>` page. More details how to query metrics from databases are described on :ref:`Metric
databases <metric_databases>` page.

View File

@ -2,15 +2,15 @@
Notifications Notifications
============= =============
Zulip as officialt OTC communication channels supports API interface for pushing Zulip as official OTC communication channel supports API interface for pushing
the notifications from ApiMon to various zulip streams: the notifications from ApiMon to various Zulip streams:
- #Alerts Stream - #Alerts Stream
- #Alerts-Hybrid Stream - #Alerts-Hybrid Stream
- #Alerts-Preprod Stream - #Alerts-Preprod Stream
Every stream contains topics based on the service type (if represented by Every stream contains topics based on the service type (if represented by
standalone ansible playbook) and general apimon_endpoint_monitor topic whihc standalone Ansible playbook) and general apimon_endpoint_monitor topic which
contains alerts of GET queries towards all services. contains alerts of GET queries towards all services.
If the error has been acknowledged on Alerta, the new notification message for If the error has been acknowledged on Alerta, the new notification message for

File diff suppressed because it is too large Load Diff