update based on review

This commit is contained in:
Hasko, Vladimir 2023-05-24 12:37:58 +00:00
parent f7ad5765ac
commit e7301444c6
12 changed files with 88 additions and 72 deletions

View File

@ -4,7 +4,7 @@ Alerts
Alerta is the component of the ApiMon that is designed to integrate alerts
from multiple sources. It supports many different standard sources like Syslog,
SNMP, Prometheus, Nagios, Zabbix, etc. Additioanlly any other type of source
SNMP, Prometheus, Nagios, Zabbix, etc. Additionally any other type of source
using URL request or command line can be integrated as well.
Native functions like correlation and de-duplication help to manage thousands of
@ -12,10 +12,10 @@ alerts in transparent way and consolidate alerts in proper categories based on
environment, service, resource, failure type, etc.
Alerta is hosted on https://alerts.eco.tsi-dev.otc-service.com/ .
The authentication is centrally managed by LDAP.
The authentication is centrally managed by OTC LDAP.
The Zulip API was integrated with Alerta, to send notification of errors/alerts
on zulip stream.
on Zulip stream.
Alerts displayed on OTC Alerta are generated either by Executor, Scheduler,
EpMon or by Grafana.

File diff suppressed because it is too large Load Diff

View File

@ -75,7 +75,7 @@ OpenStack metrics branch is structured as following:
- request method (GET/POST/DELETE/PUT)
- resource (service resource, i.e. server, keypair, volume, etc). Subresources are joined with "_" (i.e. cluster_nodes)
- resource (service resource, i.e. server, keypair, volume, etc). Sub-resources are joined with "_" (i.e. cluster_nodes)
- response code - received response code

View File

@ -8,6 +8,9 @@ Due to the ongoing transformation of ApiMon and integration to a more robust
CloudMon there are two operation modes right now. Therefore it's important to
understand what is supported in which mode.
This pages aims to provide navigation links and understand the changes once the
transformation is completed and some of the locations will change.
The most important differences are described in the table below:
+-----------------------+------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------+

View File

@ -5,7 +5,7 @@ Endpoint Monitoring overview
============================
EpMon is a standalone python based process targetting every OTC service. Tt
EpMon is a standalone python based process targeting every OTC service. It
finds service in the service catalogs and sends GET requests to the configured
endpoints.
@ -14,8 +14,8 @@ coverage, but is usually not something what can be performed very often and
leaves certain gaps on the timescale of monitoring. In order to cover this gap
EpMon component is capable to send GET requests to the given URLs relying on the
API discovery of the OpenStack cloud (perform GET request to /servers or the
compute endpoint). Such requests are cheap and can be performed in the loop i.e.
every 5 seconds. Latency of those calls, as well as the return codes are being
compute endpoint). Such requests are cheap and can be performed in the loop, i.e.
every 5 seconds. Latency of those calls, as well as the return codes, are being
captured and sent to the metrics storage.

View File

@ -1,3 +1,5 @@
.. _working_with_logs:
=============================================
How To Read The Logs And Understand The Issue
=============================================
@ -24,7 +26,7 @@ accessed from multiple locations:
.. image:: faq_images/dashboard_log_links.jpg
The logs contain whole ansible playbook output and help to analyse the problem
The logs contain whole ansible playbook output and help to analyze the problem
in detail.
For example following log detail describes the failed scenario for ECS deployment::

View File

@ -13,7 +13,7 @@ field can include links to other systems with more detail.
In Cloudmon Dashboards annotations are used to show the JIRA change issue types
which change the transition from SCHEDULED to IN EXECUTION. This helps to
identify if some JIRA change has negative impact on platform in real time. The
annotations contain several fields which help to corelate the platform behaviour
annotations contain several fields which help to correlate the platform behavior
with the respective change directly on the dashboard:
- JIRA Change issue ID

View File

@ -34,8 +34,8 @@ ApiMon Architecture Summary
`Github <https://github.com/opentelekomcloud-infra/apimon-test>`_.
- EpMon executes various HTTP query requests towards service endpoints and
generates statistsic
- Scheduler fetches the latest playbooks from repo and puts them in
generates statistics
- Scheduler fetches the latest playbooks from repo and puts them in a
queue to run in a endless loop.
- Executor is running the playbooks from queue and capturing the metrics
- The ansible playbook results generates the metrics (duration, result).
@ -69,8 +69,8 @@ ApiMon comes with the following features:
- internal (OTC)
- external (vCloud)
- Alerts agregated in Alerta and notifications sent to zulip
- Various dasbhoards
- Alerts aggregated in Alerta and notifications sent to zulip
- Various dashboards
- KPI dashboards
- 24/7 squad dashboards
@ -102,7 +102,7 @@ possible):
- No synthetic workloads: The service is not simulating any workloads (for
example a benchmark suite) on the provisioned resources. Instead it measures
and reports only if APIs are available and return expected results with an
expected behaviour.
expected behavior.
- No every single API monitoring .The API-Monitoring focuses on basic API
functionality of selected components. It doesn't cover every single API call
available in OTC API product portfolio.

View File

@ -9,7 +9,7 @@ Logs
- Every single job run log is stored on OpenStack Swift object storage.
- Each single job log file provides unique URL which can be accessed to see log
details
- These URLs are available on all APIMON levels:
- These URLs are available on all ApiMon levels:
- In Zulip alarm messages
- In Alerta events
@ -38,3 +38,9 @@ Logs
2020-07-12 05:54:48.505906 | TASK [Delete SecurityGroup]
2020-07-12 05:54:50.727174 | localhost | changed
2020-07-12 05:54:50.745541 |
For further details how to work with logs please refer to :ref:`How To Read The
Logs And Understand The Issue
<working_with_logs>` FAQ page.

View File

@ -4,7 +4,7 @@
Metrics
=======
The ansible playbook scenarios generate metrics in two ways:
The Ansible playbook scenarios generate metrics in two ways:
- The Ansible playbook internally invokes method calls to **OpenStack SDK
libraries.** They in turn generate metrics about each API call they do. This
@ -41,16 +41,17 @@ The ansible playbook scenarios generate metrics in two ways:
Custom metrics:
In some situations more complex metric generation is required which consists of
execution of multiple tasks in scenario. For such cases the tags parameter is
execution of multiple tasks in scenario. For such cases, the tags parameter is
used. Once the specific tasks in playbook are tagged with some specific metric
name the metrics are calculated as sum of all executed tasks with respective
tag. It's useful in cases where measured metric contains multiple steps to
achieve the desired state of service or service resource. For example boot up of
virtual machine from deployment until succesfull login via SSH.
tag. It's useful in cases where the measured metric contains multiple steps to
achieve the desired state of service or service resource. For example, boot up of
virtual machine from deployment until successful login via SSH.
.. code-block::
tags: ["metric=delete_server"]
tags: ["az={{ availability_zone }}", "service=compute", "metric=create_server{{ metric_suffix }}"]
More details how to query metrics from databases are described on :ref:`Metric databases <metric_databases>` page.
More details how to query metrics from databases are described on :ref:`Metric
databases <metric_databases>` page.

View File

@ -2,15 +2,15 @@
Notifications
=============
Zulip as officialt OTC communication channels supports API interface for pushing
the notifications from ApiMon to various zulip streams:
Zulip as official OTC communication channel supports API interface for pushing
the notifications from ApiMon to various Zulip streams:
- #Alerts Stream
- #Alerts-Hybrid Stream
- #Alerts-Preprod Stream
Every stream contains topics based on the service type (if represented by
standalone ansible playbook) and general apimon_endpoint_monitor topic whihc
standalone Ansible playbook) and general apimon_endpoint_monitor topic which
contains alerts of GET queries towards all services.
If the error has been acknowledged on Alerta, the new notification message for

File diff suppressed because it is too large Load Diff