forked from docs/docsportal
update based on review
This commit is contained in:
parent
f7ad5765ac
commit
e7301444c6
@ -4,7 +4,7 @@ Alerts
|
|||||||
|
|
||||||
Alerta is the component of the ApiMon that is designed to integrate alerts
|
Alerta is the component of the ApiMon that is designed to integrate alerts
|
||||||
from multiple sources. It supports many different standard sources like Syslog,
|
from multiple sources. It supports many different standard sources like Syslog,
|
||||||
SNMP, Prometheus, Nagios, Zabbix, etc. Additioanlly any other type of source
|
SNMP, Prometheus, Nagios, Zabbix, etc. Additionally any other type of source
|
||||||
using URL request or command line can be integrated as well.
|
using URL request or command line can be integrated as well.
|
||||||
|
|
||||||
Native functions like correlation and de-duplication help to manage thousands of
|
Native functions like correlation and de-duplication help to manage thousands of
|
||||||
@ -12,10 +12,10 @@ alerts in transparent way and consolidate alerts in proper categories based on
|
|||||||
environment, service, resource, failure type, etc.
|
environment, service, resource, failure type, etc.
|
||||||
|
|
||||||
Alerta is hosted on https://alerts.eco.tsi-dev.otc-service.com/ .
|
Alerta is hosted on https://alerts.eco.tsi-dev.otc-service.com/ .
|
||||||
The authentication is centrally managed by LDAP.
|
The authentication is centrally managed by OTC LDAP.
|
||||||
|
|
||||||
The Zulip API was integrated with Alerta, to send notification of errors/alerts
|
The Zulip API was integrated with Alerta, to send notification of errors/alerts
|
||||||
on zulip stream.
|
on Zulip stream.
|
||||||
|
|
||||||
Alerts displayed on OTC Alerta are generated either by Executor, Scheduler,
|
Alerts displayed on OTC Alerta are generated either by Executor, Scheduler,
|
||||||
EpMon or by Grafana.
|
EpMon or by Grafana.
|
||||||
|
File diff suppressed because it is too large
Load Diff
@ -75,7 +75,7 @@ OpenStack metrics branch is structured as following:
|
|||||||
|
|
||||||
- request method (GET/POST/DELETE/PUT)
|
- request method (GET/POST/DELETE/PUT)
|
||||||
|
|
||||||
- resource (service resource, i.e. server, keypair, volume, etc). Subresources are joined with "_" (i.e. cluster_nodes)
|
- resource (service resource, i.e. server, keypair, volume, etc). Sub-resources are joined with "_" (i.e. cluster_nodes)
|
||||||
|
|
||||||
- response code - received response code
|
- response code - received response code
|
||||||
|
|
||||||
|
@ -8,6 +8,9 @@ Due to the ongoing transformation of ApiMon and integration to a more robust
|
|||||||
CloudMon there are two operation modes right now. Therefore it's important to
|
CloudMon there are two operation modes right now. Therefore it's important to
|
||||||
understand what is supported in which mode.
|
understand what is supported in which mode.
|
||||||
|
|
||||||
|
This pages aims to provide navigation links and understand the changes once the
|
||||||
|
transformation is completed and some of the locations will change.
|
||||||
|
|
||||||
The most important differences are described in the table below:
|
The most important differences are described in the table below:
|
||||||
|
|
||||||
+-----------------------+------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------+
|
+-----------------------+------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------+
|
||||||
|
@ -5,7 +5,7 @@ Endpoint Monitoring overview
|
|||||||
============================
|
============================
|
||||||
|
|
||||||
|
|
||||||
EpMon is a standalone python based process targetting every OTC service. Tt
|
EpMon is a standalone python based process targeting every OTC service. It
|
||||||
finds service in the service catalogs and sends GET requests to the configured
|
finds service in the service catalogs and sends GET requests to the configured
|
||||||
endpoints.
|
endpoints.
|
||||||
|
|
||||||
@ -14,8 +14,8 @@ coverage, but is usually not something what can be performed very often and
|
|||||||
leaves certain gaps on the timescale of monitoring. In order to cover this gap
|
leaves certain gaps on the timescale of monitoring. In order to cover this gap
|
||||||
EpMon component is capable to send GET requests to the given URLs relying on the
|
EpMon component is capable to send GET requests to the given URLs relying on the
|
||||||
API discovery of the OpenStack cloud (perform GET request to /servers or the
|
API discovery of the OpenStack cloud (perform GET request to /servers or the
|
||||||
compute endpoint). Such requests are cheap and can be performed in the loop i.e.
|
compute endpoint). Such requests are cheap and can be performed in the loop, i.e.
|
||||||
every 5 seconds. Latency of those calls, as well as the return codes are being
|
every 5 seconds. Latency of those calls, as well as the return codes, are being
|
||||||
captured and sent to the metrics storage.
|
captured and sent to the metrics storage.
|
||||||
|
|
||||||
|
|
||||||
|
@ -1,3 +1,5 @@
|
|||||||
|
.. _working_with_logs:
|
||||||
|
|
||||||
=============================================
|
=============================================
|
||||||
How To Read The Logs And Understand The Issue
|
How To Read The Logs And Understand The Issue
|
||||||
=============================================
|
=============================================
|
||||||
@ -24,7 +26,7 @@ accessed from multiple locations:
|
|||||||
.. image:: faq_images/dashboard_log_links.jpg
|
.. image:: faq_images/dashboard_log_links.jpg
|
||||||
|
|
||||||
|
|
||||||
The logs contain whole ansible playbook output and help to analyse the problem
|
The logs contain whole ansible playbook output and help to analyze the problem
|
||||||
in detail.
|
in detail.
|
||||||
For example following log detail describes the failed scenario for ECS deployment::
|
For example following log detail describes the failed scenario for ECS deployment::
|
||||||
|
|
||||||
|
@ -13,7 +13,7 @@ field can include links to other systems with more detail.
|
|||||||
In Cloudmon Dashboards annotations are used to show the JIRA change issue types
|
In Cloudmon Dashboards annotations are used to show the JIRA change issue types
|
||||||
which change the transition from SCHEDULED to IN EXECUTION. This helps to
|
which change the transition from SCHEDULED to IN EXECUTION. This helps to
|
||||||
identify if some JIRA change has negative impact on platform in real time. The
|
identify if some JIRA change has negative impact on platform in real time. The
|
||||||
annotations contain several fields which help to corelate the platform behaviour
|
annotations contain several fields which help to correlate the platform behavior
|
||||||
with the respective change directly on the dashboard:
|
with the respective change directly on the dashboard:
|
||||||
|
|
||||||
- JIRA Change issue ID
|
- JIRA Change issue ID
|
||||||
|
@ -34,8 +34,8 @@ ApiMon Architecture Summary
|
|||||||
`Github <https://github.com/opentelekomcloud-infra/apimon-test>`_.
|
`Github <https://github.com/opentelekomcloud-infra/apimon-test>`_.
|
||||||
|
|
||||||
- EpMon executes various HTTP query requests towards service endpoints and
|
- EpMon executes various HTTP query requests towards service endpoints and
|
||||||
generates statistsic
|
generates statistics
|
||||||
- Scheduler fetches the latest playbooks from repo and puts them in
|
- Scheduler fetches the latest playbooks from repo and puts them in a
|
||||||
queue to run in a endless loop.
|
queue to run in a endless loop.
|
||||||
- Executor is running the playbooks from queue and capturing the metrics
|
- Executor is running the playbooks from queue and capturing the metrics
|
||||||
- The ansible playbook results generates the metrics (duration, result).
|
- The ansible playbook results generates the metrics (duration, result).
|
||||||
@ -69,8 +69,8 @@ ApiMon comes with the following features:
|
|||||||
- internal (OTC)
|
- internal (OTC)
|
||||||
- external (vCloud)
|
- external (vCloud)
|
||||||
|
|
||||||
- Alerts agregated in Alerta and notifications sent to zulip
|
- Alerts aggregated in Alerta and notifications sent to zulip
|
||||||
- Various dasbhoards
|
- Various dashboards
|
||||||
|
|
||||||
- KPI dashboards
|
- KPI dashboards
|
||||||
- 24/7 squad dashboards
|
- 24/7 squad dashboards
|
||||||
@ -102,7 +102,7 @@ possible):
|
|||||||
- No synthetic workloads: The service is not simulating any workloads (for
|
- No synthetic workloads: The service is not simulating any workloads (for
|
||||||
example a benchmark suite) on the provisioned resources. Instead it measures
|
example a benchmark suite) on the provisioned resources. Instead it measures
|
||||||
and reports only if APIs are available and return expected results with an
|
and reports only if APIs are available and return expected results with an
|
||||||
expected behaviour.
|
expected behavior.
|
||||||
- No every single API monitoring .The API-Monitoring focuses on basic API
|
- No every single API monitoring .The API-Monitoring focuses on basic API
|
||||||
functionality of selected components. It doesn't cover every single API call
|
functionality of selected components. It doesn't cover every single API call
|
||||||
available in OTC API product portfolio.
|
available in OTC API product portfolio.
|
||||||
|
@ -9,7 +9,7 @@ Logs
|
|||||||
- Every single job run log is stored on OpenStack Swift object storage.
|
- Every single job run log is stored on OpenStack Swift object storage.
|
||||||
- Each single job log file provides unique URL which can be accessed to see log
|
- Each single job log file provides unique URL which can be accessed to see log
|
||||||
details
|
details
|
||||||
- These URLs are available on all APIMON levels:
|
- These URLs are available on all ApiMon levels:
|
||||||
|
|
||||||
- In Zulip alarm messages
|
- In Zulip alarm messages
|
||||||
- In Alerta events
|
- In Alerta events
|
||||||
@ -38,3 +38,9 @@ Logs
|
|||||||
2020-07-12 05:54:48.505906 | TASK [Delete SecurityGroup]
|
2020-07-12 05:54:48.505906 | TASK [Delete SecurityGroup]
|
||||||
2020-07-12 05:54:50.727174 | localhost | changed
|
2020-07-12 05:54:50.727174 | localhost | changed
|
||||||
2020-07-12 05:54:50.745541 |
|
2020-07-12 05:54:50.745541 |
|
||||||
|
|
||||||
|
|
||||||
|
For further details how to work with logs please refer to :ref:`How To Read The
|
||||||
|
Logs And Understand The Issue
|
||||||
|
<working_with_logs>` FAQ page.
|
||||||
|
|
||||||
|
@ -4,7 +4,7 @@
|
|||||||
Metrics
|
Metrics
|
||||||
=======
|
=======
|
||||||
|
|
||||||
The ansible playbook scenarios generate metrics in two ways:
|
The Ansible playbook scenarios generate metrics in two ways:
|
||||||
|
|
||||||
- The Ansible playbook internally invokes method calls to **OpenStack SDK
|
- The Ansible playbook internally invokes method calls to **OpenStack SDK
|
||||||
libraries.** They in turn generate metrics about each API call they do. This
|
libraries.** They in turn generate metrics about each API call they do. This
|
||||||
@ -41,16 +41,17 @@ The ansible playbook scenarios generate metrics in two ways:
|
|||||||
Custom metrics:
|
Custom metrics:
|
||||||
|
|
||||||
In some situations more complex metric generation is required which consists of
|
In some situations more complex metric generation is required which consists of
|
||||||
execution of multiple tasks in scenario. For such cases the tags parameter is
|
execution of multiple tasks in scenario. For such cases, the tags parameter is
|
||||||
used. Once the specific tasks in playbook are tagged with some specific metric
|
used. Once the specific tasks in playbook are tagged with some specific metric
|
||||||
name the metrics are calculated as sum of all executed tasks with respective
|
name the metrics are calculated as sum of all executed tasks with respective
|
||||||
tag. It's useful in cases where measured metric contains multiple steps to
|
tag. It's useful in cases where the measured metric contains multiple steps to
|
||||||
achieve the desired state of service or service resource. For example boot up of
|
achieve the desired state of service or service resource. For example, boot up of
|
||||||
virtual machine from deployment until succesfull login via SSH.
|
virtual machine from deployment until successful login via SSH.
|
||||||
|
|
||||||
.. code-block::
|
.. code-block::
|
||||||
|
|
||||||
tags: ["metric=delete_server"]
|
tags: ["metric=delete_server"]
|
||||||
tags: ["az={{ availability_zone }}", "service=compute", "metric=create_server{{ metric_suffix }}"]
|
tags: ["az={{ availability_zone }}", "service=compute", "metric=create_server{{ metric_suffix }}"]
|
||||||
|
|
||||||
More details how to query metrics from databases are described on :ref:`Metric databases <metric_databases>` page.
|
More details how to query metrics from databases are described on :ref:`Metric
|
||||||
|
databases <metric_databases>` page.
|
||||||
|
@ -2,15 +2,15 @@
|
|||||||
Notifications
|
Notifications
|
||||||
=============
|
=============
|
||||||
|
|
||||||
Zulip as officialt OTC communication channels supports API interface for pushing
|
Zulip as official OTC communication channel supports API interface for pushing
|
||||||
the notifications from ApiMon to various zulip streams:
|
the notifications from ApiMon to various Zulip streams:
|
||||||
|
|
||||||
- #Alerts Stream
|
- #Alerts Stream
|
||||||
- #Alerts-Hybrid Stream
|
- #Alerts-Hybrid Stream
|
||||||
- #Alerts-Preprod Stream
|
- #Alerts-Preprod Stream
|
||||||
|
|
||||||
Every stream contains topics based on the service type (if represented by
|
Every stream contains topics based on the service type (if represented by
|
||||||
standalone ansible playbook) and general apimon_endpoint_monitor topic whihc
|
standalone Ansible playbook) and general apimon_endpoint_monitor topic which
|
||||||
contains alerts of GET queries towards all services.
|
contains alerts of GET queries towards all services.
|
||||||
|
|
||||||
If the error has been acknowledged on Alerta, the new notification message for
|
If the error has been acknowledged on Alerta, the new notification message for
|
||||||
|
File diff suppressed because it is too large
Load Diff
Loading…
x
Reference in New Issue
Block a user