fixing wrong bullets in multiple places

This commit is contained in:
Hasko, Vladimir 2023-05-21 21:05:03 +00:00
parent efadfc75fd
commit 5d25748c82
6 changed files with 53 additions and 63 deletions

View File

@ -6,16 +6,17 @@ https://alerts.eco.tsi-dev.otc-service.com/
The authentication is centrally managed by LDAP.
- Alerta is a monitoring tool to integrate alerts from multiple sources.
- The alerts from different sources can be consolidated and de-duplicated.
- On ApiMon it is hosted on same instance as Grafana just listening on
different port.
- The Zulip API was integrated with Alerta, to send notification of
errors/alerts on zulip stream.
- Alerts displayed on OTC Alerta are generated either by Executor or by
Grafana.
- “Executor alerts” focus on playbook results, whether playbook has completed or failed.
- “Grafana alerts” focus on breaching the defined thresholds. For example API response time is higher than defined threshold.
- Alerta is a monitoring tool to integrate alerts from multiple sources.
- The alerts from different sources can be consolidated and de-duplicated.
- On ApiMon it is hosted on same instance as Grafana just listening on
different port.
- The Zulip API was integrated with Alerta, to send notification of
errors/alerts on zulip stream.
- Alerts displayed on OTC Alerta are generated either by Executor or by
Grafana.
- “Executor alerts” focus on playbook results, whether playbook has completed or failed.
- “Grafana alerts” focus on breaching the defined thresholds. For example API response time is higher than defined threshold.
.. image:: training_images/alerta_dashboard.png

View File

@ -57,7 +57,6 @@ Counters and timers have following subbranches:
- apimon.metric → specific apimon metrics not gathered by the OpenStack API
methods
- openstack.api → pure API request metrics
Every section has further following branches:
@ -81,13 +80,10 @@ OpenStack metrics branch is structured as following:
- response code - received response code
- count/upper/lower/mean/etc - timer specific metrics (available only under stats.timers.openstack.api.$environment.$zone.$service.$request_method.$resource.$status_code.{count,mean,upper,*})
- count/rate - counter specific metrics (available only under stats.counters.openstack.api.$environment.$zone.$service.$request_method.$resource.$status_code.{count,mean,upper,*})
- attempted - counter for the attempted requests (only for counters)
- failed - counter of failed requests (not received response, connection problems, etc) (only for counters)
- passed - counter of requests receiving any response back (only for counters)
@ -106,15 +102,10 @@ apimon.metric
- stats.timers.apimon.metric.$environment.$zone.**csm_lb_timings**.{public,private}.{http,https,tcp}.$az.__VALUE__ - timer values for the loadbalancer test
- stats.counters.apimon.metric.$environment.$zone.**csm_lb_timings**.{public,private}.{http,https,tcp}.$az.{attempted,passed,failed} - counter values for the loadbalancer test
- stats.timers.apimon.metric.$environment.$zone.**curl**.$host.{passed,failed}.__VALUE__ - timer values for the curl test
- stats.counters.apimon.metric.$environment.$zone.**curl**.$host.{attempted,passed,failed} - counter values for the curl test
- stats.timers.apimon.metric.$environment.$zone.**dns**.$ns_name.$host - timer values for the NS lookup test. $ns_name is the DNS servers used to query the records
- stats.counters.apimon.metric.$environment.$zone.**dns**.$ns_name.$host.{attempted,passed,failed} - counter values for the NS lookup test

View File

@ -79,7 +79,7 @@ ApiMon comes with the following features:
- Each squad can control and manage their test scenarios and dashboards
- Every execution of ansible playbooks stores the log file for further
investigation/analysis on swift object storage
investigation/analysis on swift object storage
What ApiMon is NOT

View File

@ -6,14 +6,16 @@ Logs
- Every single job run log is stored on OpenStack Swift object storage.
- Each single job log file provides unique URL which can be accessed to see log
details
- These URLs are available on all APIMON levels:
- In Zulip alarm messages
- In Alerta events
- In Grafana Dashboards
- Logs are simple plain text files of the whole playbook output::
- Every single job run log is stored on OpenStack Swift object storage.
- Each single job log file provides unique URL which can be accessed to see log
details
- These URLs are available on all APIMON levels:
- In Zulip alarm messages
- In Alerta events
- In Grafana Dashboards
- Logs are simple plain text files of the whole playbook output::
2020-07-12 05:54:04.661170 | TASK [List Servers]
2020-07-12 05:54:09.050491 | localhost | ok

View File

@ -6,34 +6,37 @@ Metrics
The ansible playbook scenarios generate metrics in two ways:
- The Ansible playbook internally invokes method calls to **OpenStack SDK
libraries.** They in turn generate metrics about each API call they do. This
requires some special configuration in the clouds.yaml file (currently
exposing metrics into statsd and InfluxDB is supported). For details refer
to the [config
documentation](https://docs.openstack.org/openstacksdk/latest/user/guides/stats.html)
of the OpenStack SDK. The following metrics are captured:
- response HTTP code
- duration of API call
- name of API call
- method of API call
- service type
- Ansible plugins may **expose additional metrics** (i.e. whether the overall
scenario succeed or not) with help of [callback
plugin](https://github.com/stackmon/apimon/tree/main/apimon/ansible/callback).
Since sometimes it is not sufficient to know only the timings of each API
call, Ansible callbacks are utilized to report overall execution time and
result (whether the scenario succeeded and how long it took). The following
metrics are captured:
- test case
- playbook name
- environment
- action name
- result code
- result string
- service type
- state type
- total amount of (failed, passed, ignored, skipped tests)
- The Ansible playbook internally invokes method calls to **OpenStack SDK
libraries.** They in turn generate metrics about each API call they do. This
requires some special configuration in the clouds.yaml file (currently
exposing metrics into statsd and InfluxDB is supported). For details refer
to the `config
documentation <https://docs.openstack.org/openstacksdk/latest/user/guides/stats.html>`_
of the OpenStack SDK. The following metrics are captured:
- response HTTP code
- duration of API call
- name of API call
- method of API call
- service type
- Ansible plugins may **expose additional metrics** (i.e. whether the overall
scenario succeed or not) with help of `callback
plugin <https://github.com/stackmon/apimon/tree/main/apimon/ansible/callback>`_.
Since sometimes it is not sufficient to know only the timings of each API
call, Ansible callbacks are utilized to report overall execution time and
result (whether the scenario succeeded and how long it took). The following
metrics are captured:
- test case
- playbook name
- environment
- action name
- result code
- result string
- service type
- state type
- total amount of (failed, passed, ignored, skipped tests)
Custom metrics:

View File

@ -66,7 +66,6 @@ ensure sustainability of the endless exceution of such scenarios:
`Openstack.Cloud
<https://docs.ansible.com/ansible/latest/collections/openstack/cloud/index.html>`_
collections for native interaction with cloud in ansible.
- In case there are features not supported by collection you can still use
script module and call directly python SDK script to invoke required request
towards cloud
@ -80,7 +79,6 @@ ensure sustainability of the endless exceution of such scenarios:
- Make sure that deletion / cleanup of the resources is triggered even if some
of the tasks in playbooks will fail
- Make sure that deletion / cleanup is triggered in right order
- **Simplicity**
@ -92,10 +90,8 @@ ensure sustainability of the endless exceution of such scenarios:
- ApiMon is not supposed to validate full service functionality. For such
cases we have different team / framework within QA responsibility
- Focus only on core functions which are critical for basic operation /
lifecycle of the service.
- The less functions you use the less potential failure rate you will have on
runnign scenario for whatever reasons
@ -103,7 +99,6 @@ ensure sustainability of the endless exceution of such scenarios:
- Every single hardcoded parameter in scenario will later lead to potential
outage of the scenario's run in future when such parameter might change
- Try to obtain all such parameters dynamically from the cloud directly.
- **Special tags for combined metrics**
@ -112,7 +107,6 @@ ensure sustainability of the endless exceution of such scenarios:
metric you can do with using tags parameter in the tasks
Custom metrics in Test Scenarios
================================
@ -196,4 +190,3 @@ In following example the custom metric stores the result of multiple tasks in sp
command: "ssh -o 'UserKnownHostsFile=/dev/null' -o 'StrictHostKeyChecking=no' linux@{{ server_ip }} -i ~/.ssh/{{ test_keypair_name }}.pem"
tags: ["az=default", "service=compute", "metric=create_server"]