forked from docs/docsportal
fixing wrong bullets in multiple places
This commit is contained in:
parent
efadfc75fd
commit
5d25748c82
@ -6,16 +6,17 @@ https://alerts.eco.tsi-dev.otc-service.com/
|
||||
|
||||
The authentication is centrally managed by LDAP.
|
||||
|
||||
- Alerta is a monitoring tool to integrate alerts from multiple sources.
|
||||
- The alerts from different sources can be consolidated and de-duplicated.
|
||||
- On ApiMon it is hosted on same instance as Grafana just listening on
|
||||
different port.
|
||||
- The Zulip API was integrated with Alerta, to send notification of
|
||||
errors/alerts on zulip stream.
|
||||
- Alerts displayed on OTC Alerta are generated either by Executor or by
|
||||
Grafana.
|
||||
- “Executor alerts” focus on playbook results, whether playbook has completed or failed.
|
||||
- “Grafana alerts” focus on breaching the defined thresholds. For example API response time is higher than defined threshold.
|
||||
- Alerta is a monitoring tool to integrate alerts from multiple sources.
|
||||
- The alerts from different sources can be consolidated and de-duplicated.
|
||||
- On ApiMon it is hosted on same instance as Grafana just listening on
|
||||
different port.
|
||||
- The Zulip API was integrated with Alerta, to send notification of
|
||||
errors/alerts on zulip stream.
|
||||
- Alerts displayed on OTC Alerta are generated either by Executor or by
|
||||
Grafana.
|
||||
|
||||
- “Executor alerts” focus on playbook results, whether playbook has completed or failed.
|
||||
- “Grafana alerts” focus on breaching the defined thresholds. For example API response time is higher than defined threshold.
|
||||
|
||||
.. image:: training_images/alerta_dashboard.png
|
||||
|
||||
|
@ -57,7 +57,6 @@ Counters and timers have following subbranches:
|
||||
|
||||
- apimon.metric → specific apimon metrics not gathered by the OpenStack API
|
||||
methods
|
||||
|
||||
- openstack.api → pure API request metrics
|
||||
|
||||
Every section has further following branches:
|
||||
@ -81,13 +80,10 @@ OpenStack metrics branch is structured as following:
|
||||
- response code - received response code
|
||||
|
||||
- count/upper/lower/mean/etc - timer specific metrics (available only under stats.timers.openstack.api.$environment.$zone.$service.$request_method.$resource.$status_code.{count,mean,upper,*})
|
||||
|
||||
- count/rate - counter specific metrics (available only under stats.counters.openstack.api.$environment.$zone.$service.$request_method.$resource.$status_code.{count,mean,upper,*})
|
||||
|
||||
- attempted - counter for the attempted requests (only for counters)
|
||||
|
||||
- failed - counter of failed requests (not received response, connection problems, etc) (only for counters)
|
||||
|
||||
- passed - counter of requests receiving any response back (only for counters)
|
||||
|
||||
|
||||
@ -106,15 +102,10 @@ apimon.metric
|
||||
|
||||
|
||||
- stats.timers.apimon.metric.$environment.$zone.**csm_lb_timings**.{public,private}.{http,https,tcp}.$az.__VALUE__ - timer values for the loadbalancer test
|
||||
|
||||
- stats.counters.apimon.metric.$environment.$zone.**csm_lb_timings**.{public,private}.{http,https,tcp}.$az.{attempted,passed,failed} - counter values for the loadbalancer test
|
||||
|
||||
- stats.timers.apimon.metric.$environment.$zone.**curl**.$host.{passed,failed}.__VALUE__ - timer values for the curl test
|
||||
|
||||
- stats.counters.apimon.metric.$environment.$zone.**curl**.$host.{attempted,passed,failed} - counter values for the curl test
|
||||
|
||||
- stats.timers.apimon.metric.$environment.$zone.**dns**.$ns_name.$host - timer values for the NS lookup test. $ns_name is the DNS servers used to query the records
|
||||
|
||||
- stats.counters.apimon.metric.$environment.$zone.**dns**.$ns_name.$host.{attempted,passed,failed} - counter values for the NS lookup test
|
||||
|
||||
|
||||
|
@ -79,7 +79,7 @@ ApiMon comes with the following features:
|
||||
|
||||
- Each squad can control and manage their test scenarios and dashboards
|
||||
- Every execution of ansible playbooks stores the log file for further
|
||||
investigation/analysis on swift object storage
|
||||
investigation/analysis on swift object storage
|
||||
|
||||
|
||||
What ApiMon is NOT
|
||||
|
@ -6,14 +6,16 @@ Logs
|
||||
|
||||
|
||||
|
||||
- Every single job run log is stored on OpenStack Swift object storage.
|
||||
- Each single job log file provides unique URL which can be accessed to see log
|
||||
details
|
||||
- These URLs are available on all APIMON levels:
|
||||
- In Zulip alarm messages
|
||||
- In Alerta events
|
||||
- In Grafana Dashboards
|
||||
- Logs are simple plain text files of the whole playbook output::
|
||||
- Every single job run log is stored on OpenStack Swift object storage.
|
||||
- Each single job log file provides unique URL which can be accessed to see log
|
||||
details
|
||||
- These URLs are available on all APIMON levels:
|
||||
|
||||
- In Zulip alarm messages
|
||||
- In Alerta events
|
||||
- In Grafana Dashboards
|
||||
|
||||
- Logs are simple plain text files of the whole playbook output::
|
||||
|
||||
2020-07-12 05:54:04.661170 | TASK [List Servers]
|
||||
2020-07-12 05:54:09.050491 | localhost | ok
|
||||
|
@ -6,34 +6,37 @@ Metrics
|
||||
|
||||
The ansible playbook scenarios generate metrics in two ways:
|
||||
|
||||
- The Ansible playbook internally invokes method calls to **OpenStack SDK
|
||||
libraries.** They in turn generate metrics about each API call they do. This
|
||||
requires some special configuration in the clouds.yaml file (currently
|
||||
exposing metrics into statsd and InfluxDB is supported). For details refer
|
||||
to the [config
|
||||
documentation](https://docs.openstack.org/openstacksdk/latest/user/guides/stats.html)
|
||||
of the OpenStack SDK. The following metrics are captured:
|
||||
- response HTTP code
|
||||
- duration of API call
|
||||
- name of API call
|
||||
- method of API call
|
||||
- service type
|
||||
- Ansible plugins may **expose additional metrics** (i.e. whether the overall
|
||||
scenario succeed or not) with help of [callback
|
||||
plugin](https://github.com/stackmon/apimon/tree/main/apimon/ansible/callback).
|
||||
Since sometimes it is not sufficient to know only the timings of each API
|
||||
call, Ansible callbacks are utilized to report overall execution time and
|
||||
result (whether the scenario succeeded and how long it took). The following
|
||||
metrics are captured:
|
||||
- test case
|
||||
- playbook name
|
||||
- environment
|
||||
- action name
|
||||
- result code
|
||||
- result string
|
||||
- service type
|
||||
- state type
|
||||
- total amount of (failed, passed, ignored, skipped tests)
|
||||
- The Ansible playbook internally invokes method calls to **OpenStack SDK
|
||||
libraries.** They in turn generate metrics about each API call they do. This
|
||||
requires some special configuration in the clouds.yaml file (currently
|
||||
exposing metrics into statsd and InfluxDB is supported). For details refer
|
||||
to the `config
|
||||
documentation <https://docs.openstack.org/openstacksdk/latest/user/guides/stats.html>`_
|
||||
of the OpenStack SDK. The following metrics are captured:
|
||||
|
||||
- response HTTP code
|
||||
- duration of API call
|
||||
- name of API call
|
||||
- method of API call
|
||||
- service type
|
||||
|
||||
- Ansible plugins may **expose additional metrics** (i.e. whether the overall
|
||||
scenario succeed or not) with help of `callback
|
||||
plugin <https://github.com/stackmon/apimon/tree/main/apimon/ansible/callback>`_.
|
||||
Since sometimes it is not sufficient to know only the timings of each API
|
||||
call, Ansible callbacks are utilized to report overall execution time and
|
||||
result (whether the scenario succeeded and how long it took). The following
|
||||
metrics are captured:
|
||||
|
||||
- test case
|
||||
- playbook name
|
||||
- environment
|
||||
- action name
|
||||
- result code
|
||||
- result string
|
||||
- service type
|
||||
- state type
|
||||
- total amount of (failed, passed, ignored, skipped tests)
|
||||
|
||||
Custom metrics:
|
||||
|
||||
|
@ -66,7 +66,6 @@ ensure sustainability of the endless exceution of such scenarios:
|
||||
`Openstack.Cloud
|
||||
<https://docs.ansible.com/ansible/latest/collections/openstack/cloud/index.html>`_
|
||||
collections for native interaction with cloud in ansible.
|
||||
|
||||
- In case there are features not supported by collection you can still use
|
||||
script module and call directly python SDK script to invoke required request
|
||||
towards cloud
|
||||
@ -80,7 +79,6 @@ ensure sustainability of the endless exceution of such scenarios:
|
||||
|
||||
- Make sure that deletion / cleanup of the resources is triggered even if some
|
||||
of the tasks in playbooks will fail
|
||||
|
||||
- Make sure that deletion / cleanup is triggered in right order
|
||||
|
||||
- **Simplicity**
|
||||
@ -92,10 +90,8 @@ ensure sustainability of the endless exceution of such scenarios:
|
||||
|
||||
- ApiMon is not supposed to validate full service functionality. For such
|
||||
cases we have different team / framework within QA responsibility
|
||||
|
||||
- Focus only on core functions which are critical for basic operation /
|
||||
lifecycle of the service.
|
||||
|
||||
- The less functions you use the less potential failure rate you will have on
|
||||
runnign scenario for whatever reasons
|
||||
|
||||
@ -103,7 +99,6 @@ ensure sustainability of the endless exceution of such scenarios:
|
||||
|
||||
- Every single hardcoded parameter in scenario will later lead to potential
|
||||
outage of the scenario's run in future when such parameter might change
|
||||
|
||||
- Try to obtain all such parameters dynamically from the cloud directly.
|
||||
|
||||
- **Special tags for combined metrics**
|
||||
@ -112,7 +107,6 @@ ensure sustainability of the endless exceution of such scenarios:
|
||||
metric you can do with using tags parameter in the tasks
|
||||
|
||||
|
||||
|
||||
Custom metrics in Test Scenarios
|
||||
================================
|
||||
|
||||
@ -196,4 +190,3 @@ In following example the custom metric stores the result of multiple tasks in sp
|
||||
command: "ssh -o 'UserKnownHostsFile=/dev/null' -o 'StrictHostKeyChecking=no' linux@{{ server_ip }} -i ~/.ssh/{{ test_keypair_name }}.pem"
|
||||
tags: ["az=default", "service=compute", "metric=create_server"]
|
||||
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user