fixing other format issues

This commit is contained in:
Hasko, Vladimir 2023-05-20 23:10:05 +00:00
parent 2e341f4a19
commit 40423a0364
10 changed files with 42 additions and 56 deletions

View File

@ -51,7 +51,7 @@ of the specific service.
24/7 Mission control squads uses CloudMon, ApiMon and EpMon metrics and present 24/7 Mission control squads uses CloudMon, ApiMon and EpMon metrics and present
them on their own customized dashboards which are fullfilling their them on their own customized dashboards which are fullfilling their
requirements. requirements.
https://dashboard.tsi-dev.otc-service.com/d/eBQoZU0nk/overview?orgId=1&refresh=1m https://dashboard.tsi-dev.otc-service.com/d/eBQoZU0nk/overview?orgId=1&refresh=1m
@ -98,7 +98,7 @@ Service Based Dashboard
======================= =======================
The dashboad provides deeper insight in single service with tailored views, The dashboad provides deeper insight in single service with tailored views,
graphs and tables to address the service major functionalities abd specifics. graphs and tables to address the service major functionalities abd specifics.
https://dashboard.tsi-dev.otc-service.com/d/APImonCompute/compute-service-statistics?orgId=1 https://dashboard.tsi-dev.otc-service.com/d/APImonCompute/compute-service-statistics?orgId=1

View File

@ -8,7 +8,7 @@ Metrics are stored in 2 different database types:
- Graphite time series database - Graphite time series database
- Postgresql relational database - Postgresql relational database
Graphite Graphite
======== ========
@ -61,8 +61,8 @@ Counters and timers have following subbranches:
Every section has further following branches: Every section has further following branches:
- environment name (production_regA, production_regB, etc) - environment name (production_regA, production_regB, etc)
- monitoring location (production_regA, awx) - specification of the environment from which the metric is gathered - monitoring location (production_regA, awx) - specification of the environment from which the metric is gathered
openstack.api openstack.api
@ -70,27 +70,35 @@ openstack.api
OpenStack metrics branch is structured as following: OpenStack metrics branch is structured as following:
- service (normally service_type from the service catalog, but sometimes differs slightly) - service (normally service_type from the service catalog, but sometimes differs slightly)
- request method (GET/POST/DELETE/PUT)
- resource (service resource, i.e. server, keypair, volume, etc). Subresources are joined with "_" (i.e. cluster_nodes) - request method (GET/POST/DELETE/PUT)
- response code - received response code
- count/upper/lower/mean/etc - timer specific metrics (available only under stats.timers.openstack.api.$environment.$zone.$service.$request_method.$resource.$status_code.{count,mean,upper,*}) - resource (service resource, i.e. server, keypair, volume, etc). Subresources are joined with "_" (i.e. cluster_nodes)
- count/rate - counter specific metrics (available only under stats.counters.openstack.api.$environment.$zone.$service.$request_method.$resource.$status_code.{count,mean,upper,*})
- attempted - counter for the attempted requests (only for counters) - response code - received response code
- failed - counter of failed requests (not received response, connection problems, etc) (only for counters)
- passed - counter of requests receiving any response back (only for counters) - count/upper/lower/mean/etc - timer specific metrics (available only under stats.timers.openstack.api.$environment.$zone.$service.$request_method.$resource.$status_code.{count,mean,upper,*})
- count/rate - counter specific metrics (available only under stats.counters.openstack.api.$environment.$zone.$service.$request_method.$resource.$status_code.{count,mean,upper,*})
- attempted - counter for the attempted requests (only for counters)
- failed - counter of failed requests (not received response, connection problems, etc) (only for counters)
- passed - counter of requests receiving any response back (only for counters)
apimon.metric apimon.metric
------------- -------------
- metric name (i.e. create_cce_cluster, delete_volume_eu-de-01, etc) - complex metrics branch - metric name (i.e. create_cce_cluster, delete_volume_eu-de-01, etc) - complex metrics branch
- attempted/failed/failedignored/passed/skipped - counters for the corresponding operation results (this branch element represents status of the corresponding ansible task) - attempted/failed/failedignored/passed/skipped - counters for the corresponding operation results (this branch element represents status of the corresponding ansible task)
- $az - some metrics would have availability zone for the operation on that level. Since this info is not always available this is a varying path - $az - some metrics would have availability zone for the operation on that level. Since this info is not always available this is a varying path
- curl - subtree for the curl type of metrics - curl - subtree for the curl type of metrics
- $name - short name of the host to be checked - $name - short name of the host to be checked
- stats.timers.apimon.metric.$environment.$zone.**csm_lb_timings**.{public,private}.{http,https,tcp}.$az.__VALUE__ - timer values for the loadbalancer test - stats.timers.apimon.metric.$environment.$zone.**csm_lb_timings**.{public,private}.{http,https,tcp}.$az.__VALUE__ - timer values for the loadbalancer test
- stats.counters.apimon.metric.$environment.$zone.**csm_lb_timings**.{public,private}.{http,https,tcp}.$az.{attempted,passed,failed} - counter values for the loadbalancer test - stats.counters.apimon.metric.$environment.$zone.**csm_lb_timings**.{public,private}.{http,https,tcp}.$az.{attempted,passed,failed} - counter values for the loadbalancer test
- stats.timers.apimon.metric.$environment.$zone.**curl**.$host.{passed,failed}.__VALUE__ - timer values for the curl test - stats.timers.apimon.metric.$environment.$zone.**curl**.$host.{passed,failed}.__VALUE__ - timer values for the curl test
@ -128,4 +136,4 @@ These queries are used mainly on Test Results dashboard and Service specific sta
+-------------------------------+-------------------------------------------------------------------------------------------------------------+ +-------------------------------+-------------------------------------------------------------------------------------------------------------+
.. image:: training_images/postgresql_query.jpg .. image:: training_images/postgresql_query.jpg

View File

@ -37,4 +37,4 @@ detected error codes or no responses at all.
.. image:: training_images/epmon_dashboard_details.jpg .. image:: training_images/epmon_dashboard_details.jpg
EpMon findings are also reported to Alerta and notifications are sent to Zulip EpMon findings are also reported to Alerta and notifications are sent to Zulip
dedicated topic "apimon_endpoint_monitoring". dedicated topic "apimon_endpoint_monitoring".

View File

@ -1,6 +1,6 @@
============================ ============================
How Can I Access Dashboard ? How Can I Access Dashboard ?
============================ ============================
OTC LDAP authentication is supported on OTC LDAP authentication is supported on
https://dashboard.tsi-dev.otc-service.com. https://dashboard.tsi-dev.otc-service.com.

View File

@ -8,4 +8,3 @@ Frequently Asked Questions
how_can_i_access_dashboard how_can_i_access_dashboard
how_to_read_the_logs_and_understand_the_issue how_to_read_the_logs_and_understand_the_issue
what_are_the_annotations what_are_the_annotations

View File

@ -18,6 +18,6 @@ with the respective change directly on the dashboard:
- JIRA Change issue ID - JIRA Change issue ID
- Impacted Availability Zone - Impacted Availability Zone
- Affected Environment - Affected Environment
- Main component - Main component
- Summary - Summary

View File

@ -9,7 +9,7 @@ available to them via the Internet. While internal monitoring checks on the OTC
backplane are necessary, they are not sufficient to detect failures that backplane are necessary, they are not sufficient to detect failures that
manifest in the interface, network connectivity, or the API logic itself. Also manifest in the interface, network connectivity, or the API logic itself. Also
helpful, but not sufficient are simple HTTP requests to the REST endpoints and helpful, but not sufficient are simple HTTP requests to the REST endpoints and
checking for 200 status codes. checking for 200 status codes.
The ApiMon is Open Telekom Cloud product developed by The ApiMon is Open Telekom Cloud product developed by
Ecosystem squad. Ecosystem squad.

View File

@ -8,50 +8,29 @@ Logs
- Each single job log file provides unique URL which can be accessed to see log - Each single job log file provides unique URL which can be accessed to see log
details details
- These URLs are available on all APIMON levels: - These URLs are available on all APIMON levels:
- In Zulip alarm messages - In Zulip alarm messages
- In Alerta events - In Alerta events
- In Grafana Dashboards - In Grafana Dashboards
- Logs are simple plain text files of the whole playbook output:: - Logs are simple plain text files of the whole playbook output::
2020-07-12 05:54:04.661170 | TASK [List Servers] 2020-07-12 05:54:04.661170 | TASK [List Servers]
2020-07-12 05:54:09.050491 | localhost | ok 2020-07-12 05:54:09.050491 | localhost | ok
2020-07-12 05:54:09.067582 | TASK [Create Server in default AZ] 2020-07-12 05:54:09.067582 | TASK [Create Server in default AZ]
2020-07-12 05:54:46.055650 | localhost | MODULE FAILURE: 2020-07-12 05:54:46.055650 | localhost | MODULE FAILURE:
2020-07-12 05:54:46.055873 | localhost | Traceback (most recent call last): 2020-07-12 05:54:46.055873 | localhost | Traceback (most recent call last):
2020-07-12 05:54:46.057441 | localhost | 2020-07-12 05:54:46.057441 | localhost |
2020-07-12 05:54:46.057499 | localhost | During handling of the above exception, another exception occurred: 2020-07-12 05:54:46.057499 | localhost | During handling of the above exception, another exception occurred:
2020-07-12 05:54:46.057535 | localhost | 2020-07-12 05:54:46.057535 | localhost |
2020-07-12 05:54:46.063992 | localhost | File "/tmp/ansible_os_server_payload_uz1c7_iw/ansible_os_server_payload.zip/ansible/modules/cloud/openstack/os_server.py", line 500, in _create_server 2020-07-12 05:54:46.063992 | localhost | File "/tmp/ansible_os_server_payload_uz1c7_iw/ansible_os_server_payload.zip/ansible/modules/cloud/openstack/os_server.py", line 500, in _create_server
2020-07-12 05:54:46.065152 | localhost | return self._send_request( 2020-07-12 05:54:46.065152 | localhost | return self._send_request(
2020-07-12 05:54:46.065186 | localhost | File "/root/.local/lib/python3.8/site-packages/keystoneauth1/session.py", line 1020, in _send_request 2020-07-12 05:54:46.065186 | localhost | File "/root/.local/lib/python3.8/site-packages/keystoneauth1/session.py", line 1020, in _send_request
2020-07-12 05:54:46.065334 | localhost | raise exceptions.ConnectFailure(msg) 2020-07-12 05:54:46.065334 | localhost | raise exceptions.ConnectFailure(msg)
2020-07-12 05:54:46.065378 | localhost | keystoneauth1.exceptions.connection.ConnectFailure: Unable to establish connection to https://ims.eu-de.otctest.t-systems.com/v2/images: ('Connection aborted.', OSError(107, 'Transport endpoint is not connected')) 2020-07-12 05:54:46.065378 | localhost | keystoneauth1.exceptions.connection.ConnectFailure: Unable to establish connection to https://ims.eu-de.otctest.t-systems.com/v2/images: ('Connection aborted.', OSError(107, 'Transport endpoint is not connected'))
2020-07-12 05:54:46.295035 | 2020-07-12 05:54:46.295035 |
2020-07-12 05:54:46.295241 | TASK [Delete server] 2020-07-12 05:54:46.295241 | TASK [Delete server]
2020-07-12 05:54:48.481374 | localhost | ok 2020-07-12 05:54:48.481374 | localhost | ok
2020-07-12 05:54:48.505761 | 2020-07-12 05:54:48.505761 |
2020-07-12 05:54:48.505906 | TASK [Delete SecurityGroup] 2020-07-12 05:54:48.505906 | TASK [Delete SecurityGroup]
2020-07-12 05:54:50.727174 | localhost | changed 2020-07-12 05:54:50.727174 | localhost | changed
2020-07-12 05:54:50.745541 | 2020-07-12 05:54:50.745541 |

View File

@ -46,6 +46,6 @@ achieve the desired state of service or service resource. For example boot up of
virtual machine from deployment until succesfull login via SSH. virtual machine from deployment until succesfull login via SSH.
.. code-block:: .. code-block::
tags: ["metric=delete_server"] tags: ["metric=delete_server"]
tags: ["az={{ availability_zone }}", "service=compute", "metric=create_server{{ metric_suffix }}"] tags: ["az={{ availability_zone }}", "service=compute", "metric=create_server{{ metric_suffix }}"]

View File

@ -48,4 +48,4 @@ Monitoring dashboards
* KPI dashboards * KPI dashboards
* 24/7 dashboards * 24/7 dashboards
* Test results dashboards * Test results dashboards
* Specific service dashboards * Specific service dashboards