tischrei 0618989a8a hc_ops
Reviewed-by: Gode, Sebastian <sebastian.gode@t-systems.com>
Co-authored-by: tischrei <tino.schreiber@t-systems.com>
Co-committed-by: tischrei <tino.schreiber@t-systems.com>
2024-02-22 14:55:55 +00:00

200 lines
7.9 KiB
ReStructuredText

.. _test_scenarios:
==============
Test Scenarios
==============
The Executor role of each API-Monitoring environment is responsible for
executing individual jobs (scenarios). Those can be defined as Ansible playbooks
(what allow them to be pretty much anything) or any other executable form (as
python script). With Ansible on it's own having nearly limitless capability and
availability to execute anything else ApiMon can do pretty much anything. The
only expectation is that whatever is being done produces some form of metric for
further analysis and evaluation. Otherwise there is no sense in monitoring. The
scenarios are collected in a `Github
<https://github.com/opentelekomcloud-infra/apimon-tests>`_ and updated in
real-time. In general mentioned test jobs do not need take care of generating
data implicitly. Since the API related tasks in the playbooks rely on the Python
OpenStack SDK (and its OTC extensions), metric data generated automatically by a
logging interface of the SDK ('openstack_api' metrics). Those metrics are
collected by statsd and stored to :ref:`graphite TSDB <metric_databases>`.
Additionally metric data are generated also by executor service which collects
the playbook names, results and duration time ('ansible_stats' metrics) and
stores them to :ref:`postgresql relational database <metric_databases>`.
The playbooks with monitoring scenarios are stored in separate repository on
`Github <https://github.com/opentelekomcloud-infra/apimon-tests>`_ (the location
will change with CloudMon replacement in `future
<https://stackmon.github.io/>`_). Playbooks address the most common use cases
with cloud services conducted by end customers.
The metrics generated by Executor are described on :ref:`Metric
Definitions <metrics_definition>` page.
In addition to metrics generated and captured by a playbook ApiMon also captures
:ref:`stdout of the execution <logs>`. and saves this log for additional
analysis to OpenStack Swift storage where logs are being uploaded there with a
configurable retention policy.
New Test Scenario introduction
==============================
As already mentioned playbook scenarios are stored in separate repository on
`Github <https://github.com/opentelekomcloud-infra/apimon-tests>`_. Due to the
fact that we have various environments which differ between each other by
location, supported services, different flavors, etc it's required to have
monitoring configuration matrix which defines the monitoring standard and scope
for each environment. Therefore to enable playbook in some of the monitored
environments (PROD EU-DE, EU-NL, PREPROD, Swisscloud) further update is required
in the `monitoring matrix
<https://github.com/opentelekomcloud-infra/system-config/blob/main/inventory/service/group_vars/apimon.yaml>`_.
This will be also matter of change in future once `StackMon
<https://stackmon.github.io/>`_ will take place.
Rules for Test Scenarios
========================
Ansible playbooks need to follow some basic regression testing principles to
ensure sustainability of the endless execution of such scenarios:
- **OpenTelekomCloud and OpenStack collection**
- When developing test scenarios use available `Opentelekomcloud.Cloud
<https://docs.otc-service.com/ansible-collection-cloud/>`_ or
`Openstack.Cloud
<https://docs.ansible.com/ansible/latest/collections/openstack/cloud/index.html>`_
collections for native interaction with cloud in ansible.
- In case there are features not supported by collection you can still use
script module and call directly python SDK script to invoke required request
towards cloud
- **Unique names of resources**
- Make sure that resources don't conflict with each other and are easily
trackable by its unique name
- **Teardown of the resources**
- Make sure that deletion / cleanup of the resources is triggered even if some
of the tasks in playbooks will fail
- Make sure that deletion / cleanup is triggered in right order
- **Simplicity**
- Do not over-complicate test scenario. Use default auto-filled parameters
wherever possible
- **Only basic / core functions in scope of testing**
- ApiMon is not supposed to validate full service functionality. For such
cases we have different team / framework within QA responsibility
- Focus only on core functions which are critical for basic operation /
lifecycle of the service.
- The less functions you use the less potential failure rate you will have on
running scenario for whatever reasons
- **No hardcoding**
- Every single hardcoded parameter in scenario will later lead to potential
outage of the scenario's run in future when such parameter might change
- Try to obtain all such parameters dynamically from the cloud directly.
- **Special tags for combined metrics**
- In case that you want to combine multiple tasks in playbook in single custom
metric you can do with using tags parameter in the tasks
Custom metrics in Test Scenarios
================================
OpenStack SDK and otcextensions (otcextensions covers services which are out of
scope of OpenStack SDK and extends its functionality with services provided by
OTC) support metric generation natively for every single API call and ApiMon
executor supports collection of ansible playbook statistics so every single
scenario and task can store its result, duration and name in metric database.
But in some cases there's a need to provide measurement on multiple tasks which
represent some important aspect of the customer use case. For example measure
the time and overall result from the VM deployment until successful login via
SSH. Single task results are stored as metrics in metric database but it would
be complicated to transfer processing logic of metrics on grafana. Therefore
tags feature on task level introduces possibility to address custom metrics.
In following example (snippet from `scenario2_simple_ece.yaml
<https://github.com/opentelekomcloud-infra/apimon-tests/blob/master/playbooks/scenario2_simple_ecs.yaml>`_)
custom metric stores the result of multiple tasks in special metric name
create_server::
- name: Create Server in default AZ
openstack.cloud.server:
auto_ip: false
name: "{{ test_server_fqdn }}"
image: "{{ test_image }}"
flavor: "{{ test_flavor }}"
key_name: "{{ test_keypair_name }}"
network: "{{ test_network_name }}"
security_groups: "{{ test_security_group_name }}"
tags:
- "metric=create_server"
- "az=default"
register: server
- name: get server id
set_fact:
server_id: "{{ server.id }}"
- name: Attach FIP
openstack.cloud.floating_ip:
server: "{{ server_id }}"
tags:
- "metric=create_server"
- "az=default"
- name: get server info
openstack.cloud.server_info:
server: "{{ server_id }}"
register: server
tags:
- "metric=create_server"
- "az=default"
- set_fact:
server_ip: "{{ server['openstack_servers'][0]['public_v4'] }}"
tags:
- "metric=create_server"
- "az=default"
- name: find servers by name
openstack.cloud.server_info:
server: "{{ test_server_fqdn }}"
register: servers
tags:
- "metric=create_server"
- "az=default"
- name: Debug server info
debug:
var: servers
# Wait for the server to really start and become accessible
- name: Wait for SSH port to become active
wait_for:
port: 22
host: "{{ server_ip }}"
timeout: 600
tags: ["az=default", "service=compute", "metric=create_server"]
- name: Try connecting
retries: 10
delay: 1
command: "ssh -o 'UserKnownHostsFile=/dev/null' -o 'StrictHostKeyChecking=no' linux@{{ server_ip }} -i ~/.ssh/{{ test_keypair_name }}.pem"
tags: ["az=default", "service=compute", "metric=create_server"]