docsportal/doc/source/internal/sd2_training/monitoring_coverage.rst
Nils Magnus 6e2da0d05c review of training material
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Nils Magnus <magnus@linuxtag.org>
Co-committed-by: Nils Magnus <magnus@linuxtag.org>
2023-10-12 18:02:41 +00:00

5.6 KiB

Monitoring coverage

While monitoring the cloud services of the OTC (which we call monitoring environments) is convenient and effective most of the time, it is obvious that in corner cases the servers performing the actual monitoring (which we call monitoring zones) should include also externa zones. Who monitors whom (and how) can be configured in a matrix definition:

https://github.com/opentelekomcloud-infra/stackmon-config/blob/main/config.yaml

Monitoring Environments

These targets are covered by the SD2 monitoring setup and are displayed in separate tabs (or on separate pages for the Swisscloud):

  • eu-de,
  • eu-nl, and
  • eu-ch2 (Swisscloud).

Monitoring Zones

From these zones the monitoring probes are sent to the targets: * Inside OTC (eu-de, eu-ch2) * Outside OTC (Swisscloud)

Scope of monitoring

The SD2 is a special application of the more generic Stackmon project and utilizes several plugins to collect its metrics:

  • HTTP-GET queries are sent to service API endpoints:
    • applies to all services from the service catalog,
    • multiple GET queries may be configured per service.
  • Static Resources
    • not yet implemented in SD2 (projected for 1Q2024),
    • specific services,
    • availability of the resource or resource functionality.
  • Global resources
    • not yet implemented in SD2 (projected for 2024),
    • OTC console,
    • OTC helpcenter,
    • OTC community portal,
    • OTC public website.

Example configuration of the monitoring matrix and covered services:

# Mapping of environments to test projects
- env: production_eu-de
  monitoring_zone: eu-de
  db_entry: apimon.apimon
  plugins:
    - name: apimon
      schedulers_inventory_group_name: schedulers
      executors_inventory_group_name: executors
      tests_project: apimon
      tasks:
        - scenario1_token.yaml
    - name: epmon
      epmon_inventory_group_name: epmon_de
      cloud_name: production_eu-de # env in zone has few creds. We need to pick one
      config_elements:
        - antiddos
        - antiddos_skip_bad_type
        - as
        - as_skip_v1
        - bms_skip
        - cce_skip_unver
        - cce
        - ces
        - ces_skip_v1
        - compute
        - css
        - cts_skip_unver
        - cts
        - data_protect_skip
        - database_skip
        - dcs
        - dcs_skip_v1
        - dds
        - deh
        - dis_skip_unver
        - dis
        - dms
        - dms_skip_v2
        - dns
        - dws
        - dws_skip_v1
        - identity
        - image
        - kms_skip_unver
        - kms
        - mrs
        - nat
        - network
        - object_skip
        - object_store
        - orchestration
        - rds_skip_unver
        - rds_skip_v1
        - rds
        - sdrs
        - sfsturbo
        - share
        - smn
        - smn_skip_v2
        - volume_skip_v2
        - volume
- env: production_eu-nl
  monitoring_zone: eu-de
  db_entry: apimon.apimon
  plugins:
    - name: apimon
      schedulers_inventory_group_name: schedulers
      executors_inventory_group_name: executors
      #epmons_inventory_group_name: epmons
      tests_project: apimon
      tasks:
        - scenario1_token.yaml
    - name: epmon
      epmon_inventory_group_name: epmon_de
      cloud_name: production_eu-nl # env in zone has few creds. We need to pick one
      config_elements:
        - antiddos
        - antiddos_skip_bad_type
        - as
        - as_skip_v1
        - bms_skip
        - cce_skip_unver
        - cce
        - ces
        - ces_skip_v1
        - compute
        - css
        - cts_skip_unver
        - cts
        - data_protect_skip
        - database_skip
        - dcs
        - dcs_skip_v1
        - dds
        - deh
        - dis_skip_unver
        - dis
        - dms
        - dms_skip_v2
        - dns
        - dws
        - dws_skip_v1
        - identity
        - image
        - kms_skip_unver
        - kms
        - mrs
        - nat
        - network
        - object_skip
        - object_store
        - orchestration
        - rds_skip_unver
        - rds_skip_v1
        - rds
        - sdrs
        - sfsturbo
        - share
        - smn
        - smn_skip_v2
        - volume_skip_v2
        - volume

Note that Service Managers or Engineers usually don't need to touch this configuration. Details should be negotiated with Platform Engineers.

The attribute env defines the target for monitoring (which region is to be monitored). The attribute monitoring_zone defines the source of monitoring (from which region the monitoring will be triggered).

Note that this configuration covers not only SD2 component, but also the even more generic Stackmon framework. It is plugin based so additional plugins can be added. Currently two plugins are enabled:

  • apimon
  • epmon

Apimon plugin triggers scenario-based Ansible playbooks which simulate the customer use-cases including also creation of resources (POST requests). Currently only one scenario is enabled for token authorization (scenario1_token.yaml). As the SD2 only evaluates the HTTP GET metrics other scenarios are not yet enabled. Playbooks are stored on GitHub at:

https://github.com/stackmon/apimon-tests/tree/main/playbooks

The EpMon plugin defines which service entries are used in which specific environment. Services not present in an environment won't have entry in this config as well, respectively.