Nils Magnus 6e2da0d05c review of training material
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Nils Magnus <magnus@linuxtag.org>
Co-committed-by: Nils Magnus <magnus@linuxtag.org>
2023-10-12 18:02:41 +00:00

3.7 KiB

Endpoint Monitoring overview

EpMon is a standalone Python based process targeting every OTC service. It looks up the services from the service catalog and sends GET requests to the configured endpoints.

While performing extensive tests like provisioning a server provides a great coverage and deep insights, it is a rather expensive and complex activity. It can only be performed every so often and leaves certain gaps on the timescale of monitoring. To cover this gap, the EpMon plugin sends GET-requests to a list of URLs endpoints discovered from the OTC service catalog and augmented by simple paths like /server. Such requests are cheap and can be sent in a loop, i.e. every five seconds. The latency and the HTTP status code of those calls are captured, stored in a time series database, and further processed by the Metric Processor.

Currently EpMon configuration is located in the project stackmon-config:

https://github.com/opentelekomcloud-infra/stackmon-config/blob/main/epmon/config.yaml

It defines the query HTTP targets (urls) for every single OTC service.

An entry in the OTC service catalog is a prerequisite to enable service to be queried by EpMon:

https://git.tsi-dev.otc-service.com/ecosystem/service_catalog

If there are multiple entries in service catalog, such service entries can be marked for skip in case they are obsolete. EpMon config.yaml only defines the service queries but doesn't say how and when to use them. For actual use across different monitoring sources and targets the configuration matrix is defined in:

https://github.com/opentelekomcloud-infra/stackmon-config/blob/main/config.yaml

The following example configures the autoscaling service (as) in EpMon and adds four paths to the service endpoint (three URL path for the Swisscloud):

as:
  service_type: as
  sdk_proxy: auto_scaling
  urls:
    - /
    - /scaling_group
    - /scaling_configuration
    - /scaling_policy
as_swiss:
  service_type: as
  sdk_proxy: auto_scaling
  urls:
    - /
    - /scaling_group
    - /scaling_configuration
as_skip_v1:
  service_type: asv1
  urls: []

There are three separate items defined for the autoscaling service:

  • The as entry is the default. It is used for the public OTC regions.
  • The as_swiss entry defines the specific settings for the Swisscloud.
  • The as_skip_v1 entry is entry to be skipped from EpMon.

By default all entries in service catalog are triggered for EpMon.

The mandatory parameter for all entries is service_type. This has to match the service_type entry in the OTC service catalog.

Another important parameter is sdk_proxy. This attribute identifies which otcextension module should be used for the execution of HTTP GET-queries.

The most important parameter is urls. It defines a list of URLs which EpMon triggers for this service. As service_type is known then not full url is required to be defined but only required is its path which appears after the predefined url from the OTC service catalog.

If some specific service (or some specific service version) is should be skipped from endpoint monitoring, the value of the urls key has to be set to the empty list in the EpMon configuration file. This ensures that even default queries from service catalog are overwritten by the empty list in this configuration. In this example, the service type asv1 (entry from the OTC service catalog) is not being triggered by EpMon at all as it contains an empty urls list.

Collected response codes and response times are sent to the Graphite time series database for further processing by the Metrics Processor.