Configuring Resource Groups

Resource Group Introduction

The resource group mechanism controls the overall query load of the instance from the perspective of resource allocation and implements queuing policies for queries. Multiple resource groups can be created under a compute instance resource, and each submitted query is assigned to a specific resource group for execution. Before a resource group executes a new query, it checks whether the resource load of the current resource group exceeds the amount of resources allocated to it by the instance. If it is exceeded, new incoming queries are blocked, placed in a queue, or even rejected directly. However, the resource component does not cause the running query to fail.

Application Scenarios of Resource Groups

Resource groups are used to manage resources in compute instances. Different resource groups are allocated to different users and queries to isolate resources. This prevents a single user or query from exclusively occupying resources in the compute instance. In addition, the weight and priority of resource components can be configured to ensure that important tasks are executed first. Table 1 describes the typical application scenarios of resource groups.

Table 1 Typical application scenarios of resource groups

Typical Scenarios

Solution

As the number of business teams using the compute instance increases, there is no resource when a team's task becomes more important and does not want to execute a query.

Allocate a specified resource group to each team. Important tasks are assigned to resource groups with more resources. When the sum of the proportions of sub-resource groups is less than or equal to 100%, the resources of a queue cannot be preempted by other resource groups. This is similar to static resource allocation.

When the instance resource load is high, two users submit a query at the same time. At the beginning, both queries are queuing. When there are idle resources, the query of a specific user can be scheduled to obtain resources first.

Two users are allocated with different resource groups. Important tasks can be allocated to resource groups with higher weights or priorities. The scheduling policy is configured by schedulingPolicy. Different scheduling policies have different resource allocation sequences.

For ad hoc queries and batch queries, resources can be allocated more properly based on different SQL types.

You can match different resource groups for different query types, such as EXPLAIN, INSERT, SELECT, and DATA_DEFINITION, and allocate different resources to execute the query.

Enabling a Resource Group

When creating a compute instance, add custom configuration parameters to the resource-groups.json file. For details, see 3.e in Creating HetuEngine Compute Instances.

Resource Group Properties

For details about how to configure resource group attributes, see Table 2.

Table 2 Resource group properties

Configuration Item

Mandatory

Description

name

Yes

Resource group name

maxQueued

Yes

Maximum number of queued queries. When this threshold is reached, new queries will be rejected.

hardConcurrencyLimit

Yes

Maximum number of running queries.

softMemoryLimit

No

Maximum memory usage of a resource group. When the memory usage reaches this threshold, new tasks are queued. The value can be an absolute value (for example, 10 GB) or a percentage (for example, 10% of the cluster memory).

softCpuLimit

No

The CPU time that can be used in a period (see the cpuQuotaPeriod parameter in Global Attributes). You must also specify the hardCpuLimit parameter. When the threshold is reached, the CPU resources occupied by the query that occupies the maximum CPU resources in the resource group are reduced.

hardCpuLimit

No

Maximum CPU time that can be used in a period.

schedulingPolicy

No

The scheduling policy for a specific query from the queuing state to the running state

  • fair (default)

When multiple sub-resource groups in a resource group have queuing queries, the sub-resource groups obtain resources in turn based on the defined sequence. The query of the same sub-resource group obtains resources based on the first-come-first-executed rule.

  • weighted_fair

The schedulingWeight attribute is configured for each resource group that uses this policy. Each sub-resource group calculates a ratio: Number of queried sub-resource groups/Scheduling weight. A sub-resource group with a smaller ratio obtains resources first.

  • weighted

The default value is 1. A larger value of schedulingWeight indicates that resources are obtained earlier.

  • query_priority

All sub-resource groups must be set with query_priority. Resources are obtained in the sequence specified by query_priority.

schedulingWeight

No

Weight of the group. For details, see schedulingPolicy. The default value is 1.

jmxExport

No

If this parameter is set to true, group statistics are exported to the JMX for monitoring. The default value is false.

subGroups

No

Subgroup list

Selector Rules

The selector matches resource groups in sequence. The first matched resource group is used. Generally, you are advised to configure a default resource group. If no default resource group is configured and other resource group selector conditions are not met, the query will be rejected. For details about how to set selector rule parameters, see Table 3.

Table 3 Selector rules

Configuration Item

Mandatory

Description

user

No

Regular expression for matching the user name.

source

No

Data source to be matched with, such as JDBC, HBase, and Hive. For details, see the value of --source in Configuration of Selector Attributes.

queryType

No

Task types:

  • DATA_DEFINITION: indicates that you can modify, create, or delete the metadata of schemas, tables, and views, and manage the query of prepared statements, permissions, sessions, and transactions.
  • DELETE: indicates the DELETE queries.
  • DESCRIBE: indicates the DESCRIBE, DESCRIBE INPUT, DESCRIBE OUTPUT, and SHOW queries.
  • EXPLAIN: indicates the EXPLAIN queries.
  • INSERT: indicates the INSERT and CREATE TABLE AS queries.
  • SELECT: indicates the SELECT queries.

clientTags

No

Match client tag to be matched with. Each tag must be in the tag list of the task submitted by the user. For details, see the value of --client-tags in Configuration of Selector Attributes.

group

Yes

The resource group with running queries

Global Attributes

For details about how to configure global attributes, see Table 4.

Table 4 Global attributes

Configuration Item

Mandatory

Description

cpuQuotaPeriod

No

Time range during which the CPU quota takes effect. This parameter is used together with softCpuLimit and hardCpuLimit in Resource Group Properties.

Configuration of Selector Attributes

The data source name (source) can be set as follows:

The client tag (clientTags) can be configured as follows:

Configuration Example

Figure 1 Configuration example

As shown in Figure 1.

In the following example configuration, there are multiple resource groups, some of which are templates. HetuEngine administrators can use templates to dynamically build a resource group tree. For example, in the pipeline_${USER} group, ${USER} is the name of the user who submits a query. ${SOURCE} is also supported, which will be the source where a query is submitted later. You can also use custom variables in source expressions and user regular expressions.

The following is an example of a resource group selector:

"selectors": [{
	"user": "bob",
	"group": "admin"
},
{
	"source": ".*pipeline.*",
	"queryType": "DATA_DEFINITION",
	"group": "global.data_definition"
},
{
	"source": ".*pipeline.*",
	"group": "global.pipeline.pipeline_${USER}"
},
{
	"source": "jdbc#(?<toolname>.*)",
	"clientTags": ["hipri"],
	"group": "global.adhoc.bi-${toolname}.${USER}"
},
{
	"group": "global.adhoc.other.${USER}"
}]

There are four selectors that define which resource group to run the query:

These selectors work together to implement the following policies:

The description of the query match selector is as follows: