Optimizing Statement Pushdown

Statement Pushdown

Currently, the GaussDB(DWS) optimizer can use three methods to develop statement execution policies in the distributed framework: generating a statement pushdown plan, a distributed execution plan, or a distributed execution plan for sending statements.

The third policy sends many intermediate results from the DNs to a CN for further execution. In this case, the CN performance bottleneck (in bandwidth, storage, and computing) is caused by statements that cannot be pushed down to DNs. Therefore, you are not advised to use the query statements that only the third policy is applicable to.

Statements cannot be pushed down to DNs if they have Functions That Do Not Support Pushdown or Syntax That Does Not Support Pushdown. Generally, you can rewrite the execution statements to solve the problem.

Viewing Whether the Execution Plan Has Been Pushed Down to DNs

Perform the following procedure to quickly determine whether the execution plan can be pushed down to DNs:

  1. Set the GUC parameter enable_fast_query_shipping to off to use the distributed framework policy for the query optimizer.

    1
    SET enable_fast_query_shipping = off;
    

  2. View the execution plan.

    If the execution plan contains Data Node Scan, the SQL statements cannot be pushed down to DNs. If the execution plan contains Streaming, the SQL statements can be pushed down to DNs.

    For example:

    1
    2
    3
    4
    5
    select
    count(ss.ss_sold_date_sk order by ss.ss_sold_date_sk)c1 
    from store_sales ss, store_returns sr 
    where 
    sr.sr_customer_sk = ss.ss_customer_sk;
    

    The execution plan is as follows, which indicates that the SQL statement cannot be pushed down.

                                  QUERY PLAN
    --------------------------------------------------------------------------
    Aggregate
    ->  Hash Join
    Hash Cond: (ss.ss_customer_sk = sr.sr_customer_sk)
    ->  Data Node Scan on store_sales "_REMOTE_TABLE_QUERY_"
    Node/s: All datanodes
    ->  Hash
    ->  Data Node Scan on store_returns "_REMOTE_TABLE_QUERY_"
    Node/s: All datanodes
    (8 rows)

Syntax That Does Not Support Pushdown

SQL syntax that does not support pushdown is described using the following table definition examples:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
CREATE TABLE CUSTOMER1
(
    C_CUSTKEY     BIGINT NOT NULL
  , C_NAME        VARCHAR(25) NOT NULL
  , C_ADDRESS     VARCHAR(40) NOT NULL
  , C_NATIONKEY   INT NOT NULL
  , C_PHONE       CHAR(15) NOT NULL
  , C_ACCTBAL     DECIMAL(15,2)   NOT NULL
  , C_MKTSEGMENT  CHAR(10) NOT NULL
  , C_COMMENT     VARCHAR(117) NOT NULL
)
DISTRIBUTE BY hash(C_CUSTKEY);
CREATE TABLE test_stream(a int, b float);--float does not support redistribution.
CREATE TABLE sal_emp ( c1 integer[] ) DISTRIBUTE BY replication;

Functions That Do Not Support Pushdown

This module describes the variability of functions. The function variability in GaussDB(DWS) is as follows:

The volatility of a function can be obtained by querying its provolatile column in pg_proc. The value i indicates immutable, s indicates stable, and v indicates volatile. The valid values of the proshippable column in pg_proc are t, f, and NULL. This column and the provolatile column together describe whether a function is pushed down.

For a UDF, you can specify the values of provolatile and proshippable during its creation. For details, see CREATE FUNCTION.

In scenarios where a function does not support pushdown, perform one of the following as required:

Example: UDF

Define a user-defined function that generates fixed output for a certain input as the immutable type.

Take the sales information of TPCDS as an example. If you want to write a function to calculate the discount data of a product, you can define the function as follows:

1
2
3
4
CREATE FUNCTION func_percent_2 (NUMERIC, NUMERIC) RETURNS NUMERIC
AS 'SELECT $1 / $2 WHERE $2 > 0.01'
LANGUAGE SQL
VOLATILE;

Run the following statement:

1
2
SELECT func_percent_2(ss_sales_price, ss_list_price)
FROM store_sales;

The execution plan is as follows:

func_percent_2 is not pushed down, and ss_sales_price and ss_list_price are executed on a CN. In this case, a large amount of resources on the CN is consumed, and the performance deteriorates as a result.

In this example, the function returns certain output when certain input is entered. Therefore, we can modify the function to the following one:

1
2
3
4
CREATE FUNCTION func_percent_1 (NUMERIC, NUMERIC) RETURNS NUMERIC
AS 'SELECT $1 / $2 WHERE $2 > 0.01'
LANGUAGE SQL
IMMUTABLE;

Run the following statement:

1
2
SELECT func_percent_1(ss_sales_price, ss_list_price)
FROM store_sales;

The execution plan is as follows:

func_percent_1 is pushed down to DNs for quicker execution. (In TPCDS 1000X, where three CNs and 18 DNs are used, the query efficiency is improved by over 100 times).

Example 2: Pushing Down the Sorting Operation

For details, see Case: Pushing Down Sort Operations to DNs.