This section takes the statements in TPC-DS (Q24) as an example to describe how to optimize an execution plan by using hints in 1000X+24DN environments. For example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | select avg(netpaid) from (select c_last_name ,c_first_name ,s_store_name ,ca_state ,s_state ,i_color ,i_current_price ,i_manager_id ,i_units ,i_size ,sum(ss_sales_price) netpaid from store_sales ,store_returns ,store ,item ,customer ,customer_address where ss_ticket_number = sr_ticket_number and ss_item_sk = sr_item_sk and ss_customer_sk = c_customer_sk and ss_item_sk = i_item_sk and ss_store_sk = s_store_sk and c_birth_country = upper(ca_country) and s_zip = ca_zip and s_market_id=7 group by c_last_name ,c_first_name ,s_store_name ,ca_state ,s_state ,i_color ,i_current_price ,i_manager_id ,i_units ,i_size); |
In this plan, the performance of the layer-10 broadcast is poor because the estimation result generated at layer 11 is 2140 rows, which is much less than the actual number of rows. The inaccurate estimation is mainly caused by the underestimated number of rows in layer-13 hash join. In this layer, store_sales and store_returns are joined (based on the ss_ticket_number and ss_item_sk columns in store_sales and the sr_ticket_number and sr_item_sk columns in store_returns) but the multi-column correlation is not considered.
1 2 | select avg(netpaid) from (select /*+rows(store_sales store_returns * 11270)*/ c_last_name ... |
The execution takes a longer time because layer-9 redistribute is slow. Considering that data skew does not occur at layer-9 redistribute, the slow redistribution is caused by the slow layer-8 hashjoin due to data skew at layer-18 redistribute.
1 2 3 4 | select avg(netpaid) from (select /*+rows(store_sales store_returns *11270) leading((store_sales store_returns store item customer) customer_address)*/ c_last_name ... |
Most of the time is spent on layer-6 redistribute. The plan needs to be further optimized.
1 2 3 4 | select avg(netpaid) from (select /*+rows(store_sales store_returns *11270) leading((customer_address (store_sales store_returns store customer) item)) c_last_name ... |
Data skew occurs after the join of item and customer_address because item is broadcasted at layer-22. As a result, layer-6 redistribute is still slow.
1 2 3 4 5 | select avg(netpaid) from (select /*+rows(store_sales store_returns *11270) leading((customer_address (store_sales store_returns store customer) item)) no broadcast(item)*/ c_last_name ... |
If the query performance deteriorates due to statistics changes, you can use hints to optimize the query plan. Take TPCH-Q17 as an example. The query performance deteriorates after the value of default_statistics_target is changed from the default one to –2 for statistics collection.