PostgreSQL 은 특별합니다.
단순한 데이터베이스가 아니라, 하나의 어플리케이션 플랫폼이기 때문입니다.

공지사항

뉴스 및 기사

	총 게시물 177건, 최근 0 건

Scalar Subquery Optimization 2

글쓴이 : 모델광 날짜 : 2023-11-05 (일) 16:31 조회 : 1659

In an Oracle to PostgreSQL migration project, one of the most common SQL performance issues arises when we migrate a scalar subquery in the SELECT list. There are two advantages to using a scalar subquery in Oracle:

1) With the scalar subquery caching mechanism, Oracle avoids repetitive execution of the scalar subqery. Instead of re-evaluating the subquery for each row, Oracle caches the result of the subquery and uses the cached value for subsequent rows that reference the same subquery.

2) Oracle has a specific mechanism, what Koreans call "partial range processing", which processes up to the transport unit (ARRAY SIZE) initially, rather than the entire range that satisfies the condition in the WHERE clause. When the transport unit (the default value is 15 rows) is filled, Oracle sends the intermediate result set to the client. Therefore, the client can quickly receive the first 15 rows while the Oracle databasse is consuming its resources retrieving the remaing result values. To use this feature, app developers utilize scalar subquerys or induces a nested loop join even when a hash join may be more performant. A hash join can not use this feature.

Up to now, I haven't been able to find a solution in PostgreSQL that can replicate these two Oracle features. So when we migrate a scalar subquery, we have to consider the overall query performance rather than focusing on how quickly we retrieve the first 15 rows or similar optimizations.

Recetly, I was requested to tune a query which was blazinglly fast in Oracle, but not as fast as in PostgreSQL. Due to the architectural difference in PostgreSQL, I couldn't achieve the same level of performance as in Oracle, but I managed to improve the query's performance to some extent.

Here is the SQL to create the data, and a version of the query that was migrated from Oracle 12.2:

create table online_order (
ord_no        bigint not null,
cust_no       int not null,
ord_date       timestamp(0) not null,
ord_dt         varchar(8) not null,
ord_status_cd varchar(1) not null,
comment      varchar(100)
);

insert into online_order
select i, mod(i,1000000) as cust_no
        ,current_date - mod(i,1000) as ord_date
        ,to_char((current_date - mod(i,1000)),'yyyymmdd') as ord_dt
        ,(mod(i,4) + 1) as ord_status_cd
        ,lpad('x',100,'x')
from generate_series(1,2000000,2) a(i);

alter table online_order add constraint online_order_pk primary key (ord_no);
create index online_order_x02 on online_order(cust_no);

create table customer (
cust_no        numeric not null,
cust_nm        character varying(100),
register_date timestamp(0),
register_dt    varchar(8),
cust_status_cd varchar(1),
register_channel_cd varchar(1),
cust_age       smallint,
active_yn      varchar(1),
sigungu_cd     varchar(5),
sido_cd        varchar(2)
);

insert into customer
select i, chr(65+mod(i,26))||i::text||'CUST_NM'
     , current_date - mod(i,10000)
     , to_char((current_date - mod(i,10000)),'yyyymmdd') as register_dt
     , mod(i,5)+1 as cust_status_cd
     , mod(i,2)+1 as register_channel_cd
     , trunc(random() * 100) +1 as age
     , case when mod(i,22) = 0 then 'N' else 'Y' end as active_yn
     , case when mod(i,1000) = 0 then '11007'
            when mod(i,1000) = 1 then '11006'
            when mod(i,1000) = 2 then '11005'
            when mod(i,1000) = 3 then '11004'
            when mod(i,1000) = 4 then '11003'
            when mod(i,1000) = 5 then '11002'
            else '11001' end                  as sigungu_cd
      , case when mod(i,4) = 0 then '01'
             when mod(i,4) = 1 then '02'
             else                   '03' end as sido_cd
from generate_series(1,1000000) a(i);

SELECT A.CUST_NM

, (SELECT ORD_DATE

FROM ONLINE_ORDER

WHERE CUST_NO = A.CUST_NO

ORDER BY ORD_DATE DESC

FETCH NEXT 1 ROWS ONLY) AS MAX_ORD_DATE

, (SELECT ORD_STATUS_CD

FROM ONLINE_ORDER

WHERE CUST_NO = A.CUST_NO

ORDER BY ORD_DATE DESC

FETCH NEXT 1 ROWS ONLY) AS MAX_ORD_STATUS_CD

FROM CUSTOMER A
WHERE A.CUST_NO BETWEEN 1 AND 5000;

This query retrieves customer names along with the most recent order date and order status code for each customer whose CUST_NO falls between 1and 5000. It employes two subqueries to finde the most recent order informatioin for each customer individually.

Now take a look at the execution plan for the query which, obtained by running the EXPLAIN command, appears as follows:

Index Scan using customer_pk on customer a (actual time=0.054..128.329 rows=5000 loops=1)
Index Cond: ((cust_no >= 1) AND (cust_no <= 5000))
Buffers: shared hit=230072
SubPlan 1
    -> Limit (actual time=0.012..0.012 rows=1 loops=5000)
          Buffers: shared hit=115000
          -> Sort (actual time=0.012..0.012 rows=1 loops=5000)
                Sort Key: online_order.ord_date DESC
                Sort Method: top-N heapsort Memory: 25kB
                Buffers: shared hit=115000
                -> Bitmap Heap Scan on online_order (actual time=0.003..0.010 rows=20 loops=5000)
                      Recheck Cond: (cust_no = a.cust_no)
                      Heap Blocks: exact=100000
                      Buffers: shared hit=115000
                      -> Bitmap Index Scan on online_order_x02 (actual time=0.002..0.002 rows=20 loops=5000)
                            Index Cond: (cust_no = a.cust_no)
                            Buffers: shared hit=15000
SubPlan 2
    -> Limit (actual time=0.012..0.012 rows=1 loops=5000)
          Buffers: shared hit=115000
          -> Sort (actual time=0.012..0.012 rows=1 loops=5000)
                Sort Key: online_order_1.ord_date DESC
                Sort Method: top-N heapsort Memory: 25kB
                Buffers: shared hit=115000
                -> Bitmap Heap Scan on online_order online_order_1 (actual time=0.003..0.009 rows=20 loops=5000)
                      Recheck Cond: (cust_no = a.cust_no)
                      Heap Blocks: exact=100000
                      Buffers: shared hit=115000
                      -> Bitmap Index Scan on online_order_x02 (actual time=0.001..0.001 rows=20 loops=5000)
                            Index Cond: (cust_no = a.cust_no)
                            Buffers: shared hit=15000
Planning:
Buffers: shared hit=8
Planning Time: 0.129 ms

Execution Time: 128.677 ms

What do you think is suboptimal in the plan?
The obious issue in the plan is that we are accessing ONLINE_ORDER twice. Let's rewrite the query to access the ONLINE_ORDER table only once. I have used the ARRAY operator in order to combine the two subqueries:

SELECT CUST_NM
     , AA[1] AS ord_date
     , AA[2] AS ORD_STATUS_CD
FROM (
       SELECT A.CUST_NM
          , (SELECT ARRAY[ORD_DATE::text,ORD_STATUS_CD]

FROM ONLINE_ORDER

WHERE CUST_NO = A.CUST_NO

ORDER BY ORD_DATE DESC FETCH NEXT 1 ROWS ONLY) AS AA

        FROM CUSTOMER A
        WHERE A.CUST_NO BETWEEN 1 AND 5000
        OFFSET 0
       ) FOO;

The ORD_DATE column should be converted to a text data type using the ::text syntax. Keep in mind that when we use the "||" operator, all values should not be null.
If any of the two values is null, then the concatenated resulting value becomes null.
Here is the execution plan from PostgreSQL 15.1.

Subquery Scan on foo (actual time=0.052..86.908 rows=5000 loops=1)
Buffers: shared hit=115072
-> Index Scan using customer_pk on customer a (actual time=0.050..85.758 rows=5000 loops=1)
        Index Cond: ((cust_no >= 1) AND (cust_no <= 5000))
        Buffers: shared hit=115072
        SubPlan 1
          -> Limit (actual time=0.017..0.017 rows=1 loops=5000)
                Buffers: shared hit=115000
                -> Sort (actual time=0.016..0.016 rows=1 loops=5000)
                      Sort Key: online_order.ord_date DESC
                      Sort Method: top-N heapsort Memory: 25kB
                      Buffers: shared hit=115000
                      -> Bitmap Heap Scan on online_order (actual time=0.003..0.014 rows=20 loops=5000)
                            Recheck Cond: (cust_no = a.cust_no)
                            Heap Blocks: exact=100000
                            Buffers: shared hit=115000
                            -> Bitmap Index Scan on online_order_x02 (actual time=0.002..0.002 rows=20 loops=5000)
                                  Index Cond: (cust_no = a.cust_no)
                                  Buffers: shared hit=15000
Planning:
Buffers: shared hit=8
Planning Time: 0.136 ms

Execution Time: 87.237 ms

Not e that the block I/O decreased from 230072 to 115072, and the elapsed time dropped from 128 ms to 87 ms. One of the flaws in the query is that it requires converting the data type of ORD_DATE to text to utilize the ARRAY operation. To eliminate the need for data type conversion, I have modified the query to use a CTID.

SELECT C.CUST_NM
     , D.ORD_DATE
     , D.ORD_STATUS_CD
FROM (
        SELECT A.CUST_NM
           , (SELECT CTID
               FROM ONLINE_ORDER
                WHERE CUST_NO = A.CUST_NO
              ORDER BY ORD_DATE DESC
              FETCH NEXT 1 ROWS ONLY) AS RID
      FROM CUSTOMER A
          WHERE A.CUST_NO BETWEEN 1 AND 5000
        ) C, ONLINE_ORDER D
WHERE C.RID = D.CTID

;

The corrected query extracts the CTID of the ONLINE_ORDE table in the subquery
and then uses a join to the ONLINE_ORDER table again to retrive ORD_DATE
and ORD_STATUS_CD values. This approach removes the need for data type
conversion and makes the query more efficient.

Here is the execution plan obtained:

Nested Loop (actual time=0.048..67.956 rows=5000 loops=1)
Buffers: shared hit=120072
-> Index Scan using customer_pk on customer a (actual time=0.010..0.913 rows=5000 loops=1)
        Index Cond: ((cust_no >= 1) AND (cust_no <= 5000))
        Buffers: shared hit=72
-> Tid Scan on online_order d (actual time=0.013..0.013 rows=1 loops=5000)
        TID Cond: (ctid = (SubPlan 1))
        Buffers: shared hit=120000
        SubPlan 1
          -> Limit (actual time=0.012..0.012 rows=1 loops=5000)
                Buffers: shared hit=115000
                -> Sort (actual time=0.012..0.012 rows=1 loops=5000)
                      Sort Key: online_order.ord_date DESC
                      Sort Method: top-N heapsort Memory: 25kB
                      Buffers: shared hit=115000
                      -> Bitmap Heap Scan on online_order (actual time=0.003..0.010 rows=20 loops=5000)
                            Recheck Cond: (cust_no = a.cust_no)
                            Heap Blocks: exact=100000
                            Buffers: shared hit=115000
                            -> Bitmap Index Scan on online_order_x02 (actual time=0.001..0.001 rows=20 loops=5000)
                                  Index Cond: (cust_no = a.cust_no)
                                  Buffers: shared hit=15000
Planning:
Buffers: shared hit=8
Planning Time: 0.181 ms

Execution Time: 68.256 ms

The number of block I/O has increased from 115072 to 120072 because we have accessed ONLINE_ORDER twice. However, the elapsed time decreased from 87 ms to 68 ms.

Finally, let's transform the scalar subquery into a left join query:
SELECT A.CUST_NM
     , B.ord_date
     , B.ORD_STATUS_CD
FROM CUSTOMER A
LEFT JOIN LATERAL
       (SELECT ord_date
             , ORD_STATUS_CD
          FROM ONLINE_ORDER B
         WHERE A.CUST_NO = B.CUST_NO
         ORDER BY B.ORD_DATE desc
         FETCH NEXT 1 ROWS ONLY) B
    ON TRUE

WHERE A.CUST_NO BETWEEN 1 AND 50;

Nested Loop Left Join (actual time=0.043..67.475 rows=5000 loops=1)
Buffers: shared hit=115072
-> Index Scan using customer_pk on customer a (actual time=0.010..0.924 rows=5000 loops=1)
        Index Cond: ((cust_no >= 1) AND (cust_no <= 5000))
        Buffers: shared hit=72
-> Limit (actual time=0.013..0.013 rows=1 loops=5000)
        Buffers: shared hit=115000
        -> Sort (actual time=0.013..0.013 rows=1 loops=5000)
              Sort Key: b.ord_date DESC
              Sort Method: top-N heapsort Memory: 25kB
              Buffers: shared hit=115000
              -> Bitmap Heap Scan on online_order b (actual time=0.003..0.010 rows=20 loops=5000)
                    Recheck Cond: (a.cust_no = cust_no)
                    Heap Blocks: exact=100000
                    Buffers: shared hit=115000
                    -> Bitmap Index Scan on online_order_x02 (actual time=0.001..0.001 rows=20 loops=5000)
                          Index Cond: (cust_no = a.cust_no)
                          Buffers: shared hit=15000
Planning:
Buffers: shared hit=8
Planning Time: 0.156 ms

Execution Time: 67.785 ms

Now the block I/O is the same as in the query using the ARRAY operator. But the elapsed time descreased from 87 ms to 67 ms.
In this specific case, the left join strategy proves to be the best choice.

Conclusion
If you look at scalar subquries in your SELECT statements, it is worth using a left join, as it may offer some performance benefits.

FOOTNOTE

Given that PostgreSQL lacks a partial range processing feature and a scalar subquery caching mechanism, you might encounter performance degradation. One of the mothodes to mitigate the issue is to tranform the subquery in the SELECT list into a left join. Some developers using PostgreSQL argue that they can not find a compelling reason to use a scalar subquery in the SELECT list. However, there are situations in which a scalar subquery is more efficient than an outer left join in PostgreSQL. I will come up with a scenario in which a scalar subquery is a better choice in a future article.

E-MAIL: root@postgresdba.com