In an Oracle to PostgreSQL migration project, one of the most common SQL performance issues arises when we migrate a scalar subquery in the SELECT list. There are two advantages to using a scalar subquery in Oracle:
1) With the scalar subquery caching mechanism, Oracle avoids repetitive execution of the scalar subqery. Instead of re-evaluating the subquery for each row, Oracle caches the result of the subquery and uses the cached value for subsequent rows that reference the same subquery.
2) Oracle has a specific mechanism, what Koreans call "partial range processing", which processes up to the transport unit (ARRAY SIZE) initially, rather than the entire range that satisfies the condition in the WHERE clause. When the transport unit (the default value is 15 rows) is filled, Oracle sends the intermediate result set to the client. Therefore, the client can quickly receive the first 15 rows while the Oracle databasse is consuming its resources retrieving the remaing result values. To use this feature, app developers utilize scalar subquerys or induces a nested loop join even when a hash join may be more performant. A hash join can not use this feature.
Up to now, I haven't been able to find a solution in PostgreSQL that can replicate these two Oracle features. So when we migrate a scalar subquery, we have to consider the overall query performance rather than focusing on how quickly we retrieve the first 15 rows or similar optimizations.
Recetly, I was requested to tune a query which was blazinglly fast in Oracle, but not as fast as in PostgreSQL. Due to the architectural difference in PostgreSQL, I couldn't achieve the same level of performance as in Oracle, but I managed to improve the query's performance to some extent.
Here is the SQL to create the data, and a version of the query that was migrated from Oracle 12.2:
create table online_order ( ord_no bigint not null, cust_no int not null, ord_date timestamp(0) not null, ord_dt varchar(8) not null, ord_status_cd varchar(1) not null, comment varchar(100) );
insert into online_order select i, mod(i,1000000) as cust_no ,current_date - mod(i,1000) as ord_date ,to_char((current_date - mod(i,1000)),'yyyymmdd') as ord_dt ,(mod(i,4) + 1) as ord_status_cd ,lpad('x',100,'x') from generate_series(1,2000000,2) a(i);
alter table online_order add constraint online_order_pk primary key (ord_no); create index online_order_x02 on online_order(cust_no);
insert into customer select i, chr(65+mod(i,26))||i::text||'CUST_NM' , current_date - mod(i,10000) , to_char((current_date - mod(i,10000)),'yyyymmdd') as register_dt , mod(i,5)+1 as cust_status_cd , mod(i,2)+1 as register_channel_cd , trunc(random() * 100) +1 as age , case when mod(i,22) = 0 then 'N' else 'Y' end as active_yn , case when mod(i,1000) = 0 then '11007' when mod(i,1000) = 1 then '11006' when mod(i,1000) = 2 then '11005' when mod(i,1000) = 3 then '11004' when mod(i,1000) = 4 then '11003' when mod(i,1000) = 5 then '11002' else '11001' end as sigungu_cd , case when mod(i,4) = 0 then '01' when mod(i,4) = 1 then '02' else '03' end as sido_cd from generate_series(1,1000000) a(i);
SELECT A.CUST_NM
, (SELECT ORD_DATE
FROM ONLINE_ORDER
WHERE CUST_NO = A.CUST_NO
ORDER BY ORD_DATE DESC
FETCH NEXT 1 ROWS ONLY) AS MAX_ORD_DATE
, (SELECT ORD_STATUS_CD
FROM ONLINE_ORDER
WHERE CUST_NO = A.CUST_NO
ORDER BY ORD_DATE DESC
FETCH NEXT 1 ROWS ONLY) AS MAX_ORD_STATUS_CD
FROM CUSTOMER A WHERE A.CUST_NO BETWEEN 1 AND 5000;
This query retrieves customer names along with the most recent order date and order status code for each customer whose CUST_NO falls between 1and 5000. It employes two subqueries to finde the most recent order informatioin for each customer individually.
Now take a look at the execution plan for the query which, obtained by running the EXPLAIN command, appears as follows:
What do you think is suboptimal in the plan? The obious issue in the plan is that we are accessing ONLINE_ORDER twice. Let's rewrite the query to access the ONLINE_ORDER table only once. I have used the ARRAY operator in order to combine the two subqueries:
SELECT CUST_NM , AA[1] AS ord_date , AA[2] AS ORD_STATUS_CD FROM ( SELECT A.CUST_NM , (SELECT ARRAY[ORD_DATE::text,ORD_STATUS_CD]
FROM ONLINE_ORDER
WHERE CUST_NO = A.CUST_NO
ORDER BY ORD_DATE DESC FETCH NEXT 1 ROWS ONLY) AS AA
FROM CUSTOMER A WHERE A.CUST_NO BETWEEN 1 AND 5000 OFFSET 0 ) FOO;
The ORD_DATE column should be converted to a text data type using the ::text syntax. Keep in mind that when we use the "||" operator, all values should not be null. If any of the two values is null, then the concatenated resulting value becomes null. Here is the execution plan from PostgreSQL 15.1.
Subquery Scan on foo (actual time=0.052..86.908 rows=5000 loops=1) Buffers: shared hit=115072 -> Index Scan using customer_pk on customer a (actual time=0.050..85.758 rows=5000 loops=1) Index Cond: ((cust_no >= 1) AND (cust_no <= 5000)) Buffers: shared hit=115072 SubPlan 1 -> Limit (actual time=0.017..0.017 rows=1 loops=5000) Buffers: shared hit=115000 -> Sort (actual time=0.016..0.016 rows=1 loops=5000) Sort Key: online_order.ord_date DESC Sort Method: top-N heapsort Memory: 25kB Buffers: shared hit=115000 -> Bitmap Heap Scan on online_order (actual time=0.003..0.014 rows=20 loops=5000) Recheck Cond: (cust_no = a.cust_no) Heap Blocks: exact=100000 Buffers: shared hit=115000 -> Bitmap Index Scan on online_order_x02 (actual time=0.002..0.002 rows=20 loops=5000) Index Cond: (cust_no = a.cust_no) Buffers: shared hit=15000 Planning: Buffers: shared hit=8 Planning Time: 0.136 ms
Execution Time: 87.237 ms
Not e that the block I/O decreased from 230072 to 115072, and the elapsed time dropped from 128 ms to 87 ms. One of the flaws in the query is that it requires converting the data type of ORD_DATE to text to utilize the ARRAY operation. To eliminate the need for data type conversion, I have modified the query to use a CTID.
SELECT C.CUST_NM , D.ORD_DATE , D.ORD_STATUS_CD FROM ( SELECT A.CUST_NM , (SELECT CTID FROM ONLINE_ORDER WHERE CUST_NO = A.CUST_NO ORDER BY ORD_DATE DESC FETCH NEXT 1 ROWS ONLY) AS RID FROM CUSTOMER A WHERE A.CUST_NO BETWEEN 1 AND 5000 ) C, ONLINE_ORDER D WHERE C.RID = D.CTID
;
The corrected query extracts the CTID of the ONLINE_ORDE table in the subquery and then uses a join to the ONLINE_ORDER table again to retrive ORD_DATE and ORD_STATUS_CD values. This approach removes the need for data type conversion and makes the query more efficient.
Here is the execution plan obtained:
Nested Loop (actual time=0.048..67.956 rows=5000 loops=1) Buffers: shared hit=120072 -> Index Scan using customer_pk on customer a (actual time=0.010..0.913 rows=5000 loops=1) Index Cond: ((cust_no >= 1) AND (cust_no <= 5000)) Buffers: shared hit=72 -> Tid Scan on online_order d (actual time=0.013..0.013 rows=1 loops=5000) TID Cond: (ctid = (SubPlan 1)) Buffers: shared hit=120000 SubPlan 1 -> Limit (actual time=0.012..0.012 rows=1 loops=5000) Buffers: shared hit=115000 -> Sort (actual time=0.012..0.012 rows=1 loops=5000) Sort Key: online_order.ord_date DESC Sort Method: top-N heapsort Memory: 25kB Buffers: shared hit=115000 -> Bitmap Heap Scan on online_order (actual time=0.003..0.010 rows=20 loops=5000) Recheck Cond: (cust_no = a.cust_no) Heap Blocks: exact=100000 Buffers: shared hit=115000 -> Bitmap Index Scan on online_order_x02 (actual time=0.001..0.001 rows=20 loops=5000) Index Cond: (cust_no = a.cust_no) Buffers: shared hit=15000 Planning: Buffers: shared hit=8 Planning Time: 0.181 ms
Execution Time: 68.256 ms
The number of block I/O has increased from 115072 to 120072 because we have accessed ONLINE_ORDER twice. However, the elapsed time decreased from 87 ms to 68 ms.
Finally, let's transform the scalar subquery into a left join query: SELECT A.CUST_NM , B.ord_date , B.ORD_STATUS_CD FROM CUSTOMER A LEFT JOIN LATERAL (SELECT ord_date , ORD_STATUS_CD FROM ONLINE_ORDER B WHERE A.CUST_NO = B.CUST_NO ORDER BY B.ORD_DATE desc FETCH NEXT 1 ROWS ONLY) B ON TRUE
WHERE A.CUST_NO BETWEEN 1 AND 50;
Nested Loop Left Join (actual time=0.043..67.475 rows=5000 loops=1) Buffers: shared hit=115072 -> Index Scan using customer_pk on customer a (actual time=0.010..0.924 rows=5000 loops=1) Index Cond: ((cust_no >= 1) AND (cust_no <= 5000)) Buffers: shared hit=72 -> Limit (actual time=0.013..0.013 rows=1 loops=5000) Buffers: shared hit=115000 -> Sort (actual time=0.013..0.013 rows=1 loops=5000) Sort Key: b.ord_date DESC Sort Method: top-N heapsort Memory: 25kB Buffers: shared hit=115000 -> Bitmap Heap Scan on online_order b (actual time=0.003..0.010 rows=20 loops=5000) Recheck Cond: (a.cust_no = cust_no) Heap Blocks: exact=100000 Buffers: shared hit=115000 -> Bitmap Index Scan on online_order_x02 (actual time=0.001..0.001 rows=20 loops=5000) Index Cond: (cust_no = a.cust_no) Buffers: shared hit=15000 Planning: Buffers: shared hit=8 Planning Time: 0.156 ms
Execution Time: 67.785 ms
Now the block I/O is the same as in the query using the ARRAY operator. But the elapsed time descreased from 87 ms to 67 ms. In this specific case, the left join strategy proves to be the best choice.
Conclusion If you look at scalar subquries in your SELECT statements, it is worth using a left join, as it may offer some performance benefits.
FOOTNOTE
Given that PostgreSQL lacks a partial range processing feature and a scalar subquery caching mechanism, you might encounter performance degradation. One of the mothodes to mitigate the issue is to tranform the subquery in the SELECT list into a left join. Some developers using PostgreSQL argue that they can not find a compelling reason to use a scalar subquery in the SELECT list. However, there are situations in which a scalar subquery is more efficient than an outer left join in PostgreSQL. I will come up with a scenario in which a scalar subquery is a better choice in a future article.