In the last article titled Scalar Subquery Optimization I mentioned that some scalar subqueries in PostgreSQL have poor performance compared with subqueries in Oracle which sends the intermediate result set to the client as soon as it retrieves 15 rows.
And I showed you how to rewrite the query to avoid the problematic scalar subqeury.
Another architecture that PostgreSQL doesn't have is scalar subquery caching.
Scalar subquery caching is a mechanism which caches the result of the subquery in the hash table location and when the same input value arrives the subqurery returns the result in the hash table location, which means that Oracle extracts the result value without executing the subquery.
While writing the previous article, I asked myself "Will there be a similar mechanism in PostgreSQL?"
Recently I have discoverd a new feature in PostgreSQL 14 which has a similar mechanism with the scalar subquery caching of Oracle.
The pleasant new feature is enabled by the new paramter enable_memoize.
Here is a little demonstration that can show you what memoization in PostgreSQL 14 is. It starts with a dept table holding 200 departments and an emp table holding 500,000 employees spread across 200 departments.
drop table dept cascade;
create table dept as
select i id,
lpad(i::varchar,100,'k') dept_name,
rpad('x',100,'y') padding
from generate_series(1,200) a(i);
alter table dept add constraint dept_pk primary key(id);
drop table emp;
create table emp as
select i id,
1 + mod(i-1,200) dept_id,
lpad(i::varchar,20,'nname') ename,
rpad('x',100,'y') padding
from generate_series(1,500000) a(i);
alter table emp add constraint emp_pk primary key (id)
alter table emp add constraint emp_fk_dept foreign key(dept_id) references dept;
The following is a problematic query which has a poor performance in PostgreSQL.
I disabled parallel processing for the purpose of simplicity and clarity of my explanation.
set max_parallel_workers_per_gather=0;
select emp.id, emp.ename,
(select dept.dept_name from dept where dept.id = emp.dept_id) dept_name
from emp
order by emp.ename;
Sort (actual time=673.812..725.878 rows=500000 loops=1)
Output: emp.id, emp.ename, ((SubPlan 1))
Sort Key: emp.ename
Sort Method: external merge Disk: 66552kB
Buffers: shared hit=1010205, temp read=8319 written=8321
-> Seq Scan on public.emp (actual time=0.065..490.400 rows=500000 loops=1)
Output: emp.id, emp.ename, (SubPlan 1)
Buffers: shared hit=1010205
SubPlan 1
-> Index Scan using dept_pk ondept (actual time=0.001..0.001 rows=1 loops=500000)
Output: dept.dept_name
Index Cond: (dept.id = emp.dept_id)
Buffers: shared hit=1000000
Planning Time: 0.252 ms
Execution Time: 750.811 ms
If you are not fluent at understanding execution plans, here is how to read the plan.
1. Sort operation calls Seq Scan.
2. Seq Scan retrieves 500,000 rows and calls SubPlan1 with its first row.
3. For each row from Seq Scan SubPlan1 calls Index Scan.
4. Index Scan probes the dept_pk index to find a match with emp.dept_id.
5. When Index Scan finds a match, it passes it up to SubPlan1.
6. SubPlan1 passes the result to Seq Scan.
7. Seq Scan calls SubPlan1 again with its second row.
8. It iterates the process of the operation 3 to 6 until Seq Scan exhausts 500,000 rows,
which means that SubPlan1 is executed 500,000 times.
9. Seq Scan now can pass the result up to Sort.
10. Finally Sort does the sorting operation by emp.ename.
When you look at the execution plan closely, you can see that it took 490 ms to do Seq Scan and you can infer that it took 260 ms (750-490) to do Index Scan. The reason why it took so long to do Index Scan is because Index Scan had to be done 500,000 times. One time access to the index took merely 0.001 ms but repetitive access (500,000 times) killed the performance. In PostgreSQL there is no way of avoiding this inefficiency without rewriting the query. If my memory serves me well, in Oracle the Index Scan would have taken just 1 or 2 ms because it needs to access the index just 200 times.
When I looked at the execution plan, I found another inefficiency. The optimizer is sorting the result set by emp.ename and the result set has columns of id, ename and dept_name. Dept_name has a length of 100 bytes. The column length of dept_dname is relatively big compared with the length of dept_id.
We can safely infer that sorting the narrow data set will be more efficient than sorting the thick set because it could decrease the overall work_mem usage. So I revised the query to get the optimizer to sort the thinner data set.
select v1.id, v1.ename,
(select dept.dept_name from dept where dept.id = v1.dept_id) dept_name
from (
select emp.id, emp.ename, emp.dept_id
from emp
order by emp.ename
offset 0 --I added offset 0 to prevent subquery collapse.
) v1;
Subquery Scan on v1 (actual time=207.296..644.353 rows=500000 loops=1)
Output: v1.id, v1.ename, (SubPlan 1)
Buffers: shared hit=1010208
-> Sort (actual time=207.237..226.089 rows=500000 loops=1)
Output: emp.id, emp.ename, emp.dept_id
Sort Key: emp.ename
Sort Method: quicksort Memory: 50878kB
Buffers: shared hit=10208
-> Seq Scan on public.emp (actual time=0.005..53.995 rows=500000 loops=1)
Output: emp.id, emp.ename, emp.dept_id
Buffers: shared hit=10205
SubPlan 1
-> Index Scan using dept_pk on public.dept (actual time=0.000..0.001 rows=1 loops=500000)
Output: dept.dept_name
Index Cond: (dept.id = v1.dept_id)
Buffers: shared hit=1000000
Planning:
Buffers: shared hit=75
Planning Time: 0.363 ms
Execution Time: 666.771 ms
--Due to the volume limitation of this bulletin board, I put up the rest of this article in the next post.