When you want to extract a year or month value from a column of a timestamp data type, you shoud use the date_part() function, not the extract() or the to_char() function.
Both functions have the same cost value. But the date_part function always runs faster. I do not know the reason. The extract function is ANSI compliant, while the date_part function is Postgres specific.
Here is a simple data set to start with:
create table online_order (
ord_no bigint not null,
cust_no int not null,
ord_date
timestamp(0) not null,
ord_dt varchar(8) not null,
ord_status_cd varchar(1) not null,
comment varchar(100)
);
insert into online_order
select i, mod(i,1000000) as cust_no
,current_date - mod(i,1000) as ord_date
,to_char((current_date - mod(i,1000)),'yyyymmdd') as ord_dt
,(mod(i,4) + 1) as ord_status_cd
,lpad('x',100,'x')
from generate_series(1,2000000,2) a(i);
alter table online_order add constraint online_order_pk primary key (ord_no);
create index online_order_x01 on online_order(ord_date);
I haven't bothered to gather statistics - it is not needed in this experiment.
Here is an SQL statement which counts the number of orders by year and month.
set max_parallel_workers_per_gather = 0;
select
date_part('year',ord_date) as year
,
date_part('month', ord_date) as month
, count(*)
from online_order
group by date_part('year',ord_date), date_part('month', ord_date)
order by date_part('year',ord_date), date_part('month', ord_date);
Sort (cost=62229.26..
62231.76 rows=1000 width=24) (actual time=623.706..623.709 rows=34 loops=1)
Sort Key: (date_part('year'::text, ord_date)), (date_part('month'::text, ord_date))
Sort Method: quicksort Memory: 27kB
Buffers: shared hit=3781
-> HashAggregate (cost=62164.43..62179.43 rows=1000 width=24) (actual time=623.686..623.694 rows=34 loops=1)
Group Key: date_part('year'::text, ord_date), date_part('month'::text, ord_date)
Batches: 1 Memory Usage: 73kB
Buffers: shared hit=3781
-> Index Only Scan using online_order_x01 on online_order (cost=0.43..47164.43 rows=2000000 width=16) (actual time=0.018..377.527 rows=2000000 loops=1)
Heap Fetches: 28
Buffers: shared hit=3781
Planning Time: 0.097 ms
Execution Time: 623.753 m
Let's replace the DATE_PART with the EXTRACT and run the query again. So we can check that the DATE_PART function goes faster.select
extract(year from ord_date) as year
,
extract(month from ord_date) as month
, count(*)
from online_order
group by extract(year from ord_date), extract(month from ord_date)
order by extract(year from ord_date), extract(month from ord_date);
Sort (cost=62229.26..
62231.76 rows=1000 width=72) (actual time=867.696..867.699 rows=34 loops=1)
Sort Key: (EXTRACT(year FROM ord_date)), (EXTRACT(month FROM ord_date))
Sort Method: quicksort Memory: 27kB
Buffers: shared hit=3781
-> HashAggregate (cost=62164.43..62179.43 rows=1000 width=72) (actual time=867.669..867.677 rows=34 loops=1)
Group Key: EXTRACT(year FROM ord_date), EXTRACT(month FROM ord_date)
Batches: 1 Memory Usage: 73kB
Buffers: shared hit=3781
-> Index Only Scan using online_order_x01 on online_order (cost=0.43..47164.43 rows=2000000 width=64) (actual time=0.020..498.403 rows=2000000 loops=1)
Heap Fetches: 28
Buffers: shared hit=3781
Planning Time: 0.061 ms
Execution Time: 867.730 ms
As you can see, the estimated cost did not change, though, the elapsed time rose from 623 ms to 867 ms.
Conclusion
The date_part() function is a PostgreSQL specific funtion, while the extract() function is an ANSI-compliant function. If you do not plan to move your PostgreSQL database to a different database, keep the extract() function in quarantine.
Addendum
We can use the to_char function to get the same result. But, it is the worst in performance.select
to_char(ord_date,'yyyy') as year
,
to_char(ord_date,'mm') as month
, count(*)
from online_order
group by to_char(ord_date,'yyyy'), to_char(ord_date,'mm')
order by to_char(ord_date,'yyyy'), to_char(ord_date,'mm');
Sort (cost=62229.26..
62231.76 rows=1000 width=72) (actual time=1023.938..1023.941 rows=34 loops=1)
Sort Key: (to_char(ord_date, 'yyyy'::text)), (to_char(ord_date, 'mm'::text))
Sort Method: quicksort Memory: 26kB
Buffers: shared hit=3781
-> HashAggregate (cost=62164.43..62179.43 rows=1000 width=72) (actual time=1023.847..1023.880 rows=34 loops=1)
Group Key: to_char(ord_date, 'yyyy'::text), to_char(ord_date, 'mm'::text)
Batches: 1 Memory Usage: 73kB
Buffers: shared hit=3781
-> Index Only Scan using online_order_x01 on online_order (cost=0.43..47164.43 rows=2000000 width=64) (actual time=0.023..711.328 rows=2000000 loops=1)
Heap Fetches: 28
Buffers: shared hit=3781
Planning Time: 0.063 ms
Execution Time: 1023.974 ms
Added on the next day of this publication
The above experiment was conducted on PostgreSQL 15.1.