설문조사
PostgreSQL/PPAS 관련 듣고 싶은 교육은


총 게시물 163건, 최근 0 건
   

Social distancing for to_char() and to_extract()

글쓴이 : 모델광 날짜 : 2023-09-23 (토) 09:43 조회 : 377
When you want to extract a year or month value from a column of a timestamp data type, you shoud use the date_part() function, not the extract() or the to_char() function.
Both functions have the same cost value. But the date_part function always runs faster. I do not know the reason. The extract function is ANSI compliant, while the date_part function is Postgres specific.

Here is a simple data set to start with:

create table online_order (
ord_no        bigint not null,
cust_no       int not null,
ord_date       timestamp(0) not null,
ord_dt         varchar(8) not null,
ord_status_cd  varchar(1) not null,
comment      varchar(100)
);

​insert into online_order
select i, mod(i,1000000) as cust_no
        ,current_date - mod(i,1000) as ord_date
        ,to_char((current_date - mod(i,1000)),'yyyymmdd') as ord_dt
        ,(mod(i,4) + 1) as ord_status_cd
        ,lpad('x',100,'x')
  from generate_series(1,2000000,2) a(i);

alter table online_order add constraint online_order_pk primary key (ord_no);
create index online_order_x01 on online_order(ord_date);


I haven't bothered to gather statistics - it is not needed in this experiment.
Here is an SQL statement which counts the number of orders by year and month.


set max_parallel_workers_per_gather = 0;

select date_part('year',ord_date) as year
     , date_part('month', ord_date) as month
      , count(*)
  from online_order
group by date_part('year',ord_date), date_part('month', ord_date)
order by date_part('year',ord_date), date_part('month', ord_date);

Sort  (cost=62229.26..62231.76 rows=1000 width=24) (actual time=623.706..623.709 rows=34 loops=1)
  Sort Key: (date_part('year'::text, ord_date)), (date_part('month'::text, ord_date))
  Sort Method: quicksort  Memory: 27kB
  Buffers: shared hit=3781
  ->  HashAggregate  (cost=62164.43..62179.43 rows=1000 width=24) (actual time=623.686..623.694 rows=34 loops=1)
        Group Key: date_part('year'::text, ord_date), date_part('month'::text, ord_date)
        Batches: 1  Memory Usage: 73kB
        Buffers: shared hit=3781
        ->  Index Only Scan using online_order_x01 on online_order  (cost=0.43..47164.43 rows=2000000 width=16) (actual time=0.018..377.527 rows=2000000 loops=1)
              Heap Fetches: 28
              Buffers: shared hit=3781
Planning Time: 0.097 ms
Execution Time: 623.753 m

Let's replace the DATE_PART with the EXTRACT and run the query again. So we can check that the DATE_PART function goes faster.


select extract(year from ord_date) as year
     , extract(month from ord_date) as month
     , count(*)
 from online_order
group by extract(year from ord_date), extract(month from ord_date)
order by extract(year from ord_date), extract(month from ord_date);

Sort  (cost=62229.26..62231.76 rows=1000 width=72) (actual time=867.696..867.699 rows=34 loops=1)
  Sort Key: (EXTRACT(year FROM ord_date)), (EXTRACT(month FROM ord_date))
  Sort Method: quicksort  Memory: 27kB
  Buffers: shared hit=3781
  ->  HashAggregate  (cost=62164.43..62179.43 rows=1000 width=72) (actual time=867.669..867.677 rows=34 loops=1)
        Group Key: EXTRACT(year FROM ord_date), EXTRACT(month FROM ord_date)
        Batches: 1  Memory Usage: 73kB
        Buffers: shared hit=3781
        ->  Index Only Scan using online_order_x01 on online_order  (cost=0.43..47164.43 rows=2000000 width=64) (actual time=0.020..498.403 rows=2000000 loops=1)
              Heap Fetches: 28
              Buffers: shared hit=3781
Planning Time: 0.061 ms
Execution Time: 867.730 ms

As you can see, the estimated cost did not change, though, the elapsed time rose from 623 ms to 867 ms.

Conclusion
The date_part() function is a PostgreSQL specific funtion, while the extract() function is an ANSI-compliant function. If you do not plan to move your PostgreSQL database to a different database, keep the extract() function in quarantine.

Addendum
We can use the to_char function to get the same result. But, it is the worst in performance.


select to_char(ord_date,'yyyy') as year
     , to_char(ord_date,'mm') as month
     , count(*)
 from online_order
group by to_char(ord_date,'yyyy'), to_char(ord_date,'mm')
order by to_char(ord_date,'yyyy'), to_char(ord_date,'mm');

Sort  (cost=62229.26..62231.76 rows=1000 width=72) (actual time=1023.938..1023.941 rows=34 loops=1)
  Sort Key: (to_char(ord_date, 'yyyy'::text)), (to_char(ord_date, 'mm'::text))
  Sort Method: quicksort  Memory: 26kB
  Buffers: shared hit=3781
  ->  HashAggregate  (cost=62164.43..62179.43 rows=1000 width=72) (actual time=1023.847..1023.880 rows=34 loops=1)
        Group Key: to_char(ord_date, 'yyyy'::text), to_char(ord_date, 'mm'::text)
        Batches: 1  Memory Usage: 73kB
        Buffers: shared hit=3781
        ->  Index Only Scan using online_order_x01 on online_order  (cost=0.43..47164.43 rows=2000000 width=64) (actual time=0.023..711.328 rows=2000000 loops=1)
              Heap Fetches: 28
              Buffers: shared hit=3781
Planning Time: 0.063 ms
Execution Time: 1023.974 ms

Added on the next day of this publication
The above experiment was conducted on PostgreSQL 15.1.

   

postgresdba.com