설문조사
PostgreSQL/PPAS 관련 듣고 싶은 교육은


총 게시물 163건, 최근 0 건
   

Gap-Free Sequence

글쓴이 : 모델광 날짜 : 2023-12-23 (토) 11:31 조회 : 456
A few years ago, I worked on a project for a public institution, and one of the requirements was that the application numbers of the citizens' requests had to be managed as consecutive values. In other words, there should be no gaps in the application numbers. The staff of the public institution couldn't provide a clear reason why there should be no gaps in the application numbers, but they insisted that this requirement had to be met.

Faced with this challenging mandate, the first thing I did was to have the discussion with the staff because I was fully aware that such a business requirement could be a noose around our neck. To comply, we would need to serialize all transactions requesting the application numbers. This is like turning our PostgreSQL server into a single-user system. It was obvious that a lot of citizens were supposed to apply simultaneously. Luckily, after some discussion, the staff and I came to a compromise that we didn't have to preserve the order of the application numbers.

PostgreSQL sequences are contention-free and efficient in obtaining sequential numbers. But, if we need to implement gap-freee seqneces, we can not use PostgreSQL sequences to meet this management requirement. However, there is a compromised solution that is not as performant as a sequence but is scalable and workable. The secret to having a scalable gap-free solution is to pre-allocate your sequence numbers.

We have a table called COM_SEQ and there'll be effectively a range of numbers as many as you require for each of the sequences you created.

CREATE TABLE COM_SEQ (
SEQ_NAME  VARCHAR(100) NOT NULL,
SEQ_VAL   BIGINT       NOT NULL,
SEQ_STS   BOOLEAN      NOT NULL DEFAULT TRUE,
CONSTRAINT COM_SEQ_PK PRIMARY KEY (SEQ_NAME, SEQ_VAL)
);

I am going to insert up to 100 pre-allocated numbers. You can literally create as many as you want. This is one of the things that needs to reset periodically, for example each year.

INSERT INTO com_seq
SELECT 'REGISTRATION_SEQUENCE', i, true
  FROM generate_series(1,10000) a(i);

seq_name             |seq_val|seq_sts|
---------------------+-------+-------+
REGISTRATION_SEQUENCE|      1|true   |
REGISTRATION_SEQUENCE|      2|true   |
REGISTRATION_SEQUENCE|      3|true   |
.....

What we have is all the pre-allocated sequence numbers for each sequence and they all have a sequence state. The default  state is TRUE, which means the sequence value is currently eligible for use. This is the critical part because they are all pre-allocated now we'll be able to give them out to application programs in a nice efficient manner.

The fist thing we do is to build a function that every application program is going to be able to call the fuction to get a sequence number.

CREATE OR REPLACE FUNCTION get_seqno(p_name varchar)
RETURNS bigint
LANGUAGE plpgsql
AS
$$
DECLARE
  l_seq   bigint;
BEGIN
  UPDATE com_seq
     SET SEQ_STS = false             --set its status to FALSE
   WHERE SEQ_NAME = p_name
     AND SEQ_VAL = (SELECT seq_val   --fetch the first available non locked row
                      FROM com_seq
                     WHERE SEQ_NAME = p_name
                       AND SEQ_STS is true
                     ORDER BY SEQ_VAL
                     FETCH NEXT 1 ROWS ONLY
                     FOR UPDATE SKIP LOCKED)
  RETURNING SEQ_VAL
       INTO l_seq;
      
  RETURN l_seq;
END;
$$

Notice I have used the FOR UPDATE SKIP LOCKED clause. This is the key thing in this function, which allows many transactions to extract sequence numbers simultaneoulsy. This is where the magic happens.  Every time someone asks for a sequence we are going to get a sequence number for the given sequence name where the status equals TRUE. We are going to lock that row and we are going to skip any other rows that are currently locked. We are going to walk down the list of available numbers where they are TRUE. Any numbers which have been used and committed will have a different status value but any one value which is in an active transaction they will still have a STATUS of TRUE from this person's perspective but they will be locked and we are going to simply skip the locked ones. We will simply fetch the first available non-locked row and then set its status to FALSE. So it is fairly simple. Let's have a look at it in action.

--    on session 1

begin;
select get_seqno('REGISTRATION_SEQUENCE');
get_seqno|
---------+
        1|
 SELECT * FROM COM_SEQ order by seq_val fetch next 10 rows only;
seq_name             |seq_val|seq_sts|
---------------------+-------+-------+
REGISTRATION_SEQUENCE|      1|false  |
REGISTRATION_SEQUENCE|      2|true   |
REGISTRATION_SEQUENCE|      3|true   |
......


On session 1, I call the fuction and I get a value of 1. We can view the COM_SEQ table and this is currently an active transaction and it says the sequence status is false even though I have not committed the session yet. If I ask for another sequence number, I get the value of 2.

--    on session 1

select get_seqno('REGISTRATION_SEQUENCE');
get_seqno|
---------+
        2|
 SELECT * FROM COM_SEQ order by seq_val fetch next 10 rows only;
seq_name             |seq_val|seq_sts|
---------------------+-------+-------+
REGISTRATION_SEQUENCE|      1|false  |
REGISTRATION_SEQUENCE|      2|false   |
REGISTRATION_SEQUENCE|      3|true   |


Let's now simulate a different session asking for a sequence number.

--    on session 2

 SELECT * FROM COM_SEQ order by seq_val fetch next 10 rows only;
seq_name             |seq_val|seq_sts|
---------------------+-------+-------+
REGISTRATION_SEQUENCE|      1|true   |
REGISTRATION_SEQUENCE|      2|true   |
REGISTRATION_SEQUENCE|      3|true   |
....


In another session, the SEQ_STS values are still TRUE because the session 1 has not committed yet. Let's now call the function in Session 2.

--    on session 2

begin;
select get_seqno('REGISTRATION_SEQUENCE');
get_seqno|
---------+
        3
commit;     --Note that I have committed this transaction.


Note that I get a sequence number 3 on Session 2. The reason for that is those values of 1 and 2 are currently locked by the transaction in Session 1.

Now, let's observe what happens when we roll back the transaction in Session 1.

--​    on session 1

rollback;
SELECT * FROM COM_SEQ order by seq_val fetch next 10 rows only;
seq_name             |seq_val|seq_sts|
---------------------+-------+-------+
REGISTRATION_SEQUENCE|      1|true   |
REGISTRATION_SEQUENCE|      2|true   |
REGISTRATION_SEQUENCE|      3|false  |
REGISTRATION_SEQUENCE|      4|true   |
......
.....

If we go look at our table we can observe that only the value of 3 committed by session 2 is false. We can see how we're going to fill in all the gaps but we are sacrificing order here. This is the only way we can get a scalable gap-free solution. If we ask for another sequence number in any sessions, we will get number one because it is the first availabe value that the value of SEQ_STS is TRUE that is not locked.

Conclusion
In a project, it is not always possible to meet all of the customer's requirements. If a customer's request has a significant impact on PostgreSQL, it's necessay to exert persuasive power to convince the customer to modify their demands to something more reasonable.

If we have to gurantee the preservation of order, that kills our database server because we have to serialize all transactions requesting the registration number. And there is no scalable solution to gurantee the preservation of order.

   

postgresdba.com