DEV Community

vbilopav
vbilopav

Posted on

Unit Testing and TDD With PostgreSQL is Easy

Note: this is a repost from my personal blog: Unit Testing and TDD With PostgreSQL is Easy

I believe this subject matter is important and it deserves a bigger audience.

I keep hearing that PostgreSQL, as well as all other databases - are not testable. That is, of course, completely wrong. Not only it is possible but I found it to be even easier and faster than traditional methods.

And no, I'm not talking about some integration testing or some Docker magic. Just plain-old PostgreSQL, that's all.

So, let's do some TDD with PostgreSQL, shall we?

Note: this article is neither an endorsement nor even a criticism of TDD, it is merely a demonstration of how easy is to do such things with PostgreSQL.

The Problem

Suppose we have a schema:

create table devices (
    device_id int generated always as identity primary key,
    name text
);

create table measurements (
    timestamp timestamp not null,
    device_id int not null references devices(device_id),
    primary key(timestamp, device_id),
    value numeric not null
);
Enter fullscreen mode Exit fullscreen mode

We want to write a functionality that will have the following parameters:

  • Period (starting and ending timestamps).
  • Time interval.
  • Device.

The result will be a cumulative sum for a device and for each given period between start and end, divided by the interval parameter.

Sounds good?

Let's totally do it, it will be fun, I promise.

Database Setup

First, we need a little schema setup to make it a little bit more testable.

Unit tests, by definition, must not interfere with each other in any shape or form and they must be able to run in parallel. That means that tests will have their own transactions, and any test data inserted will be rollback-ed at the end of tests.

However, inserting test data into a relational database can be a bit tricky. Usually, tables will reference some other tables that will reference some other tables too, and before you know it - in order to insert a few test records - we must insert data in all tables in the database. That is certainly possible, but still, inconvenient and tedious. So we don't want to do that. Luckily, PostgreSQL offers a simple solution to this.

In our example, we have table measurements that reference the devices table. This relation will be checked immediately: meaning, a moment when we insert a new measurement the PostgreSQL will check does that device even exists in a database to keep data integrity in check. We can change the behavior of that check to be performed at the end of the transaction. And since the transaction in our unit tests will be rollback-ed anyhow, that will allow us to insert some test data safely without inserting it into a dozen other tables, not concerned with our tests.

To enable this deferred check, a reference has to be created with deferrable initially deferred declaration:

create table measurements (
    timestamp timestamp not null,
    device_id int not null references devices(device_id) deferrable initially deferred,
    primary key(timestamp, device_id),
    value numeric not null
);
Enter fullscreen mode Exit fullscreen mode

Note, there is ˙the other option too: deferrable initially immediate or simply deferrable. That means that we can tell the running transaction to defer all reference checks until the end of transactions with the declaration set all constraints deferred (docs).

If your table is already created then you'll have to recreate the constraint:

begin;
alter table only measurements drop constraint measurements_device_id_fkey;
alter table only measurements add constraint measurements_device_id_fkey 
    foreign key (device_id) references devices(device_id) deferrable initially deferred;
end;
Enter fullscreen mode Exit fullscreen mode

Or, if you want to do that for the entire database because you had an architect or ORM who wasn't aware of this feature, you can simply execute this script:

$$
declare 
    _table text; 
    _fk text;
    _def text;
begin
    for _table, _fk, _def in ( 
        select conrelid::regclass, conname, pg_get_constraintdef(oid)
        from
            pg_constraint 
        where
            contype = 'f' 
            and condeferrable is false
            and connamespace 'public'
    )
    loop
        raise info 'setting fk % on table % to deferrable', _fk, _table;
        execute(format('alter table only %s drop constraint %s', _table, _fk));
        execute(format('alter table only %s add constraint %s %s deferrable initially deferred', _table, _fk, _def));
    end loop;
end;
$$
Enter fullscreen mode Exit fullscreen mode

That will do it. And one more little thing: I usually like to create a special schema for unit tests called simply: test. And now we're ready.

TDD

Ok, let's first create a fresh SQL file, add an empty test and then execute it immediately:

create or replace procedure test.cumulative_sum() language plpgsql as $$
begin
    -- arange
    -- act
    -- assert

    rollback;
end;
$$;
call test.cumulative_sum();
Enter fullscreen mode Exit fullscreen mode

With this approach when we execute a file in the editor, our changes to the test are applied, and the test is immediately executed. This allows for an extremely fast test loop.

Now, first, let's arrange some data:

create or replace procedure test.cumulative_sum() language plpgsql as $$
begin
    -- arange
    insert into measurements (timestamp, device_id, value) values ('2021-01-01 00:02:00', 0, 0.5);
    insert into measurements (timestamp, device_id, value) values ('2021-01-01 00:03:00', 0, 2.5);

    insert into measurements (timestamp, device_id, value) values ('2021-01-01 00:07:00', 0, 2.5);
    insert into measurements (timestamp, device_id, value) values ('2021-01-01 00:08:00', 0, 3.5);

    insert into measurements (timestamp, device_id, value) values ('2021-01-01 00:11:00', 0, 3.0);
    insert into measurements (timestamp, device_id, value) values ('2021-01-01 00:12:00', 0, 4.0);

    -- act
    -- assert

    rollback;
end;
$$;
call test.cumulative_sum();
Enter fullscreen mode Exit fullscreen mode

This will add some measurements to a non-existing device (id = 0).

If we execute our hypothetical calculation for this device from timestamps between 2021-01-01 00:00:00 and 2021-01-01 00:15:00 for 5-minute intervals, we should get the following results:

timestamp sum
2021-01-01 00:05:00 3
2021-01-01 00:10:00 9
2021-01-01 00:15:00 16

You can use Excel or a calculator to verify the validity of these cumulative sum calculations.

Fine, now that we know what we must get, we can add act and assertion parts. First, we will add the act part, which we will call our non-existing function. Since we need to assert results multiple times (count and for each row), we can put the results into a temporary table:

create or replace procedure test.cumulative_sum() language plpgsql as $$
begin
    -- arange
    insert into measurements (timestamp, device_id, value) values ('2021-01-01 00:02:00', 0, 0.5);
    insert into measurements (timestamp, device_id, value) values ('2021-01-01 00:03:00', 0, 2.5);

    insert into measurements (timestamp, device_id, value) values ('2021-01-01 00:07:00', 0, 2.5);
    insert into measurements (timestamp, device_id, value) values ('2021-01-01 00:08:00', 0, 3.5);

    insert into measurements (timestamp, device_id, value) values ('2021-01-01 00:11:00', 0, 3.0);
    insert into measurements (timestamp, device_id, value) values ('2021-01-01 00:12:00', 0, 4.0);

    -- act
    create temp table result on commit drop as
    select * from cumulative_sum('2021-01-01 00:00:00', '2021-01-01 00:15:00', '5 minutes', 0);

    -- assert

    rollback;
end;
$$;
call test.cumulative_sum();
Enter fullscreen mode Exit fullscreen mode

This, of course, will fail, because we haven't written this cumulative_sum function yet. But, before we do that, let's also add the assertion part to verify our results. Luckily for us, PostgreSQL supports assertions and assert statements:

create or replace procedure test.cumulative_sum() language plpgsql as $$
begin
    -- arange
    insert into measurements (timestamp, device_id, value) values ('2021-01-01 00:02:00', 0, 0.5);
    insert into measurements (timestamp, device_id, value) values ('2021-01-01 00:03:00', 0, 2.5);

    insert into measurements (timestamp, device_id, value) values ('2021-01-01 00:07:00', 0, 2.5);
    insert into measurements (timestamp, device_id, value) values ('2021-01-01 00:08:00', 0, 3.5);

    insert into measurements (timestamp, device_id, value) values ('2021-01-01 00:11:00', 0, 3.0);
    insert into measurements (timestamp, device_id, value) values ('2021-01-01 00:12:00', 0, 4.0);

    -- act
    create temp table result on commit drop as
    select * from cumulative_sum('2021-01-01 00:00:00', '2021-01-01 00:15:00', '5 minutes', 0);

    -- assert
    assert (select count(*) from result) = 3, 
        'Expected 3 rows, got ' || (select count(*) from result)::text;

    assert (select sum from result where timestamp = '2021-01-01 00:05:00') = 3, 
        'Expected 3, got ' || (select sum from result where timestamp = '2021-01-01 00:05:00')::text;

    assert (select sum from result where timestamp = '2021-01-01 00:10:00') = 9, 
        'Expected 9, got ' || (select sum from result where timestamp = '2021-01-01 00:10:00')::text;

    assert (select sum from result where timestamp = '2021-01-01 00:15:00') = 16, 
        'Expected 16, got ' || (select sum from result where timestamp = '2021-01-01 00:15:00')::text;

    rollback;
end;
$$;
call test.cumulative_sum();
Enter fullscreen mode Exit fullscreen mode

That looks a bit ugly, but to be fair, most of those assertions were autocompleted by the Copilot.

This will still fail because we haven't written this cumulative_sum function yet. But at least we can extract input and output data contracts for this function now.

  • The parameters will be these:
from timestamp,
to timestamp,
interval interval,
device_id int
Enter fullscreen mode Exit fullscreen mode
  • The resulting table will be this:
table (
    "timestamp" timestamp, 
    sum numeric
) 
Enter fullscreen mode Exit fullscreen mode

So now, we know what our function should look like. We can add the first prototype, just above our failing tests:

create or replace function cumulative_sum(
    _from timestamp,
    _to timestamp,
    _interval interval,
    _device_id int
) 
returns table (
    "timestamp" timestamp, 
    sum numeric
) 
language sql
as $$ 
select null::timestamp, null::numeric;
$$;
Enter fullscreen mode Exit fullscreen mode

Again, we've placed this create or replace function cumulative_sum above our failing tests, and again we are executing the entire file. Which in turn gives an extremely fast TDD refactor-red-green loop.

However, our tests are still in the red, since obviously, our newly created function is returning nonsense. So, let's refactor this:

create or replace function cumulative_sum(
    _from timestamp,
    _to timestamp,
    _interval interval,
    _device_id int
) 
returns table (
    "timestamp" timestamp, 
    sum numeric
) 
language sql
as $$ 
select
    p.period_to,
    sum(coalesce(m.sum, 0)) over (rows unbounded preceding)
from (
        select 
            series as period_from, 
            series + _interval as period_to 
        from generate_series(_from, _to - _interval, _interval) series
    ) p
    left join lateral (
        select 
            sum(coalesce(value, 0)) as sum
        from measurements 
        where 
            timestamp > p.period_from
            and timestamp <= p.period_to 
            and device_id = _device_id
    ) m on true
$$;
Enter fullscreen mode Exit fullscreen mode

And now, this seems to be correct, and our tests are in green now. We can continue improving and optimizing this function while our tests are green as much as we want without any fear that something will be broken.

Here is the final content of our work in a single file:

create or replace function cumulative_sum(
    _from timestamp,
    _to timestamp,
    _interval interval,
    _device_id int
) 
returns table (
    "timestamp" timestamp, 
    sum numeric
) 
language sql
as $$ 
select
    p.period_to,
    sum(coalesce(m.sum, 0)) over (rows unbounded preceding)
from (
        select 
            series as period_from, 
            series + _interval as period_to 
        from generate_series(_from, _to - _interval, _interval) series
    ) p
    left join lateral (
        select 
            sum(coalesce(value, 0)) as sum
        from measurements 
        where 
            timestamp > p.period_from
            and timestamp <= p.period_to 
            and device_id = _device_id
    ) m on true
$$;

create or replace procedure test.cumulative_sum() language plpgsql as $$
begin
    -- arange
    insert into measurements (timestamp, device_id, value) values ('2021-01-01 00:02:00', 0, 0.5);
    insert into measurements (timestamp, device_id, value) values ('2021-01-01 00:03:00', 0, 2.5);

    insert into measurements (timestamp, device_id, value) values ('2021-01-01 00:07:00', 0, 2.5);
    insert into measurements (timestamp, device_id, value) values ('2021-01-01 00:08:00', 0, 3.5);

    insert into measurements (timestamp, device_id, value) values ('2021-01-01 00:11:00', 0, 3.0);
    insert into measurements (timestamp, device_id, value) values ('2021-01-01 00:12:00', 0, 4.0);

    -- act
    create temp table result on commit drop as
    select * from cumulative_sum('2021-01-01 00:00:00', '2021-01-01 00:15:00', '5 minutes', 0);

    -- assert
    assert (select count(*) from result) = 3, 
        'Expected 3 rows, got ' || (select count(*) from result)::text;

    assert (select sum from result where timestamp = '2021-01-01 00:05:00') = 3, 
        'Expected 3, got ' || (select sum from result where timestamp = '2021-01-01 00:05:00')::text;

    assert (select sum from result where timestamp = '2021-01-01 00:10:00') = 9, 
        'Expected 9, got ' || (select sum from result where timestamp = '2021-01-01 00:10:00')::text;

    assert (select sum from result where timestamp = '2021-01-01 00:15:00') = 16, 
        'Expected 16, got ' || (select sum from result where timestamp = '2021-01-01 00:15:00')::text;

    rollback;
end;
$$;
call test.cumulative_sum();
Enter fullscreen mode Exit fullscreen mode

Conclusion

The arguments I hear all the time are as follows:

  • Testing in a database is impossible. No, it isn't. I just showed you how. It may be in other databases, but PostgreSQL isn't that other database.
  • Testing in a database is hard. No, it isn't. SQL may be hard if you haven't honed your skills. So, start learning.
  • Testing in a database is slow. No, it isn't. It is way faster than anything out there. Again, hone your skills.

But what about running all tests in parallel, perhaps in a CI/CD pipeline you might ask.

Well, it's just a matter of a runner that will run all those parameterless procedures in parallel connections under some criteria. On my setup, that is the test schema. So if, it is a procedure and it is in test schema and it doesn't have parameters - it's a test.

Writing such a test runner would be really, really simple. In fact, that is precisely what I did by using NodeJS, for my projects: @vbilopav/pgmigrations. Here is an example project and how it is used in the GitHub actions: teamserator/.github/workflows
/build-and-test.yml

But in all fairness, it's so easy that anyone could do it.

Top comments (0)