What are Postgres arrays?
Arrays are columns that can hold multiple values. They are useful when there is additional data that is tightly coupled to a row of data in a table.
Storing tags associated with a row, values from a web form where multiple options can be selected. These are both examples of where you could use an array.
Arrays do not replace lookup tables. Lookup tables can generally be accessed from multiple rows in a table and are not tightly coupled to a specific row.
Example without using arrays
Here is a simplified schema for a migraine tracker that stores both the start and end time, and a list of triggers.
Main table
CREATE TABLE
public.migraines (
id INT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
user_id integer NOT NULL,
start_dt timestamp without time zone NULL,
end_dt timestamp without time zone NULL,
);
Lookup table for trigger type names
CREATE TABLE
public.trigger_types (
id INT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
name character varying(30) NOT NULL
);
Table to store selected triggers
CREATE TABLE
public.migraine_triggers (
id integer NOT NULL GENERATED BY DEFAULT AS IDENTITY,
user_id integer NOT NULL,
migraine_id integer NOT NULL,
trigger_id integer NOT NULL
);
Inserting Data
Inserting data requires two separate actions
- Insert data into the migraine table
- Insert Triggers into the migraine_triggers table
The insert into the migraine_triggers is likely a multi row insert.
INSERT INTO migraines (user_id,start_dt,end_dt)
VALUES (1,'2024-06-18 09:30:00', '2024-06-18 10:30:00')
INSERT INTO migraine_triggers (user_id,migraine_id,trigger_id)
VALUES (1,2, 3),(1,2, 4),(1,2, 5)
Updating Data
Updating data is not entirely straight forward, you have to decide what approach you want to take (or the approach that works best with your data).
1) Run a SELECT before UPDATE to find which ones already exist and INSERT items not already in the list. You may need to also delete rows that are no longer in the list.
2) Use a conflict resolution insert (if the table is indexed to allow it)
INSERT INTO migraine_triggers (user_id, migraine_id, trigger_id)
VALUES
(1, 2, 3),
(1, 2, 4),
(1, 2, 5)
ON CONFLICT (migraine_id, trigger_id)
DO NOTHING;
You may need to also delete rows that are no longer in the list with this method also.
3) Run a delete query to delete all rows related to the migraine_id and INSERT all the new items.
In all these scenarios multiple queries are required to update the data.
Selecting Data
A simple selection might be to find migraines where the migraine was triggered by trigger 3
SELECT migraines.*, migraine_triggers.trigger_id
FROM migraines
INNER JOIN migraine_triggers
ON migraine_triggers.migraine_id = migraines.id
WHERE migraine_triggers.trigger_id = 3
Now a slightly more complex query to bring back the name of the trigger from the trigger_types
table
SELECT migraines.*,trigger_types.name
FROM migraines
INNER JOIN migraine_triggers
ON migraine_triggers.migraine_id = migraines.id
INNER JOIN trigger_types
ON trigger_types.id = migraine_triggers.trigger_id
WHERE migraine_triggers.trigger_id = 3
Example using arrays
Using arrays we can simplify the database design and the queries needed to retrieve the same information in the above examples.
One of the features of arrays that separates it from JSON OR JSONB fields is that the data is strictly typed.
The data that goes into an array must be the right type of data.
This ensures that data integrity is maintained in the array.
In this example the data type would be INTEGER. A CHAR could be used but using an integer and utilizing a lookup table has some advantages over just storing the names in the array.
Adding an array field
Instead of using the migraine_triggers
table, we can add a column to the migraine table to hold the trigger_ids selected for the migraine.
This will prevent the need for multiple row inserts, deletes and updates. It can also improve select performance because the queries can be simplified in some cases. It also reduces the size of the database by not needing an additional, potentially large table.
To add an array column, add [] after the columns data type.
CREATE TABLE
public.migraines (
id INT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
user_id integer NOT NULL,
start_dt timestamp without time zone NULL,
end_dt timestamp without time zone NULL,
trigger_types integer[] NULL
);
Inserting Data
Inserting data will now require just one query, wrap the values for the array in {} to insert the array.
INSERT INTO public.migraines (user_id, start_dt, end_dt, trigger_types)
VALUES (2, '2024-06-18 09:30:00', '2024-06-18 10:00:00', '{1,2}');
Updating Data
Updating data is similar, just one query to update the migraine and the trigger data.
UPDATE public.migraines
SET end_dt='2024-06-18 11:00:00', trigger_types = '{1,3}'
WHERE id = 1;
Selecting Data
A simple selection to find migraines where the migraine was triggered by trigger 3 can now be simplified from what it was before.
SELECT * FROM migraines WHERE 3 = ANY(trigger_types);
In this case, there is no overhead from table joins.
Here is a more complex query where we want to pull in the trigger name from the trigger_types
table.
SELECT migraines.*, trigger_types.name
FROM migraines
INNER JOIN UNNEST(trigger_types) trigger_id ON trigger_id = 3
INNER JOIN trigger_types ON trigger_types.id = trigger_id;
In this case we can use unnest
to turn the array into rows and then join those rows with the trigger_types table.
Indexing Arrays
To improve performance, you can add an index to an array field.
Using a GIN (Generalized Inverted Index) is most likely the best index type to choose.
GIN is designed for fields where multiple values are present. Arrays, JSONB are both examples where you might want to use a GIN index.
CREATE INDEX idx_gin_triggers ON migraines USING GIN (trigger_types);
Arrays are not right for every situations, but can provide a efficient way to store row meta data.
They can simplify database design and queries, while maintaining data integrity and ease of access.
Further information on arrays can be found in the Postgres Manual
Top comments (0)