Most of the times we create a single-column index for join columns, but in practical I experienced the power of multiple column index which is called composite indexes.
There was a table with 10+ millions of records with approximate 3gb size, and query was already optimized to return response within 900ms.
But with the power of composite indexes I reduced response time to 50ms. This came after so many trials and errors, and I am glad to put this as a blog on internet.
We will dive into:
- Single-column indexes and how they work.
- An overview of B-tree, GIN, and GiST indexes.
- The power of composite indexes, partial composite indexes, and how prefix columns work in composite indexes.
1. Single-Column Indexes
A single-column index is the simplest form of an index, created on just one column. It works well for queries that frequently filter or sort based on that column alone.
Example:
Let’s say we have a table employees
:
CREATE TABLE employees (
emp_id SERIAL PRIMARY KEY,
name TEXT,
department TEXT,
salary NUMERIC
);
Now, if we frequently query based on the department
column:
SELECT * FROM employees WHERE department = 'Engineering';
We can create a single-column B-tree index to speed up this query:
CREATE INDEX idx_department ON employees (department);
PostgreSQL will use this index whenever you query the department
column, speeding up lookups by reducing the number of rows scanned.
2. B-Tree Indexes
The B-tree (Balanced Tree) index is the default and most commonly used index type in PostgreSQL. It is optimal for queries that involve comparison operators (=
, <
, >
, <=
, >=
), as it efficiently organizes data in sorted order.
When to Use B-tree:
- Exact matches (
=
). - Range queries (e.g.,
salary > 50000
). - Sorting and grouping.
Example:
CREATE INDEX idx_salary ON employees (salary);
Now, any query filtering or sorting by salary
will be optimized:
SELECT * FROM employees WHERE salary > 60000;
3. GIN and GiST Indexes
GIN (Generalized Inverted Index) and GiST (Generalized Search Tree) indexes are specialized for more complex data types like arrays, full-text search, and geometric types. We should stick to B-tree index for primitive data types as B-tree will outperform GIN and GiST for primitives. Usage is as below:
- GIN is ideal for indexing array values or documents (e.g., full-text search, JSONB fields).
- GiST is better suited for geometric data types and data that doesn't fit into a strict linear order, such as ranges or network types.
Example:
For a table with a JSONB field containing customer data:
CREATE TABLE customers (
customer_id SERIAL PRIMARY KEY,
info JSONB
);
A GIN index will optimize searches within the JSONB column:
CREATE INDEX idx_info ON customers USING gin (info);
Now, querying within the JSONB field becomes efficient:
SELECT * FROM customers WHERE info @> '{"status": "active"}';
4. Composite Indexes
A composite index is an index on multiple columns, and it can dramatically improve performance for queries involving multiple conditions.
When filtering is to be done on multiple columns in where
clause and order by
clause, composite indexes outperform single-column index with significant margin. Composite index uses prefix columns if not all the columns used in where clause for that we'll have to design composite index properly. More about prefix columns in this blog ahead.
Example:
For a table orders
:
CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
customer_id INT,
order_date DATE,
status TEXT
);
If you often query by both customer_id
and order_date
, a composite index can help:
CREATE INDEX idx_customer_order ON orders (customer_id, order_date);
With this index, queries filtering by both customer_id
and order_date
are optimized:
SELECT * FROM orders WHERE customer_id = 1001 AND order_date > '2024-01-01';
Composite Index Usage with Prefix Columns
When using composite indexes, PostgreSQL can efficiently use the prefix columns (the leading columns) even if not all the indexed columns are included in the query. For example, in the composite index (customer_id, order_date)
, queries filtering by customer_id
alone can still use the index.
SELECT * FROM orders WHERE customer_id = 1001;
However, PostgreSQL cannot use this index efficiently if the query filters only by order_date
because customer_id
is the leading column.
5. Partial Composite Indexes
A partial index is a powerful optimization technique in PostgreSQL, where the index is created on a subset of the table's data. This can reduce the size of the index and make lookups faster by indexing only the relevant rows.
Example:
Let’s say you only care about orders that are active
:
CREATE INDEX idx_active_orders ON orders (customer_id, order_date) WHERE status = 'active';
This index will optimize queries that filter by customer_id
, order_date
, and status = 'active'
.
SELECT * FROM orders WHERE customer_id = 1001 AND order_date > '2024-01-01' AND status = 'active';
By narrowing the index to active orders only, PostgreSQL can skip irrelevant rows, improving performance.
6. Single-Column vs Composite Indexes
The choice between single-column and composite indexes depends on your query patterns.
- Single-column indexes are optimal when your queries focus on individual columns.
- Composite indexes are better when your queries often filter or sort by multiple columns.
Key Point: Index Prefix Columns
In a composite index, PostgreSQL can use the index efficiently if the query involves leading columns (prefix columns). If your query only involves non-prefix columns, PostgreSQL cannot use the index optimally.
Example:
For a composite index on (customer_id, order_date, status)
:
- Queries using just
customer_id
will benefit from the index. - Queries using
customer_id
andorder_date
will also benefit. - But queries using only
status
ororder_date
will not benefit.
7. B-tree vs GIN vs GiST: Which to Use?
- B-tree is the go-to for most types of queries, especially those involving comparisons or sorting.
- GIN is best for full-text search, arrays, and documents like JSONB.
- GiST is used for geometric data, range queries, and other complex types.
Conclusion
Indexes are an essential tool for optimizing query performance in PostgreSQL, but choosing the right type of index can be a nuanced decision based on the nature of your queries and the data. Whether it’s single-column indexes for simplicity, composite indexes for multi-column queries, or GIN/GiST indexes for specialized data types, a well-thought-out indexing strategy can dramatically reduce query times.
For large tables and frequent queries, understanding the concepts of prefix columns and partial indexes is crucial. A carefully designed index can shave off milliseconds or even seconds from query performance — and over time, that difference adds up.
Happy optimizing!
Top comments (1)
Waoooo