Marcell Cruz

Posted on Jun 21, 2021 • Edited on Jun 24, 2021

Postgres indexes

#tutorial #postgres #webdev

How To Create An Index
Removing an Index
Identify If A Query Uses An Index
List All Indexes In A Table
Index Methods
Indexing Techniques
Who decides when to use an index or not?
You Need A Lot Of Data To Test Indexes
Creating Indexes Locks Your DB
Reindexing

Indexing is a technique to make queries run faster, different indexes are suitable for different queries, the main differences are between an exact match, ranges and a fulltext, in a nutshell if your query has an exact match or a range you should use the default method, otherwise use gin.

most of the information from this post came from here, have a read if you want to dig deeper.

How To Create An Index

CREATE INDEX [UNIQUE] INDEX_NAME ON TABLE_NAME [USING METHOD]
(
  COLUMN_NAME [ASC | DESC] [NULL { FIRST | LAST}]
)

e.g:

CREATE INDEX idx_name ON users(name)

Removing An Index

DROP INDEX  [ CONCURRENTLY]
[ IF EXISTS ]  index_name 
[ CASCADE | RESTRICT ];

Identify If A Query Uses An Index

You can use EXPLAIN to identify if a query is using an specific index

EXPLAIN SELECT * FROM BOOKS WHERE BOOK = "MY BOOK";

if your query is using the index something like "Index Scan "
will appear e.g:

Index Is Not Being Used

                        QUERY PLAN
----------------------------------------------------------
 Seq Scan on cards  (cost=0.00..62.50 rows=828 width=242)

Index Is Being Used

                               QUERY PLAN
------------------------------------------------------------------------
 Index Scan using my_index on cards  (cost=0.28..8.29 rows=1 width=242)
   Index Cond: ((title)::text = 'a'::text)

my_index is the name of the index, cards is the name of the table and
title is the name of the column

List All Indexes In A Table

The following will list all indexes related to your tables, you can also
filter indexes for a specific table or field

select tablename, indexname from pg_indexes where schemaname = 'public';

Index Methods

The following are the most common index methods.

Generalized Inverted Indexes(GIN)

Good for full-text search, maps multiple values to a column

Generalized Search Tree(Gist)

Allows for balanced btree, good for full-text search, used for geometric data types(poligons points etc..)
are usefull for operations besides comparison and range.

Hash Index

Not worth the complexity and work to manage it, coudn't find any common example that couldn't be done with the other methods in an easier way with better results, you can try to convince me otherwise in the comments if you want.

Btree Index

Doesn't work with like queries, works pretty well with caching, it's the default index, if you don't pass the method option when creating the index it will be created with this method.

These are all the most common indexes, now let's talk about some techniques when creating indexes

Expression Indexes

expression indexes are usefull when you need to convert the data before searching it.

CREATE INDEX USERS_LOWER_EMAIL ON USERS(LOWER(EMAIL));

the above index will be used in searches like

WHERE lower(email) = '<lowercased-email>'

CREATE INDEX articles_day ON articles ( date(published_at) )

the above can will be used for

WHERE date(articles.published_at) = date('2011-03-07')

Partial Indexes

partial index is an index applied to just part of the data, the following is a usefull partial index

CREATE INDEX articles_flagged_created_at_index ON articles(created_at) WHERE flagged IS TRUE;

Unique Indexes

create unique <index_name> .....

Improves query performance, if you try to create a uniq index in a column that has rows with the same value the index creation will fail.
if you try to insert values that already exist in a column with a uniq index a error occurs

ERROR:  duplicate key value violates unique constraint "uniq_index"
DETAIL:  Key (title)=(title 1) already exists.

When it makes sense to use indexes?

The general rule is, the bigger your table is the better
index only makes sense when the cost of reading data from the index(hitting the disc) and then reading data from the table is lower than a table scan, so in general big tables
are better for indexes because you don't need to scan the whole table and the cost of reading two different values in two different tables isn't very high comparatively with the full table scan, when in doubt just create a test case and test it or better yet, always test it first.

What decides when to use an index or not?

The query planner decides to use an index or not, just because an index exists and your query
matches the initial criteria for using that index, doesn't mean that the index will be used.
the query planner decides when to use the index or not based on a multitude of factors, the biggest
one is how much data the query will hit, if it's a lot it's more likely that the query planner will use the index
indexes make more sense the bigger the table is.

You Need A Lot Of Data To Test Indexes Properly

A good general rule is to test as close to production as you can. the query planer might decide not to use your index if you test it locally in small sets of data, which will make you waste a lot of time benchmarking with non sensical data, trust me I've learned the hard way.
so you should create a lot of data to test your indexes, the general rule is to test as close to production as possible, also don't forget to activate /timing when you're testing your queries in psql.

Creating Indexes Locks Your DB

If it's a big table your db can be locked for hours, use create index concurrently to mitigate the problem

Reindexing

after sometime your index will not be optimized anymore, that's when you need to reindex it
a good reindex technique is to create another index with a different name and then droping the old one.

and that's pretty much all the basic information that you need to start creating indexes, go create some indexes and make your queries run faster!

DEV Community