DEV Community

Cover image for Improving Performance of PostgreSQL queries
Jakub Andrzejewski
Jakub Andrzejewski

Posted on

Improving Performance of PostgreSQL queries

I recently wanted to implement some raw SQL queries and work on the efficiency of them so that returning data from database will be faster. After reading few articles and implementing some of the changes myself I decided to summarize them here so that you could use it in your projects :)

Now, let's talk about PostgreSQL. It's a super cool and flexible open-source relational database management system. While PostgreSQL is already pretty awesome at what it does, there's always room to make those queries zip along even faster! Want to know how? We've got some nifty strategies to supercharge your PostgreSQL queries.

Enjoy!

Limit the number of rows returned by a query

In case of small tables, we may not need to take care about the number of returned rows (while I highy suggest to limit them anyway!) in case of bigger tables, this can be extremely useful for improving performance of queries.

We can achieve that in SQL by using the keyword LIMIT with following query:

SELECT * FROM products LIMIT 10
Enter fullscreen mode Exit fullscreen mode

Use JOIN instead of subqueries

If there are many records in the table, running a query with subqueries inside may not be performant - instead we could use the JOIN keyword which can be significantly faster.

Instead of using subqueries like following:

SELECT * FROM orders WHERE product_id IN (SELECT id FROM products WHERE name = 'Phone');
Enter fullscreen mode Exit fullscreen mode

We could use JOIN keyword like this:

SELECT orders.* FROM orders JOIN products ON orders.product_id = products.id WHERE products.name = 'Phone';
Enter fullscreen mode Exit fullscreen mode

This can be much faster if your table has big amount of rows.

Use efficient data types for columns

For rows where there are frequent calculations like average it can be beneficial to check what is the datatype for the column. For example, it will be more efficient to use smallint rather than integer. You can change it like following:

ALTER TABLE products ALTER COLUMN price TYPE smallint;
Enter fullscreen mode Exit fullscreen mode

Use PREPARE for frequently executed queries

If you know that some queries will be executed frequently, for example a Fetch_Popular_Products_Query, you can use a keyword PREPARE like following:

PREPARE get_products_by_name (text) AS SELECT * FROM products WHERE name = $1;
EXECUTE get_products_by_name('Phone');
Enter fullscreen mode Exit fullscreen mode

By using this approach, the query will be prepared once and the execution will be reused to improve performance.

Using INDEX for frequent queries

If you frequently search for products by their name, you can create an index of the name column like following:

CREATE INDEX products_name_idx ON products (name);
Enter fullscreen mode Exit fullscreen mode

It will be especially effective if you would use multiple WHERE or JOIN keywords in your regular query.

Not use wildcard characters

Let's say that we have a query that will search for all products to find those whose name ends with phone:

SELECT * FROM products WHERE name LIKE '%phone';
Enter fullscreen mode Exit fullscreen mode

This query will perform a sequential scan of the entire products table, which can be slow if the table is large. To optimize this query, you can rewrite it like this:

SELECT * FROM products WHERE name LIKE 'phone%';
Enter fullscreen mode Exit fullscreen mode

This query will use the index created in previous point and perform a much faster index scan.

Debugging queries with EXPLAIN keyword

The EXPLAIN command will output the execution plan for the query, which can help you identify any performance issues and you can use it like following:

EXPLAIN SELECT COUNT(*) FROM products JOIN orders ON products.id = orders.products_id;
Enter fullscreen mode Exit fullscreen mode

Use full text search service for faster searching by query/ID

Searching through milions of records to find a specific record by a name, or more difficult case - part of name, can be really slow. To solve this problem, you could implement a full test search service like ElasticSearch that can be much faster.

How this would work is that you will store key/value pairs inside your ElasticSearch database in a form of POSTGRESQL_ID/RECORD_NAME, then you will search through the values to find a full or part of the name and assigned PostgreSQL ID, and finally you will perform a PostgreSQL query WHERE id = 'POSTGRESQL_ID' instead of WHERE name LIKE 'RECORD_NAME' which can be significantly more performant as you will be querying for specific record rather than records that match certain search criteria.

This approach looks amazing for performance and optimizing queries, but you have to take into consideration that it will add additional layer of complexity and another service to your infrastructure that you would need to maintain.

πŸ“– Learn more

If you would like to learn more about Vue, Nuxt, JavaScript or other useful technologies, checkout VueSchool by clicking this link or by clicking the image below:

Vue School Link

It covers most important concepts while building modern Vue or Nuxt applications that can help you in your daily work or side projects πŸ˜‰

βœ… Summary

Well done! You have just learned how to check if a slot is empty in Vue.

Take care and see you next time!

And happy coding as always πŸ–₯️

Top comments (8)

Collapse
 
clasicrando profile image
Steven Thomson

I appreciate people trying to put out content that will help others learn but feel like I need to provide more details/clarify some of the points made here. I am going to assume that you are not specialized in SQL databases or query optimizations since your profile appears to be JS related (Side note: you might want to look at your summary since it looks like it came from another article). This is not to say I think you are not knowledgeable in tech, but I would recommend spending more time on a topic before posting.

Limit the number of rows returned by a query
Limiting can be helpful but you should only really ever do this while developing or analyzing data. The main case where limiting would be required/helpful would be paginating results for a web application (even then other strategies are much faster, see youtube.com/watch?v=zwDIN04lIpc). Otherwise, limiting your results for performance reasons wouldn't really be feasible.

Use JOIN instead of subqueries
This is almost never an issue with modern RDBMS systems. Your database's query optimizer will likely modify your query to convert it to a JOIN query before executing. You can actually see/check this by running an EXPLAIN on your query to see the plan it generates.

Use efficient data types for columns
I see this mentioned a lot when it comes to performance but it's mostly a myth that will likely never be required by 99.999% of developers. Although there might be a case for storage size with BILLIONS of rows of data, the difference between a 32bit integer and a 16bit integer computational will almost never matter and is generally a sign of premature optimizations.

Use PREPARE for frequently executed queries
This can be true if you are regularly accessing your data directly through a driver that does not provide a prepared statement mechanism or does not implicitly create prepared statements in the database. Chances are, your library will always provide a mechanism to do this and you should not try to prepare statements yourself. Although this can be an area of exploration once all other optimization routes have been considered.

CREATE INDEX products_name_idx ON products (name);
This advice is good and should be your first thought when you have verified your query is sound from a SQL perspective. However, I have seen people abuse indexes way too often. The thing to remember is that many indexes on a single table can hurt the performance of tables that are modified often and when you query a table, you can only access it through 1 index at a time. So it doesn't matter if you add indexes to a table if your query is getting complex and won't use the index you just created.

Not use wildcard characters
Not sure what this section is about? The 2 queries are fundamentally different since they search for rows that end with and start with 'phone' respectively. Yes, querying against a pattern will always be slower since you need to do a perform a scan of sorts rather than a seek. I think what you got here is that you noticed that using the filter name LIKE 'phone%' was faster. This is likely because it is checking the start of the name field (which has an index from your previous section) and thus able to perform a range scan of the table rather than a full scan. If that is true, the database would know what chunk of the index your values reside (indexes are by default sorted in ascending order so knowing the first few characters of a values to find helps) so it doesn't need to search the entire table.

Debugging queries with EXPLAIN keyword
Definitely solid advice for anyone trying to dig deeper into why queries are slower and what needs to change about it. I won't go super deep into this topic but it's DEEP and should be a last resort when conventional advice to improve your query has not yielded much of a performance improvement.

Collapse
 
vikas_rathod_a07c836ce34c profile image
vikas rathod

I'm working as a backend developer since 2 years and still afraid by looking at complex queries which includes multiple joins n all so want to get expertise in that so please anyone any suggestions n cource that will assist in as much as simplified way @clasicrando

Collapse
 
martinbaun profile image
Martin Baun

Start with the basics: ANALYZE, EXPLAIN, and add indexes. Great guideline! Thanks for sharing

Collapse
 
jacobandrewsky profile image
Jakub Andrzejewski

Thanks for the hint. Appreciate it! :)

Collapse
 
saleembasit profile image
Saleem Basit

I think, phone% and %phone are meant to return different results.

Or am I missing something?

Collapse
 
finndersen profile image
Finndersen

Yes they're completely different queries so can't be used interchangeably. This article has a lot of misleading information

Collapse
 
syxaxis profile image
George Johnson

Oracle DBA for 20+ years and the one thing we DBAs hate is people seeing databases a "black boxes", always ready to support any dev willing to do some basic performance tuning to get their queries up to scratch. Great article!

Collapse
 
nafg profile image
nafg

Have you benchmarked these rules? Or do you have a reliable source?