Summary of Chapter# 7 : "Heap Only Tuple and Index-Only Scans" from the book "The Internals of PostgreSQL" Part-2

#postgres #apacheage #database #agenssql

This blog aims to assist you in understanding the final concepts of Chapter:7 [Heap Only Tuple and Index-Only Scans] from the book The Internals of PostgreSQL.

Note: Ensure that you have a thorough understanding of
Chapter 7 Part-1 and basics of PostreSQL before we proceed to Chapter 7 Part-2, as it forms the foundation for our exploration.

So, Let's Continue:

PostgreSQL removes dead tuples if possible, as in the pruning process, at an appropriate time. In the document of PostgreSQL, this processing is called defragmentation.

Defragmentation of the dead tuples in PostgreSQL is depicted in figure below:

Note: the cost of defragmentation is less than the cost of normal VACUUM processing because defragmentation does not involve removing the index tuples.
HOT reduces the consumption of both indexes and tables of pages; this also reduces the number of tuples that the VACUUM processing has to process.
HOT has a good influence on performance because it eventually reduces the number of insertions of the index tuples by updating and the necessity of VACUUM processing.

The Cases in which HOT is not available in PostgreSQL is depicted in figure below:

When the updated tuple is stored in the other page, which does not store the old tuple, the index tuple that points to the tuple is also inserted in the index page. Refer to Figure(a) above.
When the key value of the index tuple is updated, the new index tuple is inserted in the index page.

Index-Only Scans

To reduce the I/O (input/output) cost, index-only scans (often called index-only access) directly use the index key without accessing the corresponding table pages when all of the target entries of the SELECT statement are included in the index key.
This technique is provided by almost all commercial RDBMS, such as DB2 and Oracle. PostgreSQL has introduced this option since version 9.2.
PostgreSQL utilizes the visibility map of the target table to address this issue.
The visibility map indicates whether all the tuples stored on a page are visible or not.
If all tuples on a page are visible, PostgreSQL can rely on the index tuple's key and avoid accessing the corresponding table page
to check visibility.
In such cases, the visibility information provided by the index tuple is sufficient.
However, if any tuple on a page is not visible, PostgreSQL needs to read the table tuple pointed to by the index tuple and verify its visibility.
This process ensures that the visibility of tuples is accurately determined, either by using the index tuple or by accessing the table data when necessary.
‘Tuple_18’ need not be accessed because the 0th page that stores ‘Tuple_18’ is visible, that is, all tuples including Tuple_18 in the 0th page are visible.
‘Tuple_19’ needs to be accessed to treat the concurrency control because the visibility of the 1st page is not visible.

How Index-Only Scans performs in PostgreSQL is depicted in figure below:

I hope, this blog has helped you in understanding the final concepts of Heap Only Tuple and Index-Only Scans in PostreSQL.

Check out summary of Chapter : 8 Part-1

If you want to understand PostgreSQL In-Depth.

DEV Community

Summary of Chapter# 7 : "Heap Only Tuple and Index-Only Scans" from the book "The Internals of PostgreSQL" Part-2

Index-Only Scans

Top comments (0)

Read next

Research AI Assistant Application - RAG

AI Bot: Intelligent Financial Query Assistant Powered by PostgreSQL and Ollama

🚀 pgai Vectorizer: Automate AI Embeddings With One SQL Command in PostgreSQL

Deploying PostgreSQL, MySQL, and MongoDB on Kubernetes: A Step-by-Step Guide