Hi, I just wanted to share a short summary of what I learned about PostgreSQl Database from Chapter 1 of The Internals of PostgreSQL.
1.1 Logical Structure of Database Clusters
A database cluster is a bunch of databases managed by a PostgreSQL server running on a single host. So, it's not a group of servers but just one server managing multiple databases. In PostgreSQL, databases contain many objects, such as tables, indexes, sequences, views, and functions, which store or reference data. Each database is separate, and its objects belong to it. For instance, tables, indexes, etc. belong to their respective databases.
1.2 Physical Structure of Database Clusters
The physical structure of a PostgreSQL database cluster is essentially a directory with a base directory as its root. The base directory is created when the initdb utility initializes a new database cluster, and it typically resides at a path specified by the environment variable PGDATA. Each database in the cluster is a subdirectory under the base directory, and each table or index is stored as at least one file under the subdirectory of the database to which it belongs. Additionally, there are subdirectories for particular data and configuration files. While PostgreSQL does support tablespaces, the meaning of the term is different from other RDBMS; in PostgreSQL, a tablespace is simply a directory that contains data outside of the base directory.
1.3 Internal Layout of a Heap Table File
In PostgreSQL, data files are divided into pages of fixed length. A page in a heap table file consists of heap tuples, line pointers, and header data. Heap tuples contain the actual record data and are stacked in order from the bottom of the page. Line pointers form an array that holds pointers to the heap tuples, and each index is numbered sequentially from 1. Header data, defined by the PageHeaderData structure, contains general information about the page, including the LSN of the last XLOG record written to the page and the checksum value of the page. To identify a tuple within the table, a tuple identifier (TID) is used, which comprises a pair of values: the block number of the page that contains the tuple and the offset number of the line pointer that points to the tuple.
1.4 The Methods of Writing and Reading Tuples
This section explains the methods of writing and reading heap tuples in PostgreSQL. When a new tuple is added to a table, a new line pointer is pushed onto the array to point to the new tuple, and the header data within the page is updated accordingly. To read heap tuples, two typical access methods are sequential scan and B-tree index scan. Sequential scan involves reading all tuples in all pages sequentially by scanning all line pointers in each page. In contrast, B-tree index scan involves using an index file to locate the desired heap tuple based on the TID value obtained from the index tuple.
Thats all for this time and I hope to share what I learn further in the future with you guys.
Top comments (0)