1.1 Logical Structure Of Database Cluster.
In PostgreSQL, the term database cluster refers to a collection of databases, not a group of servers. A PostgreSQL server runs on a single host and manages a single database cluster. A database in a cluster is a group of database objects. According to relational database theory, a data structure used to store or reference data is known as a database object. Examples include (heap) tables, indexes. Sequences, views, functions and more. PostgreSQL considers database themselves to be database objects that are logically separated from each other. While all other objects like indexes and tables belong to their own respective databases. These database objects are internally managed by their respective 4-byte integers called object identifiers (ODIs).
Figure 1. Logical structure of a database cluster
1.2 Physical Structure Of a Database Cluster.
A database cluster is a single directory which is known as base directory. Which contains some subdirectories and lots of files. On executing initdb command a new database cluster is initialized and a base directory will be created under the specified directory. You can set the path of the base directory in the environment variables though it's not mandatory. A database is a subdirectory located under the base subdirectory, and every table and index is stored under the subdirectory of it's respective database. There are various subdirectories that contain specific data and configuration files. Although PostgreSQL supports tablespaces, the definition of the term differs from other RDBMS. In PostgreSQL, a tablespace refers to a directory that holds some data outside of the base directory.
1.2.1 Tablespaces.
In PostgreSQL a tablespace is an additional area outside the base directory. This has been implemented in version 8.0.
Figure 2. A tablespace in database cluster
1.3 Internal Layout Of Heap Table Files
In PostgreSQL data files (heap table and index) are divided into pages of fixed length. And the default page length is 8192 bytes (8KB). Pages are numbered sequentially from 0 called block numbers, and if pages are filled up by data PostgreSQL adds new empty pages to the end of the file.
The internal layout of the pages depends on the data file types. Heap tuple is a record data itself and are stacked in order. The line pointer is 4 byte long which holds a pointer to each heap tuple. Also called item pointer.Header data is allocated in the beginning of the page. It is 24 byte long which contain general information about the page.
1.4 The Method Of Writing And Reading Tuples
After the insertion of a second tuple, it is positioned immediately after the first tuple and the second line pointer is added to the first one and directed to the second tuple. The pointers pointing to the second line pointer and second heap tuple are updated accordingly. And for reading we have two methods. Sequentially scan all the tuples in all pages by scanning line pointer in each page. A B-Tree index scan is a method that involves utilizing an index, instead of a sequential scan, to locate the desired tuple. This search is performed based on the TID values.
Hopefully this post has given you a solid foundation for further exploring the inner workings of PostgreSQL, and a depper appreciation, for the complexity and power of this powerful database management system.
Top comments (0)