DEV Community

Nile Lazarus
Nile Lazarus

Posted on • Edited on

Demystifying the Internals of PostgreSQL - Chapter 2

Welcome back to my blog series on the internal workings of PostgreSQL. Following up from my previous blog where we covered Chapter 1 of The Internals of PostgreSQL by Hironobu SUZUKI, in this blog I will be covering Chapter 2.
Chapter 2 explains the 'Process and Memory Architecture' which, as the name suggests, details the different server, backend, and background processes of PostgreSQL as well as the local and shared memory architecture.
Understanding these concepts can help developers and database administrators to optimise performance and finetune backend database processes.

Process Architecture

Fig.2.1
PostgreSQL uses a client-server type multi process architecture which runs on a single host. The PostgreSQL server is a collection of multiple processes all working together to manage one database cluster. It includes the following types of processes:

  1. Postgres server process: parent process for all database cluster management processes
  2. Backend process: responsible for handling all queries and statements issues by the client
  3. Background processes: performs processes of each feature for database management such as VACUUM and CHECKPOINT processes
  4. Replication associated processes: they streamline the application. They are discussed in further detail in Chapter 11
  5. Background worker processes: can perform any processing issued by the user This was the basic gist of each type of process. Now let's go into a bit more detail about the first three which are the main processes in PostgreSQL.

Postgres Server Process

As mentioned above, it is the parent of all processes in the PostgreSQL server and was also referred to as 'postmaster' in earlier versions of PostgreSQL.
The work flow of this process begins when the pg_ctl utility is executed with start option. It then allocates a shared memory area and starts background, replication associated, and background worker processes*. It then waits for connection requests, which when received, prompt it to also start backend processes. Once started, the background process handles all queries from the client.
The port on which the server process listens is 5432 by default but can be changed if required. It is best to assign different port numbers if you are running more than one PostgreSQL server at the same time.

Backend Processes

The backend process, also referred to as postgres, is called by the postgres server process as explained above and it is responsible for handling all queries issued by the connected client through a single TCP connection.
A backend process is allowed to operate on only one database, which must be explicitly specified during connection.
PostgreSQL allows multiple clients to connect simultaneously. The maximum number of clients which can do so can be set through the max_connections configuration parameter (default is 100). To improve performance when many clients frequently connect and disconnect, a connection pooling middleware like pgbouncer or pgpool-II is often used.

Background Processes

This chapter provides an introduction to background processes listed below. They are covered in further detail in later chapters.

  1. Background Writer: this process writes dirty pages from the shared buffer pool to persistent storage (e.g., HDD, SSD) gradually. In earlier versions (9.1 or earlier), it also performed checkpoint processes
  2. Checkpointer: starting from version 9.2, this process is responsible for performing checkpoint processes
  3. Autovacuum Launcher: this process periodically invokes autovacuum worker processes for vacuum processes. It requests the creation of autovacuum workers from the postgres server
  4. WAL Writer: this process writes and flushes WAL (Write-Ahead Log) data from the WAL buffer to persistent storage on a regular basis
  5. Statistics Collector: this process collects statistics information, such as pg_stat_activity and pg_stat_database, to provide insights into database activity
  6. Logging Collector (Logger): this process writes error messages into log files, providing a means to monitor and troubleshoot issues
  7. Archiver: this process performs archiving logging, which involves copying WAL files to an archive storage for backup or other purposes

Memory Architecture

Fig.2.2
PostgreSQL's memory architecture is divided into two categories: local memory area and shared memory area.

Local Memory Area

Each backend process is allocated a local memory area for query processing. This memory area is then divided into sub-areas with fixed or variable sizes. Listed below are the major sub-areas:

  1. work_mem: the Executor uses this area for sorting tuples during operations such as ORDER BY and DISTINCT. It is also utilised for joining tables through merge-join and hash-join operations
  2. maintenance_work_mem: this area is used for specific maintenance operations like VACUUM and REINDEX
  3. temp_buffers: the Executor utilises this area for storing temporary tables

Shared Memory Area

The PostgreSQL server allocates a shared memory area when its starts up. The shared memory area is divided into multiple fixed size sub-areas. Listed below are the main sub-areas:

  1. shared buffer pool: PostgreSQL loads pages from tables and indexes into this area from persistent storage and directly operates on them
  2. WAL buffer: PostgreSQL utilises the Write-Ahead Log (WAL) mechanism to ensure data integrity in case of server failures. The WAL buffer serves as a buffering area for WAL data before writing it to persistent storage.
  3. commit log: the Commit Log (CLOG) maintains the transaction states (e.g., in_progress, committed, aborted) for concurrency control mechanisms

In addition to these major sub-areas, PostgreSQL also allocates sub-areas for various access control mechanisms (e.g., semaphores, locks), background processes (e.g., checkpointer, autovacuum), and transaction processing (e.g., save-points, two-phase commit). Other sub-areas may exist for specific purposes.

Top comments (0)