Introduction
PostgreSQL, often referred to as Postgres, is a powerful and widely-used open-source relational database system with over 35 years of active development.
In this post, we will discuss the fundamentals of the architecture of postgres, exploring its key components and their interactions.
Architecture
PostgreSQL is a client/server type relational database management system with a multi-process architecture and runs on a single host.
Don't worry if you do not understand what the statement above means. We will take it in bits and discuss it individually.
The Client-Server Model
Postgres uses a client/server model just like in the case of modern web development. In this case, the clients connect to the server which is responsible for running the database. The client is simply the application that provides a query for the database server to run and as such could be anything ranging from a web server, a desktop application, or even other instances of Postgres databases. The client and server communicate via a TCP/IP network connection. TCP/IP stands for Transmission Control Protocol/Internet Protocol and is a suite of communication protocols used to by network devices to communicate and share data on the Internet.
Multi-Process Architecture
When we say Postgres uses a multi-process architecture, we are simply saying that for Postgres to handle multiple connections from clients to the database, the database server creates(forks) a new process for each of those connections. That process would then be responsible for serving that client only. Each process is responsible for managing its own memory space and resources, ensuring isolation and security.
There are different types of Postgres process types and they include
- Postmaster (Daemon) Process
- Backend Process
- Client Process
- Checkpointer Process
- Writer Processes
- Background Process
Postmaster (Daemon) Process: this is the first process that starts when Postgres is started. It is the job of the postmaster process to perform recovery, initialize shared memory, manage incoming client connections, and fork new backend processes to handle those connections.
Backend Process: this process is responsible for handling user queries and transmitting the result to the user.
Client Process: this is the process that is responsible for handling user connections. It is often forked by the postmaster process and dedicated to a particular client whenever a connection is made.
Writer Process: this process is responsible for asynchronously flushing modified data pages from memory to disk which improves performance.
Checkpointer Processes: this process is responsible for creating checkpoints after which dirty buffers are flushed to the disk.
Background Process: these are a collection of processing that are responsible for running tasks such as automatic vacuuming to reclaim disk space, analyzing statistics for query optimization, and managing replication.
Memory Architecture
Postgres memory can be classified into two broad categories:
- Local Memory: this is memory allocated by each backend process for its use. This memory is allocated by the backend process to enable it to process queries and can only be accessed by the backend process that owns it. The local memory can be divided into sub-areas;
Work Mem: this is used for sorting tuples by ORDER BY and DISTINCT operations, and for joining tables.
Maintenance work mem: this is used for some kinds of maintenance operations in Postgres.
Temp buffers: this is used for storing temporary files and tables.
- Shared Memory: this category of memory is allocated by the Postgres server when it starts up and is used by all the running Postgres processes. This allows processes to communicate with each other efficiently and share data without having to constantly read and write to disk. This area is further divided into several fixed-sized sub-areas.
Shared Buffer: this acts as a cache for frequently accessed data pages. They hold copies of data pages from disk in memory, reducing disk I/O and improving query performance. Written or modified data are called dirty pages.
WAL Buffer: WAL which stands for Write-Ahead Logging is a technique used by Postgres to guarantee the durability of database modifications. It ensures that changes are first written to a transaction log, known as the WAL, before being applied to the actual data files on disk. The WAL buffer is responsible for storing these changes until they are written to disk. This mechanism enables the recovery of data in the event of a system crash or failure.
Commit Log: the commit log is a component of the transaction log where information about the state of all transactions is recorded. There are four transaction states used by Postgres
IN_PROGRESS
,COMMITTED
,ABORTED
, andSUB-COMMITTED
.
Query Processing
When a query is executed in Postgres, it goes through several stages to be processed and produce the desired results:
Parsing: in this step, the parser analyzes the query analyzes the syntax of the query, and creates what is called a parse tree which is an internal representation of the query from the plain SQL text.
Query Rewriting: in this step, the parse tree is transformed and optimized by the PostgreSQL query optimizer.
Planning: here, the query planner generates an execution plan, which specifies the most effective and efficient sequence of operations needed to retrieve or modify the data
Execution: in this step, the execution engine carries out the execution plan by accessing the tables and indexes in the order that was created by the plan tree.
In this post, we reviewed the fundamentals of the architecture of Postgres. While this post is not exhaustive, it provides a basis from which you can dive deeper.
Top comments (0)