Often we may have heard of term Pages in database, Page is a fixed-length block of data, each page is of 8192 bytes, which is 8KB, page is also called as Block in postgres.
Whenever postgres reads or writes data to disk, it does so in pages.
postgres=# SELECT current_setting('block_size');
current_setting
-----------------
8192
(1 row)
PageLayout
Page is composed of attributes which holds following information:
- PageHeader — 24Bytes — hold information on offset of start freespace and end freespace. Within this Freespace, everytime new record is inserted into database, it allows us to append new entry.
- ItemIdData – pointer to actual data in block.
- Item – actual data residing in block or page.
Now, let's look at how database organises data resident in any table in form of pages on disk.
For ease of running postgres locally, I have started instance of postgres in my local using docker. All the data is organised in /postgresql directory.
Before we start creating our own tables, lets understand the structure of data organized in filesystem.
Since I am running postgres locally, data is resident in a container at /postgresql in data directory.
/postgresql$ ls -l
total 8
drwxrwxr-x 3 root root 4096 Jun 4 18:32 conf
drwx------ 19 1001 root 4096 Jul 24 17:29 data
data/base/ directory above, which has a subdirectory for each individual database in your cluster.
/postgresql/data/base$ ls -l
total 56
drwx------ 2 1001 root 4096 Jul 13 06:15 1
drwx------ 2 1001 root 12288 Jul 13 06:15 16384
drwx------ 2 1001 root 12288 Aug 7 08:10 24576
drwx------ 2 1001 root 12288 Aug 7 08:09 24604
drwx------ 2 1001 root 4096 Jun 4 18:32 4
drwx------ 2 1001 root 12288 Jul 24 17:53 5
Now, lets create new employee database, notice oid=24604 for new employee database.
postgres=# select oid, datname from pg_database;
oid | datname
------+-----------
5 | postgres
1 | template1
4 | template0
24604 | employee
Moving on, lets create new "Users" table with few attributes along with few indexes, and we shall inserting few records into it.
CREATE TABLE public.users (
id integer NOT NULL,
city character varying(255),
age integer,
company character varying(255),
gender text,
created_at timestamp without time zone,
first_name character varying(255),
last_name character varying(255)
);
CREATE INDEX user_id_idx_fname_lname ON public.users
USING btree (id) INCLUDE (first_name, last_name);
Once this table has been created, I will insert few records into them.
postgres=# select count(1) from users ;
count
-------
700
(1 row)
Since, I have created table in default database, we will see segment files created after insertion.
Clearly notice that, default page size if 8192 bytes i.e. 8KB, all the segments are in multiples of 8, which means if table is 32KB, it has 4 pages, this is done to achieve better write performance.
/postgresql/data/base/5$ ls -lh
total 7.7M
-rw------- 1 1001 root 8.0K Jun 4 18:32 112
-rw------- 1 1001 root 8.0K Jun 4 18:32 113
-rw------- 1 1001 root 120K Aug 7 17:45 1247
-rw------- 1 1001 root 24K Jun 4 18:32 1247_fsm
-rw------- 1 1001 root 8.0K Jul 15 05:32 1247_vm
-rw------- 1 1001 root 456K Aug 7 17:45 1249
-rw------- 1 1001 root 24K Jul 24 17:34 1249_fsm
-rw------- 1 1001 root 8.0K Jul 15 05:32 1249_vm
-rw------- 1 1001 root 776K Aug 7 17:45 1255
-rw------- 1 1001 root 24K Aug 7 08:21 1255_fsm
-rw------- 1 1001 root 8.0K Aug 7 08:21 1255_vm
-rw------- 1 1001 root 112K Jul 24 17:54 1259
-rw------- 1 1001 root 24K Jun 4 18:32 1259_fsm
-rw------- 1 1001 root 8.0K Jul 15 05:33 1259_vm
-rw------- 1 1001 root 64K Jun 4 18:32 13393
-rw------- 1 1001 root 24K Jun 4 18:32 13393_fsm
-rw------- 1 1001 root 8.0K Jun 4 18:32 13393_vm
In PostgreSQL, the row is stored as tuple in the Table in a column called ctid and looks like this:
What is ctid?
ctid is identifier of the tuple within its table. This is a pair (page number, tuple index) within page that identifies the physical location of the tuple.
(page_number, tuple_number)
page_number: number of the page in the shared_buffers that contains the record
tuple_number: offset number inside the in-memory page where the current version of this record resides
postgres=# select ctid, * from users limit 50;
ctid | id | city | age | company | gender | created_at | first_name | last_name
--------+-----+------------------------------------+-----+--------------------------------------+--------+---------------------+------------+--------------
(0,1) | 10 | Rivière-du-Loup | 28 | A Institute | male | 2019-04-09 06:11:03 | Margie | Beahan
(0,2) | 11 | Winterswijk | 35 | Eu Elit Nulla LLC | male | 2017-10-08 23:45:43 | Ferne | Hodkiewicz
(0,3) | 12 | Manoppello | 51 | Duis Mi Consulting | male | 2016-12-26 15:54:27 | Judy | Crona
(0,4) | 13 | Dietzenbach | 78 | Aliquam Gravida Consulting | male | 2017-08-29 20:46:38 | Dan | Krajcik
(0,5) | 14 | Parauapebas | 79 | Mauris Aliquam Inc. | male | 2017-12-31 02:09:34 | Chris | Mosciski
(0,6) | 16 | Dieppe | 52 | A Arcu Sed PC | male | 2018-06-06 23:27:14 | Summer | Schroeder
(0,7) | 17 | Waarmaarde | 34 | Dis Parturient Ltd | male | 2018-11-12 14:45:21 | Lupe | Pouros
(0,8) | 18 | San Pancrazio Salentino | 75 | Eget Odio Institute | male | 2017-03-30 11:01:13 | Demetrius | Rice
(0,9) | 19 | Mission | 25 | Ut Quam Vel Corporation | male | 2019-09-23 13:24:05 | Tyler | Purdy
(0,10) | 21 | Bahraich | 20 | Mi Consulting | male | 2016-10-30 14:08:42 | Otho | Hammes
(1,1) | 111 | Castiglione del Lago | 65 | Hendrerit Consectetuer Cursus Corp. | male | 2018-12-29 09:51:16 | Dustin | Yost
(1,2) | 112 | Alto Biobío | 36 | Sit Amet Consectetuer Inc. | male | 2018-01-17 18:23:59 | Micaela | Nicolas
(1,3) | 113 | Nieuwerkerken | 82 | Diam At PC | male | 2018-01-13 00:06:16 | Remington | Casper
(1,4) | 115 | Calera | 69 | Diam Nunc Foundation | male | 2017-09-30 06:13:46 | Greg | Gulgowski
(1,5) | 116 | Inverbervie | 25 | Venenatis Lacus Etiam Foundation | male | 2019-12-25 16:16:14 | Janae | Rau
(1,6) | 117 | Şanlıurfa | 26 | Ultrices Mauris Ipsum Ltd | male | 2019-04-23 04:10:37 | Kaitlin | Shanahan
(1,7) | 118 | Sh�diac | 45 | Malesuada Corporation | male | 2017-12-14 19:08:51 | Quinn | Trantow
(1,8) | 120 | Ruda | 49 | Consectetuer Mauris Incorporated | male | 2019-05-03 16:01:27 | Sydni | Harber
(1,9) | 121 | Jerez de la Frontera | 29 | In Hendrerit Limited | male | 2017-12-02 20:35:12 | Felipa | Hegmann
(1,10) | 122 | Curacaví | 82 | At Velit Industries | male | 2018-02-10 15:04:20 | Pat | Dickens
(1,11) | 123 | Söderhamn | 20 | Mauris Consulting | male | 2017-04-24 23:08:23 | Elouise | Schiller
(1,12) | 124 | Cochrane | 34 | Nisi Aenean Inc. | male | 2017-01-21 16:35:51 | Shirley | Armstrong
Taking few examples: (0,1) indicates, page0 and tuple1; similarly (1,7) indicates tuple #7 in page1.
postgres=# select oid, relname, relfilenode from pg_class where relnamespace = to_regnamespace('public')::oid;
oid | relname | relfilenode
-------+-------------------------+-------------
32782 | users | 32782
32790 | user_id_idx | 32790
32792 | user_id_idx_fname_lname | 32792
32793 | user_id_fname_lname | 32793
Indexes as tables!
🤔
Lets see how its organised in filesystem; it is evident that indexes like user_id_idx_fname_lname(32792) and user_id_fname_lname(32793) are also saved like tables, but they store indexed fields.
/postgresql/data/base/5$ ls -lh 32*
-rw------- 1 1001 root 72K Jul 24 17:39 32782
-rw------- 1 1001 root 24K Jul 24 17:39 32782_fsm
-rw------- 1 1001 root 8.0K Jul 24 17:36 32786
-rw------- 1 1001 root 32K Jul 24 17:40 32790
-rw------- 1 1001 root 48K Jul 24 17:49 32792
-rw------- 1 1001 root 40K Jul 24 17:53 32793
We can query each of these individual tables by using their page#.
postgres=# select * from page_header(get_raw_page('users', 8));
lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid
-----------+----------+-------+-------+-------+---------+----------+---------+-----------
0/25F1468 | 0 | 0 | 304 | 1024 | 8192 | 8192 | 4 | 0
(1 row)
When we query page-8, we see the lower offset is at 304 and higher offset is at 1024. Page-8 has (1024-304)=702 bytes of free space i.e upper-lower.
Fill Factor
Fill factor determines maximum percentage of space to be used within each page is initially reserved for future updates. By default fill factor is 90 for pages, when PostgreSQL inserts tuples to a page, it leaves 10% room to accommodate future updates to existing rows. When insertion happens, if size of page exceeds fill factor, it will lead to page split.
Page Split
When new record is inserted into table, this key has to be added into index as well. Postgres will attempt to find appropriate location beneath B-Tree, and during this process if page is already full this will invoke page split to accommodate new key. During page split, it has to shift/move other tuples out of page and appropriately add new key to page.
Top comments (0)