DEV Community

Dmitry Romanoff
Dmitry Romanoff

Posted on

Finding Tables Without Primary Keys in PostgreSQL Databases

In database management, primary keys are crucial for ensuring data integrity and establishing relationships between tables. However, sometimes, tables may be created without primary keys, which can lead to data anomalies. In this article, weโ€™ll explore a Bash script that connects to a PostgreSQL instance, iterates through its databases, and identifies tables that lack primary keys.

Overview of the Script

The script is designed to:

  1. Connect to a PostgreSQL database instance.
  2. Retrieve a list of databases, excluding system databases.
  3. For each database, check for tables that do not have primary keys.
  4. Output the results.

Prerequisites

To run the script, ensure you have:

  • Bash installed.
  • Access to a PostgreSQL instance with sufficient permissions to query system catalogs.
  • The psql command-line tool available.

The Script

#!/bin/bash

###################################################################################
#
# iterate_all_the_DBs_on_the_DB_instance_and_find_the_tables_with_no_pks_access.sh
#
# This scripts iterates databases and 
# find the tables with no primary keys
#
# Date: 10-Jan-2024
#
# Author: Dmitry
#
###################################################################################

trim() {
    local var="$*"
    # remove leading whitespace characters
    var="${var#"${var%%[![:space:]]*}"}"
    # remove trailing whitespace characters
    var="${var%"${var##*[![:space:]]}"}"
    printf '%s' "$var"
}

export PGCONNECT_TIMEOUT=15

export DB_HOST="localhost"
export DB_PORT="5432"

the_query_number_of_dbs_to_process="
select count(M.*) 
from 
   (select datname 
      from pg_database 
      where datname not in ('postgres', 'template0', 'template1') 
      order by 1) M
"

the_query_dbs_to_process="
select M.* 
from 
   (select datname 
   from pg_database 
   where datname not in ('postgres', 'template0', 'template1')  
   order by 1) M
"

num_of_dbs_to_process=$(psql -h $DB_HOST  -p $DB_PORT -U postgres -t -c "$the_query_number_of_dbs_to_process" postgres)
num_of_dbs_to_process=`trim $num_of_dbs_to_process`

echo "The number of DBs to process: $num_of_dbs_to_process"

export the_yyyymmdd=$(date '+%Y%m%d')
export hh24miss=$(date '+%H%M%S')

idx=1

for DBs_to_process in $(psql -h $DB_HOST  -p $DB_PORT -U postgres -t -c "$the_query_dbs_to_process" postgres); do

nTablesWithoutPKs=$(psql -h $DB_HOST  -p $DB_PORT -U postgres -d $DBs_to_process -t << EOF
select count(1)
from information_schema.tables tab
left join information_schema.table_constraints tco 
          on tab.table_schema = tco.table_schema
          and tab.table_name = tco.table_name 
          and tco.constraint_type = 'PRIMARY KEY'
where tab.table_type = 'BASE TABLE'
      and tab.table_schema not in ('pg_catalog', 'information_schema')
      and tco.constraint_name is null;
EOF
)

nTablesWithoutPKs=`trim $nTablesWithoutPKs`

if [ "$nTablesWithoutPKs" -gt "0" ]; then

the_output=$(psql -h $DB_HOST  -p $DB_PORT -U postgres -d $DBs_to_process -t << EOF
select tab.table_schema,
       tab.table_name
from information_schema.tables tab
left join information_schema.table_constraints tco
          on tab.table_schema = tco.table_schema
          and tab.table_name = tco.table_name
          and tco.constraint_type = 'PRIMARY KEY'
where tab.table_type = 'BASE TABLE'
      and tab.table_schema not in ('pg_catalog', 'information_schema')
      and tco.constraint_name is null
order by table_schema,
         table_name;
EOF)

echo "DB: $DBs_to_process"
echo "$the_output"

fi

idx=$(( idx + 1 ))

done

echo " "
echo "End"
echo " "
Enter fullscreen mode Exit fullscreen mode

The Script Breakdown

Hereโ€™s a detailed breakdown of the script:

1. Setup and Configuration

export PGCONNECT_TIMEOUT=15
export DB_HOST="localhost"
export DB_PORT="5432"
Enter fullscreen mode Exit fullscreen mode

The script starts by setting connection parameters for the PostgreSQL instance, including the host and port.

2. SQL Queries for Database Retrieval

the_query_number_of_dbs_to_process="
select count(M.*) 
from 
   (select datname 
      from pg_database 
      where datname not in ('postgres', 'template0', 'template1') 
      order by 1) M
"

the_query_dbs_to_process="
select M.* 
from 
   (select datname 
   from pg_database 
   where datname not in ('postgres', 'template0', 'template1')  
   order by 1) M
"
Enter fullscreen mode Exit fullscreen mode

Two SQL queries are defined:

  • One to count the number of databases to process.
  • The other to list the names of those databases, excluding the default system databases.

3. Counting and Processing Databases

num_of_dbs_to_process=$(psql -h $DB_HOST  -p $DB_PORT -U postgres -t -c "$the_query_number_of_dbs_to_process" postgres)
num_of_dbs_to_process=$(trim "$num_of_dbs_to_process")

echo "The number of DBs to process: $num_of_dbs_to_process"
Enter fullscreen mode Exit fullscreen mode

The script executes the first query to get the number of databases and trims any whitespace from the output.

4. Iterating Through Each Database

for DBs_to_process in $(psql -h $DB_HOST  -p $DB_PORT -U postgres -t -c "$the_query_dbs_to_process" postgres); do
Enter fullscreen mode Exit fullscreen mode

The script iterates over each database and performs the following checks:

5. Checking for Tables Without Primary Keys

nTablesWithoutPKs=$(psql -h $DB_HOST  -p $DB_PORT -U postgres -d $DBs_to_process -t << EOF
select count(1)
from information_schema.tables tab
left join information_schema.table_constraints tco 
          on tab.table_schema = tco.table_schema
          and tab.table_name = tco.table_name 
          and tco.constraint_type = 'PRIMARY KEY'
where tab.table_type = 'BASE TABLE'
      and tab.table_schema not in ('pg_catalog', 'information_schema')
      and tco.constraint_name is null;
EOF
)
Enter fullscreen mode Exit fullscreen mode

This query checks for base tables in the database that lack a primary key. If such tables are found, the script collects their schema and names:

the_output=$(psql -h $DB_HOST  -p $DB_PORT -U postgres -d $DBs_to_process -t << EOF
select tab.table_schema,
       tab.table_name
from information_schema.tables tab
left join information_schema.table_constraints tco
          on tab.table_schema = tco.table_schema
          and tab.table_name = tco.table_name
          and tco.constraint_type = 'PRIMARY KEY'
where tab.table_type = 'BASE TABLE'
      and tab.table_schema not in ('pg_catalog', 'information_schema')
      and tco.constraint_name is null
order by table_schema,
         table_name;
EOF)
Enter fullscreen mode Exit fullscreen mode

6. Outputting the Results

if [ "$nTablesWithoutPKs" -gt "0" ]; then
    echo "DB: $DBs_to_process"
    echo "$the_output"
fi
Enter fullscreen mode Exit fullscreen mode

Conclusion

This script serves as a practical tool for database administrators and developers to quickly identify tables lacking primary keys in their PostgreSQL databases. Regularly auditing your databases for such issues can help maintain data integrity and ensure a robust database design.

Feel free to modify the script according to your specific needs, such as changing database credentials or extending functionality to include other types of constraints.

Top comments (0)