Indexes are essential for optimizing query performance in relational databases, but not all indexes are created equal. Some may remain unused, consuming valuable disk space and potentially slowing down write operations. This article explores a Bash script designed to help PostgreSQL administrators identify and report unused indexes across multiple databases in a PostgreSQL instance.
The Importance of Monitoring Index Usage
Unused indexes can be a silent drain on your database's performance. They take up space, add overhead during data modification operations, and can complicate maintenance tasks. Regularly auditing your indexes can lead to better performance and more efficient use of resources.
Overview of the Script
The script, check_unused_indexes_for_PostgreSQL_db_instance.sh, connects to a PostgreSQL instance, iterates through its databases, and identifies indexes that have never been used. It produces a comprehensive report detailing these unused indexes, including their size.
Key Features:
- Connects to any PostgreSQL instance using specified host and port.
- Iterates through all user-defined databases, excluding system databases.
- Runs a SQL query to check for unused indexes.
- Generates a detailed output report, both in the terminal and as a trace file.
Getting Started
Prerequisites
Before running the script, ensure you have:
- Access to a PostgreSQL instance with appropriate permissions.
- psql command-line tool installed.
Usage
The script can be executed with the following command:
./check_unused_indexes_for_PostgreSQL_db_instance.sh -h <db_host> -p <db_port>
For example:
./check_unused_indexes_for_PostgreSQL_db_instance.sh -h localhost -p 5432 > ./unused_indexes_$(date +'%Y%m%d_%H%M%S').trc
Script
#!/bin/bash
##################################################################################################################################
#
# Name: check_unused_indexes_for_PostgreSQL_db_instance.sh
#
# Description: This script iterates all the DBs on the given PostgreSQL DB instance
# and gets the detailed list of the unused DB indexes.
#
# Author: Dmitry
#
# Date: 02-Nov-2020
#
# Usage Example:
#
# ./check_unused_indexes_for_PostgreSQL_db_instance.sh -h <db_host> -p <db_port>
#
####################################################################################################################################
current_date_time="$(date +'%Y%m%d_%H%M%S')"
helpFunction()
{
echo ""
echo "Usage: $0 -h hostname -p port"
echo -e "\t-h Postgres hostname"
echo -e "\t-p Postgers port"
echo -e " "
echo -e "Example how to run: ./$0 -h localhost -p 5432 > ./unused_indexes_$current_date_time.trc"
echo -e " "
exit 1 # Exit script after printing help
}
while getopts "h:p:" opt
do
case "$opt" in
h ) inpHost="$OPTARG" ;;
p ) inpPort="$OPTARG" ;;
? ) helpFunction ;; # Print helpFunction in case parameter is non-existent
esac
done
# Print helpFunction in case parameters are empty
if [ -z "$inpHost" ] || [ -z "$inpPort" ]
then
echo "Some or all of the parameters are empty";
helpFunction
fi
report_name="Check unused indexes for PostgreSQL DB instance"
output_dir="./output_dir"
if [ ! -d $output_dir ]; then
echo "The directory $output_dir should exist"
exit 1
fi
output_trace_name="check_unused_indexes_for_pg_db_instance_${current_date_time}_port_${inpPort}.trc"
output_full_name="${output_dir}/${output_trace_name}"
touch ${output_full_name}
echo "------------------------------------------------------------------"
echo "Report: ${report_name} "
echo "------------------------------------------------------------------"
echo "Postgres DB Instance: port [${inpPort}]"
echo "Check Unused Indexes for pg DB instance"
echo "Timestamp: [${current_date_time}]"
echo "------------------------------------------------------------------"
echo " " > ${output_full_name}
echo "------------------------------------------------------------------" >> ${output_full_name}
echo "Report: ${report_name} " >> ${output_full_name}
echo "------------------------------------------------------------------" >> ${output_full_name}
echo "Postgres DB Instance: port [${inpPort}]" >> ${output_full_name}
echo "Check Unused Indexes for pg DB instance" >> ${output_full_name}
echo "Timestamp: [${current_date_time}]" >> ${output_full_name}
echo "------------------------------------------------------------------" >> ${output_full_name}
echo " "
echo "Input parameters:"
echo "---------------- "
echo "inpHost=$inpHost"
echo "inpPort=$inpPort"
echo "---------------- "
echo " " >> ${output_full_name}
echo "Input parameters:" >> ${output_full_name}
echo "---------------- " >> ${output_full_name}
echo "inpHost=$inpHost" >> ${output_full_name}
echo "inpPort=$inpPort" >> ${output_full_name}
echo "---------------- " >> ${output_full_name}
query_check_unused_DB_indexes="
SELECT s.schemaname,
s.relname AS tablename,
s.indexrelname AS indexname,
pg_relation_size(s.indexrelid) AS index_size_bytes,
round(pg_relation_size(s.indexrelid)/1024) AS index_size_Kbytes,
round(pg_relation_size(s.indexrelid)/1024/1024) AS index_size_Mbytes
FROM pg_catalog.pg_stat_user_indexes s
JOIN pg_catalog.pg_index i ON s.indexrelid = i.indexrelid
WHERE s.idx_scan = 0 -- has never been scanned
AND 0 <>ALL (i.indkey) -- no index column is an expression
AND NOT i.indisunique -- is not a UNIQUE index
AND NOT EXISTS -- does not enforce a constraint
(SELECT 1 FROM pg_catalog.pg_constraint c
WHERE c.conindid = s.indexrelid)
-- and s.relname='my_table' -- in case you want examine the unused indexes for the specific table
ORDER BY pg_relation_size(s.indexrelid) DESC;
"
query_DBs="
select M.*
from
(select datname
from pg_database
where datname not in ('postgres', 'template0', 'template1')
order by 1) M
"
query_num_of_DBs="
select count(1)
from ($query_DBs) M
"
num_of_dbs_to_process=$(psql -h $inpHost -p $inpPort -t -c "$query_num_of_DBs" postgres)
echo " "
echo "------------------------------------------------------------------"
echo " Number of databases: ${num_of_dbs_to_process}"
echo "------------------------------------------------------------------"
echo " " >> ${output_full_name}
echo "------------------------------------------------------------------" >> ${output_full_name}
echo " Number of databases: ${num_of_dbs_to_process}" >> ${output_full_name}
echo "------------------------------------------------------------------" >> ${output_full_name}
idx=$(($idx + 1))
for DBs_to_process in $(psql -h $inpHost -p $inpPort -t -c "$query_DBs" postgres); do
echo " "
echo "#${idx}:"
echo "${DBs_to_process}"
echo " " >> ${output_full_name}
echo "#${idx}:" >> ${output_full_name}
echo "${DBs_to_process}" >> ${output_full_name}
echo " "
echo " " >> ${output_full_name}
psql -h $inpHost -p $inpPort -U postgres -d $DBs_to_process -qtX << EOF >> ${output_full_name}
\x
$query_check_unused_DB_indexes
EOF
#### for terminal output
psql -h $inpHost -p $inpPort -U postgres -d $DBs_to_process -qtX << EOF
\x
$query_check_unused_DB_indexes
EOF
echo "................................................................."
echo "................................................................." >> ${output_full_name}
idx=$(($idx + 1))
done
echo " "
echo " " >> ${output_full_name}
echo " --- The End --- "
echo " --- The End --- " >> ${output_full_name}
echo " "
echo " " >> ${output_full_name}
echo "Trace File: ${output_full_name}"
echo " "
Script Breakdown
Let’s dive deeper into the key sections of the script.
Parameter Handling
The script uses getopts to handle command-line arguments for the database host and port. It provides a help function to guide users on proper usage:
helpFunction()
{
echo "Usage: $0 -h hostname -p port"
exit 1
}
Database Querying
The script constructs a SQL query to find unused indexes:
SELECT s.schemaname,
s.relname AS tablename,
s.indexrelname AS indexname,
pg_relation_size(s.indexrelid) AS index_size_bytes
FROM pg_catalog.pg_stat_user_indexes s
JOIN pg_catalog.pg_index i ON s.indexrelid = i.indexrelid
WHERE s.idx_scan = 0
AND NOT i.indisunique
AND NOT EXISTS
(SELECT 1 FROM pg_catalog.pg_constraint c
WHERE c.conindid = s.indexrelid)
ORDER BY pg_relation_size(s.indexrelid) DESC;
This query retrieves details about indexes that have never been scanned, are not unique, and do not enforce constraints, making them candidates for removal.
Generating Reports
The script produces both console output and a trace file that records the findings:
touch ${output_full_name}
echo "------------------------------------------------------------------" > ${output_full_name}
echo "Report: ${report_name}" >> ${output_full_name}
Running the Script
Once executed, the script outputs a summary of the databases processed and lists any unused indexes it finds. The final report helps database administrators make informed decisions about which indexes can be safely dropped.
Conclusion
Regularly auditing indexes in your PostgreSQL databases is crucial for maintaining optimal performance. The check_unused_indexes_for_PostgreSQL_db_instance.sh script provides an automated way to identify unused indexes, allowing you to streamline your database environment effectively.
For further enhancement, consider scheduling this script to run periodically or integrating it into your database maintenance routine. By keeping your indexes in check, you’ll ensure your PostgreSQL databases run smoothly and efficiently.
Whether you're managing a single database or multiple instances, keeping track of your indexes will lead to better performance and resource management.
Top comments (0)