In the world of database management, ensuring the health and performance of your databases is crucial. One effective way to achieve this is by automating your database health checks and generating reports. In this article, we’ll explore a Python script that performs these tasks using psycopg2, yaml, and jinja2 libraries. By the end, you'll have a clear understanding of how to automate database checks and produce HTML reports for easy visualization.
Overview of the Script
Our script performs the following tasks:
- Load Configuration: Reads the database check configuration from a YAML file.
- Connect to the Database: Establishes a connection to the PostgreSQL database.
- Execute Queries: Runs queries specified in the configuration file and collects results.
- Generate HTML Report: Creates an HTML report of the check results.
Let’s break down each part of the script.
1. Loading Configuration
The configuration for our database checks is stored in a YAML file. We use the pyyaml library to load this configuration.
import yaml
def load_config(config_file):
with open(config_file, 'r') as file:
return yaml.safe_load(file)
This function reads the YAML file and loads it into a Python dictionary, which allows us to access our database checks configuration.
2. Connecting to the Database
We use psycopg2, a popular PostgreSQL adapter for Python, to connect to the database.
import psycopg2
def connect_db(hostname, port, dbname, username, password):
try:
conn = psycopg2.connect(
host=hostname,
port=port,
dbname=dbname,
user=username,
password=password
)
return conn
except Exception as e:
print(f"Error connecting to database: {e}")
raise
This function attempts to connect to the database with the provided credentials and handles any connection errors that may arise.
3. Executing Queries
Once connected, we execute queries defined in the configuration file and gather the results.
def execute_queries(conn, queries):
results = []
try:
with conn.cursor() as cursor:
for check in queries:
description = list(check.keys())[0]
query = check[description]['query']
cursor.execute(query)
result = cursor.fetchall()
columns = [desc[0] for desc in cursor.description] # Get column names
results.append({
'description': check[description]['description'],
'query': query,
'columns': columns,
'result': result
})
except Exception as e:
print(f"Error executing queries: {e}")
raise
return results
In this function, we:
- Iterate over each query in the configuration.
- Execute the query and fetch the results.
- Extract column names and store everything in a results list.
4. Generating HTML Report
The final step is to generate an HTML report using the jinja2 templating engine.
from jinja2 import Template
def generate_html_report(results, output_file):
template = Template("""
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Database Check Results</title>
<style>
table {
width: 100%;
border-collapse: collapse;
}
table, th, td {
border: 1px solid black;
}
th, td {
padding: 8px;
text-align: left;
}
th {
background-color: #f2f2f2;
}
pre {
white-space: pre-wrap; /* Allows wrapping of long queries */
}
</style>
</head>
<body>
<h1>Database Check Results</h1>
{% for check in results %}
<h2>{{ check.description }}</h2>
<pre>Query: {{ check.query }}</pre>
<h3>Result:</h3>
<table>
<thead>
<tr>
{% for column in check.columns %}
<th>{{ column }}</th>
{% endfor %}
</tr>
</thead>
<tbody>
{% for row in check.result %}
<tr>
{% for value in row %}
<td>{{ value }}</td>
{% endfor %}
</tr>
{% endfor %}
</tbody>
</table>
{% endfor %}
</body>
</html>
""")
html_content = template.render(results=results)
with open(output_file, 'w') as file:
file.write(html_content)
This function uses a Jinja2 template to create an HTML file. It includes:
- A header section with the title and styles.
- A loop that iterates over each database check result and formats it into an HTML table.
Main Function
Finally, we tie everything together in the main function.
def main():
# Database parameters
db_hostname = 'localhost'
db_port = 5439
db_name = 'my_database'
db_username = 'my_user'
db_pwd = 'my_password'
# File paths
config_file = 'dbcheck_config.yaml'
output_file = 'result.html'
# Load configuration
config = load_config(config_file)
dbchecks = config['dbchecks']
# Connect to the database
conn = connect_db(db_hostname, db_port, db_name, db_username, db_pwd)
try:
# Execute queries and get results
results = execute_queries(conn, dbchecks)
# Generate HTML report
generate_html_report(results, output_file)
finally:
# Close the database connection
conn.close()
if __name__ == "__main__":
main()
In the main function, we:
- Define database connection parameters and file paths.
- Load the configuration and connect to the database.
- Execute the queries and generate the HTML report.
- Ensure the database connection is closed properly.
dbcheck_config.yaml
dbchecks:
- dbcheck_1:
description: "Check the PostgreSQL DB version"
query:
"select version()"
- dbcheck_2:
description: "Check DB connections"
query:
"select
A.total_connections,
A.active_connections,
B.max_connections,
round((100 * A.total_connections::numeric / B.max_connections::numeric), 2) connections_utilization_pctg
from
(select count(1) as total_connections, sum(case when state='active' then 1 else 0 end) as active_connections from pg_stat_activity) A,
(select setting as max_connections from pg_settings where name='max_connections') B"
- dbcheck_3:
description: "Distribution of active connections per DB"
query:
"select
datname as db_name,
count(1) as num_of_active_connections
from pg_stat_activity
where state='active'
group by 1
order by 2 desc"
- dbcheck_4:
description: "Distribution of active connections per database and per query"
query:
"select
datname as db_name,
substr(query, 1, 200) short_query,
count(1) as num_active_connections
from pg_stat_activity
where state='active'
group by 1, 2
order by 3 desc"
- dbcheck_5:
description: "Active sessions detailed running more than 5 seconds"
query:
"select
now()-query_start as runtime,
pid as process_id,
datname as db_name,
client_addr,
client_hostname,
substr(query, 1, 200) the_query
from pg_stat_activity
where state='active'
and now() - query_start > '5 seconds'::interval
order by 1 desc"
- dbcheck_6:
description: "Running frequent PostgreSQL queries"
query:
"WITH a AS (
SELECT dbid, queryid, query, calls AS s
FROM pg_stat_statements
),
b AS (
SELECT dbid, queryid, query, calls AS s
FROM pg_stat_statements, pg_sleep(1)
)
SELECT
pd.datname AS db_name,
substr(a.query, 1, 400) AS the_query,
SUM(b.s - a.s) AS runs_per_second
FROM a
JOIN b ON a.dbid = b.dbid AND a.queryid = b.queryid
JOIN pg_database pd ON pd.oid = a.dbid
GROUP BY 1, 2
HAVING SUM(b.s - a.s) > 10
ORDER BY runs_per_second DESC"
- dbcheck_7:
description: "PostgreSQL Database CPU distribution per database, and per query"
query:
"SELECT
pss.userid,
pss.dbid,
pd.datname as db_name,
round((pss.total_exec_time + pss.total_plan_time)::numeric, 2) as total_time,
pss.calls,
round((pss.mean_exec_time+pss.mean_plan_time)::numeric, 2) as mean,
round((100 * (pss.total_exec_time + pss.total_plan_time) / sum((pss.total_exec_time + pss.total_plan_time)::numeric) OVER ())::numeric, 2) as cpu_portion_pctg,
pss.query
FROM pg_stat_statements pss, pg_database pd
WHERE pd.oid=pss.dbid
ORDER BY (pss.total_exec_time + pss.total_plan_time)
DESC LIMIT 30"
- dbcheck_8:
description: "List Databases and their sizes"
query:
"SELECT
pg_database.datname AS database_name,
pg_size_pretty(pg_database_size(pg_database.datname)) AS size
FROM
pg_database
ORDER BY
pg_database_size(pg_database.datname) DESC"
This Python script automates database health checks and generates a well-formatted HTML report.
db_check.py
import psycopg2
import yaml
from jinja2 import Template
# Define your function to load configuration
def load_config(config_file):
with open(config_file, 'r') as file:
return yaml.safe_load(file)
# Define your function to connect to the database
def connect_db(hostname, port, dbname, username, password):
try:
conn = psycopg2.connect(
host=hostname,
port=port,
dbname=dbname,
user=username,
password=password
)
return conn
except Exception as e:
print(f"Error connecting to database: {e}")
raise
# Define your function to execute queries and get results
def execute_queries(conn, queries):
results = []
try:
with conn.cursor() as cursor:
for check in queries:
description = list(check.keys())[0]
query = check[description]['query']
cursor.execute(query)
result = cursor.fetchall()
columns = [desc[0] for desc in cursor.description] # Get column names
results.append({
'description': check[description]['description'],
'query': query,
'columns': columns,
'result': result
})
except Exception as e:
print(f"Error executing queries: {e}")
raise
return results
# Define your function to generate HTML report
def generate_html_report(results, output_file):
template = Template("""
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Database Check Results</title>
<style>
table {
width: 100%;
border-collapse: collapse;
}
table, th, td {
border: 1px solid black;
}
th, td {
padding: 8px;
text-align: left;
}
th {
background-color: #f2f2f2;
}
pre {
white-space: pre-wrap; /* Allows wrapping of long queries */
}
</style>
</head>
<body>
<h1>Database Check Results</h1>
{% for check in results %}
<h2>{{ check.description }}</h2>
<pre>Query: {{ check.query }}</pre>
<h3>Result:</h3>
<table>
<thead>
<tr>
{% for column in check.columns %}
<th>{{ column }}</th>
{% endfor %}
</tr>
</thead>
<tbody>
{% for row in check.result %}
<tr>
{% for value in row %}
<td>{{ value }}</td>
{% endfor %}
</tr>
{% endfor %}
</tbody>
</table>
{% endfor %}
</body>
</html>
""")
html_content = template.render(results=results)
with open(output_file, 'w') as file:
file.write(html_content)
def main():
# Database parameters
db_hostname = 'localhost'
db_port = 5439
db_name = 'my_database'
db_username = 'my_user'
db_pwd = 'my_password'
# File paths
config_file = 'dbcheck_config.yaml'
output_file = 'result.html'
# Load configuration
config = load_config(config_file)
dbchecks = config['dbchecks']
# Connect to the database
conn = connect_db(db_hostname, db_port, db_name, db_username, db_pwd)
try:
# Execute queries and get results
results = execute_queries(conn, dbchecks)
# Generate HTML report
generate_html_report(results, output_file)
finally:
# Close the database connection
conn.close()
if __name__ == "__main__":
main()
Conclusion
By leveraging libraries psycopg2 for database interaction, yaml configuration management, and jinja2 for templating, you can efficiently monitor and document your database performance. This approach can be customized and extended to fit various database environments and reporting requirements.
Top comments (0)