In this blog I will demonstrate how to parse YAML file with Python.
Assume we have a YAML file:
cat my_yaml_file.yaml
probes:
-
id: 1
description: Check the databases having more than 5 active connections.
issue: It were found databases with the high number of active connections.
recommendation: Check why Customers open so many active connections. It may be
wrong configuration or unusual application pattern.
sql_query_o: no
sql_query: select datname, count(1) num_of_active_connections, 'wvw' chk from
pg_stat_activity where datname!='' and state!='idle' group by datname
having count(1)>5 order by 2 desc
sql_query_extra: select datname, state, client_addr, client_hostname,
substr(query, 1, 2048) query from pg_stat_activity where state!='idle' and
datname in ( select datname from ( select datname, count(1)
num_of_active_sessions from pg_stat_activity where state!='idle' and
datname!='' group by 1 having count(1)>0 ) M ) order by 1, 5
-
id: 2
description: Check DB queries that take more than 30 seconds.
issue: Long-running queries.
recommendation: Check why the query/queries take so much time. It maybe it's
heavy non-optimized query. Maybe it's unusual application pattern.
sql_query_o: no
sql_query: select now()-query_start as runtime, pid as process_id, datname as
db_name, client_addr, client_hostname, substr(query, 1, 2048) query, 'wvw'
chk from pg_stat_activity where state!='idle' and datname!='' and now() -
query_start > '30 seconds'::interval order by 1 desc;
sql_query_extra: no
-
id: 3
description: Check in the pg_stat_statements DB queries that take more than 3000 ms
issue: Long-running queries.
recommendation: Check why the query/queries take so much time. It may be it is a
heavy non-optimized query. Maybe it's an unusual application pattern.
sql_query_o: SELECT pss.userid, pss.dbid, pd.datname as db_name,
round(pss.total_time::numeric, 2) as total_time, pss.calls,
round(pss.mean_time::numeric, 0) as mean, substr(pss.query, 1, 2048)
query, 'wvw' chk FROM pg_stat_statements pss, pg_database pd WHERE
pd.oid=pss.dbid and round(pss.mean_time::numeric, 0) > 3000 ORDER BY
round(pss.mean_time::numeric, 0) desc LIMIT 30;
sql_query: SELECT pss.userid, pss.dbid, pd.datname as db_name,
round((pss.total_exec_time + pss.total_plan_time)::numeric, 2) as
total_time, pss.calls,
round((pss.mean_exec_time+pss.mean_plan_time)::numeric, 0) as mean,
substr(pss.query, 1, 2048) query, 'wvw' chk FROM pg_stat_statements pss,
pg_database pd WHERE pd.oid=pss.dbid and
round((pss.mean_exec_time+pss.mean_plan_time)::numeric, 0) > 3000 ORDER BY
round((pss.mean_exec_time+pss.mean_plan_time)::numeric, 0) desc LIMIT 30;
sql_query_extra: no
To parse it and populate an array with its data I will use the following Python code:
ptn_yaml_parser.py
import yaml
with open("/Users/dmitryr/LAB/YAML_parser/my_yaml_file.yaml", 'r') as f:
valuesYaml = yaml.load(f, Loader=yaml.FullLoader)
n_elements=len(valuesYaml['probes'])
print('n_elements:', n_elements)
for idx in range(0, n_elements):
print(" ",idx,":")
print("issue:",valuesYaml['probes'][idx]['issue'])
print("description:",valuesYaml['probes'][idx]['description'])
print("recommendation:",valuesYaml['probes'][idx]['recommendation'])
print("sql_query:",valuesYaml['probes'][idx]['sql_query'])
print("sql_query_o:",valuesYaml['probes'][idx]['sql_query_o'])
print("sql_query_extra:",valuesYaml['probes'][idx]['sql_query_extra'])
Example how it works:
n_elements: 3
0 :
issue: It were found databases with the high number of active connections.
description: Check the databases having more than 5 active connections.
recommendation: Check why Customers open so many active connections. It may be wrong configuration or unusual application pattern.
sql_query: select datname, count(1) num_of_active_connections, 'wvw' chk from pg_stat_activity where datname!='' and state!='idle' group by datname having count(1)>5 order by 2 desc
sql_query_o: False
sql_query_extra: select datname, state, client_addr, client_hostname, substr(query, 1, 2048) query from pg_stat_activity where state!='idle' and datname in ( select datname from ( select datname, count(1) num_of_active_sessions from pg_stat_activity where state!='idle' and datname!='' group by 1 having count(1)>0 ) M ) order by 1, 5
1 :
issue: Long-running queries.
description: Check DB queries that take more than 30 seconds.
recommendation: Check why the query/queries take so much time. It maybe it's heavy non-optimized query. Maybe it's unusual application pattern.
sql_query: select now()-query_start as runtime, pid as process_id, datname as db_name, client_addr, client_hostname, substr(query, 1, 2048) query, 'wvw' chk from pg_stat_activity where state!='idle' and datname!='' and now() - query_start > '30 seconds'::interval order by 1 desc;
sql_query_o: False
sql_query_extra: False
2 :
issue: Long-running queries.
description: Check in the pg_stat_statements DB queries that take more than 3000 ms
recommendation: Check why the query/queries take so much time. It may be it is a heavy non-optimized query. Maybe it's an unusual application pattern.
sql_query: SELECT pss.userid, pss.dbid, pd.datname as db_name, round((pss.total_exec_time + pss.total_plan_time)::numeric, 2) as total_time, pss.calls, round((pss.mean_exec_time+pss.mean_plan_time)::numeric, 0) as mean, substr(pss.query, 1, 2048) query, 'wvw' chk FROM pg_stat_statements pss, pg_database pd WHERE pd.oid=pss.dbid and round((pss.mean_exec_time+pss.mean_plan_time)::numeric, 0) > 3000 ORDER BY round((pss.mean_exec_time+pss.mean_plan_time)::numeric, 0) desc LIMIT 30;
sql_query_o: SELECT pss.userid, pss.dbid, pd.datname as db_name, round(pss.total_time::numeric, 2) as total_time, pss.calls, round(pss.mean_time::numeric, 0) as mean, substr(pss.query, 1, 2048) query, 'wvw' chk FROM pg_stat_statements pss, pg_database pd WHERE pd.oid=pss.dbid and round(pss.mean_time::numeric, 0) > 3000 ORDER BY round(pss.mean_time::numeric, 0) desc LIMIT 30;
sql_query_extra: False
Conclusion:
In this blog I demonstrated how to parse YAML file using Python and populate the data into array.
Top comments (0)