For monitoring postgres server most of the opensource stacks consists of grafana with prometheus.
Connecting postgres metrics to prometheus is very interesting task and there are certain tools/libraries are available.
Such libraries are helpful for monitoring and writing alert rules over prometheus.
- postgres_exporter(https://github.com/prometheus-community/postgres_exporter)
- coroot-pg-agent(https://github.com/coroot/coroot-pg-agent).
Today we are going to discuss more about coroot-pg-agent.
coroot-pg-agent can be run using docker, more information can be found here.(https://github.com/coroot/coroot-pg-agent)
On official postgresql : https://www.postgresql.org/about/news/coroot-pg-agent-an-open-source-postgres-exporter-for-prometheus-2488/
Now while running coroot-pg-agent with prometheus, there are certain things which we should keep it in mind.
- coroot-pg-agent using docker runs on port 80 by default, We can run it on custom port using following command through docker
docker run --name coroot-pg-agent \
--env DSN="postgresql://<USER>:<PASSWORD>@<HOST>:5432/postgres?connect_timeout=1&statement_timeout=30000" \
--env LISTEN="0.0.0.0:<custom_port_for_pg_agent>" \
-p <custom_port_for_pg_agent>:<custom_port_for_pg_agent> \
ghcr.io/coroot/coroot-pg-agent
We can also pass scrape-interval using --env PG_SCRAPE_INTERVAL.
After executing above command we see output as follows, custom_port_for_pg_agent is 3000 here.
I0823 21:00:58.259629 1 main.go:35] static labels: map[]
I0823 21:00:58.273610 1 main.go:41] listening on: 0.0.0.0:3000
- prometheus.yml for prometheus configs
global:
scrape_interval: 5m
scrape_timeout: 3m
evaluation_interval: 15s
scrape_configs:
- job_name: prometheus
static_configs:
- targets: ["localhost:9090"]
- job_name: coroot-pg-agent
static_configs:
- targets: ["<localhost-ip>:<custom_port_for_pg_agent>"]
We can always change the scrape interval and timeouts as per our needs, was testing over local hence kept it like this.
Keep in mind while editing above yml scrape_interval should always be greater than scrape_timeout.
To run prometheus using docker we can use following command where we're using official image from prometheus at docker
docker run \
-p 9090:9090 \
-v ~/pro/prometheus.yml:/etc/prometheus/prometheus.yml \
prom/prometheus
After executing above command you will see output like following
ts=2022-08-23T21:02:26.203Z caller=main.go:495 level=info msg="No time or size retention was set so using the default time retention" duration=15d
ts=2022-08-23T21:02:26.203Z caller=main.go:539 level=info msg="Starting Prometheus Server" mode=server version="(version=2.38.0, branch=HEAD, revision=818d6e60888b2a3ea363aee8a9828c7bafd73699)"
ts=2022-08-23T21:02:26.203Z caller=main.go:544 level=info build_context="(go=go1.18.5, user=root@e6b781f65453, date=20220816-13:29:14)"
ts=2022-08-23T21:02:26.204Z caller=main.go:545 level=info host_details="(Linux 5.10.47-linuxkit #1 SMP PREEMPT Sat Jul 3 21:50:16 UTC 2021 aarch64 87decec12cad (none))"
ts=2022-08-23T21:02:26.204Z caller=main.go:546 level=info fd_limits="(soft=1048576, hard=1048576)"
ts=2022-08-23T21:02:26.204Z caller=main.go:547 level=info vm_limits="(soft=unlimited, hard=unlimited)"
ts=2022-08-23T21:02:26.205Z caller=web.go:553 level=info component=web msg="Start listening for connections" address=0.0.0.0:9090
ts=2022-08-23T21:02:26.205Z caller=main.go:976 level=info msg="Starting TSDB ..."
ts=2022-08-23T21:02:26.206Z caller=tls_config.go:195 level=info component=web msg="TLS is disabled." http2=false
ts=2022-08-23T21:02:26.207Z caller=head.go:495 level=info component=tsdb msg="Replaying on-disk memory mappable chunks if any"
ts=2022-08-23T21:02:26.207Z caller=head.go:538 level=info component=tsdb msg="On-disk memory mappable chunks replay completed" duration=10.125µs
ts=2022-08-23T21:02:26.207Z caller=head.go:544 level=info component=tsdb msg="Replaying WAL, this may take a while"
ts=2022-08-23T21:02:26.207Z caller=head.go:615 level=info component=tsdb msg="WAL segment loaded" segment=0 maxSegment=0
ts=2022-08-23T21:02:26.207Z caller=head.go:621 level=info component=tsdb msg="WAL replay completed" checkpoint_replay_duration=21.416µs wal_replay_duration=117.958µs total_replay_duration=159.167µs
ts=2022-08-23T21:02:26.208Z caller=main.go:997 level=info fs_type=EXT4_SUPER_MAGIC
ts=2022-08-23T21:02:26.208Z caller=main.go:1000 level=info msg="TSDB started"
ts=2022-08-23T21:02:26.208Z caller=main.go:1181 level=info msg="Loading configuration file" filename=/etc/prometheus/prometheus.yml
ts=2022-08-23T21:02:26.210Z caller=main.go:1218 level=info msg="Completed loading of configuration file" filename=/etc/prometheus/prometheus.yml totalDuration=2.047292ms db_storage=750ns remote_storage=1.709µs web_handler=292ns query_engine=583ns scrape=341.75µs scrape_sd=16.708µs notify=542ns notify_sd=792ns rules=1µs tracing=9.625µs
ts=2022-08-23T21:02:26.210Z caller=main.go:961 level=info msg="Server is ready to receive web requests."
ts=2022-08-23T21:02:26.210Z caller=manager.go:941 level=info component="rule manager" msg="Starting rule manager..."
We can hit localhost:9090 over browser and see screen like following
Now, to see the targets we can visit Targets
Once targets are up we can see the status changed like follows
We can visit graph and here if we hit the search bar, as I've kept auto suggestions on, we can see it like following.
Top comments (1)
Kind of metrics coroot-pg-agent supports and can be found coroot.com/docs/metrics/pg-agent