I'm running databases and knowing if the pressure is on CPU, RAM or I/O is crucial, and that's not easy to infer from the metrics provided in CloudWatch or OS usual monitoring. Recent Linux kernels provide PSI (Pressure Stall Information) for that, so let's enable it.
I have EC2 instances provisioned from
aws-marketplace/CentOS Linux 7 images but Centos is not moving fast and has an old kernel:
[yugabyte]$ cat /etc/system-release
CentOS Linux release 7.4.1708 (Core)
[yugabyte]$ uname --kernel-release
3.10.0-693.5.2.el7.x86_64
I need a more recent one which I'll install from ELRepo
[yugabyte]$ sudo yum update -y
No packages marked for update
[yugabyte]$ sudo yum install -y https://www.elrepo.org/elrepo-release-7.el7.elrepo.noarch.rpm
Now listing the mainline kernel ('ml' as opposite to long-term 'lt'):
[yugabyte]$ yum list available --disablerepo='*' --enablerepo=elrepo-kernel kernel-ml.x86_64
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
* elrepo-kernel: mirrors.coreix.net
Available Packages
kernel-ml.x86_64 5.13.12-1.el7.elrepo elrepo-kernel
And installing it:
[yugabyte]$ sudo yum --enablerepo=elrepo-kernel install -y kernel-ml
Here it is as the first menu entry for grub:
[centos]$ sudo grep ^menuentry /boot/grub2/grub.cfg
menuentry 'CentOS Linux (5.13.12-1.el7.elrepo.x86_64) 7 (Core)' --class centos --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-3.10.0-693.5.2.el7.x86_64-advanced-6f15c206-f516-4ee8-a4b7-89ad880647db' {
menuentry 'CentOS Linux (3.10.0-1160.36.2.el7.x86_64) 7 (Core)' --class centos --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-3.10.0-693.5.2.el7.x86_64-advanced-6f15c206-f516-4ee8-a4b7-89ad880647db' {
menuentry 'CentOS Linux (3.10.0-693.5.2.el7.x86_64) 7 (Core)' --class centos --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-3.10.0-693.5.2.el7.x86_64-advanced-6f15c206-f516-4ee8-a4b7-89ad880647db' {
menuentry 'CentOS Linux (0-rescue-f073c429a7456b53ec3e2c53460c5c8f) 7 (Core)' --class centos --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-0-rescue-f073c429a7456b53ec3e2c53460c5c8f-advanced-6f15c206-f516-4ee8-a4b7-89ad880647db' {
I make it the default one (menuentry 0):
[yugabyte]$ sudo sed -e '/^GRUB_DEFAULT=saved/s/=.*/=0/' -i /etc/default/grub
I need to add psi
in the kernel command line:
[yugabyte]$ sudo mkdir -p /etc/tuned/psi && sudo tee /etc/tuned/psi/tuned.conf <<'TAC'
[main]
summary=Enable Pressure Stall Information as in https://dev.to/aws-heroes/pressure-stall-information-on-ec2-centos-7-2nbb-temp-slug-5559720
[bootloader]
cmdline=psi=1
TAC
Checking current profile:
[yugabyte]$ tuned-adm profile
Current active profile: virtual-guest
adding the new one:
[yugabyte]$ sudo tuned-adm profile virtual-guest psi
enabling all these GRUB changes:
[yugabyte]$ sudo grub2-mkconfig -o /boot/grub2/grub.cfg
Now ready to reboot the node. This is where I appreciate running a distributed database (YugabyteDB 🚀) as I can rolling-restart the nodes without application interruption.
Checking it:
[yugabyte]$ tail /proc/pressure/*
==> /proc/pressure/cpu <==
some avg10=27.80 avg60=25.88 avg300=16.13 total=77572758
full avg10=0.98 avg60=0.94 avg300=0.55 total=4422080
==> /proc/pressure/io <==
some avg10=12.03 avg60=13.02 avg300=7.36 total=32530366
full avg10=4.73 avg60=5.33 avg300=3.08 total=15034660
==> /proc/pressure/memory <==
some avg10=0.12 avg60=0.02 avg300=0.00 total=309168
full avg10=0.12 avg60=0.02 avg300=0.00 total=307455
Now it remains to interpret it. I explained a bit in a past blog post and the full description in on www.kernel.org. Basically, the "some" line shows the percent of time where one task is stalled, and "full" when all non-idle tasks are waiting, over the last 10 seconds, 1 minute, and 5 minutes. So if you feel something is slow and should be faster, don't scale blindly. You know which resource is responsible for the response time.
Top comments (0)