Pressure Stall Information on CentOS7

#linux #performance #psi

I'm running databases and knowing if the pressure is on CPU, RAM or I/O is crucial, and that's not easy to infer from the metrics provided in CloudWatch or OS usual monitoring. Recent Linux kernels provide PSI (Pressure Stall Information) for that, so let's enable it.

I have EC2 instances provisioned from
aws-marketplace/CentOS Linux 7 images but Centos is not moving fast and has an old kernel:

[yugabyte]$ cat /etc/system-release

CentOS Linux release 7.4.1708 (Core)

[yugabyte]$ uname --kernel-release

3.10.0-693.5.2.el7.x86_64

I need a more recent one which I'll install from ELRepo

[yugabyte]$ sudo yum update -y

No packages marked for update

[yugabyte]$ sudo yum install -y https://www.elrepo.org/elrepo-release-7.el7.elrepo.noarch.rpm

Now listing the mainline kernel ('ml' as opposite to long-term 'lt'):

[yugabyte]$ yum list available --disablerepo='*' --enablerepo=elrepo-kernel kernel-ml.x86_64

Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * elrepo-kernel: mirrors.coreix.net
Available Packages
kernel-ml.x86_64                                                5.13.12-1.el7.elrepo                                                 elrepo-kernel

And installing it:

[yugabyte]$ sudo yum --enablerepo=elrepo-kernel install -y kernel-ml

Here it is as the first menu entry for grub:

[centos]$ sudo grep ^menuentry /boot/grub2/grub.cfg

menuentry 'CentOS Linux (5.13.12-1.el7.elrepo.x86_64) 7 (Core)' --class centos --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-3.10.0-693.5.2.el7.x86_64-advanced-6f15c206-f516-4ee8-a4b7-89ad880647db' {
menuentry 'CentOS Linux (3.10.0-1160.36.2.el7.x86_64) 7 (Core)' --class centos --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-3.10.0-693.5.2.el7.x86_64-advanced-6f15c206-f516-4ee8-a4b7-89ad880647db' {
menuentry 'CentOS Linux (3.10.0-693.5.2.el7.x86_64) 7 (Core)' --class centos --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-3.10.0-693.5.2.el7.x86_64-advanced-6f15c206-f516-4ee8-a4b7-89ad880647db' {
menuentry 'CentOS Linux (0-rescue-f073c429a7456b53ec3e2c53460c5c8f) 7 (Core)' --class centos --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-0-rescue-f073c429a7456b53ec3e2c53460c5c8f-advanced-6f15c206-f516-4ee8-a4b7-89ad880647db' {

I make it the default one (menuentry 0):

[yugabyte]$ sudo sed -e '/^GRUB_DEFAULT=saved/s/=.*/=0/' -i /etc/default/grub

I need to add psi in the kernel command line:

[yugabyte]$ sudo mkdir -p /etc/tuned/psi && sudo tee /etc/tuned/psi/tuned.conf <<'TAC'

[main]
  summary=Enable Pressure Stall Information as in https://dev.to/aws-heroes/pressure-stall-information-on-ec2-centos-7-2nbb-temp-slug-5559720
[bootloader]
cmdline=psi=1

TAC

Checking current profile:

[yugabyte]$ tuned-adm profile
Current active profile: virtual-guest

adding the new one:

[yugabyte]$ sudo tuned-adm profile virtual-guest psi

enabling all these GRUB changes:

[yugabyte]$ sudo grub2-mkconfig -o /boot/grub2/grub.cfg

Now ready to reboot the node. This is where I appreciate running a distributed database (YugabyteDB 🚀) as I can rolling-restart the nodes without application interruption.

Checking it:

[yugabyte]$ tail /proc/pressure/*

==> /proc/pressure/cpu <==
some avg10=27.80 avg60=25.88 avg300=16.13 total=77572758
full avg10=0.98 avg60=0.94 avg300=0.55 total=4422080

==> /proc/pressure/io <==
some avg10=12.03 avg60=13.02 avg300=7.36 total=32530366
full avg10=4.73 avg60=5.33 avg300=3.08 total=15034660

==> /proc/pressure/memory <==
some avg10=0.12 avg60=0.02 avg300=0.00 total=309168
full avg10=0.12 avg60=0.02 avg300=0.00 total=307455

Now it remains to interpret it. I explained a bit in a past blog post and the full description in on www.kernel.org. Basically, the "some" line shows the percent of time where one task is stalled, and "full" when all non-idle tasks are waiting, over the last 10 seconds, 1 minute, and 5 minutes. So if you feel something is slow and should be faster, don't scale blindly. You know which resource is responsible for the response time.

DEV Community

Pressure Stall Information on CentOS7

Top comments (0)

Read next

Mastering `sed` Commands and Flags: A Guide to Stream Editing in Linux 🖥️

How to Install Anytype on Linux

A Comparative Analysis between RK3588 and RK3576 Chips: Unveiling the Technological Distinctions

VPS Servers for Linux - Everything You Need to Know