Applications in many programming languages perform data compression. They commonly rely on zlib for easy handling of gzip files. This article explains how to improve performance of applications using zlib on AWS Graviton processors.
Most Linux distributions use zlib without any optimizations. For the Arm architecture, this means that CRC (cyclic redundancy check) instructions are not utilized for best performance. Installing and using a zlib which has been optimized may provide performance improvement for applications doing data compression. Let's see how to do it with an example Python application.
Cloudflare zlib is one version which has optimizations included. There are other zlib versions which have been optimized. The process to use them should be similar.
This can be done on any Graviton-based instance. I did it with Ubuntu 22.04.
Confirm crc32 is included in the processor flags
All AWS Graviton processors and most Armv8.0-A and above processors have support for CRC instructions.
To check if a Linux system has support, use the lscpu command and look for crc32 in the listed flags.
lscpu | grep crc32
If the machine is confirmed to include crc32 it may benefit from zlib-cloudflare.
Check if the default zlib includes crc32 instructions
Some Linux systems may already make use of crc32 in the default library. If the default zlib is already optimized, then using zlib-cloudflare may not have any impact on performance.
Ubuntu and Debian Linux distributions put zlib in /usr/lib/aarch64-linux-gnu
Other software tools are needed to build zlib, so install them now.
sudo apt install -y build-essential
To check if there are any CRC instructions in a library, use objdump to disassemble and look for crc32 instructions.
objdump -d /usr/lib/aarch64-linux-gnu/libz.so.1 | awk -F" " '{print $3}' | grep crc32 | wc -l
If the result is 0 then there are no crc32 instructions used in the library.
Install Cloudflare zlib
If there are no crc32 instructions in zlib then zlib-cloudflare may help application performance.
To build and install zlib-cloudflare navigate to an empty directory and use these commands.
mkdir tmp ; pushd tmp
git clone https://github.com/cloudflare/zlib.git
cd zlib && ./configure
make && sudo make install
popd
rm -rf tmp
If successful, zlib-cloudflare is installed in /usr/local/lib
Confirm the new zlib has crc32 instructions. The objdump command should return a non-zero number now.
objdump -d /usr/local/lib/libz.so | awk -F" " '{print $3}' | grep crc32 | wc -l
To install zlib somewhere else, use the prefix argument to select another location.
./configure --prefix=$HOME/zlib
This results in zlib being installed in $HOME/zlib instead.
Configuring zlib
Below is a simple C program to demonstrate zlib usage.
#include <stdio.h>
#include <stdlib.h>
#include "zlib.h"
int main()
{
gzFile myfile;
printf("%s\n", zlibVersion());
myfile = gzopen("testfile.gz", "wb");
gzprintf(myfile,"Hello gzipped file!\n");
gzclose(myfile);
exit(0);
}
Save the text above as a file named test.c and compile the example.
gcc test.c -o test -lz
Run the program and see the version.
./test
The printed version will be a number such as:
1.2.11
Use ldd to see the location of the shared library.
ldd ./test
The output shows the shared libraries used by test.
linux-vdso.so.1 (0x0000ffff91026000)
libz.so.1 => /lib/aarch64-linux-gnu/libz.so.1 (0x0000ffff90fa0000)
libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000ffff90df0000)
/lib/ld-linux-aarch64.so.1 (0x0000ffff90fed000)
Set LD_PRELOAD to use zlib-cloudflare
To run test with zlib-cloudflare instead of the default.
LD_PRELOAD=/usr/local/lib/libz.so ./test
The LD_PRELOAD variable informs the linker to use these libraries before the default libraries.
The version of zlib-cloudflare will be printed. It may be older than the default, but we are interested in crc32 and not using the latest.
Next, let's see how to use zlib-cloudflare in an application doing data compression. We can use a Python example and measure the performance difference with zlib-cloudflare.
Copy and save the file below as zip.py
import gzip
size = 16384
with open('largefile', 'rb') as f_in:
with gzip.open('largefile.gz', 'wb') as f_out:
while (data := f_in.read(size)):
f_out.write(data)
f_out.close()
For Ubuntu 22.04, configure python to be python3.
sudo apt install python-is-python3 -y
Create a large file to compress
The above Python code will read a file named largefile and write a compressed version as largefile.gz
To create the input file, use the dd command.
dd if=/dev/zero of=largefile count=1M bs=1024
Run the example using the default zlib
Run with the default zlib and time the execution.
time python ./zip.py
Make a note of the runtime.
Run the example again with zlib-cloudflare
This time, use LD_PRELOAD to change to zlib-cloudflare and check the performance difference.
Adjust the path to libz.so as needed.
time LD_PRELOAD=/usr/local/lib/libz.so python ./zip.py
Notice the shorter runtime when zlib-cloudflare is used.
Using a c6g.large EC2 instance, the time with the original zlib is about 7.25 seconds and with zlib-cloudflare the time is about 2.66 seconds.
Summary
If you have applications using zlib make sure to check alternative versions of the library. Cloudflare zlib is a good one, and there may be others available. Watch the AWS Graviton Getting Started for the latest information.
Top comments (0)