The most significant new feature with new Armv9 compatible CPUs that will be directly observable to developers and users is the baselining of SVE2 as a successor to NEON.
The Armv9 Scalable Vector Extensions version 2 (SVE2) provides variable-width SIMD credentials for AArch64 systems.
In its first implementation, Scalable Vector Extensions, or SVE, was announced back in 2016 and implemented for the first time in Fujitsu's A64FX CPU cores, now powering the world's #1 supercomputer Fukagu in Japan. The problem with SVE was that this first iteration of the new variable vector length SIMD instruction set was somewhat limited in scope and aimed more at HPC workloads, missing many of the more versatile instructions covered by NEON.
SVE2 was announced in April 2019 and looked to solve this issue by complementing the new scalable SIMD instruction set with the needed instructions to serve more varied DSP-like workloads that currently use NEON.
The benefit of SVE and SVE2 beyond the addition of various modern SIMD capabilities is their variable vector size, ranging from 128b to 2048b, allowing variable 128b granularity of vectors, irrespective of the actual actuality hardware is running on. Purely from a view of vector processing and programming, a software developer would only ever have to compile his code once. If in the future a CPU would come out with, say, native 512b SIMD execution pipelines, the code would be able already to take advantage of the entire width of the units. Similarly, the same principle would be able to run on more conservative designs with a lower hardware execution width capability, which is essential to Arm as they design CPUs from IoT to mobile to datacentres. It also does this all while remaining within the 32b encoding space of the Arm architecture. In contrast, alternative implementations such as x86 have to add new extensions and instructions depending on vector size.
Building SVE2 Code
To build code that includes SVE2 instructions, you will need to instruct the compiler or assembler to emit code for an Armv8a processor that also understands the SVE2 instructions; this is performed using the -march= option (which is read as "machine architecture"). The architecture specification for this target is currently "armv8-a+sve2":
gcc -march=armv8-a+sve2 ...
Remember that to invoke the auto vectorizer in GCC version 11. It would help if you used -O3:
gcc -O3 -march=armv8-a+sve2 ...
Using SVE2 Intrinsics Header Files
To use SVE2 intrinsics in a C program, include the header file arm_sve.h:
#include <arm_sve2.h>
Running SVE2 Code
To run SVE2 code on an Armv8 system, you can use the QEMU user-mode system. This will trap SVE2 instructions and emulate them in software while executing Armv8a instructions directly on the hardware:
qemu-aarch64 ./binary
This blog post is work in progress;
Resources
https://www.arm.com/campaigns/arm-vision
https://www.anandtech.com/show/16584/arm-announces-armv9-architecture
https://developer.arm.com/documentation/ddi0602/2021-12/
Conclusion
⚠️ Computer Architecture Blog Post: Link
Links
🖇 Follow me on GitHub
🖇 Follow me on Twitter
_p.s This post was made for my Software Portability and Optimization class.
Top comments (0)