Introduction
Welcome back to part 2 of SVE2(Scalable Vector Extension version 2). If you are not sure about what this post is about, you can see the part 1 to have a better idea.
Source code (vol1.c) for conversion to adapt SVE2
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <stdbool.h>
#include "vol.h"
int16_t scale_sample(int16_t sample, int volume) {
return ((((int32_t) sample) * ((int32_t) (32767 * volume / 100) <<1) ) >> 16);
}
int main() {
int x;
int ttl=0;
// ---- Create in[] and out[] arrays
int16_t* in;
int16_t* out;
in=(int16_t*) calloc(SAMPLES, sizeof(int16_t));
out=(int16_t*) calloc(SAMPLES, sizeof(int16_t));
// ---- Create dummy samples in in[]
vol_createsample(in, SAMPLES);
// ---- This is the part we're interested in!
// ---- Scale the samples from in[], placing results in out[]
for (x = 0; x < SAMPLES; x++) {
out[x]=scale_sample(in[x], VOLUME);
}
// ---- This part sums the samples.
for (x = 0; x < SAMPLES; x++) {
ttl=(ttl+out[x])%1000;
}
// ---- Print the sum of the samples.
printf("Result: %d\n", ttl);
return 0;
}
As you can tell, this is vol1 from previous post about algorithm selection.
Note that vol1 utilizes a fixed-point calculation. This avoids the cost of repetitively casting between integer and floating point.
Converting
C Compiler Options
most compilers do not have a specific target for Armv9 systems. Therefore, to build code that includes SVE2 instructions, we will need to instruct the complier to emit code for an Armv8-a processor that also understands the SVE2 instructions; on the GCC compiler, this is performed using the -march= option
we have to instruct the compiler to emit code for an Armv8a processor to make it understand SVE2 to do that we need to invoke the autovectorizer in GCC version 11, we must use -O3 or the appropriate feature options
gcc -O3 -march=armv8-a+sve2
In our case, we will be working with vol1
gcc -o3 -march=armv8-a+sve2 vol1.c vol_createsample.c -o vol1
Then, we can execute the program by emulating with the QEMU usermode system. This will trap SVE2 instructions and emulate them in software, while executing Armv8a instructions directly on the hardware:
qemu-aarch64 ./vol1
Result:
Converted code
.arch armv8-a+sve2
.file "vol1.c"
.text
.align 2
.p2align 4,,11
.global scale_sample
.type scale_sample, %function
scale_sample:
.LFB24:
.cfi_startproc
lsl w2, w1, 15
mov w3, 34079
sub w1, w2, w1
movk w3, 0x51eb, lsl 16
sxth w0, w0
smull x3, w1, w3
asr x3, x3, 37
sub w1, w3, w1, asr 31
lsl w1, w1, 1
mul w0, w1, w0
lsr w0, w0, 16
ret
.cfi_endproc
.LFE24:
.size scale_sample, .-scale_sample
.section .rodata.str1.8,"aMS",@progbits,1
.align 3
.LC0:
.string "Total Time: %2.9f\n"
Understanding converted code
SVE2 instructions
.cfi_startproc
lsl w2, w1, 15
mov w3, 34079
sub w1, w2, w1
movk w3, 0x51eb, lsl 16
sxth w0, w0
smull x3, w1, w3
asr x3, x3, 37
sub w1, w3, w1, asr 31
lsl w1, w1, 1
mul w0, w1, w0
lsr w0, w0, 16
ret
.cfi_endproc
corresponding C code
return ((((int32_t) sample) * ((int32_t) (32767 * volume / 100) <<1) ) >> 16);
- ‘movk w3, 0x51eb, lsl 16’ contains an ‘lsl 16’ instruction, indicating that the bits are to be shifted left by 16 bits.
- ‘sxth ’ tells register w0 to sign the least-significant element of itself.
- ‘smull x3, w1, w3’ refers to the multiplication of the value of ‘volume’ by 32767.
- ‘lsl w1, w1, 1’ refers to the shifting left one bit at the end.
- ‘mul w0, w1, w0’ turns the result of multiplying the sample into a signed 32-bit integer.
- ‘lsr w0, w0, 16’ shifts the final resulting integer’s bits to the right 16 times.
Conclusion
We've done experimenting with SVE2 instructions to the volume adjusting algorithm(vol1). Since SVE2 is very new at the moment and has practically no systems developed for it. And we must use an emulator to run the program. I wasn't able to find a way to test the SVE2 performance of the assembly code.
The most challenging part of the lab I found was when after converting the C code into SVE2 instructions, trying to relate the instructions from SVE2 with the code in the original C file.
Source: SVE2
Top comments (0)