Background of this lab
- Digital sound is typically represented, uncompressed, as signed 16-bit integer signal samples. There are two streams of samples, one each for the left and right stereo channels, at typical sample rates of 44.1 or 48 thousand samples per second per channel, for a total of 88.2 or 96 thousand samples per second (kHz). Since there are 16 bits (2 bytes) per sample, the data rate is 88.2 * 1000 * 2 = 176,400 bytes/second (~172 KiB/sec) or 96 * 1000 * 2 = 192,000 bytes/second (~187.5 KiB/sec).
- To change the volume of sound, each sample can be scaled (multiplied) by a volume factor, in the range of 0.00 (silence) to 1.00 (full volume).
- On a mobile device, the amount of processing required to scale sound will affect battery life.
Multiple Approaches
Six programs are provided, each with a different approach to the problem, named vol0.c through vol5.c. A header file, vol.h, defines how much data (in number of sample) will be processed by each program, as well as the volume level to be used for scaling (50%).
These are the six programs:
-
vol0.c
is the basic or naive algorithm. This approach multiplies each sound sample by the volume scaling factor, casting from signed 16-bit integer to floating point and back again. Casting between integer and floating point can be expensive operations. -
vol1.c
does the math using fixed-point calculations. This avoids the overhead of casting between integer and floating point and back again. -
vol2.c
pre-calculates all 65536 different results, and then looks up the answer for each input value. -
vol3.c
is a dummy program - it doesn't scale the volume at all. It can be used to determine some of the overhead of the rest of the processing (besides scaling the volume) done by the other programs. -
vol4.c
uses Single Instruction, Multiple Data (SIMD) instructions accessed through inline assembley (assembly language code inserted into a C program). This program is specific to the AArch64 architecture and will not build for x86_64. -
vol5.c
uses SIMD instructions accessed through Complier Intrinsics. This program is also specific to AArch64. Note that vol4.c and vol5.c will build only on AArch64 systems because they use architecture-specific SIMD instructions.
Benchmarking
AArch64
- My first step was to copy files from the root directory into my.
- After, I unpacked the archive /public/spo600-volume-examples.tgz.
- Here you can see the tree
- I build all the programs with
make
- I tested all of them from ./vol0-5 on aarch64 and 0-3 on x86_64 with vol.h
/* This is the number of samples to be processed */
#define SAMPLES 16
/* This is the volume scaling factor to be used */
#define VOLUME 50.0 // Percent of original volume
/* Function prototype to fill an array sample of
* length sample_count with random int16_t numbers
* to simulate an audio buffer */
void vol_createsample(int16_t* sample, int32_t sample_count);
- I changed number of samples and my results was different: I was kind of experimenting with a code. 5 000 000 Samples.
- Now lets check each algorithm three times:
vol0 | time ex |
---|---|
1 attempt | 0m1.488s |
2 attempt | 0m1.469s |
3 attempt | 0m1.399s |
AVG: 0m1.452s
vol1 | time ex |
---|---|
1 attempt | 0m1.479s |
2 attempt | 0m1.259s |
3 attempt | 0m1.329s |
AVG: 0m1.355s
vol2 | time ex |
---|---|
1 attempt | 0m1.549s |
2 attempt | 0m1.379s |
3 attempt | 0m1.219s |
AVG: 0m1.382s
vol3 | time ex |
---|---|
1 attempt | 0m1.279s |
2 attempt | 0m1.319s |
3 attempt | 0m1.349s |
AVG: 0m1.315s
vol4 | time ex |
---|---|
1 attempt | 0m1.379s |
2 attempt | 0m1.359s |
3 attempt | 0m1.479s |
AVG: 0m1.405s
vol5 | time ex |
---|---|
1 attempt | 0m1.338s |
2 attempt | 0m1.439s |
3 attempt | 0m1.349s |
AVG: 0m1.375
- I also used a time function with #include
My prediction
I thought that vol4 or vol5 will be the best because of specific SIMD instructions, but...
Memory Usage
x86_64
The same steps I did for x86_64
vol0 | time ex |
---|---|
1 attempt | 0m0.811s |
2 attempt | 0m0.820s |
3 attempt | 0m0.844s |
AVG: 0m0.825s
vol1 | time ex |
---|---|
1 attempt | 0m0.839s |
2 attempt | 0m0.803s |
3 attempt | 0m0.817s |
AVG: 0m0.819s
vol2 | time ex |
---|---|
1 attempt | 0m0.826s |
2 attempt | 0m0.837s |
3 attempt | 0m0.828s |
AVG: 0m0.830s
vol3 | time ex |
---|---|
1 attempt | 0m0.811s |
2 attempt | 0m0.824s |
3 attempt | 0m0.846s |
AVG: 0m0.827s
Memory Usage:
Questions
Q: This part sums the samples. (Why is this needed?)
for (x = 0; x < SAMPLES; x++) {
ttl=(ttl+out[x])%1000;
}
We need this part because it is used to go through out of the SAMPLES that we defined in vol. h. Then assigned all of the results from out[] array to ttl property. It used to print out on as a result.
Q: Print the sum of the samples. (Why is this needed?)
printf("Result: %d\n", ttl);
We need this part because it is used to print the result that we went through the SAMPLE then assigned from the out[] array.
Q: should we use 32767 or 32768 in next line? why?
vol_int = (int16_t)(VOLUME/100.0 * 32767.0);
We should operate 32767 in the subsequent line because we include the limitation of the sample so that the model will begin at the lowest value of a 16-bit signed integer to the highest value of the 16-bit signed integer.
Q: what is the purpose of these next two lines?
in_cursor = in;
out_cursor = out;
limit = in + SAMPLES;
The goal of the first line is to set the input cursor to the in[] collection, and the second line proceeds to provide the result cursor to the out[] collection.
Q: what does it mean to "duplicate" values in the next line?
asm ("dup v1.8h,%w0"::"r"(vol_int)); // duplicate vol_int into v1.8h
A copy of the value is kept in a vector that acts as a similar-sized array. The deal to copy is %w0, and the duplicate value will be sent to the dupv1.8h.
Q: What do these next three lines do?
: [in_cursor]"+r"(in_cursor), [out_cursor]"+r"(out_cursor)
: "r"(in_cursor),"r"(out_cursor)
: "memory"
);
}
These three lines are used to acquire the input cursor's value, and the output cursor then stores them into the system memory.
// Q: Why is the increment below 8 instead of 16 or some other value?
// Q: Why is this line not needed in the inline assembler version
// of this program?
in_cursor += 8;
out_cursor += 8;
}
Conclusion
⚠️ Computer Architecture Blog Post: Link
Links
🖇 Follow me on GitHub
🖇 Follow me on Twitter
_p.s This post was made for my Software Portability and Optimization class. Lab 5.
Top comments (0)