1. Background
Take a look at the ARM-based server SR1 recently launched by Tencent Cloud. Is it worth it? How does it stack up against other models? Let's check it out.
We have reviewed two typical models of the ARM-based SR1 and x86-based S5 to show you how to measure CPU performance, mainly computing power, so that you can quickly know what you should be looking for.
2. ARM-based server environment and evaluation preparations
Tencent Cloud SR1 is the first ARM-based server with the latest Ampere Altra, an ARM Neoverse N1 CPU with up to 2.8 GHz clock rate and 64 KiB L1 cache. The Neoverse N1 CPU has the following architecture:
The other object is the mainstream x86-based standard S5, which adopts the latest Cooper Lake microarchitecture of Intel Xeon Platinum and runs at 2.5 GHz. It's quite popular in general use cases. By the way, both of the test objects accommodate 4-core 8 GiB memory.
From the cost perspective, SR1 is approximately 20% cheaper than S5 as indicated at the official website. Although it doesn't have a price as competitive as Lighthouse, it is really worth it.
1.1 ARM-based server activation
S5 and SR1 price comparison
SR1 is comparable to S5 in terms of overall performance and more economical than the latter, a must-have that promises a large amount of cost savings for both individuals and enterprises.
Tips: Screen splitting
Use the Tmux tool to split the screen (ctrl b), log in to two servers at the same time, and enter the ctrl b:setw synchronize-panes
command to allow for entering commands on two terminals at the same time, as shown below:
2.1 System preparations and CPU viewing
Enter commands in different windows of Tmux.
Done with the preparations and let's start the evaluation.
3. 7-Zip compression evaluation
7-Zip is built with the LZMA compression tool to quickly evaluate the CPU computing performance of servers.
Run the following command to evaluate the performance:
2.6 LZMA compression evaluation (ARM-based SR1/x86-based S5)
7-Zip evaluation
The 7-Zip benchmark command can be used to display the compression and decompression performance of a server, with a measure of million instructions per second (MIPS). The higher the value, the stronger the performance. You can also use metrics such as compression rate and execution time for coordinated verification. 7-Zip evaluation rarely uses 64-bit instructions, let alone advanced sets; it's more about the performance of CPU "fundamentals". LZMA compression performance relies on the memory access latency, high-speed data cache (D-Cache) capacity, TLB performance, and out-of-order execution efficiency of a CPU; while the decompression performance reveals more about the branch prediction and instruction latency of the multi-stage pipeline design.
Evaluation results:
2.2 LZMA compression evaluation
7-Zip evaluation of S5 and SR1
As you can see, ARM-based SR1 delivers 60% higher performance than x86-based S5 in LZMA compression and decompression scenarios.
4. LUKS block device encryption and decryption evaluation
LUKS is a specification for block device encryption supported by the Linux kernel. Simply put, it encrypts disks.
Similar to file compression and decompression, block device encryption and decryption are typical applications that consume a lot of computing resources. Unlike generic computing scenarios, encryption and decryption computing instructions are usually implemented with special hardware to serve as CPU extension sets. The x86 system adopts the AES-NI extension, and ARM differentiates extensions for varied encryption and decryption scenarios.
There is no need to install any other software. Just use the cryptsetup tool that comes with Linux to evaluate the CPU performance through encryption and decryption algorithms:
By default, the command evaluates tasks of ciphers and key derivation functions (KDFs).
Run the following command to evaluate the performance:
2.3 LUKS encryption evaluation (ARM-based SR1/x86-based S5)
LUKS evaluation process
Evaluation results (KDFs):
2.3 LUKS encryption evaluation
LUKS evaluation of S5 and SR1 in terms of KDFs
Evaluation results (ciphers):
2.3 LUKS encryption evaluation (ARM-based SR1/x86-based S5)
LUKS evaluation of S5 and SR1 in terms of encryption algorithms
As you can see, the ARM-based server outperforms its x86-based counterpart in terms of the optimization of common SHA instructions (SHA-256 and SHA-512) and AES-CBC encryption; while in terms of decryption and XTS encryption with the highest security, the x86-based server (AES-NI extension instruction) does a better job.
5. OpenSSL network encryption and decryption evaluation
Block device encryption uses data at rest, while network encryption involves data in transit. As OpenSSL is one of the most popular network encryption libraries, it's necessary to conduct an OpenSSL performance evaluation.
OpenSSL's speed sub-command can be used to evaluate all the encryption algorithms, which takes a long time. Generally speaking, you can use parameters to specify algorithms. Commonly used algorithms are Hash-based Message Authentication Code (HMAC) for encrypted information integrity and identity verification, SHA-256 secure hash for information digest and digital signature, and standard encryption algorithm of AES-256 widely adopted by cloud service providers.
Run the following command to evaluate the performance:
2.4 OpenSSL encryption evaluation (ARM-based SR1/x86-based S5)
OpenSSL encryption process through speed
Evaluation results:
2.4 OpenSSL encryption evaluation
OpenSSL encryption results of S5 and SR1
As you can see, the ARM-based server slightly lags behind the x86-based server in terms of MD5 HMAC, but it outperforms the latter in terms of SHA-256 and AES-256, especially in the former case.
6. Redis database throughput rate evaluation
Now let's move to Redis performance evaluation. As one of the most popular memory databases, Redis is often used for key-value storage, data cache, and message queue scenarios with a high throughput rate. Redis also has a built-in evaluation utility called redis-benchmark to measure the number of requests per second.
The redis-benchmark program evaluates the throughput rate of a single server during the tests of GET, SET, LPUSH, and other common Redis commands, looking into the CPU and its memory access capabilities (such as memory access bandwidth and performance).
Run the following command to evaluate the performance:
2.6 Throughput evaluation (ARM-based SR1/x86-based S5)
Redis evaluation command execution
Evaluation results:
2.6 Throughput evaluation
Redis throughput rate evaluation of S5 and SR1
According to the Redis evaluation results, ARM-based SR1 has 30% to 40% higher performance on average than x86-based S5.
7. Conclusion
Now it's time you get some hands-on experience and see what your cloud server performance test would reveal.
Actually, ARM-based servers are more than cost-effective. As ARM platform-based virtualization technologies become popularized in the cloud, ARM-based servers are bound to gain more momentum in IoT, cloud phone/gaming, Android ecosystem, and many more use cases.
Let's look forward to more diversified experiences available at our fingertips.
Top comments (0)