Recently there is a blog post titled Yes, PHP Is Faster Than C# that has sparked quite a conversation. I decided to run the tests mentioned in the post and found some interesting result, which I think is worth sharing.
The benchmark used here reads a file from the file system in 4 KiB chunks, and count the number of bytes with the value 1
in the file. First off, I would start by saying that I don't find this "benchmark" to be very meaningful, especially since reading files from disk is involved. There are a lot of things that can impact the file-system performance (caches, state of the disk drive, how busy the kernel is at that time), none of which is address in the test itself.
Nonetheless, the results do indicate some interesting performance characteristics that we can talk about.
Source code for the test can be found here: https://github.com/dhhoang/csharp-php-file-read
Small files
I generated the test file like this
# for this test, we will use file_size of 4 MiB as specified in the original post
base64 /dev/urandom | head -c [file_size] > test.txt
The code for the PHP (8.0) program looks something like this:
function test()
{
$file = fopen("/path/to/test.txt", 'r');
$counter = 0;
$timer = microtime(true);
while ( ! feof($file)) {
$buffer = fgets($file, 4096);
$counter += substr_count($buffer, '1');
}
$timer = microtime(true) - $timer;
fclose($file);
printf("counted %s 1s in %s milliseconds\n", number_format($counter), number_format($timer * 1000, 4));
}
test();
And for C#:
private static void Test()
{
using var file = File.OpenRead("/path/to/test.txt");
var counter = 0;
var buffer = new byte[4096];
var numRead = 0;
var sw = Stopwatch.StartNew();
while ((numRead = file.Read(buffer, 0, buffer.Length)) != 0)
{
counter += buffer.Take(numRead).Count((x) => x == '1');
}
sw.Stop();
Console.WriteLine($"Counted {counter} 1s in {sw.ElapsedMilliseconds} milliseconds");
}
Test();
The result when running on a t3-xlarge EC2 instance is as follows (note: code is run 10 times and runtime is averaged after removing anomalies due to cold file cache)
Test-C# 53.2ms
Test-PHP 11.1ms
So the PHP code is about 5 times faster than the C# code!!! So looks like PHP really is faster than C#?
Something is definitely off here. Is .NET that slow when reading a file? Probably not. I did a simple test where I removed the "counting" part in both programs, and their performance became very similar. The blog's author claimed that the test has "very little user-land code" and mainly test the file-reading performance. I found this to be incorrect.
Now if you look closer at the 2 programs, they are very similar, except for the part where the 1
bytes are counted. PHP uses the substr_count
built-in function which is very optimized, while the C# code uses LINQ. LINQ is a very convenient way to work with collections in C#, but they are also quite slow. What if we try to just count the bytes the old-fashioned way?
private static void Test_FileStream_NoLinq()
{
...
while ((numRead = file.Read(buffer, 0, buffer.Length)) != 0)
{
for (var c = 0; c < numRead; c++)
{
if (buffer[c] == '1')
{
counter++;
}
}
}
...
}
Our result now is (see Test-C#-NoLinq
):
Test-C# 53.2ms
Test-PHP 11.1ms
Test-C#-NoLinq 6.5ms
So at this point C# is already doing much faster than before, and about twice as fast as the PHP program. This shows that the byte-counting process contributes significantly to the total run time.
So the next question is, can we do even better? When working with byte buffer, iterating through individual bytes is a pretty naive implementation. A more optimized one would be to utilize vectorization techniques such as SIMD. In fact, I would be very surprised if the substr_count
function is not using vectorization. In order to test this, I created another PHP test function that iterate through the string instead of using substr_count
, which would be comparable to our C# Test_FileStream_NoLinq
function:
function test_manual_count()
{
...
while ( ! feof($file)) {
$buffer = fgets($file, 4096);
$length = strlen($buffer);
for ($i = 0; $i < $length; $i++) {
if($buffer[$i]=='1'){
$counter += 1;
}
}
}
...
}
And the result (see Test-PHP-Manual-Count
):
Test-C#-NoLinq 6.5ms
Test-PHP 11.1ms
Test-PHP-Manual-Count 135ms
That is painfully slow, which is why it's always a good idea to use substr_count
when you need to count occurrences in a string. Unfortunately, C# doesn't not provide a built-in method with the same functionality, however it does offer a lot of primitives for implementing vectorization. I found an implementation of a SIMD-equivalent function on StackOverflow: VectorExtensions.OccurrencesOf(ReadonlySpan<byte>, byte)
. With this we can rewrite our counter:
private static void Test_FileStream_Vectorized()
{
...
while ((numRead = file.Read(buffer, 0, buffer.Length)) != 0)
{
counter += buffer.AsSpan().Slice(0, numRead).OccurrencesOf((byte)'1');
}
...
}
And the result (see Test-C#-Vectorization
):
Test-C#-NoLinq 6.5ms
Test-C#-Vectorization 1.0ms
That is 6 times faster than manual loop and about 10x faster than PHP π.
Large file
For this test, I'm using an 3.2 GB Ubuntu ISO image. The result looks like this:
Test-PHP 3228.4ms
Test-PHP-Manual-Count 103966.7ms
Test-C#-NoLinq 5175.3ms
Test-C#-Vectorization 1104.7ms
Here we can clearly see how using vectorization makes things a lot faster for both languages.
Top comments (8)
And memory usage?
Haven't looked into this. From my experiences, the .NET GC tends to be pretty generous with memory allocation, especially with ServerGC, so C# programs usually have a larger memory footprint than say Go or NodeJS. I don't have enough experience with PHP but it would definitely be interesting to look into.
You could use memory_get_peak_usage or memory_get_usage to check how much memory is used in PHP. Just add the first one at the end of script.
This performance test is not accurate. It does the time capture from within the same process -- the same function even, so it skips over the time spent loading the runtime, running through linked library symbols, time to load the binary, time to execute those symbols -- and pretty much every other piece besides in-runtime translation.
This article should either be retracted or rewritten to use external timer processes (such as
time
; please seeman time
for more details).Agree π. Note that I said in the beginning that I don't find the test to be meaningful. This is done just to response to the same test in the original post.
My main point is not as much to "counter" as to point out how easy it is to misunderstand performance characteristics of programs π . In fact I believe the PHP code could be further optimized to be much faster as well.
That's true, one thing is while and for loops in PHP are painfully slow compared to foreach. If you use foreach for checking characters you can also forget about strlen too.
One other minor thing you could do is to import all builtin functions you are using or prefix them with namespace (in case of builtins it's "\").
Overall I agree, benchmarks like this make no sense if you don't go extremely in depth to make sure you are really testing the same thing.
This was really interesting, and I like how you split it out in reasonable chunks. Your code was very clear too.