John Hawthorn wrote a nice post discussing a recent tool to incorporate Crystal into your Ruby app. While JH brings an important point, it overlooks certain aspects that are worth consideration. I'll discuss Crystal's real performance and benefits, highlighting why such Ruby/Crystal integration is an indispensable tool to have on the bench.
This is also a structured presentation of some comments made on the Hacker News post.
tl;dr
- JH makes the case that Ruby has a just-in-time compiler, and that optimizing the Ruby version of the code has a great performance improvement.
- Crystal code doesn't need wrestling to be optimal.
- The comparison is performed within Ruby, that is, incorporating the cost of calling Crystal within Ruby.
- Pure Crystal shows something radically different!
Yes, Ruby is fast
The first point I want to make is that JH is right: we need to be fair to Ruby's JIT compiler (--yjit
), and only consider benchmarks that include it. And, indeed, with it, Ruby gets very nice performance.
And let me be blunt here: I love Ruby! Ruby is one of my top 5 languages of choice. And a great community with many big companies agree, so I expect Ruby's performance will only increase with time as more improvements gets incorporated into it.
🔴 First point: Ruby's YJIT is fast!
The real performance of JITs and Crystal
Let's compare the execution of Ruby's YJIT, Python PyPy (another JIT compiler), and pure Crystal (that is, without the integration).
Ruby: On my computer, the numbers for Ruby's YJIT goes on par with those in the post. Each line corresponds to each of the optimizations proposed:
> ruby --yjit fib.rb
user system total real
3.464166 0.022979 3.487145 ( 3.491493)
1.705869 0.002169 1.708038 ( 1.710117)
0.187083 0.000318 0.187401 ( 0.187578)
Python: My Python-foo is limited, so I only ported the last problem (a simple while loop) and ran it with PyPy. It takes a bit less of time:
> pypy fib.py
0.12447810173
Crystal: When we compile the code with --release
, numbers are insignificant! Not only that, I've added some extra code to make sure the optimizations weren't throwing away important code. So not only I calculate the Fibonacci number of 45 (using an UInt128, to even stretch this further), but I also print the sum of the million runs!
> crystal build --release fib.cr; ./fib
user system total real
1134903170000000
0.000002 0.000004 0.000006 ( 0.000004)
1134903170000000
0.000001 0.000002 0.000003 ( 0.000003)
1134903170000000
0.000002 0.000002 0.000004 ( 0.000003)
⚫ Second point: Pure Crystal is really, really fast in this benchmark!
Reference: The code I'm using for the benchmarks is listed in this gist.
Note: As mentioned, the Crystal version uses a primitive number type (UInt128
). That explains a lot of the performance difference.
Crystal compilation optimizes your code
In the timings of the Crystal programs, the first one takes a couple more micro-seconds. However, if we swap the order in which the examples are run, the output is identical: the first one, whichever that is, takes a few micro-seconds more.
In conclusion, none of the proposed changes to the Ruby version of the code makes a dent in the Crystal version. This is not entirely Crystal's doing: it uses the LLVM backend, which generates very optimized binaries.
Quite frankly, I'm puzzled as to why Ruby's YJIT doesn't optimize this as well. Perhaps it will get there with time (I tested Ruby 3.3.1).
⚫ Third point: Crystal code is fast, even without tweaks
Maybe it's the plumbing that's slow?
Doesn't seem so. But to understand why, we need to discuss an important point: by default, the integration compiles the Crystal code without the --release
flag. This makes sense: during development, you don't want the compilation to take a lot of time. Compiling in release mode makes efficient binaries, but at the cost of significantly increasing the compilation time.
When I tested the Prime Counting from the README file of the crystalruby page, using release mode, the time it takes to run the Crystal code is the same as the one from pure Crystal. For that, one needs to add the following code:
CrystalRuby.configure do |config|
config.debug = false
end
So perhaps the timings from the Fibonacci example would look the same as with pure Crystal. I say perhaps because I stumbled across an issue that turned the integration unusable on that particular example.
🔴⚫ Fourth point: The integration doesn't produce efficient Crystal code by default.
Crystal/Ruby integration revisited
Crystal and Ruby are two wonderful languages, each with their pros and cons. Crystal's performance and low memory footprint is hardly contested, and can further be studied in the benchmarks of language and compilers (but be critical about benchmarks!).
Performance is not the only advantage of Crystal: its typechecker is another benefit that teams might want to use for safety-critical parts of an application. Or maybe there is an interesting shard to call from a gem… Whatever the reason, integrating Crystal code into Ruby is a very appealing tool to have in the dev toolbox.
It is common to call C functions from Ruby or Crystal. It's interesting to know that there are alternatives to bridge these two languages that share the same goal of writing beautiful programs, using a similar syntax. The mentioned crystalruby gem allows interfacing Ruby programs with Crystal, and the shard anyolite allows calling Ruby programs from Crystal.
🔴⚫ Fifth point: Ruby + Crystal FTW! ❤️
EDIT: I got twice a very good question: how do we know LLVM isn't optimizing it that much, that it just replaces the call to the Fibonacci function with the result? After all, the argument is fixed, it can calculate how much the result will be and just place that.
I missed this point in the post, although I originally thought about it. At the time of writing, I tried adding 45 + rand(1)
as argument. This ensures the argument is not a literal number. It certainly impacts in the overall performance, and now it takes 1ms. Still very good, because it also counts the calls to rand
! This is why I didn't see a problem and forgot to add this to the article.
However, with further inspection of the LLVM generated code, I found more! It optimizes the code nevertheless! It produces a sum of 1134903170 (result of fib(45)
) with the million calls to rand(1)
! I was totally mind-blowed by this. In any case, point to LLVM, and for Crystal to use it!
EDIT 2: GitHub's user @petr-fischer
suggested to take the argument from the command line, in order to force LLVM to not optimize that much. With that change, times changes significantly, in particular we can see a difference from the second to the third version:
user system total real
0.034982 0.000266 0.035248 ( 0.035400)
0.034268 0.000134 0.034402 ( 0.034522)
0.023234 0.000140 0.023374 ( 0.023607)
I don't think the takeaways are any different: we're still talking of a significant reduction w.r.t. to the Ruby or Python versions. And as mentioned already, let me stress that a big part of this is using a primitive type (check this post by Ary that George Dietrich recommended in the forum).
Top comments (10)
Thanks for this insightful posts, These are posts I expect to see here on dev.to
As I understand it Ruby has massive performance problems in practice.
Big software teams like Gitlab and Github were initially written in Ruby and their first/initial versions were slow as hell.
Since entire teams of big companies found it easier to migrate existing ruby code to a completely new language than to fix and make their existing ruby code fast -> I think where lies a hidden reason why no-one seems to be able to make fast ruby programs.
I positively read this as:
Ruby and Rails allowed these companies to become big enough to pay entire teams to work on improving Ruby.
Adding a bit more context to this:
I know some of them are not using Ruby or maybe they are not exclusive using Ruby.
But what I see here is that there seems to be not an issue to have a multi-billion dollar company build with Ruby, so it seems to me the speed in most of the cases is not that important.
There is no hidden secret for Ruby: you might pay a performance price for being flexible with Ruby, fast in feature delivery, and having many things come out of the box working with Rails. In a lot of cases, the product will never reach this scale, and when and if it reaches that scale, they can afford to pay people to make Ruby faster.
Exactly! And the key of this gem, and this article, is you can do that with a language that is close to what you've started with.
Another interesting point is that Crystal is developed with orders of magnitude fewer funding than Ruby or other languages, and still manages to stay atop the performance benchmarks. If we had 1/4 of the same funding, we'll be even better!
I do not know about you, but I think that 1-20 Requests per second are very very low and completely unacceptable
x.com/nateberkopec/status/17919275...
My current setup with Typescript/Bun on a single 4GB ram VPS serves about 10k Requests per second -> and I would not call it the most optimized.
Or do you not use Rails and/or do you have other other performance numbers of your real world apps?
I don't know what you're calling “request”, but those numbers are awfully low. In this benchmark rails can handle ~7k, and the last time I dive into the code it was logging in full debug mode, so the real number in production should be significantly higher.
That said, I don't think it's possible for a rails app to get closer to the numbers of Crystal's alternatives, which can handle an order of magnitude higher of requests (146k for toro, 145k router.cr, 143k spider-gazelle).
From what I understand, this information might be out of date.
I use ruby and I like it. The way I add the dependencies and modules. It's use with JavaScript enabled framework, it's cool 😎
Interesting. How did you ensure that, in the case of Crystal, LLVM optimizations do not replace the entire calculation of, for example, fib1(45), with an already calculated value (1134903170) during compilation? What did I miss?
I was the one to who missed to clarify this. Check the EDIT at the end. Thanks for bringing this point!