I've been playing with mruby recently, after seeing several interesting talks at Rubyconf 2019. I found a few projects ( mruby-esp32, M5Stack-mruby) that were setup for building mruby on the device I had bought (an M5StickC), so after some messing around with toolchains & drivers I was up and running.
All these projects were using mruby 2.0.1 (released April 2019), but I wanted to use mruby 2.1 (Nov 2019) which has a bunch of ruby 2.6 features, some nice enhancements to the C apis (including keyword arguments) and the usual collection of bugfixes. In theory this was easy to fix: the setup for these projects was a git submodule pointing at mruby, all I needed to do was update that submodule to the tag for 2.1, recompile, and I'd be on my way.
However, when I did that, my device started crashing as soon as it loaded mruby (in the call to mrb_open
) with a message like this:
Guru Meditation Error: Core 0 panic'ed (InstrFetchProhibited). Exception was unhandled.
Core 0 register dump:
PC : 0x000ef970 PS : 0x00060730 A0 : 0x800e791d A1 : 0x3ffb8c40
A2 : 0x3ffb8d30 A3 : 0x3ffb8c60 A4 : 0x0000000e A5 : 0x000ef970
A6 : 0x00000001 A7 : 0x3ffb4a68 A8 : 0x800e7cf0 A9 : 0x00000000
A10 : 0x3ffb4a68 A11 : 0x00000010 A12 : 0x3ffb5298 A13 : 0x3ffb4a68
A14 : 0x00000012 A15 : 0x00000002 SAR : 0x0000000f EXCCAUSE: 0x00000014
EXCVADDR: 0x000ef970 LBEG : 0x4000c46c LEND : 0x4000c477 LCOUNT : 0x00000000
ELF file SHA256: 5da51cd84380674eb796b51bd0d21ea3ee3c4f275043c281291ffd060f6d7e00
Backtrace: 0x000ef970:0x3ffb8c40 0x400e791a:0x3ffb8d70 0x400e7e33:0x3ffb8ea0 0x400d6ca1:0x3ffb8f30 0x400efb3d:0x3ffb8fb0 0x400f0ad9:0x3ffb9000 0x400da57f:0x3ffb9060 0x400da32b:0x3ffb9080 0x400da341:0x3ffb90a0 0x400da361:0x3ffb90c0 0x400d390d:0x3ffb90e0 0x40086ee9:0x3ffb9150
0x400e791a: mrb_funcall_with_block at /Users/fred/iot/M5Stick-mruby/components/mruby_component/mruby/src/vm.c:808 (discriminator 8)
0x400e7e33: mrb_funcall_argv at /Users/fred/iot/M5Stick-mruby/components/mruby_component/mruby/src/vm.c:808 (discriminator 8)
0x400d6ca1: mrb_obj_new at /Users/fred/iot/M5Stick-mruby/components/mruby_component/mruby/src/class.c:504
0x400efb3d: mrb_exc_new_str at /Users/fred/iot/M5Stick-mruby/components/mruby_component/mruby/src/error.c:563
0x400f0ad9: mrb_init_exception at /Users/fred/iot/M5Stick-mruby/components/mruby_component/mruby/src/error.c:610
0x400da57f: mrb_init_core at /Users/fred/iot/M5Stick-mruby/components/mruby_component/mruby/src/init.c:42
0x400da32b: mrb_open_core at /Users/fred/iot/M5Stick-mruby/components/mruby_component/mruby/src/state.c:43
0x400da341: mrb_open_allocf at /Users/fred/iot/M5Stick-mruby/components/mruby_component/mruby/src/state.c:71
0x400da361: mrb_open at /Users/fred/iot/M5Stick-mruby/components/mruby_component/mruby/src/state.c:63
0x400d390d: mrubyTask(void*) at /Users/fred/iot/M5Stick-mruby/main/main.cpp:38
0x40086ee9: vPortTaskWrapper at /Users/fred/iot/esp-idf/components/freertos/port.c:403
Which was more than a little intimidating, seeing as I'm totally unfamiliar with the mrb or the mrb source and my google searches turned up nothing. It's crashing inside mrb_funcall_with_block
, which is the c api function for calling a ruby method, but that's about all I could glean from the stack trace.
Bisect to the rescue
That's when I remembered a great talk from Brighton Ruby 2019 about tracking down a bug in MRI. Different codebase, same idea: just git bisect on the mruby itself, with the 2.0.1 tag as the initial "good" commit and 2.1.0 as the bad one.
For those unfamiliar with git bisect, it's an awesome tool, well worth adding to your arsenal. Given an initial pair of good and bad commits it will do a binary search through the intervening history to find the commit that caused the problem. Since it's doing a binary search, instead of having to try all 900 or so commits between the 2 mruby versions, I would only need to try 10 or so (which was a good thing, since you're recompiling mruby at each step). I couldn't quite do the automated setup Alyssa did in her talk: I had to manually build my code & flash to the device for each commit git bisect asked me to check. I also got compile errors on some commits and had to include the fix from this commit.
Still it wasn't too long before it pointed the finger of blame at this commit. This was a change relating to an optimisation relating to how methods are represented.
Digging deeper
On mruby, a ruby method is represented by two things: a pointer to a function or RProc structure and some flags. Those flags tell ruby whether the pointer is a function pointer or not and whether the method takes arguments. There are only 2 flags (i.e. 2 bits are needed) and on most platforms function pointers are 4 byte aligned: the low 2 bits are never used. Therefore mruby can use those 2 bits to store the flags. As a result it doesn't need to allocate a separate data structure, reduces memory accesses required to call methods etc. (A similar trick is used to encode integers into fixnum objects).
When mruby gets a mrb_method_t
(the name of this data type) it can mask off the low 2 bits to get that pointer or the high bits to get the flag information.
If however function pointers are not 4 byte aligned though things will go very wrong:
- When jumping to the function, mruby ignores the last 2 bits, so will use the wrong pointer value
- The low bits look like flags so mruby may use the pointer incorrectly
Apparently that's the case on my platform (you can see this on the stack trace above - the addresses are not all divisible by zero). Prior to the linked commit you had to opt into this optimisation, but afterwards it became the default.
Fixing it
Luckily the commit message had all the information I needed: you can disable the optimization by adding MRB_METHOD_T_STRUCT
to the list of defines in the mruby build config file (compiling with falign-functions=4 didn't seem to do anything). With that, I was up and running on mruby 2.1.
This option is also documented in the mrbconf docs, however I doubt I would have realised its significance.
So there you have it: mruby is cool and so git bisect is a great tool even if you don't understand the codebase. (and go watch Alyssa's talk if you have a spare 25 minutes or so - I saw it in person and it is well worth the watch)
Top comments (0)