DEV Community

Bryan Hughes for Microsoft Azure

Posted on • Edited on

Passing strings from C++ to JavaScript in Web Assembly

I'm moving right along with my experiments of getting the messaging stack of my wireless LED control system running in Node.js via Web Assembly (WASM for short). I'm now ready to start integrating the stack into a Node.js library.

The very first thing I decided to wire up was some logging functionality. This meant passing strings from C++ to JavaScript. Sounds straightfoward, right? I thought so until I spent a day and a half struggling to get it to work 😅.

The Scenario

You can only pass numbers between JavaScript and WASM. This is just how the runtime is designed. So how do you pass more complex data?

There is some emscripten documentation on interacting with code across languages that discusses how to do just that. If you're fully immersed in the emscripten world, then you can use the functions ccall and cwrap to neatly and tidily pass strings from one language to another. There's a catch though: you must be running a full C++ application to make use of these functions, not just a library.

I tried to hack the output so I could tie into these functions without making it a full application, similar to how I hacked the output to tie into emscripten's WASM bootstrap code. It didn't work this time though. emscripten is set up so that these functions are only available once int main() {} has been run in C++ land. I don't have a main function though, since this is a library. Even adding an empty main function didn't work for some reason. emscripten threw an error stating that ccall is not available until the app has been initialized and main had been run.

So back to the drawing board. I searched high and low for other sorts of emscripten tricks, but no such luck. Then it hit me! I was way over-complicating the problem.

WASM creates a chunk of memory within JavaScript for its use. This memory chunk is created by calling const memory = new WebAssembly.Memory({ initial: 256, maximum: 256 }). A WebAssembly.Memory instance is a wrapper around an ArrayBuffer, and exposes this buffer to us via the buffer property on the memory instance. This is a long winded way of saying that WASM memory is just a typed array that we can access!

The Solution

You might have read that last paragraph and thought "ok, cool, but what does that have to do with strings?" In C, strings are typically defined as a character array, e.g. const char* myString. const char* indicates that we have an array of characters, which is really an array of signed 8-bit integers. This means we can look at where the string is stored in the typed array buffer mentioned above, and interpret a contiguous block of memory as the array of characters relative to the string's memory pointer. A pointer is, at a high level, an index into a block of memory representing a value. We can represent a string in memory with a pointer called str as such:

str str + 1 str + 2 str + 3
72 105 33 0

This block of memory forms the string "Hi!". See how there's a "fourth character" with a value of 0? This is what we call a "null terminator," which signifies the end of the string in memory. It's often easier to work explicitly with string length instead of looping through memory looking for a 0. We can get the length of any string in C/C++ with the strlen function. With a pointer and string length in hand, we can iterate over the memory and reconstruct the string with the following code:

const view = new Uint8Array(memory.buffer, pointer, length);
const string = '';
for (let i = 0; i < length; i++) {
  string += String.fromCharCode(view[i]);
}
console.log(string);
Enter fullscreen mode Exit fullscreen mode

Now we're ready to write the code to bind them together! First, let's write the following C++ to use a JavaScript function:

extern "C" void jsPrintString(const char *s, uint16_t len);

void print() {
  const char* str = "Hello from C++!";
  jsPrintString(str, strlen(str));
}
Enter fullscreen mode Exit fullscreen mode

Note the first extern "C" line. This does two things: 1) defines the function signature for a function named jsPrintString that we will implement in JavaScript, and 2) tells the compiler to use C name mangling instead of C++ name mangling. C and C++ compilers change the name of function signatures so that overloaded versions can be easily identified. This is a simple algorithm in C because it doesn't allow much overloading, and only prepends a _ to the name. C++ is a lot more complicated though, and you can end up with names like _Z16RVLMessagingLoopv for a function called RVLMessagingLoop in code. We'll see why this is important in a minute.

Note: Make sure to add -s ERROR_ON_UNDEFINED_SYMBOLS=0 to your em++ build command. This will prevent the compiler from erroring when a defined function's implementation cannot be found in C++. This is expected since the function is defined in JavaScript, not C++. Note: be careful with this option, as you may miss actual issues in your C++ code with this option enabled. Always be sure to compare the list of symbols it didn't find with what you expect not to be found.

Then we have our print function, which will invoke the JavaScript function. We define a character array with const char* and assign it a string value. str is now a pointer to the string in memory. Pointers are also numbers! This means we can pass the pointer straight from C++ to JavaScript without having to do anything special.

Now it's time to modify the JavaScript code. We're going to wrap our string reconstruction code in a function called handlePrintString. Then, we inject it into C++ code by modifying the env object we pass to the WASM instantiation. We assign this function to the env object with the key _jsPrintString (note the leading underscore). This name in env is the mangled name of the function in C/C++. This is why we want to use C mangling instead of C++ mangling. Finally, we can invoke the print function in C++ from JavaScript, which calls back into JavaScript to log the string.

function handlePrintString(ptr: number, len: number) {
  const view = new Uint8Array(memory.buffer, ptr, len);
  let string = '';
  for (let i = 0; i < len; i++) {
    string += String.fromCharCode(view[i]);
  }
  console.log(string);
}

const env = {
  ...
  _jsPrintString: handlePrintString,
  ...
};
WebAssembly.instantiate(bytes, { env }).then((result) => {
  result.instance.exports._print();
});
Enter fullscreen mode Exit fullscreen mode

And there we have it, we can now passing strings from C++ into JavaScript! It may sound small, but this marks a big step forward towards integrating this system with Azure IoT Edge via Node.js.

Top comments (6)

Collapse
 
golinvauxb profile image
Benjamin Golinvaux

I would be careful not to use "-s ERROR_ON_UNDEFINED_SYMBOLS=0" for too long, for this may turn build-time errors into run-time ones! The --js-library flag can be used for the JS glue code. You are certainly aware of this, but I think beginners should understand the risk :)

Collapse
 
nebrius profile image
Bryan Hughes

Good point about the dangers of -s ERROR_ON_UNDEFINED_SYMBOLS=0, thanks!

I'll confess, I haven't explored using the --js-library flag yet. A quick read suggested it wouldn't do what I needed it to do, but I also may just not have read deeply into it enough.

Collapse
 
golinvauxb profile image
Benjamin Golinvaux

The thing is:
if, in your C++ code, you define something as:

extern int GetFoo();

Then, if you build the code, you'll get an undefined symbol error, UNLESS this function can be "seen" by emscripten. It can only see it if you put it in a js file that it knows of. That is, a file that is passed to the --js-library flag.

What I usually do is something like:

C++: extern "C" void SomeFunction();

JS (in library.js) : SomeFunction : function() { _SomeFunction() }

And _SomeFunction is declared in the JavaScript project where I load the wasm module in.

You can find more details here:

emscripten.org/docs/porting/connec...

HTH

Collapse
 
nebrius profile image
Bryan Hughes

I added a note in the text itself warning folks to be careful about undefined symbols. Thanks again!

Collapse
 
rw251 profile image
Richard Williams

Great article. Minor typo - should be strlen(str) rather than strlen(s).

Collapse
 
nebrius profile image
Bryan Hughes

Nice catch, and thanks for letting me know. It's fixed now.