Contents

#3 WebAssembly and C++: Passing strings between C++ and Javascript

This post is part of a WebAssembly series focused on WASM and C++. The goal is to gain a thorough understanding of how WebAssembly works, how to use it as a compilation target for C++ code and hopefully have fun along the way. So, stick with me for this exciting journey.

Wherever mentioned, working WASM examples will be embedded directly on the page. If your browser supports it, you should be able to see them running.

Interoperability

So far, the data types I’ve passed around between WASM module and JavaScript were extremely simple. In fact, I could count them using fingers of one hand. To be specific, the used types were:

  • unsigned
  • float
  • unsigned*

That’s it! If your application is doing most of the work on WASM side and the API that it exposes is simple then this might be sufficient but in reality it rarely will be. We need to learn how to exchange strings and structured data.

Memory

It’s worth reminding that we’re dealing with two, distinct and separate memory systems here. WASM has its own memory area separate from JavaScript. You can’t use JavaScript’s memory directly in WASM. Similarly, WASM memory is not directly useful in JavaScript.

Same principles apply to memory management. WASM memory must be managed exclusively by WASM module and by the same token, WASM module cannot manage JavaScript’s memory in any shape or form.

Passing strings between C++ and JavaScript

Decoding WASM strings

First step will be to create a very simple WASM module that will return a static string:

1
2
3
4
5
6
7
8
9
const char *str_ret() { return "string returned from C++"; }

unsigned str_len(const char *str) {
  const char *ptr = str;
  while (ptr != 0 && *ptr != '\0') {
    ptr++;
  }
  return ptr - str;
}

If I call str_ret from within JavaScript, it’s just gonna return a pointer to WASM module memory.

1
2
3
4
5
6
7
8
9
const importObject = {};

WebAssembly.instantiateStreaming(fetch("strs.wasm"), importObject)
    .then((wasm) => {
      console.log(wasm.instance.exports);
      const {str_ret} = wasm.instance.exports;
      const mem = wasm.instance.exports.memory;
      console.log(str_ret());
    });

Here’s the console:

/wasm/wasm_cpp_03/wasm_str_ptr.png

Using this pointer and a handle to WASM module’s memory (wasm.instance.exports.memory), the string has to be recreated on the JavaScript side. But to do that, string’s length has to be known as well; this is the reason for str_len function which I implemented in C++ as well (I can’t use strlen since I’m still operating in standalone mode without C/C++ standard library).

I need an instance of TextDecoder to perform the conversion. The decode method needs a buffer; the easiest way is to provide an instance of DataView. Here’s how to do all of that:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18

function importWasmStr(wasmMem, strPtr, strLen) {
  let view = new DataView(wasmMem.buffer, strPtr, strLen)
  let dec = new TextDecoder();
  return dec.decode(view);
}

WebAssembly.instantiateStreaming(fetch("strs.wasm"), importObject)
    .then((wasm) => {
      console.log(wasm.instance.exports);
      const {str_ret, str_len} = wasm.instance.exports;
      const mem = wasm.instance.exports.memory;

      const wasmStr = str_ret();
      const wasmStrLen = str_len(wasmStr);

      console.log(importWasmStr(mem, wasmStr, wasmStrLen));
    });

Here’s the console screenshot:

/wasm/wasm_cpp_03/wasm_str_converted.png

Encoding Javascript strings

Passing strings from Javascript back to WASM happens very similarly. For the purpose of this example, I’ll implement a simple function in C++, which counts digits within a string:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
unsigned count_digits(const char *str) {
  unsigned digits = 0;
  while (str != 0 && *str != '\0') {
    if (*str >= '0' && *str <= '9') {
      digits++;
    }
    str++;
  }
  return digits;
}

I’ll need an instance of TextEncoder to encode JavaScript string to an array of byte characters.

But… there’s a problem. I can’t just randomly write some data wherever I want into WASM memory. Should I have access to malloc and some sorts of heap management facilities, that would be simple since I could just ask WASM module to allocate memory for me that I could use. In standalone mode, it’s not that easy.

The workaround, useful for the sake of this contrived example is to explicitly allocate more memory for WASM module:

1
2
const pageSize = 64 * 1024;
wasm.instance.exports.memory.grow(1) * pageSize;

This is described on the Memory.grow page.

It’s becoming quite apparent I hope, that in the long run, this approach won’t scale and might be applicable only to a narrow specific use cases. Despite that, let’s continue.

I’ve got the memory, It’s time to write data to it. Here’s how the updated JavaScript code looks like:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
const importObject = {};

function importWasmStr(wasmMem, strPtr, strLen) {
  let view = new DataView(wasmMem.buffer, strPtr, strLen)
  let dec = new TextDecoder();
  return dec.decode(view);
}

function exportJsStr(wasmMem, wasmStrPtr, jsStr) {
  let strLen = jsStr.length + 1;
  let strArr = new Uint8Array(wasmMem.buffer, wasmStrPtr, strLen);
  let enc = new TextEncoder();
  enc.encodeInto(jsStr, strArr);
}

WebAssembly.instantiateStreaming(fetch("strs.wasm"), importObject)
    .then((wasm) => {
      console.log(wasm.instance.exports);
      const {str_ret, str_len, count_digits} = wasm.instance.exports;
      const mem = wasm.instance.exports.memory;

      const wasmStr = str_ret();
      const wasmStrLen = str_len(wasmStr);

      console.log(importWasmStr(mem, wasmStr, wasmStrLen));

      let jsStr = "This string contains some 123 digits";
      const pageSize = 64 * 1024;
      let wasmExtrMemPtr = mem.grow(1) * pageSize;
      exportJsStr(mem, wasmExtrMemPtr, jsStr);

      console.log(count_digits(wasmExtrMemPtr));
    });

Console output:

/wasm/wasm_cpp_03/wasm_str_arg.png

Structured data

What about structured data, like classes or structs passed between C++ and JS? In short, the same principles apply as for strings. Whatever is returned from WASM is an opaque handle to Javascript and has to be somehow converted to Javascript objects. Therefore, having the following code:

1
2
3
4
5
struct Pair {
    int x, y;
};

Pair makePair(int x, int y) { ... };

The invocation of makePair from Javascript, will return a … pointer. Yep, it doesn’t matter if you’re returning by value or by pointer explicitly. makePair returns a pointer to a fragment of WASM memory representing a Pair. Javascript knows nothing about this data structure. There’s no way to handle it explicitly or assume its internal layout. To convert it to Javascript object, we’d need functions in C++ allowing access to the data since, Pair itself, in Javascript, is just an opaque handle. For example:

1
2
3
4
5
6
7
int pairGetX(const Pair* p) {
    return p->x;
}

int pairGetY(const Pair* p) {
    return p->y;
}

This later on can be used in Javascript:

1
2
3
function importPair(pairPtr) {
    return { x: pairGetX(pairPtr), y: pairGetY(pairPtr) };
}

Code examples

You can find the discussed example code in a github repository created for the purpose of this post.

Conclusion

Passing structured data between JS and C++ requires serialisation. Something like a protocol buffer, JSON or msgpack. To have that working the facilities that standard C++ library provides are really a must. Therefore, in future instalments of this series, I’m gonna focus on details how to use it and how to instrument an integration layer between two environments.