Contents

#1 WebAssembly and C++: Baby steps

This post is part of a WebAssembly series focused on WASM and C++. The goal is to gain a thorough understanding of how WebAssembly works, how to use it as a compilation target for C++ code and hopefully have fun along the way. So, stick with me for this exciting journey.

Wherever mentioned, working WASM examples will be embedded directly on the page. If your browser supports it, you should be able to see them running.

Introduction

Let’s start by answering some basic questions regarding WebAssembly to understand what we’re dealing with.

What is WebAssembly?

The name is a bit misleading. WebAssembly (WASM) is a byte code that is executed by a WASM runtime. WASM runtime is effectively a virtual machine (just like JVM). There are many runtimes available, amongst most popular ones are:

Yes, I’ve mentioned a web browser and node.js. WASM can run within JavaScript engines and there’s a set of Web APIs created to support that which can be found here.

Why do we need the runtime?

It’s a byte code, not native code, as a result it requires a viable runtime environment to execute.

WebAssembly assembly?

If WebAssembly is the byte code, is there an assembly language for WebAssembly? Yes, there is. It’s called WebAssembly Text (WAT). You can read more about it here. It’s not my primary focus so, I’m not gonna go into details about it but in general, you don’t need Rust or C/C++ or any high level language to create WASM binaries. Theoretically, you can just write your code in WAT and compile that down to WASM. The complexity of this approach and all challenges that comes with it is probably a topic for a different story but it is possible nonetheless.

It’s good to familiarise yourself with WAT as it’s useful when debugging but, in general, not a strict requirement.

How would you compile WAT to WASM?

WebAssembly binary toolkit contains a complete set of tools comprising the most fundamental toolchain.

It’s useful to have it installed even when working with clang or emscripten or Rust technologies. Again, mainly for debugging purposes. If you’re using Arch (like I am) or Ubuntu, then it’s even easier as it’s available in official repositories, just:

pacman -S wabt

or

apt install wabt

Getting started

We need to start with most basic examples to understand the calling convention and get a general feel of how to work with this technology.

Prerequisites

Initially, I’m gonna use clang which supports WASM targets:

1
2
3
4
5
6
$ clang -print-targets
  Registered Targets:
    ...
    wasm32     - WebAssembly 32-bit
    wasm64     - WebAssembly 64-bit
    ...

Which one to use? Similarly as with native targets, the main difference between the two is the size for POD types like int and pointers, which are 32 bits in length for wasm32 and 64 for wasm64 respectively. It doesn’t really matter at that point. I’m gonna use wasm32 as the runtime support is a bit better.

Hello world… kind of

I’m gonna start with something bare bones simple:

1
2
3
4
5
extern "C" {
    void foo() {}
    void bar(int) {}
    int baz() { return 123; }
}

This can be compiled into WASM binary the following way:

1
2
3
4
5
6
clang++ \
    -target wasm64 \
    --no-standard-libraries \
    -Wl,--no-entry \
    -Wl,--export-all \
    foo.cpp -o foo.wasm

Some quick clarification on the above:

  • I use extern "C" to avoid C++ name mangling
  • --no-standard-libraries means that the resulting binary is compiled in standalone mode, meaning it doesn’t require anything from libstdc++, libc or the OS.
  • -Wl,--no-entry just means that I’m not defining main and the linker shouldn’t complain about it

Okay, so now what?

foo.wasm

Let’s inspect the binary to learn what it contains. Here’s the disassembly:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
$ wasm-objdump -d foo.wasm

foo.wasm:       file format wasm 0x1

Code Disassembly:

0000f3 func[0] <__wasm_call_ctors>:
 0000f4: 0b                         | end
0000f6 func[1] <foo>:
 0000f7: 0f                         | return
 0000f8: 0b                         | end
0000fa func[2] <bar>:
 0000fb: 03 7f                      | local[0..2] type=i32
 0000fd: 23 80 80 80 80 00          | global.get 0 <__stack_pointer>
 000103: 21 01                      | local.set 1
 000105: 41 10                      | i32.const 16
 000107: 21 02                      | local.set 2
 000109: 20 01                      | local.get 1
 00010b: 20 02                      | local.get 2
 00010d: 6b                         | i32.sub
 00010e: 21 03                      | local.set 3
 000110: 20 03                      | local.get 3
 000112: 20 00                      | local.get 0
 000114: 36 02 0c                   | i32.store 2 12
 000117: 0f                         | return
 000118: 0b                         | end
00011a func[3] <baz>:
 00011b: 01 7f                      | local[0] type=i32
 00011d: 41 fb 00                   | i32.const 123
 000120: 21 00                      | local.set 0
 000122: 20 00                      | local.get 0
 000124: 0f                         | return
 000125: 0b                         | end

As expected there are functions which I defined and one more:

  • __wasm_call_ctors which is mainly used as an initialiser for static data (and global, initialised variables, if there are any).
  • foo first C function which just returns
  • bar is slightly different, stack manipulation code is visible handling the argument
  • baz in which 123 constant declaration is visible as a return value

Detailed content can be inspected by converting to WAT format:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
$ wasm2wat foo.wasm

(module                                                                                                                                                                         [1/1964]
  (type (;0;) (func))
  (type (;1;) (func (param i32)))
  (type (;2;) (func (result i32)))
  (func $__wasm_call_ctors (type 0))
  (func $foo (type 0)
    return)
  (func $bar (type 1) (param i32)
    (local i32 i32 i32)
    global.get $__stack_pointer
    local.set 1
    i32.const 16
    local.set 2
    local.get 1
    local.get 2
    i32.sub
    local.set 3
    local.get 3
    local.get 0
    i32.store offset=12
    return)
  (func $baz (type 2) (result i32)
    (local i32)
    i32.const 123
    local.set 0
    local.get 0
    return)
  (memory (;0;) 2)
  (global $__stack_pointer (mut i32) (i32.const 66560))
  (global (;1;) i32 (i32.const 1024))
  (global (;2;) i32 (i32.const 1024))
  (global (;3;) i32 (i32.const 1024))
  (global (;4;) i32 (i32.const 66560))
  (global (;5;) i32 (i32.const 131072))
  (global (;6;) i32 (i32.const 0))
  (global (;7;) i32 (i32.const 1))
  (export "memory" (memory 0))
  (export "__wasm_call_ctors" (func $__wasm_call_ctors))
  (export "foo" (func $foo))
  (export "bar" (func $bar))
  (export "baz" (func $baz))
  (export "__dso_handle" (global 1))
  (export "__data_end" (global 2))
  (export "__global_base" (global 3))
  (export "__heap_base" (global 4))
  (export "__heap_end" (global 5))
  (export "__memory_base" (global 6))
  (export "__table_base" (global 7)))

Now, that’s a lot of code. All of that is simple stuff though. I just want to explain the basics. The general format of the binary is documented on the WebAssembly specification page. This document tells us, that the module is comprised of sections. In this case, there is:

You might be confused about the meaning of “(;0;)” within the WAT code. Well, that’s just a block comments, inserted by the tool to increase readability.

To summarise, we’ve got 3 function types declared at the very top, followed by 3 function definitions, then there’s memory module declaring two words (it’s not used though), seven global constants and a list of exports at the very end. Exports are the symbols which will be made public to the runtime environment.

Seeing how the code has been compiled allows us to understand what the calling convention will be for variety of function types. So, even though, I’m not gonna write WAT code manually it’s good to know what’s going on and be able to read it.

How to run it?

Let’s focus first on two most basic runtimes.

wasm-interp

WebAssembly binary toolkit provides us with the most fundamental runtime called wasm-interp. This can be used to run the binary directly:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
$ wasm-interp --run-all-exports -t foo.wasm
>>> running export "__wasm_call_ctors":
#0.   96: V:0  | return
__wasm_call_ctors() =>
>>> running export "foo":
#0.  100: V:0  | return
foo() =>
>>> running export "baz":
#0.  236: V:0  | alloca 1
#0.  244: V:1  | i32.const 123
#0.  252: V:2  | local.set $2, 123
#0.  260: V:1  | local.get $1
#0.  268: V:2  | drop_keep $1 $1
#0.  280: V:1  | return
baz() => i32:123

Just like that, it executed all exported functions within the binary. Additionally, it can produce a useful trace allowing us to inspect what’s happening.

node.js

Node.js is a JavaScript engine. As such, it can’t run WASM directly. JavaScript APIs are needed to load and instantiate WASM module. To do that a simple wrapper is required:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
const fs = require('fs');

const wasmBuffer = fs.readFileSync('foo.wasm');

let wasmImports = {};

WebAssembly.instantiate(wasmBuffer, wasmImports).then(wasmModule => {
  console.log('wasm module loaded successfully');

  // bind exported WASM symbols to local variables
  const {foo, bar, baz} = wasmModule.instance.exports;

  console.log("running baz: ", baz());
});

After running it, the result is visible:

1
2
3
$ node foo.js
wasm module loaded successfully
running:  123

Congratulations! You should now understand the basics of what WebAssembly is, how it works, how to load & execute it! That’s a good progress. I’m gonna build on top of that to extend the knowledge about WebAssembly in the next post. See you there!