# #4 WebAssembly and C++: What's WASI and why do we need it? This post is part of a [WebAssembly series](/tags/wasmcpp) focused on WASM and C++. The goal is to gain a thorough understanding of how WebAssembly works, how to use it as a compilation target for C++ code and hopefully have fun along the way. So, stick with me for this exciting journey. ## Recap So far, in this series, we've learned what WASM is, how to execute it in JavaScript engines (like browsers or node.js) and how to perform basic interoperability between WASM code and JavaScript. The C++ code used to generate WASM modules was independent of any libraries (that includes standard C/C++ libraries). Now, it's finally time to discuss the usage of standard library. ## What's WASI? In short, WASI defines the ABI for WASM in order to standardise the integration with programming languages like C++ or Rust. This is in details described on [WASI's page](https://wasi.dev/). [WASI documents](https://github.com/bytecodealliance/wasmtime/blob/main/docs/WASI-documents.md) is a good starting point. This may sound a bit vague to begin with but bear with me, it's gonna get a lot clearer once we get to some examples. ## Prerequisites Before we continue, some tools are required. ### WASI SDK *clang* (at least at the time this post is being written) supports WASM target but it knows nothing about WASI. Thankfully WASI provides an SDK which integrates *clang* (and some other basic development tools) that can be used to build WASM code with WASI supports. There are 3 options described in [wasi-sdk readme](https://github.com/WebAssembly/wasi-sdk/). You can build the SDK yourself, use a release package or a provided docker image. I like to use the release packages. ```bash curl -s -L \ https://github.com/WebAssembly/wasi-sdk/releases/download/wasi-sdk-20/wasi-sdk-20.0-linux.tar.gz | \ tar zvxf - -C /opt ``` And just like that, the SDK should be installed in `/opt`. ```bash $ ls -ld /opt/wasi-sdk-20.0/ drwxr-xr-x 5 root root 4096 Mar 30 2023 /opt/wasi-sdk-20.0/ ``` Additionally, it's a good idea to create an environment file containing the following entries. ```bash export WASI_VERSION=20 export WASI_VERSION_FULL=${WASI_VERSION}.0 export WASI_SDK_PATH=/opt/wasi-sdk-${WASI_VERSION_FULL} export CC="${WASI_SDK_PATH}/bin/clang --sysroot=${WASI_SDK_PATH}/share/wasi-sysroot" export CXX="${WASI_SDK_PATH}/bin/clang++ --sysroot=${WASI_SDK_PATH}/share/wasi-sysroot" ``` Save that as `wasmenv.sh`. This could be placed in your `.bashrc` as well but I like to keep my environment clean and only extend it with stuff that is required for a given project. Using the docker image has a lot of advantages as well. ```bash $ docker run -it ghcr.io/webassembly/wasi-sdk env PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin HOSTNAME=ae4bc8a52d3d TERM=xterm LLVM_VERSION=16 CMAKE_TOOLCHAIN_FILE=/usr/share/cmake/wasi-sdk.cmake CC=clang-16 CXX=clang++-16 LD=wasm-ld-16 AR=llvm-ar-16 RANLIB=llvm-ranlib-16 CFLAGS=--target=wasm32-wasi --sysroot=/wasi-sysroot CXXFLAGS=--target=wasm32-wasi --sysroot=/wasi-sysroot LDFLAGS=--target=wasm32-wasi --sysroot=/wasi-sysroot ``` The container comes with a complete environment preconfigured. This is very convenient. Additionally, it's just easier to update the SDK by just updating the image. It doesn't really matter how you plan to use the SDK as long as you have it available and working. ## WASM runtime For test purposes we'll need a runtime. Something like [wasmtime](https://github.com/bytecodealliance/wasmtime) is a good choice. I'm using my distro's (Arch) package manager (pacman) to install it. ## WASI hello world Some code is needed to start our journey. Here's the basic "hello world" in C++ with no surprises. ```c++ #include int main(int argc, const char* argv[]) { std::cout << "hello world" << std::endl; return 0; } ``` Let's source the environment file which I've prepared earlier and try to build the code. ```bash source wasmenv.sh $CXX --target=wasm32-wasi hw.cpp -o hw.wasm ``` Just like that, we've built our first WASM module from C++ code that is using C++'s standard library. Running this in WASM runtime is trivial. ```bash $ wasmtime hw.wasm hello world ``` It works! Now, how to run it in the browser? ## WASI in browsers Let's start with the same boilerplate code I've already used several times. ```html $ cat index.html ``` ```javascript $ cat index.js const importObject = {}; WebAssembly.instantiateStreaming(fetch("hw.wasm"), importObject).then((obj) => { let wasm_exp = obj.instance.exports; // ... what export should I call? }); ``` ### WASI ABI Here's where stuff starts to get interesting. What is the entry point to our program? Is it `main`? Can this be customised? Thankfully, this is all documented within WASI documents, specifically the [WASI Application ABI](https://github.com/WebAssembly/WASI/blob/main/legacy/application-abi.md). This document defines two types of modules: - command modules - reactors Executables definitely fall under the `command modules` category and things like static libraries would be `reactors` in my understanding (spoiler alert, WASM does not support dynamic shared libraries). Since our example code is a `command module` it must be exporting the `_start` function and after inspecting the binary, it definitely does. ```bash $ wasm2wat hw.wasm | grep export (export "memory" (memory 0)) (export "_start" (func $_start)) ``` Good! Let's call it from JavaScript and see what'll happen. ```javascript const importObject = {}; WebAssembly.instantiateStreaming(fetch("hw.wasm"), importObject).then((obj) => { let wasm_exp = obj.instance.exports; wasm_exp._start(); }); ``` So, after running python's simple http server and opening `index.html`... it kind of... doesn't work. Complaining about `wasi_snapshot_preview1` object. ![wasi_snapshot_preview1 object](/wasm/wasm_cpp_04/console_01.png) That rings a bell, most likely it needs some functions in the environment. How to determine which? Let's examine the module once again. ```bash $ wasm2wat hw.wasm | grep import (import "wasi_snapshot_preview1" "args_get" (func $__imported_wasi_snapshot_preview1_args_get (type 2))) (import "wasi_snapshot_preview1" "args_sizes_get" (func $__imported_wasi_snapshot_preview1_args_sizes_get (type 2))) (import "wasi_snapshot_preview1" "environ_get" (func $__imported_wasi_snapshot_preview1_environ_get (type 2))) (import "wasi_snapshot_preview1" "environ_sizes_get" (func $__imported_wasi_snapshot_preview1_environ_sizes_get (type 2))) (import "wasi_snapshot_preview1" "fd_close" (func $__imported_wasi_snapshot_preview1_fd_close (type 0))) (import "wasi_snapshot_preview1" "fd_fdstat_get" (func $__imported_wasi_snapshot_preview1_fd_fdstat_get (type 2))) (import "wasi_snapshot_preview1" "fd_read" (func $__imported_wasi_snapshot_preview1_fd_read (type 9))) (import "wasi_snapshot_preview1" "fd_seek" (func $__imported_wasi_snapshot_preview1_fd_seek (type 15))) (import "wasi_snapshot_preview1" "fd_write" (func $__imported_wasi_snapshot_preview1_fd_write (type 9))) (import "wasi_snapshot_preview1" "proc_exit" (func $__imported_wasi_snapshot_preview1_proc_exit (type 7))) ``` That's a lot of functions for such a simple code but nothing out of ordinary. It needs functions to determine the `environment` and `argc, argv` passed to the program. Additionally it needs simple file descriptor IO - again, expected as it writes to STDOUT (or its equivalent). The last thing is `proc_exit` to set the return status. Yeah but do we have to implement all of that just to run any basic code? Short answer is no! There are polyfills. Those are JavaScript libraries implementing all of that for us but for the sake of learning it's good to know what has to be done to bootstrap all of that yourself! Once again, WASI documentation to the rescue. The code complains about `wasi_snapshot_preview1` object and this is in line with the current unstable implementation of WASI ABI which is even confirmed [here](https://github.com/WebAssembly/WASI/blob/main/README.md) and [here](https://github.com/WebAssembly/WASI/blob/main/legacy/README.md). After defining `wasi_snapshot_preview1` within the import object: ```javascript const importObject = { wasi_snapshot_preview1 : {}, }; ``` We see a new complaint: ![missing functions](/wasm/wasm_cpp_04/console_02.png) It looks for all the missing imports starting with `args_get` so, we're on the right track. The signatures for all of the required functions are well described in [WASI ABI documentation for preview](https://github.com/WebAssembly/WASI/blob/main/legacy/preview1/docs.md). `args_get` is documented [here](https://github.com/WebAssembly/WASI/blob/main/legacy/preview1/docs.md#functions). It's all cool but this seems to be `rust` signature and I'm not that familiar with rust so, is there a better way to figure out what has to be implemented? We can deduce all the details using JavaScript itself. Normal JavaScript functions (non arrow functions) have `arguments` object that can be inspected. Let's do that. ```javascript function args_get() { console.log("args_get args:", arguments); } function args_sizes_get() { console.log("args_sizes_get args:", arguments); } function environ_get() { console.log("environ_get args:", arguments); } function environ_sizes_get() { console.log("environ_sizes_get args:", arguments); } function fd_close() { console.log("fd_close args:", arguments); } function fd_fdstat_get() { console.log("fd_fdstat_get args:", arguments); } function fd_read() { console.log("fd_read args:", arguments); } function fd_seek() { console.log("fd_seek args:", arguments); } function fd_write() { console.log("fd_write args:", arguments); throw "abort"; } function proc_exit() { console.log("proc_exit args:", arguments); } const importObject = { wasi_snapshot_preview1 : { args_get : args_get, args_sizes_get : args_sizes_get, environ_get : environ_get, environ_sizes_get : environ_sizes_get, fd_close : fd_close, fd_fdstat_get : fd_fdstat_get, fd_read : fd_read, fd_seek : fd_seek, fd_write : fd_write, proc_exit, }, }; ``` That's a complete set of all required functions. You might have noticed that I've placed `throw "abort"` in `fd_write` - this is just a temporary measure. Since the code is incomplete and comprised of stubs only, it's required to short circuit an infinite attempt to call `fd_write`. Here's the call history: ![call history](/wasm/wasm_cpp_04/console_03.png) ### `args_sizes_get` `args_sizes_get` is called first with two arguments the value of which strangely resembles pointers to WASM memory. ![call history](/wasm/wasm_cpp_04/console_04.png) This would match [WASI documentation](https://github.com/WebAssembly/WASI/blob/main/legacy/preview1/docs.md#functions): > **args_sizes_get()** -> Result<(size, size), errno> > **error:** Result<(size, size), errno> Returns the number of arguments and the > size of the argument string data, or an error. The first variant is number of arguments and total length of arguments string data. To implement that, let's first create a fake `args`. This will simulate what you'd normally provide in the shell command line when calling an executable. Zeroth argument is always the executable itself so let's do something similar: ```javascript let args = [ "hw", "some", "silly", "fake", "args" ]; ``` Okay. So now within `args_sizes_get` I need to write `args.length` into memory address in wasm memory provided in first argument and the total string length of all `args` concatenated into memory address provided in second argument. The implementation should look something like so. ```javascript let wasm_memory = []; let args = [ "hw", "some", "silly", "fake", "args" ]; ... function args_sizes_get(args_p, args_len_p) { console.log("args_sizes_get args:", arguments); let m = new DataView(wasm_memory); m.setUint32(args_p, args.length, true); let args_total_len = 0; for (arg of args) { args_total_len += arg.length + 1; } m.setUint32(args_len_p, args_total_len, true); } ... WebAssembly.instantiateStreaming(fetch("hw.wasm"), importObject).then((obj) => { let wasm_exp = obj.instance.exports; wasm_memory = obj.instance.exports.memory.buffer; wasm_exp._start(); }); ``` ### `args_get` So far so good. Let's proceed with the `args_get`. It too takes two pointers as arguments. ![args_get arguments](/wasm/wasm_cpp_04/console_05.png) The documentation is a bit cryptic: > **args_get(argv: Pointer>, argv_buf: Pointer) -> Result<(), errno>** > Read command-line argument data. The size of the array should match that returned by args_sizes_get. Each argument is expected to be \0 terminated. Honestly, I had to experiment a bit to understand what is expected to happen here. This description in my opinion leaves a lot to be desired. Not knowing rust and attempting to apply C++ logic here it seems that `argv` being `**argv` would be an array of pointers and `argv_buf` is a pointer to a complete, concatenated string of all args. The above logic can be implemented in JavaScript the following way: ```javascript function args_get(argv_pp, argv_buf_p) { console.log("args_get args:", arguments); let m = new DataView(wasm_memory); let a = new Uint8Array(wasm_memory); const ptr_size = 4; for (arg of args) { m.setUint32(argv_pp, argv_buf_p, true); argv_pp += ptr_size; const argEncd = new TextEncoder().encode(arg); a.set(argEncd, argv_buf_p); // append zero terminator m.setUint8(argv_buf_p + arg.length, 0); argv_buf_p += arg.length + 1; } } ``` ### `fd_fdstat_get` `fd_fdstat_get` is accepting two arguments as well. ![call history](/wasm/wasm_cpp_04/console_06.png) The first one being the file descriptor and the second one being a return value which is `fdstat` record. Now bear in mind that I just want to run my example code so majority of this code is meant to serve only that purpose. With that in mind, I'm gonna implement `fd_fdstat_get` to support only STDOUT and STDERR fds. Therefore I'm gonna guard the code with this initial contract: ```javascript function fd_fdstat_get(fd, fdstat_p) { console.log("fd_fdstat_get args:", arguments); if (fd < 1 || fd > 2) { throw "Unsupported file descriptor"; } } ``` `fdstat` record is 24 byte long comprised of four fields. Assembling all of this by hand is a bit tedious but can be done. The first field is `fs_filetype`. Since I'm supporting output streams only, I'm gonna hard code the `character_device` value. Second field is `fd_fdflags`. This is a bit field. I'm just gonna write 0x1 to that field indicating that data written to the `fd` is always appended. All other flags are set to false. Third one is another bit field with ... a lot of fields. I'm gonna set it to 0x28 indicating that only `sync` and `write` operations are allowed on the `fd`. I'm gonna write same value to the last field as well. Below is the complete code for `fd_fdstat_get`. ```javascript function fd_fdstat_get(fd, fdstat_p) { console.log("fd_fdstat_get args:", arguments); if (fd < 1 || fd > 2) { throw "Unsupported file descriptor"; } let m = new DataView(wasm_memory); const filetype_character_device = 2; const filetype_offs = 0; const filetype = filetype_character_device; m.setUint16(filetype_offs, filetype, true); const fdflags = 0; const fdflags_offs = 2; m.setUint16(fdflags_offs, fdflags, true); const rights = 0x28; const rights_offs = 8; m.setUint32(rights_offs, rights, true); const rights_inh = 0x28; const rights_inh_offs = 16; m.setUint32(rights_inh_offs, rights_inh, true); } ``` ### `fd_write` That leaves us with `fd_write` only. `fd_write` takes four argument. ![call history](/wasm/wasm_cpp_04/console_07.png) I'm gonna have to fall back to [WASI docs](https://github.com/WebAssembly/WASI/blob/main/legacy/preview1/docs.md#fd_write) again since it's a bit difficult to explain. > **fd_write(fd: fd, iovs: ciovec_array) -> Result** > Write to a file descriptor. Note: This is similar to writev in POSIX. > Params: **fd: fd**, > **iovs: ciovec_array** List of scatter/gather vectors from which to retrieve data. > Results: **error: Result** First of all, Jesus... scather/gather... but nevermind. Right so, that's 2 arguments but the code shows 4, why? Well, `List` is passed as two arguments: pointer and its length and the last argument is a pointer for a return value which is `size` - being the total amount of data written. So, the signature will be: ```javascript function fd_write(fd, ciovec_arr_p, ciovec_arr_len, size_p) { console.log("fd_write args:", arguments); if (fd < 1 || fd > 2) { throw "Unsupported file descriptor"; } } ``` So, we've got an array of `ciovec` and a single [`ciovec` is a record](https://github.com/WebAssembly/WASI/blob/main/legacy/preview1/docs.md#ciovec) containing: - a pointer to a buffer of bytes - buffer length In other words, in order to implement `fd_write` I have to iterate over all `ciovecs` and copy the data from the buffers they contain. I'm gonna copy all that data to a string since I'm only supporting `STDOUT` and `STDERR` so, my assumption is that I'm always operating on strings. Additionally, the total length of the resulting string has to be written to a memory location provided in `size_p`. Here's the code for `fd_write`: ```javascript function fd_write(fd, ciovec_arr_p, ciovec_arr_len, size_p) { console.log("fd_write args:", arguments); if (fd < 1 || fd > 2) { throw "Unsupported file descriptor"; } let m = new DataView(wasm_memory); let s = 0; let str = ""; for (let i = 0; i < ciovec_arr_len; i++) { let buf_p = m.getUint32(ciovec_arr_p, true); ciovec_arr_p += 4; let buf_len = m.getUint32(ciovec_arr_p, true); ciovec_arr_p += 4; let sv = new DataView(wasm_memory, buf_p, buf_len); let d = new TextDecoder().decode(sv); str += d; s += buf_len; } if (str.length > 0) { if (fd == 1) { console.log(str); } else { console.error(str); } } m.setUint32(size_p, s, true); } ``` Finally, with all of the above, we've got our "hello world". ![call history](/wasm/wasm_cpp_04/console_08.png) ### Testing args We can test if the `args_get` and `args_sizes_get` works correctly, by extending the WASM module implementation ```C++ int main(int argc, const char *argv[]) { std::cout << "argc: " << argc << std::endl; for (std::size_t i = 0; i < argc; ++i) { std::cout << "argv[" << i << "] -> " << argv[i] << std::endl; } std::cout << "hello world" << std::endl; return 0; } ``` This produces the result ![call history](/wasm/wasm_cpp_04/console_09.png) This is a bit obscured by all the debugging messages in the polyfills code but still proves that it works correctly. ## WASI browser polyfills All of that work has already been done. There's a [WASI polyfills repo](https://github.com/bjorn3/browser_wasi_shim) that implements basic native interface as specified by WASI. Let's initialise an empty npm project to experiment with them. ```bash npm init -y npm install @bjorn3/browser_wasi_shim --save ``` I'm gonna copy the example from from [WASI polyfills repo](https://github.com/bjorn3/browser_wasi_shim) repo with some small adjustments. ```javascript import wasm_module from 'url:./hw.wasm'; import { File, OpenFile, PreopenDirectory, WASI } from "../node_modules/@bjorn3/browser_wasi_shim/dist"; let stdin = new OpenFile(new File([])); let stdout = new OpenFile(new File([])); let stderr = new OpenFile(new File([])); let args = [ "hw", "arg1", "arg2" ]; let env = [ "FOO=bar" ]; let fds = [ stdin, stdout, stderr ]; let wasi = new WASI(args, env, fds); WebAssembly .instantiateStreaming(fetch(wasm_module), { "wasi_snapshot_preview1" : wasi.wasiImport, }) .then((obj) => { wasi.start(obj.instance); let d = new TextDecoder().decode(stdout.file.data); console.log(d); }); ``` I like to use [parcel](https://parceljs.org/) as the bundler so, I'm gonna install it as development dependency: ```bash npm install --save-dev parcel ``` I've deliberately explicitly imported the module for parcel to pick it up: ```javascript import wasm_module from 'url:./hw.wasm'; ``` Preparing the application with parcel is very simple: ```bash npx parcel src/index.html ``` This will run http server on port 1234. After running the WASM module with `wasi.start` I'm just inspecting the raw contents of the File acting as STDOUT and print that to console. This works as expected: ![WASI polyfills output](/wasm/wasm_cpp_04/console_10.png) ## Conclusion Today's post thoroughly describes the intrinsics of working with WASM modules and how to interface with native code using WASI polyfills. Thanks to that knowledge we should have no problems interfacing with any native code and JavaScript! As always all discussed code can be found in my gitlab repositories: - [wasi-sdk](https://gitlab.com/twdev_projects/wasi-sdk) - [wasi-polyfills](https://gitlab.com/twdev_projects/wasi-polyfills) In the next instalment I'm gonna attempt to port a real piece of software to WASM and run it in the browser.