Contents

#6 WebAssembly and C++: Bridging native code and asynchronous JavaScript

This post is part of a WebAssembly series focused on WASM and C++. The goal is to gain a thorough understanding of how WebAssembly works, how to use it as a compilation target for C++ code and hopefully have fun along the way. So, stick with me for this exciting journey.

Recap

In the previous post, I’ve compiled Lua interpreter to WASM using emscripten and successfully run it using node. There was a problem running the same code in the browser as blocking IO is not possible. Today I’m gonna try to address this issue and run Lua interpreter in the browser.

There are links to compiled WASM demos in this post so, you can test the code yourself.

Problem definition

Lua interpreter is a REPL running in a tight while loop which is blocked on a call to fgets most of the time - synchronously waiting for input.

Looking at the source code, there’s a pushline function:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
static int pushline (lua_State *L, int firstline) {
  char buffer[LUA_MAXINPUT];
  char *b = buffer;
  size_t l;
  const char *prmt = get_prompt(L, firstline);
  int readstatus = lua_readline(L, b, prmt);
  if (readstatus == 0)
    return 0;  /* no input (prompt will be popped by caller) */
  ...
  return 1;
}

This function is called in loadline and the latter is called in a tight while loop:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
static void doREPL (lua_State *L) {
  int status;
  const char *oldprogname = progname;
  progname = NULL;  /* no 'progname' on errors in interactive mode */
  lua_initreadline(L);
  while ((status = loadline(L)) != -1) {
    if (status == LUA_OK)
      status = docall(L, 0, LUA_MULTRET);
    if (status == LUA_OK) l_print(L);
    else report(L, status);
  }
  lua_settop(L, 0);  /* clear stack */
  lua_writeline();
  progname = oldprogname;
}

lua_readline is a macro which results in a call to fgets or readline - depending on the platform.

To port Lua to WASM and be able to run it in the browser, this synchronous wait has to be replaced with either polling or an asynchronous approach.

Limitations

When working with WASM and JavaScript, the most fundamental principle is that we can’t block JavaScript’s thread. WASM functions can’t run busy loops. Cooperative scheduling has to be implemented in WASM module, which means that when you want to wait for something, the control has to be relinquished back to JavaScript.

Experiments

Let’s put Lua to a side for a moment and focus on the mechanics and interoperability between WASM and JavaScript to better formulate the approach.

I’m gonna start with a toy project containing a model of Lua’s REPL. Once I’m able to run it in the browser, the approach to porting Lua (or, in fact, anything else) will be obvious.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
#include <cstdio>
#include <cstring>

int main(int argc, const char *argv[]) {
  printf("I'm an echo\n");

  constexpr std::size_t MAXBUF = 1024;
  char buffer[MAXBUF];

  while (true) {
    fgets(buffer, MAXBUF, stdin);
    printf("Echo: %s\n", buffer);
  }

  return 0;
}

The above is a simplified model of a REPL. There’s blocking wait on fgets for a new line of input, the “processing”, which in this case is just print back of the input. I’m gonna define a simple Makefile to help build this code:

1
2
3
4
5
6
7
8
9
.PHONY: clean

all: repl.js

repl.js: repl.cpp
	$(CXX) $(CXXFLAGS) -o $@ $< -sWASM=1

clean:
	rm -fv repl.js repl.wasm

With the Makefile in hand, it’s possible to compile repl.cpp to WASM with just

1
$ emmake make

Let’s create a simple index.html file to load the module.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
<!doctype html>
<html>
    <head></head>
    <body>
        <h1>REPL</h1>
        <div id="repl">
            <p>Output:</p>
            <textarea 
                id="output" 
                readonly 
                style="width: 80%; height: 10em"></textarea>

            <p>Input:</p>
            <textarea 
                id="input" 
                placeholder="Enter your input here" 
                style="width: 80%; height: 10em"></textarea>
        </div>
        <script src="repl.js"></script>
    </body>
</html>

Don’t mind the rudimentary styling, it’s just to make the text fields bigger.

/wasm/wasm_cpp_06/app_model.png

Pointing the browser to this document (remember to use http server) greets us with an input prompt - that’s the default implementation emscripten provides for fgets. After hitting cancel, the page becomes unresponsive and it’s visible in the JavaScript console that we’re just spinning indefinitely in the while loop.

/wasm/wasm_cpp_06/input_prompt.png

/wasm/wasm_cpp_06/busy_loop.png

emscripten wraps the requestAnimationFrame API to provide a way to break busy loops like this. It’s called emscripten_set_main_loop. Let’s use that in the WASM module.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#include <cstdio>
#include <cstring>

#ifdef __EMSCRIPTEN__
#include <emscripten.h>
#endif

void doRepl() {
  constexpr std::size_t MAXBUF = 1024;
  char buffer[MAXBUF];

  fgets(buffer, MAXBUF, stdin);
  printf("Echo: %s\n", buffer);
}

int main(int argc, const char *argv[]) {
  printf("I'm an echo\n");

#ifdef __EMSCRIPTEN__
  int fps = 30;
  int simulate_infinite_loop = 1;
  emscripten_set_main_loop(doRepl, fps, simulate_infinite_loop);
#else
  while (true) {
    doRepl();
  }
#endif

  return 0;
}

No more busy loops for WASM target! After reloading the page, the browser no longer hangs. Now comes the implementation of fgets.

Polling

First, let’s add a callback to input textarea.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
const input_element = document.getElementById('input');

input_element.addEventListener('keypress', (e) => {
    if (e.key === 'Enter') {
        e.preventDefault();
        Module.pending_input.push('\n'.charCodeAt(0));
        input_element.value = '';
    } else {
        Module.pending_input.push(e.key.charCodeAt(0));
    }
});

I’m gonna save the above as index.js. I’m referring to Module here. This is a global which I’m gonna create in module.js:

1
2
3
var Module = {
  pending_input : [],
};

This extends the environment for WASM module since, the JavaScript wrapper that emscripten generates, picks up the Module if it already exists:

1
2
3
...
var Module = typeof Module != 'undefined' ? Module : {};
...

The last thing is to include all these scripts in the HTML file.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
...
            <textarea 
                id="input" 
                placeholder="Enter your input here" 
                style="width: 80%; height: 10em"></textarea>
        </div>
        <script src="module.js"></script>
        <script src="index.js"></script>
        <script src="repl.js"></script>
...

A quick test reveals that the character collection works as visible below.

/wasm/wasm_cpp_06/pending_input.png

Now, it’s time for fgets implementation in C++.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#ifdef __EMSCRIPTEN__

EM_JS(int, js_getchar, (), { 
    if (Module.pending_input.length == 0) {
      return 0;
    }
    return Module.pending_input.shift();
});

char *em_fgets(char *buf, std::size_t size, FILE *stream) {
  while (true) {     
    if (int c = js_getchar()) {
      if (c == '\n') {           
        *buf = '\0';                       
        return buf;                                                                         
      } else {                                                                              
        if (size == 1) {
          *buf = '\0';
          return buf;
        }
        *buf++ = c;
        size--;
      }                              
      continue;
    }
    emscripten_sleep(100);                                                                                                                                                              
  }

  return NULL;
}

#endif

First, a cool feature that emscripten offers, EM_JS macro generates an extern symbol in C++ and automatically adds the JavaScript function to the WASM environment. I’m using it to create the other end of the input FIFO - js_getchar drains pending_input array if there’s any characters available.

em_fgets is just a simple loop that glues the characters together to assemble the string. The important bit is a call to emscripten_sleep - this yields the control back to JavaScript so, in fact, the loop is not a tight locked loop but broken every iteration with a call to emscripten_sleep. This is cool as to C++ it looks like a synchronous call while it’s actually a form of coroutine. Small modifications to the doRepl are required as well:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
void doRepl() {
  ...

#ifdef __EMSCRIPTEN__
  p = em_fgets(buffer, MAXBUF, stdin);
#else
  p = fgets(buffer, MAXBUF, stdin);
#endif
  ...
}

There’s one more thing. Instead of printing to JavaScript console, it would be nice to append output to the textarea that I specifically created for that purpose. To do that, I’m gonna implement another override in the Module object.

1
2
3
4
5
6
7
var Module = {
  ...
  print : (text) => {
    const output_element = document.getElementById('output');
    output_element.textContent += text + '\n';
  },
};

It’s time to test the code. After recompiling and reloading everything, it’s visible that the REPL is working. WASM is polling for input every 100ms.

/wasm/wasm_cpp_06/polling.png

async

Polling is a viable option but it’s not preferred. It’s suboptimal, introduces input lag and unnecessary overhead. It’s better to use a fully asynchronous approach instead.

I’m gonna reimplement fgets one more time but now, it’s gonna be an asynchronous JavaScript function. This is possible thanks to ASYNCIFY. Within repl.cpp I’m gonna define em_fgets the following way.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
EM_ASYNC_JS(char *, em_fgets, (const char* buf, size_t bufsize), {
  return await new Promise((resolve, reject) => {
      if (Module.pending_lines.length > 0) {
        resolve(Module.pending_lines.shift());
      } else {
        Module.pending_fgets.push(resolve);
      }
  }).then((s) => {
      // convert JS string to WASM string
      let l = s.length + 1;
      if (l >= bufsize) {
        // truncate
        l = bufsize - 1;
      }
      Module.stringToUTF8(s.slice(0, l), buf, l);
      return buf;
  });
});

em_fgets will be blocked, waiting for a Promise. This promise is only gonna be completed if a full line of text is collected from input. This line might be available straight away in Module.pending_lines or we might have to wait for it. In case of the latter, the function that resolves the promise is pushed to Module.pending_fgets array. Additionally, I have a continuation on the string value. The JavaScript string has to be copied to WASM memory. This is something I’ve already discussed in part #3 of this series; thankfully, emscripten provides a function to perform that conversion for us.

You might’ve noticed that I’ve dropped the FILE* parameter from the function signature. That’s just to simplify the code as for the purpose of this use case it’s completely superfluous.

To make the em_fgets work, input collection code has to be modified as well

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
const input_element = document.getElementById('input');

input_element.addEventListener('keypress', function(e) {
  const isEnter = e.key === 'Enter';

  if (isEnter) {
    e.preventDefault();
    Module.pending_lines.push(Module.pending_chars.join(''));
    Module.pending_chars = [];
    input_element.value = '';
  } else {
    Module.pending_chars.push(e.key);
  }

  if (Module.pending_fgets.length > 0 && Module.pending_lines.length > 0) {
    let resolver = Module.pending_fgets.shift();
    resolver(Module.pending_lines.shift());
  }
});

This code just collects the characters in pending_chars. Once it sees a newline it flushes the pending_chars array as a string to pending_lines. If there’s data in pending_lines and there is at least one Promise resolver in pending_fgets, it will be called with the collected input. Of course, the additional arrays (pending_lines, pending_fgets) have to be added to the global Module definition.

There’s a small change to the Makefile required as well. stringToUTF8 has to be explicitly exposed to be visible to WASM:

1
2
3
4
5
6
.PHONY: clean

all: repl.js

repl.js: repl.cpp
        $(CXX) $(CXXFLAGS) -sWASM=1 -sASYNCIFY -sEXPORTED_RUNTIME_METHODS=stringToUTF8 -o $@ $<

That’s it!

Discussed example code can be found here.

Additionally, live demo is available as well.

Back to Lua

Right! To make things a bit more convenient I’m gonna switch to Lua’s git repo.

I’m gonna work on tw/wasm branch. The plan is to replace readline with code implemented in JavaScript - as previously with fgets.

Changes in makefile are minimal. I’ve removed the compiler being hardcoded to gcc and basically just added required emscripten defines - nothing more than that.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
--- a/makefile
+++ b/makefile
@@ -72,13 +72,12 @@ LOCAL = $(TESTS) $(CWARNS)
 # enable Linux goodies
-MYLIBS= -ldl -lreadline
+MYLIBS= -ldl -sASYNCIFY -sEXPORTED_RUNTIME_METHODS=stringToUTF8
 
 
-CC= gcc
-CFLAGS= -Wall -O2 $(MYCFLAGS) -fno-stack-protector -fno-common -march=native
-AR= ar rc
-RANLIB= ranlib
+CFLAGS= -Wall -O2 $(MYCFLAGS) -fno-stack-protector -fno-common
+AR= emar rc
+RANLIB= emranlib
 RM= rm -f
 
 
@@ -96,7 +95,7 @@ AUX_O=        lauxlib.o
 LIB_O= lbaselib.o ldblib.o liolib.o lmathlib.o loslib.o ltablib.o lstrlib.o \
        lutf8lib.o loadlib.o lcorolib.o linit.o
 
-LUA_T= lua
+LUA_T= lua.js

Changes in lua.c are limited as well

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
+#ifdef __EMSCRIPTEN__
+#include <emscripten.h>
+
+EM_ASYNC_JS(char *, em_fgets, (const char* buf, size_t bufsize), {
+  return await new Promise((resolve, reject) => {
+      if (Module.pending_lines.length > 0) {
+        resolve(Module.pending_lines.shift());
+      } else {
+        Module.pending_fgets.push(resolve);
+      }
+  }).then((s) => {
+      // convert JS string to WASM string
+      let l = s.length + 1;
+      if (l >= bufsize) {
+        // truncate
+        l = bufsize - 1;
+      }
+      Module.stringToUTF8(s.slice(0, l), buf, l);
+      return buf;
+  });
+});
+
+static char* readline(const char* prompt) {
+    char* buf = malloc(LUA_MAXINPUT);
+    em_fgets(buf, LUA_MAXINPUT);
+    return buf;
+}
+
+#define lua_initreadline(L) ((void)L)
+#define lua_readline(L,b,p)  ((void)L, ((b)=readline(p)) != NULL)
+#define lua_saveline(L,line) ((void)L)
+#define lua_freeline(L,b) ((void)L, free(b))
+
+#else
 #include <readline/readline.h>
 #include <readline/history.h>
+
 #define lua_initreadline(L)    ((void)L, rl_readline_name="lua")
 #define lua_readline(L,b,p)    ((void)L, ((b)=readline(p)) != NULL)
 #define lua_saveline(L,line)   ((void)L, add_history(line))
 #define lua_freeline(L,b)      ((void)L, free(b))
 
+#endif
+

This is a one-to-one copy of the function I’ve already implemented in the toy project.

With all of that in place, it’s possible to just

emmake make

That’s it! The supporting files that I’ve used in the toy project can be used without any additional changes to run the application.

/wasm/wasm_cpp_06/lua_repl.png

Fork of lua repo is available here.

Working demo is available here.

Integration with xterm.js

These simple HTML text fields are cool as a starter but I really wanted to integrate the REPL with xterm.js. Long story short, I had to slightly customise the build configuration to force emscripten to spew out ES6 compatible modules. The repository is available here.

Below is a working integration for you to enjoy.