Using binfmt_misc and docker for multi-platform builds

2024-01-05 1798 words 9 minutes

Contents

In this post I’m gonna discuss how to use binfmt_misc to build and run non-native docker images that you can build on your PC and deploy to your target machine like e.g. raspberry pi.

What’s binfmt_misc?

binfmt_misc stands for miscellaneous binary formats and in short, it allows to run non-native binaries (through the help of a format associated interpreter) on the host system just as they were native.

Support

Support in the kernel is available since version 2.1.43 so, literally all modern systems should support it. On debian, binfmt_misc is enabled as a module.

1
2


$ lsmod | grep binfmt
binfmt_misc            24576  1

Additionally, you need binfmt_misc filesystem mounted.

1
2
3


$ findmnt binfmt_misc
TARGET                   SOURCE      FSTYPE      OPTIONS
/proc/sys/fs/binfmt_misc binfmt_misc binfmt_misc rw,nosuid,nodev,noexec,relatime

On debian, this is handled by systemd.

1
2
3


  proc-sys-fs-binfmt_misc.automount loaded    active   running   Arbitrary Executable File Formats File System Automount Point
  proc-sys-fs-binfmt_misc.mount     loaded    active   mounted   Arbitrary Executable File Formats File System
  systemd-binfmt.service            loaded    active   exited    Set Up Additional Binary Formats

Examples

Let’s get a better gist of what binfmt_misc really is, with a couple of examples.

Python bytecode

Let’s start with some basic script compiled to python bytecode.

1
2


$ echo "print('hello from python')" >hello.py
$ python3 -m py_compile hello.py

This produces:

1
2


__pycache__/
└── hello.cpython-311.pyc

Which can be executed using python interpreter

1
2
3
4
5
6
7


$ python3 __pycache__/hello.cpython-311.pyc 
hello from python

$ file __pycache__/hello.cpython-311.pyc 
__pycache__/hello.cpython-311.pyc: Byte-compiled Python module for CPython
3.11, timestamp-based, .py timestamp: Sat Jan  6 13:45:27 2024 UTC, .py size:
27 bytes

No surprises at all. Of course, direct execution is impossible as the kernel simply doesn’t know what to do with this format.

1
2
3
4


$ __pycache__/hello.cpython-311.pyc
__pycache__/hello.cpython-311.pyc: line 1: $'\247\r\r': command not found
__pycache__/hello.cpython-311.pyc: line 2: syntax error near unexpected token `)'
__pycache__/hello.cpython-311.pyc: line 2: `wYeddS)zhello from pythonN)prinhello.p<module>rsr'

But, python can be registered as an interpreter for .pyc files to allow direct execution. Following the documentation for binfmt_misc, I’m just gonna register all files with .pyc extension to be interpreted with python. To do this, I need a simple extension matching rule.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


$ echo :python-pyc:E:0:pyc::/usr/bin/python3: >/proc/sys/fs/binfmt_misc/register 
$ ls -l /proc/sys/fs/binfmt_misc/
total 0
-rw-r--r-- 1 root root 0 Jan  6 14:04 python-pyc
--w------- 1 root root 0 Jan  6 14:04 register
-rw-r--r-- 1 root root 0 Jan  6 13:35 status
$ cat /proc/sys/fs/binfmt_misc/python-pyc 
enabled
interpreter /usr/bin/python3
flags: 
extension .pyc

With the above rule, kernel will invoke /usr/bin/python3 when attempting to directly execute any pyc file.

1
2


$ ./__pycache__/hello.cpython-311.pyc
hello from python

Lua bytecode

Just for fun, I’m gonna register a “magic” matcher for Lua bytecode. Similarly as with python, I’m gonna create a trivial test program:

1
2
3


$ echo "print 'hello from lua'" | luac -
$ file luac.out 
luac.out: Lua bytecode, version 5.4

With no binfmt_misc rules, the execution is impossible as expected:

1
2
3


$ chmod +x luac.out
$ ./luac.out 
-bash: ./luac.out: cannot execute binary file: Exec format error

Magic matching, will attempt to search for a pattern within the binary to choose an interpreter. For that I need a reliable pattern. I’ve found lua52vm bytecode description document, which mentions “Lua signature” to be “1b 4c 75 61”. This seems to match the contents of the binary as inspected by hexdump so I’ll use that. Here’s the rule.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


$ echo ':lua-luac:M:0:\x1b\x4c\x75\x61::/usr/bin/lua:' >/proc/sys/fs/binfmt_misc/register 

$ cat /proc/sys/fs/binfmt_misc/lua-luac 
enabled
interpreter /usr/bin/lua
flags: 
offset 0
magic 1b4c7561

$ ./luac.out 
hello from lua

Non-native binaries

binfmt_misc really shines when combined with qemu and docker to allow execution of non-native code from other platforms like e.g. ARM.

First, let’s start with the hello-world docker image we all know.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


$ docker run -it hello-world     
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
c1ec31eb5944: Pull complete 
Digest: sha256:ac69084025c660510933cca701f615283cdbb3aa0963188770b54c31c8962493
Status: Downloaded newer image for hello-world:latest

Hello from Docker!
This message shows that your installation appears to be working correctly.
...

No problems at all. Let’s try to run the same image for linux/arm/v7 (which is raspberry pi 3).

1
2
3
4
5
6
7


$ docker run --platform linux/arm/v7 -it hello-world
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
2d3d56bba6ab: Pull complete 
Digest: sha256:ac69084025c660510933cca701f615283cdbb3aa0963188770b54c31c8962493
Status: Downloaded newer image for hello-world:latest
exec /hello: exec format error

No surprises at all. Let’s install qemu and some setup tools first.

1

sudo apt install binfmt-support qemu-user-static

binfmt-support provides some tools to make format registration and management easier.

Installation involves systemd setup. The services are enabled by default and on start-up by default they should register qemu for all supported platform (including ARM - which is of interest here).

1
2
3
4
5


$ sudo update-binfmts --display | grep arm
qemu-arm (enabled):
 interpreter = /usr/libexec/qemu-binfmt/arm-binfmt-P
qemu-armeb (enabled):
 interpreter = /usr/libexec/qemu-binfmt/armeb-binfmt-P

binfmt-support adds a systemd unit that makes the configuration persistent across reboots. The config files are stored under /usr/lib/binfmt.d/ and read on boot.

With the above, it’s now possible to run armv7 binaries directly on our system (through qemu emulation).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


$ docker run --platform linux/arm/v7 -it hello-world

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (arm32v7)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.
...

How is that useful?

In some situations, it might unblock the development completely as the target platform may simply be inadequate to build its own code (i.e. not enough RAM, too slow, etc).

Here’s a real life example.

Pydantic project

I’m currently working on a project that uses pydantic. This package requires pydantic-core - which requires some native libraries to be built. My destination platform is raspberry pi 3 (armv7l) - it takes ages to build pydantic-core on rpi3 - and often it will just fail due to lack of memory. I’ve been mounting swap files to make the build work but this is really clunky. Docker multi platform setup solves that problem completely.

Why not use pre-built binaries?

I’ve run into a situation when some of my dependencies simply don’t provide binaries at all. One of them is markupsafe. As you can see here there’s no pre-built armv7 package.

This is a problem. When using pip with a custom platform, you have to explicitly define --only-binary=:all: or --only-binary=:none. Here’s a simple requirements file to prove that.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20


$ cat requirements.txt
markupsafe==2.1.3
pydantic-core==2.14.6
pydantic==2.5.3

$ pip install \
    --platform manylinux_2_17_armv7l \
    -r requirements.txt 
ERROR: When restricting platform and interpreter constraints using
--python-version, --platform, --abi, or --implementation, either --no-deps must
be set, or --only-binary=:all: must be set and --no-binary must not be set (or
must be set to :none:).

$ pip install \
    --target /tmp/project \
    --platform manylinux_2_17_armv7l \
    --only-binary=:all: \
    -r requirements.txt 
ERROR: Could not find a version that satisfies the requirement markupsafe==2.1.3 (from versions: none)
ERROR: No matching distribution found for markupsafe==2.1.3

Using pip with a custom platform is therefore not an option as sooner or later you’ll run into a dependency problem like demonstrated above (unless you want to run your own python pip repository and host pre-built packages for all your dependencies).

How to use docker multi platform build?

Let’s start with an example Dockerfile

1
2
3
4
5
6
7
8
9


FROM python:3.11-bookworm

WORKDIR /usr/src
COPY requirements.txt .
COPY hello.py .

RUN pip install -r requirements.txt

CMD ["python3", "hello.py"]

I’m using the requirements.txt which I already discussed, containing pydantic and markupsafe as dependencies. The hello.py script is trivial as it’s not really important here.

1
2
3
4
5
6
7


from pydantic import BaseModel

class Hello(BaseModel):
    msg: str

hello_msg = Hello(msg="hello from pydantic")
print(hello_msg)

After installing qemu and setting up binfmt_misc, the default docker builder should support emulated platforms as well. This can be checked with the following command.

1
2
3
4
5
6
7


$ docker buildx ls
NAME/NODE DRIVER/ENDPOINT STATUS  BUILDKIT             PLATFORMS
default * docker                                       
  default default         running v0.11.7+d3e6c1360f6e linux/amd64,
linux/amd64/v2, linux/386, linux/arm64, linux/riscv64, linux/ppc64,
linux/ppc64le, linux/s390x, linux/mips64le, linux/mips64, linux/arm/v7,
linux/arm/v6

In case your platform is missing, there’s a convenient setup image that will configure all binfmt_misc rules:

1

docker run --rm --privileged multiarch/qemu-user-static --reset -p yes

I can now build an image for rpi3 on my development machine.

1

$ docker build --platform linux/arm/v7 -t hello:latest .

I can dump this image and transfer to my rpi like so:

1
2
3
4
5


$ docker save -o hello.tar hello:latest
$ scp hello.tar user@rpi:~

# on rpi
$ docker load -i hello.tar

It runs without problems:

1
2
3
4
5


$ docker run -it hello:latest
msg='hello from pydantic'

$ docker run -it hello:latest uname -a
Linux 48f884229047 6.1.0-rpi4-rpi-v7 #1 SMP Raspbian 1:6.1.54-1+rpt2 (2023-10-05) armv7l GNU/Linux

Platform specific stages in Dockerfile

Additionally, stages within Dockerfile can be configured to run on a specific platform. Here’s a slightly modified Dockerfile

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


FROM --platform=linux/amd64 python:3.11-bookworm AS builder

WORKDIR /usr/src
RUN python3 -m venv venv
RUN . venv/bin/activate
RUN pip install pydantic markupsafe
RUN pip freeze > requirements.txt

FROM --platform=linux/arm/v7 python:3.11-slim AS runner

COPY --from=builder /usr/src/requirements.txt .
COPY hello.py .

RUN pip install -r requirements.txt

CMD ["python3", "hello.py"]

builder will run on linux/amd64. Commands in runner will execute via qemu emulation on linux/arm/v7.

Platforms don’t have to be hard-coded like that. There are two special variables available $BUILDPLATFORM and $TARGETPLATFORM. These will be populated accordingly when invoking docker build with a custom platform. The above example Dockerfile can be modified to take advantage of that.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


FROM --platform=$BUILDPLATFORM python:3.11-bookworm AS builder

WORKDIR /usr/src
RUN python3 -m venv venv
RUN . venv/bin/activate
RUN pip install pydantic markupsafe
RUN pip freeze > requirements.txt

FROM --platform=$TARGETPLATFORM python:3.11-slim AS runner

COPY --from=builder /usr/src/requirements.txt .
COPY hello.py .

RUN pip install -r requirements.txt

CMD ["python3", "hello.py"]

… and build:

1

docker build --platform linux/arm/v7 .

More information is available in docker documentation for multi-platform builds.

Docker + Qemu + binfmt_misc disadvantages

There’s mainly one. It’s slow - as it’s effectively like a virtual machine. Each command runs within its own emulated context.