Manipulating /proc files as structured data
/proc
provides essential data about the operating system on which the program
is running. Often there’s a need as well to alter system’s configuration using
/proc
files as an interface. Now, you may ask yourself, why should you care?
After all, there are already well established solutions like i.e.
procfs or
containerd allowing for convenient
dealing with proc files, cgroups… and I agree. Sometimes though you just
don’t want to suck in a huge dependency like containerd
and it’s much easier
to write something smaller that is better tailored to specifics of the problem
at hand. This is what I’m trying to achieve in this post.
Is /proc
serialisible?
In other words, can the information in /proc
be treated in a well defined,
structured way? Kind of but, in general… no. Data in /proc
doesn’t follow
any uniform schema definition and each /proc
file’s format is defined in an
ad-hoc manner. Let’s consider a couple of examples:
|
|
|
|
|
|
|
|
It’s visible that the information in all of these files doesn’t strictly follow
any common schema. The only emerging pattern is that, in general, a single line
aggregates a single “record” so to speak. Some of these files are noisy.
There are headers and empty lines. To add insult to injury, some even differ
between different versions of the kernel. Another problem is that in some
cases, the format in which the information should be written may be different
to the one in which it is read from a given /proc
file. One example is:
|
|
In this case, when reading, we get a space separated list of controllers but in order to add or remove a controller, its name has to be preceded with either a ‘+’ or a ‘-’ sign like so:
|
|
Solution(s)?
Ultimately my goal is to be able to represent the contents of a given /proc
file as a native golang data structure that can be marshalled/unmarshaled back
and forth to its /proc
filesystem counterpart. There are a couple of
solutions that come to mind. I’ll try to name a few first.
Regular expressions
The structure of a file can be described with a regular expression. That way
it’s easy to marshal a file. It’s difficult to unmarshal the information back
though. I could come up with a universal representation of any /proc
file i.e:
|
|
Having a regexp like:
|
|
it’d be possible to match /proc/loadavg
and build the contents of Values
map without any problem at all. What about the other way around? There are
things like goregen
allowing to rebuild strings from a regular expression, however in this case it
just doesn’t feel right. This solution doesn’t seem adequate so, the search
continues.
Templates
In essence, the contents of a /proc
file can be described with a template.
Sticking with the /proc/loadavg
as an example, the template that would
summarise its structure could look like:
|
|
With such template, it’s trivial to unmarshal an in-memory representation back
to the /proc/loadavg
itself. It’s impossible to marshal it into memory in
the first place though. I could complement the template with a regular
expression to have the conversion in both ways but that would imply that the
file structure is defined twice in two independent ways. Doesn’t look very
nice. Different solution is required.
DSL
Domain specific language seems like a perfect solution for this problem. A custom description would be defining schema allowing for both seamless serialisation and deserialisation. This is a bit of a dead end though. The initial premise was to have something simple and having to write a dedicated parser seems a bit over the top. The word “schema” gave me an idea though.
Reflections
What if the struct representing a /proc
file would define its schema at the
same time? Seems like a perfect solution, doesn’t it? In such case, any given
/proc
file could be serialised/de-serialised in a similar manner as JSON
,
XML
or YAML
using a Decoder
/Encoder
interface. This sounds pretty
interesting. I’m gonna explore that further.
Schema definition
I’ll treat any given proc file as a collection of records. A single line
comprises a record. Each record comprises fields. Now, that I’ve defined
the parlance, it has to be mapped to actual golang data types. To begin with,
I’m gonna represent records as composites (structs
) and the
fields as fundamental types (int
, float
, string
etc).
Going back to /proc/mounts
as an example:
|
|
This file can be represented as:
|
|
I’ve devised the following interface to act as a parser/decoder:
|
|
I will tokenise the file contents to words. Checks on value conversion will be performed. Ultimately, it’s the golang structure that represents the schema. The input just has to adhere to it. My decoder can be used the following way:
|
|
Now, mounts
will contain the contents of /proc/mounts
.
This looks pretty nice but the interesting part is actually the way reflections come into play. To fully understand how reflections work some primer on the topic is needed. I recommend going through The Laws of Reflections on golang’s blog.
First thing to remember is that the values have to be settable, which means that the decoder’s input argument has to be a pointer type. I’ll check that as the first thing:
|
|
As a reminder, golang’s interface
is just a tuple of type and value. These
are really two pointers, the first one stores the details about the type, the
second points to the concrete value of that type. Russ Cox goes through all
the details in his blog post.
Reflections provide APIs to operate on both interface
’s type and value,
that’s it!
Going back to my MountInfo
structure, I’ll have to iterate through its fields
and assign them values one by one. To begin with, let’s assume we’re dealing
with a single record only, and the decoder is invoked like so:
|
|
Step by step, here’s what has to be done:
|
|
To actually assign any value to the field, I’d need to know its type to call an
appropriate function like SetInt
, SetFloat
etc. This looks very similar to
what I already have - the problem is recursive. The switch
statement just
has to be extended with more types that I wish to support.
|
|
Recursion allows to greatly simplify the code and avoid duplication. This simple decoder is capable of assigning values to any type of structure, regardless of its layout (provided of course that it’s comprised of fields that the Decoder has support for). Here’s an example:
|
|
This will produce:
|
|
At this stage, it’s just a manner of extending this skeleton implementation to handle the rest of required types (specifically slices). Here’s the complete code:
|
|
Encoding
The process of encoding back to text representation is quite similar to decoding itself. Using reflections, I’ll iterate over struct’s fields and push their string representation to a provided writer. Since, as I mentioned, the encoding and decoding processes are substantially similar without any extra novelties, I won’t go into details of this implementation aside from presenting it in its full glory:
|
|
Practicality
The practicality of encoding the native representation back to its text form is
rarely useful. In most cases, the /proc
files are read-only so, most value
lies in the ability to quickly decode them. There’s value with writable files
like i.e. cgroups controllers since these are both readable and writable most
of the times. For these, the presented pair of encoder/decoder allows for a
clean code composition minimising the number of manual IO operations done on
/proc
filesystem.
Code repository
A more complete module implementation can be found on my gitlab. Although, at the time of writing, the presented module is in its infancy I plan to use it myself in personal projects so, further development can be expected.