Manipulating /proc files as structured data

2022-06-05 2461 words 12 minutes

Contents

/proc provides essential data about the operating system on which the program is running. Often there’s a need as well to alter system’s configuration using /proc files as an interface. Now, you may ask yourself, why should you care? After all, there are already well established solutions like i.e. procfs or containerd allowing for convenient dealing with proc files, cgroups… and I agree. Sometimes though you just don’t want to suck in a huge dependency like containerd and it’s much easier to write something smaller that is better tailored to specifics of the problem at hand. This is what I’m trying to achieve in this post.

Is `/proc` serialisible?

In other words, can the information in /proc be treated in a well defined, structured way? Kind of but, in general… no. Data in /proc doesn’t follow any uniform schema definition and each /proc file’s format is defined in an ad-hoc manner. Let’s consider a couple of examples:

1
2


$ cat /proc/loadavg
0.11 0.19 0.17 1/381 2081

1
2
3
4
5
6


$ cat /proc/net/dev
Inter-|   Receive                                                |  Transmit
 face |bytes    packets errs drop fifo frame compressed multicast|bytes    packets errs drop fifo colls carrier compressed
    lo:   57837     572    0    0    0     0          0         0    57837     572    0    0    0     0       0          0
enp0s31f6:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
wlp4s0: 6852122    7074    0   97    0     0          0         0   813994    4503    0    0    0     0       0          0

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


$ cat /proc/mounts
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
sys /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
dev /dev devtmpfs rw,nosuid,relatime,size=7874268k,nr_inodes=1968567,mode=755,inode64 0 0
run /run tmpfs rw,nosuid,nodev,relatime,mode=755,inode64 0 0
/dev/mapper/hermes--vg-root / ext4 rw,relatime 0 0
securityfs /sys/kernel/security securityfs rw,nosuid,nodev,noexec,relatime 0 0
tmpfs /dev/shm tmpfs rw,nosuid,nodev,inode64 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
cgroup2 /sys/fs/cgroup cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot 0 0
pstore /sys/fs/pstore pstore rw,nosuid,nodev,noexec,relatime 0 0

1
2
3
4
5
6
7


$ cat /proc/partitions
major minor  #blocks  name

   8        0  175825944 sda
   8        1     248832 sda1
   8        2          1 sda2
   8        5  175574016 sda5

It’s visible that the information in all of these files doesn’t strictly follow any common schema. The only emerging pattern is that, in general, a single line aggregates a single “record” so to speak. Some of these files are noisy. There are headers and empty lines. To add insult to injury, some even differ between different versions of the kernel. Another problem is that in some cases, the format in which the information should be written may be different to the one in which it is read from a given /proc file. One example is:

1
2


$ cat /sys/fs/cgroup/cgroup.subtree_control
memory pids

In this case, when reading, we get a space separated list of controllers but in order to add or remove a controller, its name has to be preceded with either a ‘+’ or a ‘-’ sign like so:

1

echo "+cpu +cpuset -memory" >/sys/fs/cgroup/cgroup.subtree_control

Solution(s)?

Ultimately my goal is to be able to represent the contents of a given /proc file as a native golang data structure that can be marshalled/unmarshaled back and forth to its /proc filesystem counterpart. There are a couple of solutions that come to mind. I’ll try to name a few first.

Regular expressions

The structure of a file can be described with a regular expression. That way it’s easy to marshal a file. It’s difficult to unmarshal the information back though. I could come up with a universal representation of any /proc file i.e:

1
2
3


type Procfile struct {
    Values map[string]string
}

Having a regexp like:

1

pattern := `^(?P<onemin>\S+)\s(?P<fivemin>\S+)\s(?P<fifteenmin>\S+)\s(?P<runnable>\S+)\s(?P<recentpid>\S+)$`

it’d be possible to match /proc/loadavg and build the contents of Values map without any problem at all. What about the other way around? There are things like goregen allowing to rebuild strings from a regular expression, however in this case it just doesn’t feel right. This solution doesn’t seem adequate so, the search continues.

Templates

In essence, the contents of a /proc file can be described with a template. Sticking with the /proc/loadavg as an example, the template that would summarise its structure could look like:

1

{{.OneMin}} {{.FiveMin}} {{.FifteenMin}} {{.Runnable}}/{{.Entities}} {{.MostRecentPid}}

With such template, it’s trivial to unmarshal an in-memory representation back to the /proc/loadavg itself. It’s impossible to marshal it into memory in the first place though. I could complement the template with a regular expression to have the conversion in both ways but that would imply that the file structure is defined twice in two independent ways. Doesn’t look very nice. Different solution is required.

DSL

Domain specific language seems like a perfect solution for this problem. A custom description would be defining schema allowing for both seamless serialisation and deserialisation. This is a bit of a dead end though. The initial premise was to have something simple and having to write a dedicated parser seems a bit over the top. The word “schema” gave me an idea though.

Reflections

What if the struct representing a /proc file would define its schema at the same time? Seems like a perfect solution, doesn’t it? In such case, any given /proc file could be serialised/de-serialised in a similar manner as JSON, XML or YAML using a Decoder/Encoder interface. This sounds pretty interesting. I’m gonna explore that further.

Schema definition

I’ll treat any given proc file as a collection of records. A single line comprises a record. Each record comprises fields. Now, that I’ve defined the parlance, it has to be mapped to actual golang data types. To begin with, I’m gonna represent records as composites (structs) and the fields as fundamental types (int, float, string etc).

Going back to /proc/mounts as an example:

1
2
3
4
5


$ cat /proc/mounts
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
sys /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
dev /dev devtmpfs rw,nosuid,relatime,size=7874268k,nr_inodes=1968567,mode=755,inode64 0 0
...

This file can be represented as:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


// Represents a single record (line) in /proc/mounts
type MountInfo struct {
	Device     string
	MountPoint string
	Filesystem string
	Options    string
	Dumpable   int
	FsckOrder  int
}

// Represents the contents of /proc/mounts
type Mounts []MountInfo

I’ve devised the following interface to act as a parser/decoder:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18


package goproc

type Decoder struct {
	wordScanner *bufio.Scanner
}


func (d *Decoder) Decode(v interface{}) error {
    ...
}

func NewDecoder(r io.Reader) (*Decoder, error) {
	scanner := bufio.NewScanner(r)
	scanner.Split(bufio.ScanWords)
	return &Decoder{
		wordScanner: scanner,
	}, nil
}

I will tokenise the file contents to words. Checks on value conversion will be performed. Ultimately, it’s the golang structure that represents the schema. The input just has to adhere to it. My decoder can be used the following way:

1
2
3
4
5
6
7
8
9


fh, err := os.Open("/proc/mounts")
if err != nil {
    return err
}
defer fh.Close()

mounts := goproc.Mounts{}

goproc.NewDecoder(fh).Decode(&mounts)

Now, mounts will contain the contents of /proc/mounts.

This looks pretty nice but the interesting part is actually the way reflections come into play. To fully understand how reflections work some primer on the topic is needed. I recommend going through The Laws of Reflections on golang’s blog.

First thing to remember is that the values have to be settable, which means that the decoder’s input argument has to be a pointer type. I’ll check that as the first thing:

1
2
3
4
5
6


func (d *Decoder) Decode(v interface{}) error {
	t := reflect.TypeOf(v)
	if t.Kind() != reflect.Ptr {
		return errors.New("expected a pointer type")
	}
    ...

As a reminder, golang’s interface is just a tuple of type and value. These are really two pointers, the first one stores the details about the type, the second points to the concrete value of that type. Russ Cox goes through all the details in his blog post. Reflections provide APIs to operate on both interface’s type and value, that’s it!

Going back to my MountInfo structure, I’ll have to iterate through its fields and assign them values one by one. To begin with, let’s assume we’re dealing with a single record only, and the decoder is invoked like so:

1
2


mountInfo := goproc.MountInfo{}
goproc.NewDecoder(fh).Decode(&mountInfo)

Step by step, here’s what has to be done:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25


func (d *Decoder) Decode(v interface{}) error {
    // Create reflect.Value object representing
    // the value within the `interface`
    // (in my case *goproc.MountInfo)
	pv := reflect.ValueOf(v)

	// dereferenced value (pve now refers to goproc.MountInfo)
	pve := pv.Elem()

    // determine the value's kind
    switch pvek := pve.Kind(); pvek {
        // if it's a struct, then iterate through its fields
        case reflect.Struct:
		for i := 0; i < pve.NumField(); i++ {
            // create struct's field value reflection
			fv := pve.Field(i)

            ...
            // assign value to the field
            // fv.SetInt(123)
            // fv.SetString("hello world")
		}

    }
    ...

To actually assign any value to the field, I’d need to know its type to call an appropriate function like SetInt, SetFloat etc. This looks very similar to what I already have - the problem is recursive. The switch statement just has to be extended with more types that I wish to support.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34


func (d *Decoder) Decode(v interface{}) error {
    // Create reflect.Value object representing the value within the `interface` (in my case *goproc.MountInfo)
	pv := reflect.ValueOf(v)

	// dereferenced value (pve now refers to goproc.MountInfo)
	pve := pv.Elem()

    // determine the value's kind
    switch pvek := pve.Kind(); pvek {

        case reflect.Int:
            pve.SetInt(123)

        case reflect.String:
            pve.SetString("hello world")

        case reflect.Float64:
            pve.SetFloat(3.14)

        // if it's a struct, then iterate through its fields
        case reflect.Struct:
		for i := 0; i < pve.NumField(); i++ {
            // create struct's field value reflection
			fv := pve.Field(i)

            // Convert value reflection to pointer to value reflection after
            // which convert to an interface.
            d.Decode(fv.Addr().Interface())
		}

    }

    return nil
}

Recursion allows to greatly simplify the code and avoid duplication. This simple decoder is capable of assigning values to any type of structure, regardless of its layout (provided of course that it’s comprised of fields that the Decoder has support for). Here’s an example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


	a := struct {
		X string
		Y string
		Z int
	}{}

	b := struct {
		X int
		Y float64
	}{}

	d.Decode(&a)
	d.Decode(&b)

	fmt.Println(a)
	fmt.Println(b)

This will produce:

1
2


{hello world hello world 123}
{123 3.14}

At this stage, it’s just a manner of extending this skeleton implementation to handle the rest of required types (specifically slices). Here’s the complete code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86


func (d *Decoder) Decode(v interface{}) error {
	t := reflect.TypeOf(v)
	if t.Kind() != reflect.Ptr {
		return errors.New("expected a pointer type")
	}

	// dereferenced type
	tv := t.Elem()

	// pointer to value
	pv := reflect.ValueOf(v)

	// dereferenced value
	pve := pv.Elem()

	// fetch next field value if needed
	fieldValue, err := d.scanNext(pve.Kind())
	if err != nil {
		return err
	}

	// determine the pointer type
	switch pvek := pve.Kind(); pvek {
	case reflect.Slice:
		elemType := tv.Elem()
		for {
			elem := reflect.New(elemType)
			if err := d.Decode(elem.Interface()); err != nil {
				if err == io.EOF {
					break
				}
				return err
			}
			pve.Set(reflect.Append(pve, elem.Elem()))
		}

	case reflect.Float32:
		fallthrough

	case reflect.Float64:
		if err := assignFloat(pve, fieldValue, pvek); err != nil {
			return err
		}

	case reflect.Int:
		intVal, err := strconv.ParseInt(fieldValue, 10, 64)
		if err != nil {
			return err
		}
		pve.SetInt(intVal)

	case reflect.String:
		pve.SetString(fieldValue)

	case reflect.Struct:
	FieldLoop:
		for i := 0; i < pve.NumField(); i++ {
			fv := pve.Field(i)
			ft := tv.Field(i)

			sf := reflect.StructField(ft)
			tagValue, ok := sf.Tag.Lookup("goproc")
			if ok {
				skipField, err := d.handleOmitTag(tagValue)
				if err != nil {
					return err
				}

				if skipField {
					// proceed to next field
					continue FieldLoop
				}
			}

			// recursively descend and assign value to field
			if err := d.Decode(fv.Addr().Interface()); err != nil {
				return err
			}
		}

	default:
		return ErrUnsupportedFieldType
	} // switch

	return nil
}

Encoding

The process of encoding back to text representation is quite similar to decoding itself. Using reflections, I’ll iterate over struct’s fields and push their string representation to a provided writer. Since, as I mentioned, the encoding and decoding processes are substantially similar without any extra novelties, I won’t go into details of this implementation aside from presenting it in its full glory:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44


func (d *Encoder) Encode(v interface{}) error {
	// pointer to value
	pv := reflect.ValueOf(v)

	// dereferenced value
	pve := pv.Elem()

	switch pvek := pve.Kind(); pvek {
	case reflect.Slice:
		for i := 0; i < pve.Len(); i++ {
			fv := pve.Index(i)
			if err := d.Encode(fv.Addr().Interface()); err != nil {
				return err
			}
		}

	case reflect.Int:
		d.w.WriteString(strconv.FormatInt(pve.Int(), 10))

	case reflect.Float32:
		fallthrough
	case reflect.Float64:
		d.w.WriteString(strconv.FormatFloat(pve.Float(), 'f', 10, 64))

	case reflect.String:
		d.w.WriteString(pve.String())

	case reflect.Struct:
		for i := 0; i < pve.NumField(); i++ {
			fv := pve.Field(i)

			if err := d.Encode(fv.Addr().Interface()); err != nil {
				return err
			}

			if i < pve.NumField()-1 {
				d.w.WriteRune(' ')
			}
		}
		d.w.WriteRune('\n')
	}

	return d.w.Flush()
}

Practicality

The practicality of encoding the native representation back to its text form is rarely useful. In most cases, the /proc files are read-only so, most value lies in the ability to quickly decode them. There’s value with writable files like i.e. cgroups controllers since these are both readable and writable most of the times. For these, the presented pair of encoder/decoder allows for a clean code composition minimising the number of manual IO operations done on /proc filesystem.

Code repository

A more complete module implementation can be found on my gitlab. Although, at the time of writing, the presented module is in its infancy I plan to use it myself in personal projects so, further development can be expected.