A Raspberry PI Concourse Worker

Hey,

I’ve recently bought a Raspberry Pi 3B+, and that seemed like a great target for a Concourse worker run at.

concourse cluster with the raspberry pi included

It turns out that just the process of building and adapting the Concourse binary itself was already an interesting thing to do, so here I share the lessons learned and how that process looked like.

To provide ARM-compiled binaries for node_extra_exporter (a Rust-based Prometheus exporter for exposing some metrics that the traditional node_exporter doesn’t expose), it felt like having an ARM-based machine in my Concourse cluster would be a great idea - have an extra job for building the ARM binary and there you go!

If you’d like to know what does it take to get a Concourse worker ready to take workloads on ARM, make sure you stick to the end.

Concourse workers
Compiling the Concourse binary
Cross compiling Go code
Compiling CGO code
Cross compiling CGO code
Building Guardian, the containerizer
Building Baggageclaim, the volumizer
Running the Concourse worker
Generating the modified registry-image-resource
Automating the process of building an ARM-based Concourse distribution
Summarizing
Some surprises
Closing Thoughts

Concourse workers

Regardless of your knowledge of Concourse, the tl;dr is that being a “continuous thing-doer”, it needs some form of compute nodes to do the things you want - these nodes are the workers.

To perform the job of “doing things”, in any platform, these workers are made of three components:

a “containerizer”, that manages containers
a “volumizer”, that manages volumes
“beacon”, the piece that registers the worker against the cluster and manages its lifecycle

minimal architecture showing the internal concourse components

In the case of Linux, of those three components, two¹ (the “volumizer” and “beacon”) are already part of the concourse binary. This means that we not only need to build concourse, but we also need to build the “containerizer”.

¹ there’s an expection, but that won’t be covered here.

Compiling the Concourse binary

Being Concourse a project entirely written in Go (actually, there’s also the UI, which is in Elm), all that we need at this stage is the Go toolchain.

As almost all of the Go code necessary for building Concourse is now under a single repository (github.com/concourse/concourse), and Go modules is the way that dependencies are handled there, this is what should be the most straightforward part.

Given that installing Go is quite straightforward, I decided to do that right in my Raspberry PI (why not?).

illustration of the concourse repository being built in a raspberry pi producing armv7 concourse

This means that just the following two steps are required:

# clone the main concourse repository
#
git clone https://github.com/concourse/concourse .


# build the main Concourse binary and put the result of
# the compilation into `$GOPATH/bin/`.
#
go install -v ./cmd/concourse

The problem though, is that the Raspberry PI is not as powerful as one would imagine: despite the 4 core SoC, it takes no less than seven minutes once all of the dependencies are already in place. Yeah, SEVEN MINUTES already having fetched all of the dependencies. Without the dependencies: 20 minutes.

graph showing the cpu utilization of the raspberry pi during the build process

For a full picture of the dashboard, click here to check out the interactive snapshot).

As a side note, it was quite interesting for me to see how there are clearly very different phases that the compiler toolchain goes trough when performing the build (not including the dependency fetching that is not in this panel).

Cross compiling Go code

Knowing that if I’d need to recompile this multiple times, it’d be crazy slow and, thus, super time consuming, I decided to go with cross compiling - this way we could just use all of the speed we have available and, in the end, just ship the binaries.

illustration of a macbook pro machine running darwin producing artifacts for other architectures and operating systems

Another benefit is that it’d be easy for anyone to build it too! No need for an actual Raspberry PI (or any ARMv7) to build Concourse.

The good news is that Go makes that whole job easy for us, making the whole cross compilation dependant on just few flags:

GOOS the name of the target operating system
GOARCH the name of the target architecture
GOARM the version of ARM that want to target

illustration of a golang file going through the compilation steps that gets it towards a compiled binary file

For Go code that is free of any calls to C code (via CGO), this “Just Works™” by the magic of the default Go toolchain - the go compilation infrastructure has all the necessary bits to perform the right translation to the right architectures and OSes that it supports.

If you’re curious about the intermediate representation that the Go compiler generates and how that becomes machine-dependant code, checkout the following example views of Go’s SSA:

ps.: to compile reproduce: GOSSAFUNC=main go build -gcflags "-S" main.go

However, being free of calls to C code is not a property of all projects - in the case of Concourse itself, dex, one of our dependencies, depends on go-sqlite3, which has bindings to C, which means that we now depend on CGO.

├ github.com/concourse/concourse/skymarshal/storage
  ├ github.com/concourse/dex/storage
  ├ github.com/concourse/dex/storage/sql
    ├ database/sql
    ├ database/sql/driver
    ...
    └ github.com/mattn/go-sqlite3
      ├ database/sql
      ├ database/sql/driver
      ..
      ├ unsafe
      └ C                       << CGO!!

If you’re curious about an example in go-sqlite3 where CGO gets used, check out sqlite3.go.

To see what I mean by “CGO doesn’t ‘just works’”, let’s try doing cross compilation with just those flags:

CGO_ENABLED=1 \
        GOARCH=arm \
        GOARM=7 \
        GOOS=linux \
                go build ./cmd/concourse/
# runtime/cgo
gcc: error: unrecognized command line option '-marm'; 
            did you mean '-mabm'?

And the reason why it wouldn’t work out of the box makes sense: when it comes to CGO, you’re not only in the realm of the Go toolchain, but you’re also relying on the infrastructure to build the C code as well, and there, things are slightly different. Let’s expand on it.

Compiling CGO code

As an example of how cross compilation with CGO looks like, let’s assume that we have a super extra very efficient library in C that is optimized to print strings to stdout.

First, starting with the declaration of the function we want to consume from Go (in the printer.h file):

#include <stdio.h>
void super_optimized_print(char* str);

Then, in the definition (printer.c):

#include "printer.h"
void super_optimized_print(char* str)
{
	printf("%s\n", str);
	return;
}

And now, finally, our Go code that uses CGO (main.go):

package main

// #include "printer.h"
// #include <stdlib.h>
import "C"
import "unsafe"

func main() {
	str := C.CString("hello world")
	defer C.free(unsafe.Pointer(str))

	C.super_optimized_print(str)
}

We can then try to build all of that code to our own OS and ARCH, and nothing really needs to be changed: you give the package to the go compiler, and let it do its job:

go build -v .

As, in this case, our C code is definitely portable, and we’re targetting our own machine architecture, all works. Let’s now try to build to a different platform though.

Cross compiling CGO code

However, if, again, we try the cross compilation, it’ll fail with the very same error we saw before:

CGO_ENABLED=1 GOOS=linux GOARCH=arm GOARM=7 go build -v .
gcc: error: unrecognized command line 
     option '-marm'; did you mean '-mabm'?

The good news is that without even looking at the Go code, we can see how that separation of “who compiles what” happens by tracing all of the execves that happen:

PCOMM     PID      PPID     ARGS
--------------------------------------------------------
go        29352    26142    go build .
cgo       29361    29352    cgo -V=full
compile   29362    29352    compile -V=full
compile   29365    29352    compile -V=full
compile   29367    29352    compile -V=full
compile   29376    29352    compile -V=full
asm       29381    29352    asm -V=full
asm       29382    29352    asm -V=full
cgo       29394    29352    cgo -objdir /tmp/go-build123931463/b003/...
gcc       29399    29394    gcc -E -dM -marm -I /tmp/go-build1239314...

diagram with the steps taken by the compiler to cross-compile CGO

For the output above, execsnoop from iovisor/bcc was used.

And that seems perfectly reasonable - to build C code, one needs to have a C compiler, which is something totally out of the realm of where the Go team should be focusing on, reason why it calls out to a C compiler.

The failure that we see there though comes from the fact that to have GCC performing the cross compilation, we need to install separate packages before, and properly adjust the C compiler to point to the right compiler.

Luckly, letting Go know which compiler to use when performing the build comes down to setting the CC environment variable (see golang/go#gccBaseCmd()).

CGO_ENABLED=1 \
        GOOS=linux \
        GOARCH=arm \
        GOARM=7 \
        CC=arm-linux-gnueabihf-gcc \
        go build -v
# workkss!!

PCOMM     PID    PPID   ARGS
go        30825  26142  go build -v .
cgo       30834  30825  cgo -V=full
compile   30835  30825  compile -V=full
compile   30840  30825  compile -V=full
compile   30841  30825  compile -V=full
compile   30842  30825  compile -V=full
asm       30857  30825  asm -V=full
asm       30862  30825  asm -V=full
asm       30863  30825  asm -V=full
asm       30872  30825  asm -V=full
cgo       30877  30825  cgo -objdir /tmp/go-build130487924/b003/ ...
arm-linux-30882  30877  /usr/bin/arm-linux-gnueabihf-gcc -E -dM ...
cc1       30883  30882  /usr/lib/gcc-cross/arm-linux-gnueabihf/7/cc1 -E ...
...

diagram with the steps taken to properly cross compile CGO

Now, with the Concourse binary built, we can move to its dependencies.

Building Guardian, the containerizer

As Concourse steps and resource checks run in Containers¹, and there’s a piece of the Concourse worker that is responsible for that, but Concourse as a team is not necessarily in the business of creating container runtimes, Concourse uses a separate component for doing so: Guardian(gdn), an implementation of the Garden interface for container management.

With the process of creating Linux containers being all standardized now (see opencontainers/runtime-spec), Guardian takes the approach of leveraging what’s already there, wrapping runc, the de facto implementation of the Runtime Spec, allowing consumers of the Garden interface to have containers created by Runc, without leaking the implementation details to the Garden interface.

illustration of Concourse interatinig with Garden to create a container

The detail here though, is that while most of gdn is pure Go, there are many bits from runc (gdn’s dependency) that are C based, and Guardian itself depends on other binaries (one being a C program).

Another detail in the process os building gdn is that, by default, gdn is not suitable for multiple architectures due to the way the way that it interacts with runc when it comes to asking runc to block specific syscalls.

seccomp in action, limiting the system calls that a process can use

// Seccomp represents syscall restrictions
//
// By default, only the native architecture of the kernel is allowed to be used
// for syscalls. Additional architectures can be added by specifying them in
// Architectures.
//
type Seccomp struct {
	DefaultAction Action     `json:"default_action"`
	Architectures []string   `json:"architectures"`
	Syscalls      []*Syscall `json:"syscalls"`
}

That means that, in gdn itself, we needed to have in the architectures slice, ARM included:

 var seccomp = &specs.LinuxSeccomp{
         DefaultAction: specs.ActErrno,
         Architectures: []specs.Arch{
                 specs.ArchX86_64,
                 specs.ArchX86,
                 specs.ArchX32,
+                specs.ArchARM,
         },
         Syscalls: []specs.LinuxSyscall{

As we know how to build all of those in a cross-platform way, we just need to follow the same recipe: set the right compiler, and there you go.

¹: in platforms that support containers.

Building Baggageclaim, the volumizer

As mentioned before, the “volumizer” is already part of the worker, thus, by building concourse (the binary), we already have baggageclaim built.

The only detail for baggageclaim is that if the backing filesystem for it is set to be btrfs, then the machine that runs the worker needs to have the btrfs CLI on it (built to the right platform).

Running the Concourse worker

At this point, we can already have our Concourse worker in a state that it could run - it has all of its dependencies, even though it has no base resource types, meaning that it wouldn’t be able to have any steps or even checks running.

The reason for that is that the root filesystem that ends up being fetched by Concourse to run a container needs to come from somewhere - the resource type that is configured to retrieve those bits. As there are none to do so, nothing fetch the base, thus, nothing can run.

illustration of the interaction between web and worker components in Concourse

To break out of that, we have to create a resource type to ship with our cross platform build that is able to retrieve root filesystems that are built for the specific platform that we target.

As we’re targetting non Linux AMD64, that meant that we’d need to go through the cross compilation dance again, now for the registry-image resource.

The problem though, is that not only compiling would be sufficient - when a registry client asks for a container image that lives in a registry, it has to specify what’s the platform that such image is created for.

illustration of the registry-image-resource interaction with container creation and a container images registry

By default, the underlying library that registry-image-resource uses assuming the linux amd64 tuple, thus, I created a pull request (PR) to address that: https://github.com/concourse/registry-image-resource/pull/36.

Generating the modified registry-image-resource

As a base resource type is defined by having a rootfs and resource_metadata.json (see https://concourse-ci.org/tasks.html#task-image-resource), the easiest way of getting to a rootfs would be to have a container image generated from a Dockerfile and then extracting the final rootfs, and placing that into a tarball.

Having that rootfs.tgz that contains the root filesystem for the modified registry-image-resource, that would mean that I could then distribute this resource in the tarball that contains all of the necessary bits for Concourse, effectively bootstrapping the whole thing!

a view into how the final tarball looks like

While that sounds great, we have to remember that the registry-image-resource makes requests to external systems.

The problem here is that to have the rootfs properly created, we need to execute some commands within that container that creates the rootfs to get ca-certificates so that we can make requests to HTTPS endpoints.

# the final representation of the registry-image 
# Concourse resource type.
#
FROM rootfs-${arch} AS registry-image-resource

COPY --from=registry-image-resource-build \
        /assets/ \
        /opt/resource/

RUN apt update -y && \
        apt install -y ca-certificates && \
        rm -rf /var/lib/apt/lists/*

As the base image of that container must be an ARM-based image, it means that any binaries that we try executing there, will be executing instructions that only ARM machines can run. Damn!

To overcome that, we have at least two options:

create a bootstrapping container image once in the target architecture, or
emulate the target architecture.

While 1 sounds like something that could work, 2 is now quite simple to achieve, if you’re using a MacOS or Windows 10 machine.

Since not too long ago, Docker for Desktop has been shipping their internal VM with the right hooks to be able to emulate other architectures in a very transparent way for the developer (see the recent announcement: Building Multi-Arch Image for Arm and X86 with Docker Desktop).

Automating the process of building an ARM-based Concourse distribution

As manually building all of those binaries is not fun at all, I made the whole process buildable by creating a multi-stage Dockerfile (see cirocosta/concourse-arm#Dockerfile).

Given that to build such Dockerfile we’d need to have a task that is able to do so, the Dockerfile builds a version of the builder-task, which wraps genuinetools/img, which is able to build container images from Dockerfiles (using buildkit).

Yeah, that’s a lot of names in a single article!

Summarizing

In the end, we build a bunch of stuff!

For instance, consider the building of the binaries:

diagram of all of the steps taken to build the binaries container image

And, for the registry-image-resource rootfs:

diagram with the steps taken to build the registry-image-resource bts

The good news is that it’s all declared in that very same Dockerfile I mentioned, making the build mostly reproducible.

Some surprises

While trying to figure out what was going wrong with Guardian, I wanted so much to use dlv to troubleshoot what was going on, but, unfortunatelly, it doesn’t support any 32bit systems at the moment.
I didn’t know that in some Linux distros you’d have to run modprobe config to have the /proc/config.gz file accessible to check the configuration used to build that Kernel - interesting to know!

Closing Thoughts

In the end, it turns out that it’s not super complicated to achieve building a Go project that depends on some dependencies (having some C code involved too) to other architectures - set the right variables here and there, make use of some emulation if needed, and there you go.

It’s quite cool what you can do using cross compilation, and how a combination of Go and container images with a well defined set of build steps can make the whole process of building for multi platforms work great even when you don’t have access to those platforms.

I’m very curious about the whole movement of supporting other architectures other than amd64, so it’s nice to start having a foot in this space.

Even though we made use of cross compilation, there were steps where we were still needing to run some details in such architecture, which makes me think that it might be worth investing in having Concourse workers running smoothly on these other platforms too.

Please let me know what you think! I’m @cirowrc on Twitter.

See you!