Hey,
This is a “quick” intro to get everyone on the concourse
team up to speed on
the beginning of our work on containerd
- which you can check out the
progress in #4783 -, with some of what I learned so far.
Keep in mind that I’m no containerd
expert, so I might be stating things that
are not 100% true or accurate.
overview
As you probably know, at some point, concourse
gets to run definitions of “work
to be done” in the form of processes in a machine.
WEB
"I gotta run this thing somewhere"
- grabs a worker
--> hey, create this container following this spec
--> btw, run this process in it
Given that the garden
interface is high-level enough, as long as the backend
that implements it is able to do what’s supposed to, web
doesn’t need to worry
about who’s implementing it.
WEB -----container action-----> Garden implementor (backend)
|
(how it's called) <--'
For instance, if you look at the workers that are currently registered against
https://ci.concourse-ci.org, you’ll see that we have two different garden
implementors: houdini
, for windows
and darwin
, and guardian
for the
linux ones.
$ fly -t ci workers
name containers platform tags team
9561ac54-vm-f742202a-e8d9-45b6-5bbc-7fada8c2c958 4 windows none none
c184f9d5-a073-4de7-8ded-e2b78055cc82 4 linux bosh main
ci-monitoring-worker-0 16 linux none monitoring-hush-house
ci-pr-worker-0 3 linux pr none
ci-topgun-worker-0 13 linux k8s-topgun none
ci-worker-0 53 linux none none
darwin-worker 4 darwin none main
As an example of how this looks like in practice, the container creation looks
like this (from worker.createGardenContainer
):
func (w workerHelper) createGardenContainer(
containerSpec ContainerSpec,
fetchedImage FetchedImage,
handleToCreate string,
bindMounts []garden.BindMount,
) (gclient.Container, error) {
// do some setup ...
env := append(fetchedImage.Metadata.Env, containerSpec.Env...)
return w.gardenClient.Create(
garden.ContainerSpec{
Handle: handleToCreate,
RootFSPath: fetchedImage.URL,
Privileged: fetchedImage.Privileged,
BindMounts: bindMounts,
Limits: containerSpec.Limits.ToGardenLimits(),
Env: env,
Properties: gardenProperties,
})
}
And running a process in a container, like this (from
worker.gardenWorkerContainer.RunScript
, with some code deleted / modified
for readability):
func (container *gardenWorkerContainer) RunScript(ctx context.Context,
path string, args []string,
input []byte, output interface{},
logDest io.Writer, recoverable bool,
) error {
stdout := new(bytes.Buffer)
stderr := new(bytes.Buffer)
processIO := garden.ProcessIO{
Stdin: bytes.NewBuffer(input),
Stdout: stdout,
Stderr: stdout,
}
process, err := container.Run(ctx, garden.ProcessSpec{
Path: path,
Args: args,
}, processIO)
if err != nil {
return err
}
processExited := make(chan struct{})
go func() {
processStatus, processErr = process.Wait()
close(processExited)
}()
select {
case <-processExited: // execution finished
if processStatus != 0 {
return runtime.ErrResourceScriptFailed{
Path: path,
Args: args,
ExitStatus: processStatus,
Stderr: stderr.String(),
}
}
return err
case <-ctx.Done(): // cancelled
container.Stop(false)
<-processExited
return ctx.Err()
}
That’s all to say that as long as we implement the Garden interface, we can swap
the container runtimes as we wish, and that’s exactly the first step that we’re
taking with containerd
: writing a replacement for guardian
in the Linux
stack.
web ...........................................
.
.
. concourse web ----.
. |
|
|
worker ...................+....................
. |
. |
. concourse worker |
. garden backend ---. :7777
. |
. |
. containerd |
. /run/containerd/containerd.sock
.
Again, for web
, it’s still communicating with a garden
server.
containerd
containerd
is a container runtime that’s able to manage the complete
container lifecycle - from fetching images from a registry, to setting up
storage, running, and destroying it. In terms of usage, it’s currently the
engine under the hood of moby
, buildkit
, kubernetes
(when using
containerd-cri
), pouch
, and recently, openfaas
.
The way that we’re aiming at running it is as a separate process that gets spawn
by an ifrit
runner that brings it up from the binaries located under the
usual Concourse assets path (/usr/local/concourse/bin
).
worker
containerd runner
--> /usr/local/concourse/bin/containerd (separate process)
Once up, the interaction with it takes place through a client that has all that we need to touch a running containerd instance in our machine (the worker):
client, err := containerd.New("/run/containerd/containerd.sock")
if err != nil {
err = fmt.Errorf("containerd client conn: %w", err)
return
}
defer client.Close()
Under the hood, this takes cares of instantiating the grpc
client and
“dialing” to the unix socket, so that further interactions with containerd
can
take place throguh remote procedure calls that we perform against it.
// New returns a new containerd client that is connected to the containerd
// instance provided by address
//
func New(address string, opts ...ClientOpt) (*Client, error) {
gopts := []grpc.DialOption{
grpc.WithBlock(),
grpc.WithInsecure(),
grpc.FailOnNonTempDialError(true),
grpc.WithBackoffMaxDelay(3 * time.Second),
grpc.WithContextDialer(dialer.ContextDialer),
grpc.WithDefaultCallOptions(grpc.MaxCallRecvMsgSize(defaults.DefaultMaxRecvMsgSize)),
grpc.WithDefaultCallOptions(grpc.MaxCallSendMsgSize(defaults.DefaultMaxSendMsgSize)),
}
connector := func() (*grpc.ClientConn, error) {
ctx, cancel := context.WithTimeout(context.Background(), copts.timeout)
defer cancel()
conn, err := grpc.DialContext(ctx, dialer.DialAddress(address), gopts...)
if err != nil {
return nil, errors.Wrapf(err, "failed to dial %q", address)
}
return conn, nil
}
conn, err := connector()
if err != nil {
return nil, err
}
// ...
}
That’s because differently from garden
where the interface is a REST-a-like
HTTP-based API, containerd
expose its functionality through grpc
services
whose specs are defined in the form of protocol buffers
in its interface
definition language.
worker
backend
containerd client
------grpc rpc ----> containerd
For instance, looking at the “containers” protobuf spec, we can see the definition of how one one interact with the provider of the “containers service”.
syntax = "proto3";
service Containers {
rpc Get(GetContainerRequest) returns (GetContainerResponse);
rpc List(ListContainersRequest) returns (ListContainersResponse);
rpc ListStream(ListContainersRequest) returns (stream ListContainerMessage);
rpc Create(CreateContainerRequest) returns (CreateContainerResponse);
rpc Update(UpdateContainerRequest) returns (UpdateContainerResponse);
rpc Delete(DeleteContainerRequest) returns (google.protobuf.Empty);
}
message Container {
string id = 1;
map<string, string> labels = 2;
// ...
google.protobuf.Timestamp created_at = 8 [(gogoproto.stdtime) = true, (gogoproto.nullable) = false];
}
message GetContainerRequest {
string id = 1;
}
message GetContainerResponse {
Container container = 1 [(gogoproto.nullable) = false];
}
(from containers.proto
)
ps.: for a simple “hello world” example of grpc-go
, check out
cirocosta/hello-grpc
.
Using the gRPC toolchain, the containerd
maintainers take that definition and
turn it into both client and server-side code that implements it, which we can
then consume (the client) from the github.com/containerd/containerd
package.
containers, err := client.ContainerService().List(context.Background()))
if err != nil {
err = fmt.Errorf("list containers: %w", err)
return
}
fmt.Printf("%-16s %s", "ID", "CREATED-AT")
for _, container := range containers {
fmt.Printf("%-16s %s",
container.ID,
container.CreatedAt.String(),
)
}
That said, as long as we’re matching compatible versions of the server
(containerd
) and the client (the github.com/containerd/containerd
Go
package), that interface is “guaranteed” to be honored.
module github.com/concourse/concourse
require (
github.com/containerd/containerd v1.3.2
// ...
)
go 1.13
ARG CONTAINERD_VERSION=1.3.2
RUN curl -sSL $URL/releases/download/v$CONTAINERD_VERSION.linux-amd64.tar.gz \
| tar -zvxf - -C /usr/local/concourse/bin --strip-components=1 && \
namespaces
If we try to run that code above though, it wouldn’t work - we’re missing something: namespaces.
Whenever interacting with the containerd
API, we must specify against which
“tenant” we’re interacting with, allowing containerd
to be targetted by
multiple consumers without conflicts between the objects maintained for each of
them.
client
|
| `` what are the containers
| in NS1? ,,
|
'------> containerd
.---------------.--------------------.
| NS1 | NS2 |
| containers | other containers |
| some imgs | some other imgs |
| ... | .. |
To fix that example above then, we could either get the namespace information
into the context
used for that call, or configure the client with a default
namespace for all calls to containerd
made by that client.
ctx := namespaces.WithNamespace(context.Background(), "ns1")
// or
client, err := containerd.New(
"/run/containerd/containerd.sock",
containerd.WithDefaultNamespace("ns1"),
)
services
Internally, containerd
is composed of multiple components, whose interfaces
are exposed as gRPC services, which are presented in a nice high-level form
through the containerd
API that the client interacts with (through gRPC).
concourse containerd backend |
---------------------------- | client
containerd client |
|
| grpc
|
containerd API |
---------------------------- | server
low-level grpc services |
Despite these being very loosely coupled, they’re meant to work together from
the desire of the higher level containerd
API.
For instance, in the process of going from having an image in a registry, to actually running a container, quite few of those need to interact:
pull
--> fetch
--> content
--> images
--> unpack
<-- content
<-- images
--> snapshots
-
first, once layers are pulled, their content is put in the content store, which provides access to content addressable storage
-
those layers are then referenced through the metadata store, which is all about keeping track of references & relationships, as well as namespacing things - “images” are then just pointers to those content-addressable blobs.
-
consuming that content store, layers from the image are then unpacked into the snapshotter component, who’s then capable of mounting those layers in the right way.
-
using the image manifest and configuration, the execution configuration can be prepared, so that the executor, who’s job is to implement the container runtime that effectively runs the containers
ps.: having this decoupled nature, some of this components are actually swappable by custom plugins - e.g., you can bring your Snapshotter (storage), or Task (runtime).
In this first iteration of our Garden backend though, to have the least amount
of divergences from what we currently run on top of (guardian
), what I
proposed was that we don’t use containerd
fetcher, and instead, continue
leveraging baggageclaim
for now, which already gives us a root filesystem that
we can use in our containers.
task to run
--> baggaclaim prepares a volume for it
--> garden references that volume
--> containerd uses that rootfs volume
By doing this, we can learn from all of the pieces that we’ll definitely get wrong, and have a more “oranges to oranges” comparison when it comes to its day-to-day operations.
container
For running a container, few services are usually involved, just like for images.
Assuming that we’re using containerd’s image fetcher, the flow would look like the following:
run
initialize
<-- images
--> snapshot
setup
<-- snapshot
--> containers (metadata)
start
<-- containers
--> tasks *actual container
First, it starts by reading the image’s configuration, creating an OCI spec that describes the container that we want to run, then create a cow layer to serve as the rootfs.
Once that’s done, it then moves on to setting up the linux namespaces, mounts, etc, and then actually starting the process itself.
In our case though, if we assume that baggageclaim is there to give us the volumes, and that we don’t run containers “to completion”, but rather, we use them as “places to execute stuff”, it looks more like this:
1. garden.Create(containerSpec)
setup
--> containers (creates metadata that specifies
that we want a container)
start
<-- containers
--> task * the container
2. garden.Run(processSpec, processIO)
exec
<-- task
--> process * our process in the task sandbox
Despite containerd
being the thing that provides the API for dealing with all
of the lifecycle of a container, the containers themselves are not tied to the
lifecycle of containerd
- there’s a strict separation between those.
When it comes the time to run a container, it forks off a shim which is
responsible for calling out to runc
, these two being re-parented by the
system’s init (they decouple themselves from containerd
).
systemd───containerd-shim───executable───5*[{executable}]
As the shim expects is tailored towards a specific version of runc
, that’s
another dependency that we must also get the right version.