Hey,

During the work of getting concourse to have containers being created by containerd, one of the steps that I took in the configuration of containerd was to disable the cri plugin in the containerd configuration:

    disabled_plugins = ["cri"]      # <<< (!)

    [grpc]
      address = "/run/containerd/containerd.sock"

    [debug]
      address = "/run/containerd/debug.sock"
      level = "debug"

    [plugins]
      [plugins."io.containerd.runtime.v1.linux"]
        runtime = "runc"
        shim = "containerd-shim"

When doing so, we were essentially configuring containerd to not serve the interface that lets kubelet communicate with a runtime that should provide the primitives to materialize pods into something “tangible”.

Essentially, I was taking that “cri plugin” from the diagram below away:

    kubelet
      |
      |    .---------------------------.
      |    |.------------.             |
      *----++->cri plugin| containerd -+---> containers
           |'------------'             |
           '---------------------------'

But, what if instead, we put concourse a side and try to get containerd to materialize the Pods that a kubelet sees?

ps.: all of the article assumes you’re running Linux. in my case, 5.3 from Ubuntu Eoan (19.10)

building kubelet from source

I thought this would be one of the hardest parts - a bunch of dependencies to figure out -, but it turns out that it was the easiest.

Having Go 1.13.5 already set up, I pretty much followed the k8s development guide that lives under kubernetes/community:

    git clone https://github.com/kubernetes/kubernetes
    pushd $_
            make WHAT=cmd/kubelet
    popd

As a result of that, _output/local/bin/linux/amd64/kubelet got populated with the kubelet binary.

For the sake of making the process of calling kubelet from anywhre, I linked /usr/local/bin/kubelet to that destination

    ln -s \
            $(realpath ./_output/local/bin/linux/amd64/kubelet) \
            /usr/local/bin/kubelet

setting the kernel up

We’re dealing with container tech, so, customizing some kernel parameters and ensuring that some kernel modules are ready is needed.

Under the hood, containerd will use overlayfs (overlay filesystem) at least to manage container images, and br_netfilter (bridge netfilter) for the purpose of working on packets that go through a bridge device that it sets up.

To ensure that those are loaded, we can use modprobe:

    modprobe overlay 
    modprobe br_netfilter

Naturally, this keeps them activated only until a reboot. The systemd-modules-load service (from systemd itself) can then take care of loading modules for us (so that we don’t have to use modprobe to add it all the time that the system is initialized) during initialization.

    cat > /etc/modules-load.d/containerd.conf <<EOF
    overlay
    br_netfilter
    EOF

As the network functionality requires us to have packets traversing the bridge being sent to iptables for processing, we need to enable a module parameter that allows that to occur: net.bridge.bridge-nf-call-iptables (and the IPv6 equivalent).

With ip packets needing to be routed between multiple network interfaces (e.g., a container - with an internal virtual ethernet device - trying to “ping” an external service will need to have its packets being forwarded through a default gateway that is another ethernet device, thus, making the machine act as a router), we need to explicitly allow that in the kernel (through net.ipv4.ip_forward).

    cat > /etc/sysctl.d/99-kubernetes-cri.conf <<EOF
    net.bridge.bridge-nf-call-ip6tables = 1
    net.bridge.bridge-nf-call-iptables  = 1
    net.ipv4.ip_forward                 = 1
    EOF

    sysctl --system

That done, from the kernel perspective, it’s good to go.

installing containerd

Installing containerd requires setting three things up:

  1. its own binaries
  2. runc
  3. cni

I summarized the whole process in a Makefile under a repo (cirocosta/containerd-install), but here’s how it goes.

In order to facilitate cleaning things up, I put all binaries and configurations under the same directory tree (/usr/local/kubelet-sample/{bin,conf}).

runc

runc is a single binary that can be retrieved right from the GitHub releases page: https://github.com/opencontainers/runc/releases

It’s essentially a dependency of the containerd-runc-shim-v2, which implements containerd’s runtime v2 interface, providing to containerd the ability to create containers using runc.

    containerd
       '--- tasks
              '---- runtime v2
                      '---- containerd-runc-shim-v2
                               '--- runc

Given that runc is a binary that gets called with a very specific set of arguments by whoever consumes it, the version here is very important - get one that’s not compatible with containerd and things might just suddenly fail.

For this reason, containerd makes that version explicit in its vendor file:

    github.com/opencontainers/runc d736ef14f0288d6993a1845745d6756cfc9ddd5a # v1.0.0-rc9 

containerd

Being containerd composed of not only the containerd daemon that provides the capabilities of creating containers, managing container images (and more), what we get from its release is a set of binaries that provide the bulk of the functionality:

    ctr                             - cli to interact w/ containerd
    containerd-stress               - stress testing
    containerd                      - the daemon
    containerd-shim                 - acts as a parent to the containers
    containerd-shim-runc-v2         - implements the runtime iface for runc
    containerd-shim-runc-v1         - same as v2, but older (I guess?)

As containerd is supposed to be running in the background as a daemon, I preferred to go with a systemd service, but anything would do the job here.

    [Unit]
    After=network.target
    Description=an open and reliable container runtime
    Documentation=https://containerd.io

    [Service]
    Delegate=yes
    Environment=PATH=/usr/local/kubelet-sample/bin:/usr/sbin
    ExecStart=/usr/local/kubelet-sample/bin/containerd --config=/usr/local/kubelet-sample/conf/containerd.toml
    KillMode=process
    LimitCORE=infinity
    LimitNOFILE=1048576
    LimitNPROC=infinity
    Restart=always
    TasksMax=infinity

    [Install]
    WantedBy=multi-user.target

With regards to the containerd configuration, not much is really needed aside from letting it know where the cni binaries are, and where the network configurations can be found:

    diff --git a/default.toml b/./containerd.toml
    index 2e72de9..0ddc311 100644
    --- a/tmp/before
    +++ b/./containerd.toml
    @@ -84,8 +84,8 @@ oom_score = 0
               runtime_root = ""
               privileged_without_host_devices = false
         [plugins."io.containerd.grpc.v1.cri".cni]
    -      bin_dir = "/opt/cni/bin"
    -      conf_dir = "/etc/cni/net.d"
    +      bin_dir = "/usr/local/kubelet-sample/bin"
    +      conf_dir = "/usr/local/kubelet-sample/conf/cni"
           max_conf_num = 1
           conf_template = ""
         [plugins."io.containerd.grpc.v1.cri".registry]

cni

Being cni just an interface that gets implemented by plugins that adhere to that interface, what we actually need to download in this case is the set of plugins that we plan to use when having our pods set up.

The reference ones that are maintained by the CNI team can be found under containernetworking/plugin, which releases all the binaries in the form of a compressed tarball.

getting kubelet targetting containerd

Having all of those installed in a known locations, it was now a matter of letting kubelet know where containerd lives in order to target it when it realizes that it needs to instantiate the pods whose definitions were assigned to it.

To do so, we tweak just two parameters:

    # specify that we'd like to connect to something that implemnts the CRI
    #
    --container-runtime remote 

    # specify where the CRI implementation lives
    #
    --container-runtime-endpoint unix:///run/containerd/containerd.sock

running a pod

Lastly, to have the kubelet knowing which pods we want to run, we tell it so:

    # where kubelet should look for pod definitions
    #
    --pod-manifest-path $(realpath ./pods)

That’s because it’s capable of discovering pod definitions from the filesystem, making the whole thing possible without an apiserver at all.

Put a pod definition there, and kubelet will run it for you.


If you want to check the whole thing out from a Makefile that does this all - check out https://github.com/cirocosta/containerd-install/tree/kubelet-sample