Hey,

While figuring out how to run a standalone kubelet with containerd, I had quite a bit of trouble getting the networking part set up given that I had previously misconfigured the PATH environment variable set for the systemd service that I was using to run containerd.

Everything for me made sense: I did indeed have iptables in my PATH (after confirming that it’s location matched what I’ve gotten for PATH, so, why was it failing like that?

    failed to locate iptables: exec: \"iptables\": 
            executable file not found in $PATH"

Having spent quite a bit of time just to figure out my stupid mistake of tailoring a wrong PATH in the systemd service, here’s how I got to it.

who’s trying to find iptables?

That was the first question I had - perhaps I could trace who was failing to execve, and then that’d be all:

But, as one more familiar with execve1, that’s just not how it works - execve would expect either a relative path (from AT_FDCWD or another directory that you provide via a file descriptor), or an absolute path.

Regardless, someone is going through what is set for their PATH, and then trying to see if iptables exists there.

All that we needed now is to figure out which syscall to trace:

    ls /sys/kernel/debug/tracing/events/syscalls/ | grep enter | grep stat
    sys_enter_fstatfs       // filesystem statistics
    sys_enter_newfstat
    sys_enter_newfstatat
    sys_enter_newlstat
    sys_enter_newstat
    sys_enter_statfs        // filesystem statistics
    sys_enter_statx
    sys_enter_ustat         // filesystem statistics

From those, newfstat is the only one that’d require more than a one-liner to trace as it takes an open file descriptor (in which case, an open would have already occurred first, not very likely to be something done under the hood).

Thus, let’s trace those and figure out who’s issuing them:

    #!/snap/bin/bpftrace

    tracepoint:syscalls:sys_enter_newfstatat,
    tracepoint:syscalls:sys_enter_newlstat,
    tracepoint:syscalls:sys_enter_newstat,
    tracepoint:syscalls:sys_enter_statx
    / comm != "iptables" /
    {
            printf("%-16s %s\n", comm, str(args->filename));
    }

And then, there I found it:

    bridge           /opt/containerd/bin/iptables

the bridge command (a cni plugin, part of containernetworking/plugins) was only searching for iptables under /opt/containerd/bin (which matched exactly the PATH I had set up for containerd - the process that spawned bridge).

what else could’ve been done?

To further improve that code, we could have traced the exitting side of that syscall, and then filter by those that were failing (i.e., the searches that resulted in a “not found”):

    cat  /sys/kernel/debug/tracing/events/syscalls/sys_exit_newfstatat/format
    name: sys_exit_newfstatat
    ID: 691
    format:
            field:unsigned short common_type;       offset:0;       size:2; signed:0;
            field:unsigned char common_flags;       offset:2;       size:1; signed:0;
            field:unsigned char common_preempt_count;       offset:3;       size:1; signed:0;
            field:int common_pid;   offset:4;       size:4; signed:1;

            field:int __syscall_nr; offset:8;       size:4; signed:1;
            field:long ret; offset:16;      size:8; signed:1;

    print fmt: "0x%lx", REC->ret

This way, the filter would look like / args->ret != 0 /.


  1. not to confuse with glibc’s execvp, which takes PATH into consideration, but that’s just a userspace behavior from a library anyway. ↩︎