Hey,
While figuring out how to run a standalone kubelet with
containerd, I had quite a bit of
trouble getting the networking part set up given that I had previously
misconfigured the PATH environment variable set for the systemd service that
I was using to run containerd.
Everything for me made sense: I did indeed have iptables in my PATH (after
confirming that it’s location matched what I’ve gotten for PATH, so, why was
it failing like that?
failed to locate iptables: exec: \"iptables\":
executable file not found in $PATH"
Having spent quite a bit of time just to figure out my stupid mistake of
tailoring a wrong PATH in the systemd service, here’s how I got to it.
who’s trying to find iptables?
That was the first question I had - perhaps I could trace who was failing to
execve, and then that’d be all:
-
I’d be able to tell which process did the
execve, thus, be able to inspect its environment (through/proc/pid/environ)proc1: | execve(iptables) | <--- ERR | printf("failed to locate ...") | '-------> with a misconfigured PATH '--> just gotta figure out who that is
But, as one more familiar with execve1, that’s just not how it works - execve
would expect either a relative path (from AT_FDCWD or another directory that you
provide via a file descriptor), or an absolute path.
Regardless, someone is going through what is set for their PATH, and then
trying to see if iptables exists there.
All that we needed now is to figure out which syscall to trace:
ls /sys/kernel/debug/tracing/events/syscalls/ | grep enter | grep stat
sys_enter_fstatfs // filesystem statistics
sys_enter_newfstat
sys_enter_newfstatat
sys_enter_newlstat
sys_enter_newstat
sys_enter_statfs // filesystem statistics
sys_enter_statx
sys_enter_ustat // filesystem statistics
From those, newfstat is the only one that’d require more than a one-liner to
trace as it takes an open file descriptor (in which case, an open would have
already occurred first, not very likely to be something done under the hood).
Thus, let’s trace those and figure out who’s issuing them:
#!/snap/bin/bpftrace
tracepoint:syscalls:sys_enter_newfstatat,
tracepoint:syscalls:sys_enter_newlstat,
tracepoint:syscalls:sys_enter_newstat,
tracepoint:syscalls:sys_enter_statx
/ comm != "iptables" /
{
printf("%-16s %s\n", comm, str(args->filename));
}
And then, there I found it:
bridge /opt/containerd/bin/iptables
the bridge command (a cni plugin, part of
containernetworking/plugins)
was only searching for iptables under /opt/containerd/bin (which matched
exactly the PATH I had set up for containerd - the process that spawned
bridge).
what else could’ve been done?
To further improve that code, we could have traced the exitting side of that syscall, and then filter by those that were failing (i.e., the searches that resulted in a “not found”):
cat /sys/kernel/debug/tracing/events/syscalls/sys_exit_newfstatat/format
name: sys_exit_newfstatat
ID: 691
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:int __syscall_nr; offset:8; size:4; signed:1;
field:long ret; offset:16; size:8; signed:1;
print fmt: "0x%lx", REC->ret
This way, the filter would look like / args->ret != 0 /.