Retrieving the full path of a process on MacOS (and exploring procfs)

Hey,

Another day I was trying to make sure that a given process that I was running was using a specific binary that I had built, but I couldn’t figure out: ps would only show me the non-absolute path.

# How could I know what is the absolute path of the
# `hugo` binary, assuming that I could have multiple
# `hugo` binaries in `$PATH`?
ps
  PID TTY           TIME CMD
 4153 ttys000    5:14.98 hugo serve     <<<
 9035 ttys001    0:00.04 /Applications/iTerm.app/Content...
 9037 ttys001    0:00.10 -bash
 9086 ttys001    0:02.27 /usr/local/Cellar/macvim/8.1-15...
 9236 ttys002    0:00.04 /Applications/iTerm.app/Content...
 9238 ttys002    0:00.10 -bash

If I were using Linux though, I thought, that’d be easy: head to /proc, search for the pid of the process and then check what exe links to; done.

# (on a Linux machine ...)
#
# See `hugo` will still not show up with the absolute path
# like on MacOS.
ps aux | grep hugo
ubuntu    2275  0.0  0.0 101852   748 pts/0    Sl+  00:26   0:00 hugo serve

# Given that the proc filesystem can provide us with some
# more information about the process, check out the `exe`
# link (which should provide a link to the actual executable).
stat /proc/2275/exe
  File: /proc/2275/exe -> /usr/local/bin/hugo
  Size: 0         	Blocks: 0          IO Block: 1024   symbolic link
Device: 4h/4d	Inode: 140106      Links: 1
Access: (0777/lrwxrwxrwx)  Uid: ( 1001/  ubuntu)   Gid: ( 1001/  ubuntu)
Access: 2018-09-24 00:26:37.167004005 +0000
Modify: 2018-09-24 00:26:27.391004005 +0000
Change: 2018-09-24 00:26:27.391004005 +0000
 Birth: -

In this post, I go through how we can gather such information on a MacOS, and what the procfs in Linux is all about.

tl;dr: /proc on Linux is dope; on MacoS: compile a little code that uses proc_pidpath from libproc, or install pidpath.

The /proc filesystem in Linux
procfs under the hood
The libproc library in MacOS
A Golang binary that suits Linux and MacOS
Closing thoughts

The /proc filesystem in Linux

In “Linux land”, there’s this thing called “procfs”.

It’s a virtual filesystem - in the sense that there are no real regular files in your disk that map to the filesystem representation - that allows a user (in userspace) to perform some introspection about its current running process and others as well.

From the kernel docs:

The proc file system acts as an interface to internal data structures in the kernel.

It can be used to obtain information about the system and to change certain kernel parameters at runtime (sysctl).

The way the interaction with it is set up is pretty nifty:

each process receives a given path under /proc (like, /proc/<pid>), and then
as subdirectories of this path, various other files and subdirectories are present to allow deeper introspection about the specific pid.

# Display the files and directories present at the
# very root of `/proc`.
#
# Here we can find the list of PIDs that we can access,
# as well as some more system-wide information and 
# settings that we can tweak.
ls -lah /proc
total 4.0K
dr-xr-xr-x 124 root     root       0 Sep 24 23:56 .
drwxr-xr-x  24 root     root    4.0K Sep 24 23:57 ..
dr-xr-xr-x   9 root     root       0 Sep 24 23:56 1
dr-xr-xr-x   9 root     root       0 Sep 25 00:54 2016
dr-xr-xr-x   9 root     root       0 Sep 24 23:57 417
...
-r--r--r--   1 root     root       0 Sep 25 01:25 sched_debug
-r--r--r--   1 root     root       0 Sep 25 01:25 schedstat
dr-xr-xr-x   4 root     root       0 Sep 25 01:25 scsi
lrwxrwxrwx   1 root     root       0 Sep 24 23:56 self -> 2574
...


# Getting into a specific pid path, we're able to
# gather more information about the specifics of
# a given process.
ls -lah /proc/472
total 0
dr-xr-xr-x   9 root root 0 Sep 24 23:57 .
dr-xr-xr-x 123 root root 0 Sep 24 23:56 ..
...
lrwxrwxrwx   1 root root 0 Sep 25 01:30 cwd -> /
-r--------   1 root root 0 Sep 25 01:30 environ
lrwxrwxrwx   1 root root 0 Sep 24 23:57 exe -> /lib/systemd/systemd-udevd
dr-x------   2 root root 0 Sep 24 23:57 fd
lrwxrwxrwx   1 root root 0 Sep 25 01:30 root -> /
-rw-r--r--   1 root root 0 Sep 25 01:30 sched
...

Not only that, procfs is very helpful when you’re not sure if a process is blocked on something you didn’t expect (like a write(2) to an nfs mount point that is malformed due to a bad set of servers not responding), or something simple as your process sleeping when you didn’t want to:

# Sleep for 33 days on the background
sleep 33d &
[1] 2786


# Check what's the state of the process
cat /proc/2786/stat
2786 (sleep) S ...
 |     |     |
 |     |     `-> state (interruptible sleep)
 |     `-> command being run (sleep command)
 `-> process id (the pid we used before)


# Check what's the stack trace (from the kernel
# perspective) that led the process to this 
# sleep state
cat /proc/2786/stack
[<0>] hrtimer_nanosleep+0xd8/0x1d0
[<0>] SyS_nanosleep+0x72/0xa0
[<0>] do_syscall_64+0x73/0x130
[<0>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[<0>] 0xffffffffffffffff

procfs under the hood

What’s interesting about being virtual is that the implementation of procfs is able to generate the representation of the filesystem on the fly - whenever you issue an I/O call like read(2), Linux answers back with what you asked for, be it the list of file descriptors opened by a given process, or the list of environment variables that were set at process startup time.

For instance, if tracing the execution of cat /proc/<pid>/meminfo down, we can find the path that the read(2) syscall takes:

# stack trace of `cat /proc/<pid>/meminfo`
        meminfo_proc_show
        proc_reg_read
        __vfs_read
        vfs_read
        sys_read
        do_syscall_64
        entry_SYSCALL_64_after_hwframe


# stack trace of `cat /file.txt` 
# on an ext4 mount point
        ext4_file_read_iter
        __vfs_read
        vfs_read
        sys_read
        do_syscall_64
        entry_SYSCALL_64_after_hwframe

Very different from a regular read (as shown in the second stack trace), there’s no real file on disk being accessed - just meminfo_proc_show returning the contents related to what the user asked for: virtual memory stuff.

By the way, if you’re interested in knowing more about related subjects, a great reference for this type of knowledge is The Linux Programming Interface: A Linux and UNIX System Programming Handbook.

Now to MacOS.

The libproc library in MacOS

Differently from Linux, it feels like we can’t know all that much about how things work on MacOS.

After searching a bit on how to accomplish how to gather information about a process, libproc showed up.

As mentioned in libproc.h:

/*
 * This header file contains private interfaces 
 * to obtain process information.
 *
 * These interfaces are subject to change in future releases.
 */

One thing to note:

the interfaces are private - no guaranteed compatibility with future releases.

This has been elucidated by an Apple staff member on post at the Apple’s developer forum regarding gathering process information:

[…] Apple has not put a lot of effort into providing APIs for getting this sort of information.

What APIs that do exist were either inherited from OS X’s predecessor OSs or were added primarily to meet our internal requirements rather than the needs for third-party developers.

Thus, you will find a lot of places where these APIs are: incomplete; incorrect; poorly documented and aren’t as binary compatible as they should be.

Anyway, we can still make use of it - more specifically, we can make use of proc_pidpath, a method that takes a pid (the pid of the process that we want to know more about), a buffer where the path should be written to, and the buffer size.

int proc_pidpath(
  int pid,              // pid of the process to know more about
  void * buffer,        // buffer to fill with the abs path
  uint32_t  buffersize  // size of the buffer
);

That said, we can go ahead and create our Go binary that can handle both Linux and MacOS by specifying two different compilation targets.

A Golang binary that suits Linux and MacOS

Given that libproc will not be a thing under Linux, we can start by creating a pidpath_linux.go file that is meant to be compiled only on Linux, and another file, pidpath_darwin.go, aimed at MacOS machines.

The Linux one is rather simple: it follows the /proc/<pid>/exe symlink, and that’s it:

// +build linux
package main

import (
	"os"
	"strconv"
)

func GetExePathFromPid(pid int) (path string, err error) {
	path, err = os.Readlink("/proc/" + strconv.Itoa(pid) + "/exe")
	return
}

The MacOS version though, needs a little bit more.

Given that we’d access libproc via C, we can leverage CGO.

// +build darwin
package main

// #include <libproc.h>
// #include <stdlib.h>
// #include <errno.h>
import "C"

import (
	"fmt"
	"unsafe"
)

// bufSize references the constant that the implementation
// of proc_pidpath uses under the hood to make sure that
// no overflows happen.
//
// See https://opensource.apple.com/source/xnu/xnu-2782.40.9/libsyscall/wrappers/libproc/libproc.c
const bufSize = C.PROC_PIDPATHINFO_MAXSIZE

func GetExePathFromPid(pid int) (path string, err error) {
        // Allocate in the C heap a string (char* terminated
        // with `/0`) of size `bufSize` and then make sure
        // that we free that memory that gets allocated
        // in C (see the `defer` below).
	buf := C.CString(string(make([]byte, bufSize)))
	defer C.free(unsafe.Pointer(buf))

        // Call the C function `proc_pidpath` from the included
        // header file (libproc.h).
	ret, err := C.proc_pidpath(C.int(pid), unsafe.Pointer(buf), bufSize)
	if ret <= 0 {
		err = fmt.Errorf("failed to retrieve pid path: %v", err)
		return
	}

        // Convert the C string back to a Go string.
	path = C.GoString(buf)
	return
}

That done, we can now consume GetExePathFromPid in our application.

To see that in place, check out cirocosta/pidpath.

Closing thoughts

It was interesting to me to check out how different things are in MacOS land.

Although I use a Macbook Pro as a personal computer (and a Mac at work), I’ve not really paid attention to these little details.

Also, /proc is just so valuable! Definitely worth knowing more about other functionality over there. Make sure you check out The Linux Programming Interface: A Linux and UNIX System Programming Handbook.

If you have any questions, or suggestions to improve this blog post, please let me know! I’m cirowrc, and I’d love to chat.

Have a good one!