The mechanics of moving volumes around workers in Concourse involves two steps:

  1. streaming a volume out of a machine
  2. streaming a volume into a machine

In case 1, baggageclaim creates an archive of that directory (using tar), compresses it (using either zstd or gzip), and then send its the content over to whoever is trying to consume that1.

In case 2, baggageclaim does the opposite: takes a stream of bytes, then decompresses, and then lets tar convert that into a directory tree in an empty volume (thus, filling the volume with the contents as they were in the other machine).

Given that a user (lets say, cirocosta with UID 1004) might exist in machine-a, but not in machine-b, how does tar deal with that?

It turns out that tar does keep the UIDs around in the archive when you create it:

    # create a compressed tarball w/ files that are owned by `1234`
    touch {a,b,c}
    sudo chown 1234 {a,b,c}
    tar czvf ./unpriv-files.tgz

    # check who the owner is
    tar --numeric-owner -tzvf ./unpriv-files.tgz

            -rw-r--r-- 1234/0   ./a
            -rw-r--r-- 1234/0   ./b
            -rw-r--r-- 1234/0   ./c

And, we can see that being also true for UID 0:

    # make those files privileged
    sudo su -
    chown 0 {a,b,c}
    tar czvf ./priv-files.tgz

    # check the uid
    tar --numeric-owner -tzvf ./priv-files.tgz

            -rw-r--r-- 0/0   ./a
            -rw-r--r-- 0/0   ./b
            -rw-r--r-- 0/0   ./c

When it comes to getting extracting that though, things change quite a bit.

extracting as an unprivileged user

regardless of how the UIDs are set up inside the archive, it gets extracted with the current user’s UID.

For instance:

    # extracting the unprivileged payload (files w/ uid 1234), ends up w/
    # files owned by myself (cirocosta uid=1004)
    tar xvzf ./unpriv-files.tgz
    -rw-r--r-- 1 1004 1006 0 Nov 27 13:55 a
    -rw-r--r-- 1 1004 1006 0 Nov 27 13:55 b
    -rw-r--r-- 1 1004 1006 0 Nov 27 13:55 c
                  |    |
                  UID  GID

    # extracting the privileged payload (files w/ uid 0), ends up w/
    # files owned by myself (cirocosta uid=1004)
    tar xvzf ./priv-files.tgz
    -rw-r--r-- 1 1004 1006 0 Nov 27 14:00 a
    -rw-r--r-- 1 1004 1006 0 Nov 27 14:00 b
    -rw-r--r-- 1 1004 1006 0 Nov 27 14:00 c
                  |    |
                  UID  GID

extracting as a privileged user

Given that a privileged user is capable of freely using [setuid(2)], tar in this case leverages that and then uses the permissions set in the archive:

    # extracting the unprivileged payload (files w/ uid 1234), ends up w/
    # files owned 1234 (not ourselves - what was set in the archive).
    tar xvzf ./unpriv-files.tgz
    -rw-r--r-- 1 1234 0 0 Nov 27 13:55 a
    -rw-r--r-- 1 1234 0 0 Nov 27 13:55 b
    -rw-r--r-- 1 1234 0 0 Nov 27 13:55 c
                  |    |
                  UID  GID

    # extracting the privileged payload (files w/ uid 0), ends up w/
    # files owned by 0 (just like in the archive).
    -rw-r--r-- 1 0 0 0 Nov 27 14:00 a
    -rw-r--r-- 1 0 0 0 Nov 27 14:00 b
    -rw-r--r-- 1 0 0 0 Nov 27 14:00 c
                  |    |
                  UID  GID

1: that’s more of a pipeline actually - tar | zstd | ...