Hey,

Aside from the discretionary access control (DAC) permissions associated with files (e.g., “users with UID X can read”), there is an extra permission bit that can be stored in a file’s inode: the setuid bit.

Once set in an executable, it allows the user who’s executing that binary to do so with the UID of the owner of that file.

a setuid program is a program that allows a process to gain privileges it would not normally have, by setting the process' effective user ID to the same value as the user ID (owner) of the executable file.

As an example, consider the case of “run-as-root” example bellow, which, lets you run an executable (initially, without any privilege escalations):

    #include <stdio.h>
    #include <stdlib.h>
    #include <unistd.h>

    int
    main(int argc, char** argv, char** envp)
    {
            if (argc < 2) {
                    printf("Usage: %s <executable> <args ...>\n", argv[0]);
                    return 1;
            }

            execve(argv[1], argv + 1, envp);
            perror("execve");

            return 1;
    }

For instance, with it, we can execute the /usr/bin/id:

    ./run-as-root /usr/bin/id -u
    1001

Clearly not root.

Now, if we change the owner of that file to be UID 0 (root), and set the setuid bit:

    # as root, change the ownership of the
    # file to `root`
    #
    sudo chown 0 ./run-as-root


    # as root, set the `setuid` bit
    #
    chmod u+s ./run-as-root


    # run `run-as-root` again
    #
    ./run-as-root /usr/bin/id -u
    0

Note how in the last run of run-as-root we elevated our privileges, going from 1001 to 0 without the use of sudo - our run-as-root was able to do that for us.

It’s important to realize that setuid will make a program inherit the uid of the owner of the file only in the case of linux binaries - the use of setuid on an interpreted piece of code won’t work.

For instance, consider a second version of run-as-root: run-as-root.sh.

   #!/bin/bash
   exec $@

If we go again through the process of getting the setuid bit set and the file owned by UID 0, we can see no effect:

    sudo chown 0 ./run-as-root.sh
    chmod u+s ./run-as-root.sh

    ./run-as-root.sh /usr/bin/id -u
    1001

ps.: not only uid 0 is able to set the setuid bit - in practice, having the CAP_FOWNER capability is what matters (and, for setting the uid of the file, CAP_SETUID).

pps.: this behavior does not take effect on calling threads with no_new_privs attribute set via prctl, or if it’s being ptraced, or in case the underlying filesystem is mounted with nosuid (MS_NOSUID). See execve(2)

under the hood

inheriting the effective uid from a file

When getting prepared to execute a Linux binary (during __do_execve_file), the kernel gets to fill the “binary parameter” structure (struct linux_binprm), a data structure that holds the arguments that are used when loading binaries.

While this process is interesting in itself (e.g., see Using Go as a scripting language in Linux, what matters for us here is the moment when the kernel is filling that struct with a UID.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
static void
bprm_fill_uid(struct linux_binprm* bprm)
{
        struct inode* inode;
        unsigned int mode;
        kuid_t uid;
        kgid_t gid;

        /*
         * Since this can be called multiple times (via prepare_binprm),
         * we must clear any previous work done when setting set[ug]id
         * bits from any earlier bprm->file uses (for example when run
         * first for a setuid script then again for its interpreter).
         */
        bprm->cred->euid = current_euid();
        bprm->cred->egid = current_egid();

        if (!mnt_may_suid(bprm->file->f_path.mnt))
                return;

        if (task_no_new_privs(current))
                return;

        inode = bprm->file->f_path.dentry->d_inode;
        mode = READ_ONCE(inode->i_mode);
        if (!(mode & (S_ISUID | S_ISGID)))
                return;

        /* Be careful if suid/sgid is set */
        inode_lock(inode);

        /* reload atomically mode/uid/gid now that lock held */
        mode = inode->i_mode;
        uid = inode->i_uid;
        gid = inode->i_gid;
        inode_unlock(inode);

        /* We ignore suid/sgid if there are no mappings for them in the ns */
        if (!kuid_has_mapping(bprm->cred->user_ns, uid) ||
            !kgid_has_mapping(bprm->cred->user_ns, gid))
                return;

        if (mode & S_ISUID) {
                bprm->per_clear |= PER_CLEAR_ON_SETID;
                bprm->cred->euid = uid;
        }

        if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
                bprm->per_clear |= PER_CLEAR_ON_SETID;
                bprm->cred->egid = gid;
        }
}

What we can see above is essentially that if the file that we’re looking at contains the setuid bit in its mode (via the mode & S_ISUID check), then it leverages that file’s uid to set its euid.

Another thing worth noting there is the set of checks on lines 18 and 21.

18
19
20
21
22
        if (!mnt_may_suid(bprm->file->f_path.mnt))
                return;

        if (task_no_new_privs(current))
                return;

The first is all about ensuring that if the file comes from a filesystem with the MS_NOSUID bit set, that we’ll take setuid into consideration, and the second, verifying that the current task does not have the no_new_privs bit set (see https://www.kernel.org/doc/Documentation/prctl/no_new_privs.txt)

uid inheritance (in the non setuid case)

Under regular circumstances, i.e., a process being created from another through clone(2) will inherit the security context from its parent.

For instance, let’s consider the following example:

    #include <stdio.h>
    #include <unistd.h>

    int
    main(int argc, char** argv)
    {
            if (!~fork()) {
                    perror("fork");
                    return 1;
            }

            printf("pid=%d  uid=%d\n", getpid(), getuid());

            return 0;
    }

Compiling that code and running it, we can see how the child inherits the parent real UID:

    # compile the code
    #
    gcc -O2 -static -o fork main.c


    # run it
    #
    ./fork
    pid=30044  uid=1001
    pid=30045  uid=1001

At the kernel level, we can see that inheritance at the moment that the kernel is performing the copying of the process.

    prepare_creds+1
    copy_creds+1
    copy_process.part.38+1085
    _do_fork+248
    __x64_sys_clone+39
    do_syscall_64+90
    entry_SYSCALL_64_after_hwframe+68

Given that in the struct that represents a runnable thread (struct task_struct) contains the security context for it too (the process credentials in struct cred), it also performs a copy of those, and then mutates them accordingly.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
struct cred*
prepare_creds(void)
{
        struct task_struct* task = current;
        const struct cred* old;
        struct cred* new;

        validate_process_creds();
        new = kmem_cache_alloc(cred_jar, GFP_KERNEL);
        if (!new)
                return NULL;


        old = task->cred;
        memcpy(new, old, sizeof(struct cred));

        // ...
}

To truly observe the credentials being copied, we can place kretprobe on prepare_creds and see how the new struct cred looks like after the copy (during copy_process):

    #include <linux/sched.h>
    #include <linux/cred.h>

    BEGIN
    {
            printf("%-8s %-8s %-8s %-8s %-8s\n",
                    "REAL", "SAVED", "EFFEC", "VFS", "TYPE");
    }

    kretprobe:prepare_creds
    / comm == "bash" /
    {
            $old_creds = (struct cred *) curtask->cred;
            $new_creds = (struct cred *) retval;

            printf("%-8d %-8d %-8d %-8d %-8s\n",
                    $old_creds->uid.val,
                    $old_creds->suid.val,
                    $old_creds->euid.val,
                    $old_creds->fsuid.val,
                    "old");

            printf("%-8d %-8d %-8d %-8d %-8s\n",
                    $new_creds->uid.val,
                    $new_creds->suid.val,
                    $new_creds->euid.val,
                    $new_creds->fsuid.val,
                    "new");

            printf("\n");
    }

Now, running ./fork again, we can verify how new and old compare:

    REAL     SAVED    EFFEC    VFS      TYPE
    1001     1001     1001     1001     old
    1001     1001     1001     1001     new

    1001     1001     1001     1001     old
    1001     1001     1001     1001     new

mixing setuid and real uid inheritance

Now, what happens if you have a process that gets started from a setuid program (whose effective UID gets set to 0)?

Exactly the mix of both!

By the time the process gets copied (during the execution of clone(2)), the struct cred gets copied too (as seen above), and then at the moment of executing the binary (through execve(2)), the credential switch takes place, modifying the effective UID and saved set.

If that new process calls clone(2), once again, the same first step would then occur - the credentials would be copied, and then passed along to the new process.

    my_process
            clone(2)        // process copying goes on,
                               making uids be inherited

            execve(2)       // new effective & saved set


            clone(2)        // process copy once again