On checking set*id() return values

TL;DR; If set*id() syscalls' return values are not checked, it can cause security issues such as privilege escalation. Conditions to trigger a fail of these syscalls on modern kernels are less likely than in the past but return values should always be checked.

setuid() system call

As man 2 setuid states:

setuid() sets the effective user ID of the calling process. If the calling process is privileged (more precisely: if the process has the CAP_SETUID capability in its user namespace), the real UID and saved set-user-ID are also set.

setuid() is widely used in SUID/SGID binaries to drop privileges as such:

setuid(getuid())

If the above setuid() call is successful, then the effective UID of the process would have the value of the real UID, hence “cancelling” the SUID bits given to the binary.

Return value section of this call from the man also states that there exists security issues when it is not checked:

RETURN VALUE

  On success, zero is returned.  On error, -1 is returned, and
  _errno_ is set to indicate the error.
  
  _Note_: there are cases where **setuid**() can fail even when the
  caller is UID 0; it is a grave security error to omit checking
  for a failure return from **setuid**().

Imagine the previous setuid() call fails and its return value is not checked. A root-SUID binary doing such a call would then end up not dropping its privileges while believing it did, allowing for potential privilege escalations in the rest of the execution.

The goal of an attacker is thus to provoke such an error in programs that do not enforce return value checks.

The RLIMIT_NPROC case

There are different cases where setuid() fails but one is of particular interest for us:

ERRORS

  **EAGAIN** _uid_ does not match the real user ID of the caller and this
         call would bring the number of processes belonging to the
         real user ID _uid_ over the caller's **RLIMIT_NPROC** resource
         limit.  Since Linux 3.1, this error case no longer occurs
         (but robust applications should check for this error); see
         the description of **EAGAIN** in execve(2)

For now, let’s ignore the Linux 3.1 fix described above. What that means is that if one was able to provoke an EAGAIN error by reaching RLIMIT_NPROC, then the setuid() call would fail, and privileges of an hypothetic SUID binary not checking the return value would not be dropped. Great.

The kernel 2.6 modification

The previous behaviour is due to the fact that an additional check on RLIMIT_NPROC was added to the setuid() syscall back in Linux 2.6. It ensured that if RLIMIT_NPROC was reached, then the function failed with EAGAIN.

It was added because programs were abusing the per-user process number limit by running a daemon as root, then dropping privileges and running as another user which would bypass the limitation. Unfortunately, it opened the door to the previously described security issue.

The kernel 3.1 fix

The issue introduced by the 2.6 patch was addressed in Linux 3.1, after a discussion on the kernel mailing list (that can be found here: RLIMIT_NPROC check in set_user(). The base idea is not to enforce a check of RLIMIT_NPROC in set*id() syscalls but to move it to subsequent fork() or execv() ones. Somehow “delegating” the check to these.

A full explanation can be found in man 2 execve:

execve() and EAGAIN A more detailed explanation of the EAGAIN error that can occur (since Linux 3.1) when calling execve() is as follows.

  The **EAGAIN** error can occur when a _preceding_ call to setuid(2)
  setreuid(2), or setresuid(2) caused the real user ID of the
  process to change, and that change caused the process to exceed
  its **RLIMIT_NPROC** resource limit (i.e., the number of processes
  belonging to the new real UID exceeds the resource limit).  From
  Linux 2.6.0 to Linux 3.0, this caused the **set*uid**() call to fail.
  (Before Linux 2.6, the resource limit was not imposed on
  processes that changed their user IDs.)

  Since Linux 3.1, the scenario just described no longer causes the
  **set*uid**() call to fail, because it too often led to security
  holes where buggy applications didn't check the return status and
  assumed that—if the caller had root privileges—the call would
  always succeed.  Instead, the **set*uid**() calls now successfully
  change the real UID, but the kernel sets an internal flag, named
  **PF_NPROC_EXCEEDED**, to note that the **RLIMIT_NPROC** resource limit
  has been exceeded.  If the **PF_NPROC_EXCEEDED** flag is set and the
  resource limit is still exceeded at the time of a subsequent
  **execve**() call, that call fails with the error **EAGAIN**.  This
  kernel logic ensures that the **RLIMIT_NPROC** resource limit is
  still enforced for the common privileged daemon workflow—namely,
  fork(2) + **set*uid**() + **execve**().

And the corresponding source code from Linux kernel kernel/sys.c states the same:

static void flag_nproc_exceeded(struct cred *new)
{
	if (new->ucounts == current_ucounts())
		return;

	/*
	 * We don't fail in case of NPROC limit excess here because too many
	 * poorly written programs don't check set*uid() return code, assuming
	 * it never fails if called by root.  We may still enforce NPROC limit
	 * for programs doing set*uid()+execve() by harmlessly deferring the
	 * failure to the execve() stage.
	 */
	if (is_rlimit_overlimit(new->ucounts, UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC)) &&
			new->user != INIT_USER)
		current->flags |= PF_NPROC_EXCEEDED;
	else
		current->flags &= ~PF_NPROC_EXCEEDED;
}

This is effective as all programs reaching RLIMIT_NPROC and subsequently calling setuid() will still see their privileges dropped while if they call fork() and execve(), they would fail.

Proof of concept

The fact that this fix is working and that EAGAIN error code to the setuid() syscall cannot be obtained anymore using the previously described technique can be proved by the below piece of code:

#include <stdio.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>
#include <sys/resource.h>

#define RLIMIT_PROC_LOW 0

void print_uids()
{
        printf("uid: %d euid: %d\n", getuid(), geteuid());
}

void print_nproc_limits()
{
        struct rlimit rlim;

        if (getrlimit(RLIMIT_NPROC, &rlim) == -1)
        {
                printf("getrlimit() failed: %s\n", strerror(errno));
                return;
        }

        printf("RLIMIT_NPROC current: %d max: %d\n", rlim.rlim_cur, rlim.rlim_max);
}

int main(int argc, char *argv[])
{
        printf("Testing setuid()\n");
        print_uids();
        print_nproc_limits();

        struct rlimit rlim;
        rlim.rlim_cur = RLIMIT_PROC_LOW;
        rlim.rlim_max = RLIMIT_PROC_LOW;

        printf("Setting the limit to a low value: %d\n", RLIMIT_PROC_LOW);

        if (setrlimit(RLIMIT_NPROC, &rlim) == -1)
        {
                printf("setrlimit() failed: %s\n", strerror(errno));
                return 1;
        }

        print_nproc_limits();
        printf("Calling setuid to drop privileges\n");

        if (setuid(getuid()) == -1)
        {
                printf("setuid() failed: %s\n", strerror(errno));
        }

        print_uids();

        char *binary = "/usr/bin/ls";
        printf("executing %s\n", binary);

        int child = fork();

        if (child == -1)
        {
                printf("fork() failed: %s\n", strerror(errno));
        }
        else if (child == 0)
        {
                // child
                if (execve(binary, NULL, NULL) == -1)
                {
                        printf("execve() failed: %s\n", strerror(errno));
                }
        }

        return 0;
}

This code:

  • Reduce the number of processes the current process can create (RLIMIT_NPROC) to a low value (e.g. 0)
  • Drop privileges by setting the effective user ID to the real one via setuid()
  • Creates a child process executing a given binary (e.g. /usr/bin/ls)

It basically does a setuid() + fork() + execve() chain of system calls.

The program has to be compiled and set as a root-SUID binary:

$ sudo gcc -o setuid_poc setuid_poc.c
$ sudo chmod +s setuid_poc
$ ls -l setuid_poc   
-rwsr-sr-x 1 root root 16512 Jul 20 12:57 setuid_poc

The fork() will fail with EAGAIN as expected but the previous setuid() call is successfully executed regardless of RLIMIT_NPROC:

$ ./setuid_poc 
Testing setuid()
uid: 1000 euid: 0
RLIMIT_NPROC current: 31507 max: 31507
Setting the limit to a low value: 0
RLIMIT_NPROC current: 0 max: 0
Calling setuid to drop privileges
uid: 1000 euid: 1000
executing /usr/bin/ls
fork() failed: Resource temporarily unavailable

History

Examples of that particular issue have been found in a large variety of binaries throughout the years:

The most recent of these vulnerabilities are after the RLIMIT_NPROC kernel 3.1 fix (2011).

Detection

Clang’s static code analyzer scan-build detects that behaviour: 1. Available Checkers — Clang 17.0.0git documentation (llvm.org).

It can also be searched for in source code using a simple regex to see if a C line only contains a set*id() call without anything else: /^[ ]*sete?[ug]id[ ]*\([^0]+\);/

This is a common issue.

Conclusion

Not checking set*id() syscalls return values is not a good idea but the RLIMIT_NPROC attack path is quite mitigated by the Linux 3.1 fix. It seems to be only a security risk on kernel versions between 2.6 and 3.1. set*id() calls could though fail for other reasons, so one should always check return values of these functions.

References