Tuesday, October 25, 2016

A brief look at the Linux-kernel random generator interfaces

Most modern operating systems provide a cryptographic pseudo-random number generator (CPRNG), as part of their OS kernel, intended to be used by applications involving cryptographic operations. Linux is no exception in that, and in fact it was the first operating system that actually introduced a CPRNG into the kernel. However, there is much mystery around these interfaces. The manual page is quite unclear on its suggestions, while there is a web-site dedicated to debunking myths about these interfaces, which on a first read contradicts the manual page.

In this post, triggered by my recent attempt to understand the situation and update the Linux manual page, I'll make a brief overview of these interfaces. Note that, this post will not get into the insights of a cryptographic pseudo-random generator (CPRNG); for that, consider reading this article. I will go through these interfaces, intentionally staying on the high-level, without considering internal details, and discuss their usefulness for an application or library that requires access to such a CPRNG.

  • /dev/random: a file which if read from, will output data from the kernel CPRNG. Reading from this file blocks once the kernel (using some a little arbitrary metric) believes not enough random events have been accumulated since the last use (I know that this is not entirely accurate, but the description is sufficient for this post).
  • /dev/urandom: a file which if read from, will provide data from the kernel CPRNG. Reading from /dev/urandom will never block.
  • getrandom(): A system call which provides random data from the kernel CPRNG. It will block only when the CPRNG is not yet initialized.

A software engineer who would like to seed a PRNG or generate random encryption keys, and reads the manual page random(4) carefully, he will most likely be tempted to use /dev/random, as it is described as "suitable for uses that need very high quality randomness such as ... key generation". In practice /dev/random cannot be relied on, because it requires large amounts of random events to be accumulated in order to provide few bytes of random data to running processes. Using it for key generation (e.g, for ssh keys during first boot) is most likely going to convert the first boot process to a coin flip; heads and system is up, tails and the system is left hanging waiting for random events. This (old) issue with a mail service process hanging for more than 20 minutes prior to doing any action, illustrates the impact of this device to real-world applications which need to generate fresh keys on startup.

On the other hand, the device /dev/urandom provides access to the same random generator, but will never block, nor apply any restrictions to the amount of new random events that must be read in order to provide any output. That is quite natural given that modern random generators when initially seeded can provide enormous amounts of output prior to being considered broken (in an informational-theory sense). So should we use only /dev/urandom today?

There is a catch. Unfortunately /dev/urandom has a quite serious flaw. If used early on the boot process when the random number generator of the kernel is not fully initialized, it will still output data. How random are the output data is system-specific, and in modern platforms, which provide specialized CPU instructions to provide random data, that is less of an issue. However, the situation where ssh keys are generated prior to the kernel pool being initialized, can be observed in virtual machines which have not been given access to the host's random generator.

Another, though not as significant, issue is the fact that both of these interfaces require a file descriptor to operate. That, on a first view, may not seem like a flaw. In that case consider the following scenarios:
  • The application calls chroot() prior to initializing the crypto library; the chroot environment doesn't contain any of /dev/*random.
  • To avoid the issue above, the crypto library opens /dev/urandom on an library constructor and stores the descriptor for later use. The application closes all open file descriptors on startup.
Both are real-world scenarios observed over the years of developing the GnuTLS library. The latter scenario is of particular concern since, if the application opens few files, the crypto library may never realize that the /dev/urandom file descriptor has been closed and replaced by another file. That may result to reading from an arbitrary file to obtain randomness. Even though one can introduce checks to detect such case, that is a particularly hard issue to spot, and requires inefficient and complex code to address.

That's where the system call getrandom() fits. Its operation is very similar to /dev/urandom, that is, it provides non-blocking access to kernel CPRNG. In addition, it requires no file descriptor, and will also block prior to the kernel random generator being initialized. Given that it addresses, the issues of /dev/urandom identified above, that seems indeed like the interface that should be used by modern libraries and applications. In fact, if you use new versions of libgcrypt and GnuTLS today, they take advantage of this API (though that change wasn't exactly a walk in the park).

On the other hand, getrandom() is still a low-level interface, and may not be suitable to be used directly by applications expecting a safe high-level interface. If one carefully reads its manual page, he will notice that the API may return less data than the requested (if interrupted by signal), and today this system call is not even wrapped by glibc. That means that can be used only via the syscall() interface. An illustration of (safe) usage of this system call, is given below.

#include <sys/syscall.h>
#include <errno.h>
#define getrandom(dst,s,flags) syscall(SYS_getrandom, (void*)dst, (size_t)s, (unsigned int)flags)

static int safe_getrandom(void *buf, size_t buflen, unsigned int flags)
{
  ssize_t left = buflen;
  ssize_t ret;
  uint8_t *p = buf;
  while (left > 0) {
   ret = getrandom(p, left, flags);
   if (ret == -1) {
    if (errno != EINTR)
     return ret;
   }
   if (ret > 0) {
    left -= ret;
    p += ret;
   }
  }
  return buflen;
}

The previous example code assumes that the Linux kernel supports this system call. For portable code which may run on kernels without it, a fallback to /dev/urandom should also be included.

From the above, it is apparent that using the Linux-kernel provided interfaces to access the kernel CPRNG, is not easy. The old (/dev/*random) interfaces APIs are difficult to use correctly, and while the getrandom() call eliminates several of their issues, it is not straightforward to use, and is not available in Linux kernels prior to 3.17. Hence, if applications require access to a CPRNG, my recommendation would be to avoid using the kernel interfaces directly, and use any APIs provided by their crypto library of choice. That way the complexity of system-discovery and any other peculiarities of these interfaces will be hidden. Some hints and tips are shown in the Fedora defensive coding guide (which may be a bit out-of-date but still a good source of information).

1 comment:

  1. I modified your routine so it works properly, you assume the user will set the correct flags and you dont test for EAGAIN
    static int safe_getrandom(void *buf, size_t buflen, unsigned int flags)
    {
    ssize_t left = buflen;
    ssize_t ret;
    uint8_t *p = buf;

    while(left > 0) {
    ret = getrandom(p, left, flags|GRND_RANDOM); // GRND_RANDOM so it wont block
    if(ret == -1) {
    if((errno != EINTR) && (errno != EAGAIN))
    return ret;
    }
    if(ret > 0) {
    left -= ret;
    p += ret;
    }
    } // end while

    return buflen;
    }

    From the getrandom man page:
    GRND_NONBLOCK
    By default, when reading from the random source, getrandom() blocks if no random bytes are available, and when reading from the urandom source, it blocks if the entropy pool has not yet been initialized.
    If the GRND_NONBLOCK flag is set, then getrandom() does not block in these cases, but instead immediately returns -1 with errno set to EAGAIN.

    ReplyDelete