@marcan42
It's a bit different from previous consoles
✓ x86 ✓ FreeBSD ✓ WebKit
✗ Hypervisor
But not completely different
✓ Security processor (that you can just ignore)
Step 1: Write a WebKit exploit
Step 2: Write a FreeBSD exploit
Step 0. Dump the code
Step 1. Write a WebKit exploit
Step 2. Write a FreeBSD exploit
Step 3. ?
Step 4. PROFIT
(fail0verflow got together after 31c3)
PCIe is a reliable switched packet network
Transaction Layer Packets (TLPs):
Except there's an IOMMU...
void load_some_stuff(void) { char buf[32]; plz2read_from_flash(SOME_ADDRESS, buf, 32); } void plz2read_from_flash(uint32_t addr, void *buf, size_t size) { iommu_map(buf, size); flash_send_read_command(addr, buf, size); iommu_unmap(buf, size); }
✓ Code execution
✓ FreeBSD kernel dump
✓ WebKit and OS libs dump
✓ Step 0. Dump the code
✓ Step 1. Write a WebKit exploit
✓ Step 2. Write a FreeBSD exploit
✓ Step 3. ?ps4-kexec
✓ Step 4. PROFIT (Linux)
jmp linux
Not so fast... we need to:
jmp linux
, right?Sure, Linux will technically run
For a little bit anyway
And then it stops
No video, no serial output, nothing
A mediocre instruction set architecture
The PS4 is x86 (x86-64)
A horrible, horrible thing built upon piles and piles of legacy nonsense dating back to 1981
The PS4 is NOT a PC
The PS4 has none of these
Implements Intel legacy (1981)
Implements Intel legacy (2002)
00: bus number (8 bits)
01: device number (5 bits)
2: function number (3 bits)
00:14.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB xHCI Host Controller (rev 04) 00:16.0 Communication controller: Intel Corporation 7 Series/C216 Chipset Family MEI Controller #1 (rev 04) 00:1a.0 USB controller: Intel Corporation 7 Series/C216 Chipset Family USB Enhanced Host Controller #2 (rev 04) 00:1b.0 Audio device: Intel Corporation 7 Series/C216 Chipset Family High Definition Audio Controller (rev 04) 00:1c.0 PCI bridge: Intel Corporation 7 Series/C216 Chipset Family PCI Express Root Port 1 (rev c4) 00:1c.1 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 2 (rev c4) 00:1c.2 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 3 (rev c4) 00:1c.3 PCI bridge: Intel Corporation 7 Series/C216 Chipset Family PCI Express Root Port 4 (rev c4) 00:1d.0 USB controller: Intel Corporation 7 Series/C216 Chipset Family USB Enhanced Host Controller #1 (rev 04) 00:1f.0 ISA bridge: Intel Corporation HM77 Express Chipset LPC Controller (rev 04) 00:1f.2 SATA controller: Intel Corporation 7 Series Chipset Family 6-port SATA Controller [AHCI mode] (rev 04) 00:1f.3 SMBus: Intel Corporation 7 Series/C216 Chipset Family SMBus Controller (rev 04)
[...] 00:12.0 System peripheral: Sony Corporation Aeolia ACPI [...] 00:13.0 System peripheral: Sony Corporation Aeolia ACPI [...] 00:14.0 System peripheral: Sony Corporation Aeolia ACPI 00:14.1 System peripheral: Sony Corporation Aeolia Ethernet Controller (Marvell Yukon 2 Family) 00:14.2 System peripheral: Sony Corporation Aeolia SATA AHCI Controller 00:14.3 System peripheral: Sony Corporation Aeolia SD/MMC Host Controller 00:14.4 System peripheral: Sony Corporation Aeolia PCI Express Glue and Miscellaneous Devices 00:14.5 System peripheral: Sony Corporation Aeolia DMA Controller 00:14.6 System peripheral: Sony Corporation Aeolia Memory (DDR3/SPM) 00:14.7 System peripheral: Sony Corporation Aeolia USB 3.0 xHCI Host Controller 00:15.0 System peripheral: Sony Corporation Aeolia ACPI [...] 00:16.0 System peripheral: Sony Corporation Aeolia ACPI [...] 00:17.0 System peripheral: Sony Corporation Aeolia ACPI [...]
It clones itself across all PCI device numbers
00:14.4 “PCI Express Glue”
PS4: no PIT, no PIC, no standard serial
Board has testpoints for an 8250-derived serial port
Linux earlycon: early console for debugging
No IRQs required
console=uart8250,mmio32,0xd0340000,3200n8
Clock is different... 3200 means 115200
This gets us a boot log
Newfangled timer, in-CPU
PS4 Liverpool APU supports proper TSC
Linux tries to calibrate it...
... against PIC or PMTIMER
Fail
enum {
X86_SUBARCH_PC = 0,
X86_SUBARCH_LGUEST,
X86_SUBARCH_XEN,
X86_SUBARCH_INTEL_MID,
X86_SUBARCH_CE4100,
+ X86_SUBARCH_PS4,
X86_NR_SUBARCHS,
};
Subarch specified by bootloader (ps4-kexec)
Enables custom TSC calibration code
Disables legacy PIC and RTC
Needed for proper PCI config, IOMMU, CPU frequency scaling...
PS4 has broken ACPI tables...
Fix them in ps4-kexec
✓ IRQs (apcie)
✓ Timer (TSC)
✓ Early serial
✓ Late serial with IRQs (apcie-uart)
✓ Initramfs userspace
✗ Serial I/O hangs sometimes :(
FreeBSD masks some IRQ vectors on CPU#0 with nonstandard AMD LAPIC features
Clean them up in ps4-kexec
✓ Serial is stable
This took *ages* to debug
✓ USB xHCI (3 USB controllers in one function...)
✓ SDHCI (Nonstandard PCI config, needs quirks...)
✓ Ethernet (Driver needs hacks; still partially broken...)
Worked fine on Linux 4.4
Failed on 4.9 - DMA broken?
More Linux driver patching...
dce_ihdef_get_info_crtc_linea_liverpool
"LVP A0"
StarshaAsicStateRegInfo
ThJStarsha AGESAThebeJBDK
Nobody (not even Sony/AMD) agrees on the APU codename
We're calling it Liverpool
AMD publishes 3D shader and command queue documentation
They do NOT publish register docs for recent GPUs
That's what we need to hack on kernel drivers :(
“The code is the documentation” - incomplete, magic numbers
http://www.siliconkit.com/pragmatic/bonaire.xml
XML dump of Bonaire register documentation?
<field> <fname> <token>P_ALWAYS_USE_FAST_TXCLK</token> </fname> <frange> <token>13:13</token> </frange> <ftype> <token>ALPHA</token> <token>{</token> <fieldtexts> <fieldtext> <quoted> <token>"TXCLK will be either 250MHz, 500MHz, or 1GHz depends on port speeds "</token> </quoted>
Broken, incomplete
http://www.siliconkit.com/pragmatic/RAI/rai.grammar4.txt
root ::= sections sections ::= section sections section ::= 'SECTION_START' 'CHIP_INFO' statements 'SECTION_END' section ::= 'SECTION_START' 'CHIP_SPACES' chipspaces 'SECTION_END' section ::= 'SECTION_START' 'CHIP_STREAMS' '[a-zA-Z0-9_]*' 'SECTION_END' section ::= 'SECTION_START' 'CHIP_MEMORIES' '[a-zA-Z0-9_]*' 'SECTION_END' section ::= 'SECTION_START' 'CHIP_PARAMETERS' '[a-zA-Z0-9_]*' 'SECTION_END' section ::= 'SECTION_START' 'BLOCK_INFO' statements 'SECTION_END' section ::= 'SECTION_START' 'BLOCK_REGISTERS' register 'SECTION_END' register ::= title spaces size rattribute '{' fields '}' ';' register ::= title spaces size '{' fields '}' ';' title ::= '[a-zA-Z0-9_]*' spaces ::= space spaces [...]
AMD internal register description file?
http://www.siliconkit.com/pragmatic/bonaire.xml
http://www.siliconkit.com/pragmatic/RAI/rai.grammar4.txt
Maybe...
http://www.siliconkit.com/pragmatic/bonaire.rai
Nope
//Version 1.0.1.0 //CL# 890079 //Version 1.0.0.0 //CL# 883050 SECTION_START CHIP_INFO CHIP_NAME = "bonaire"; DESCRIPTION = "R8xx GPU Chip"; RELEASE = "Chip Spec 0.28"; // Edit Vendor ID Here: Default(0xFFFF) means search for all ASIC_VENDOR_ID = 0x1002; [...]
So I wrote a *working* parser
$ python showregname.py HDP_NONSURFACE_INFO HDP_NONSURFACE_INFO (GpuF0Reg:0x2c08,GpuF1Reg:0x2c08) 32bit: 0 NONSURF_ADDR_TYPE - 0: physical address with no translation. - 1: virtual address, requires page table translation. 4:1 NONSURF_ARRAY_MODE - 0: ARRAY_LINEAR_GENERAL: Unaligned linear array - 1: ARRAY_LINEAR_ALIGNED: Aligned linear array [...]
Also does annotated register dumps, diffs, #define generation
4000+ registers documented in GpuF0Reg alone
????
Panasonic I²C DisplayPort → HDMI bridge
Requires configuration to work
Hooked up to the GPU I²C bus?
You wish
Let's build a simple I²C interface?
Nah, let's make a bytecode scripting engine to issue I²C commands
Because ICC is too slow to issue requests one by one
✓ Framebuffer console working
✗ X won't start with radeon driver
PS4 uses a unified memory architecture
Linux legacy driver expectes a usable amount of "video" memory
PS4 configures emulated VRAM as 16MiB...
Solution: reconfigure memory controller in ps4-kexec to assign 1GiB of RAM as VRAM
✓ X starts
Commands are sent to the GPU by putting them in rings:
Commands are processed by the GPU Command Processor
It contains multiple sub-units (ME, PFP, CE), each of which is a custom ‘F32’ CPU running microcode firmware
Rings can call out to Indirect Buffers (IBs) with more commands
radeon: ring 0 test failed
The graphics ring isn't working
WREG32(scratch, 0xCAFEDEAD); radeon_ring_lock(rdev, ring, 3); radeon_ring_write(ring, PACKET3(PACKET3_SET_UCONFIG_REG, 1)); radeon_ring_write(ring, ((scratch - PACKET3_SET_UCONFIG_REG_START) >> 2)); radeon_ring_write(ring, 0xDEADBEEF); radeon_ring_unlock_commit(rdev, ring, false);
The ring test writes to a GPU register from the ring, then checks to see if the write happened
Debug registers (thanks bonaire.rai!) show the CP is stuck...
... waiting for data in the ring...
... after a NOP command?
Packet headers have a length field of size - 2
2-word packet: size = 0.
They added a 1-word NOP: size = 0x3fff (-1)
Old microcode... interprets it as a huge packet
Hawaii has the same issue on old microcode:
if (rdev->family == CHIP_HAWAII) { if (rdev->new_fw) nop = PACKET3(PACKET3_NOP, 0x3FFF); else nop = RADEON_CP_PACKET2; } else { nop = PACKET3(PACKET3_NOP, 0x3FFF); }
radeon: ring 3 test failed
That's the SDMA ring
radeon_ring_write(ring, SDMA_PACKET(SDMA_OPCODE_WRITE, SDMA_WRITE_SUB_OPCODE_LINEAR, 0)); radeon_ring_write(ring, lower_32_bits(gpu_addr)); radeon_ring_write(ring, upper_32_bits(gpu_addr)); radeon_ring_write(ring, 1); /* number of DWs to follow */ radeon_ring_write(ring, 0xDEADBEEF);
Same idea: write a value to memory, check for it
Debugging, the write happens... but it writes zero?
So I tried queuing two writes instead:
radeon_ring_write(ring, SDMA_PACKET(SDMA_OPCODE_WRITE, SDMA_WRITE_SUB_OPCODE_LINEAR, 0)); radeon_ring_write(ring, lower_32_bits(gpu_addr)); radeon_ring_write(ring, upper_32_bits(gpu_addr)); radeon_ring_write(ring, 1); /* number of DWs to follow */ radeon_ring_write(ring, 0xDEADBEEF); <-- What it *should* radeon_ring_write(ring, SDMA_PACKET(SDMA_OPCODE_WRITE, write SDMA_WRITE_SUB_OPCODE_LINEAR, 0)); radeon_ring_write(ring, lower_32_bits(gpu_addr2)); radeon_ring_write(ring, upper_32_bits(gpu_addr2)); radeon_ring_write(ring, 1); /* number of DWs to follow */ <-- What it writes radeon_ring_write(ring, 0x0BADF00D);
Now it writes... 1 to the first destination?
Linear writes from the ring start 4 words too late in the ring
IBs work fine, only the ring is broken
Workaround: use FILL opcode instead:
radeon_ring_write(ring, SDMA_PACKET(SDMA_OPCODE_CONSTANT_FILL, 0, SDMA_CONSTANT_FILL_EXTRA_SIZE(2))); radeon_ring_write(ring, lower_32_bits(gpu_addr2)); radeon_ring_write(ring, upper_32_bits(gpu_addr2)); radeon_ring_write(ring, 0xDEADBEEF); /* Fill value */ radeon_ring_write(ring, 4); /* number of bytes */
Can't write to pagetable config registers via GPU commands :(
Linux uses this to configure pagetables
Special register firewall in Liverpool? Security?
Workaround by directly writing from CPU, but it sucks
Maybe the register firewall is in the firmware?
The Command Processor blocks require “microcode”
Thus far undocumented
We pull the firmware blobs from FreeBSD in ps4-kexec and pass them in initramfs (avoids redistribution issues)
Let's dig deeper
We can upload custom F32 firmware easily and have it write to scratch regs, then read what it wrote
The basic "write to GPU reg" instruction is easy to find from GPU register offsets, in the microcode blobs
Disassembler for the AMD proprietary ‘F32’ GPU microcode
CLEAR_STATE: 5e cc800000 | stw r2, [r0, #0x0] 5f cc400000 | stw r1, [r0, #0x0] 60 cc000016 | stw r0, [r0, #0x16] 61 80000672 | b 0x672 INDEX_BUFFER_SIZE: 62 cc40002d | stw r1, [r0, #0x2d] 63 7c408001 | mov r2, r1 64 88000000 | btab
Instruction syntax shamelessly stolen from ARM
Not complete, but disassembles all instructions used in Liverpool and Bonaire firmwares for PFP, ME, CE, MEC, RLC
Register blocking not in the firmware
It seems it is blocked in hardware, when issued from GFX block (debug registers show an access violation)
Haven't found how to turn it off yet
3D does work with the CPU write workaround, though!
github.com/fail0verflow/ps4-kexec
github.com/fail0verflow/ps4-linux
github.com/fail0verflow/ps4-radeon-patches
github.com/fail0verflow/radeon-tools
Well...