Console Hacking 2016

PS4: PC Master Race

 

@marcan42

In memory of Ben ‘bushing’ Byer

It's a bit different from previous consoles

✓ x86 ✓ FreeBSD ✓ WebKit

✗ Hypervisor

But not completely different

✓ Security processor (that you can just ignore)

How to pwn a PS4

 

Step 1: Write a WebKit exploit

Step 2: Write a FreeBSD exploit

 

 

How to pwn a PS4

Step 0. Dump the code

Step 1. Write a WebKit exploit

Step 2. Write a FreeBSD exploit

Step 3. ?

Step 4. PROFIT

Black-box code extraction

The fun way

A long time ago in a hackerspace far,
far away....

 

(fail0verflow got together after 31c3)

PCIe: Fun for the whole family

  • Bus mastering
  • Complicated
  • Robust
  • Delay tolerant
  • Drivers full of fail

PCIe 101

PCIe is a reliable switched packet network

Transaction Layer Packets (TLPs):

  • Memory reads/writes
  • IO reads/writes
  • Configuration reads/writes
  • Message signaled interrupts (MSI) (writes)
  • Legacy interrupts
  • Completions

Go wild with DMA!

Except there's an IOMMU...

void load_some_stuff(void)
{
    char buf[32];

    plz2read_from_flash(SOME_ADDRESS, buf, 32);
}

void plz2read_from_flash(uint32_t addr, void *buf, size_t size)
{
    iommu_map(buf, size);
    flash_send_read_command(addr, buf, size);
    iommu_unmap(buf, size);
}

Code execution

FreeBSD kernel dump

WebKit and OS libs dump

How to pwn a PS4

Step 0. Dump the code

Step 1. Write a WebKit exploit

Step 2. Write a FreeBSD exploit

Step 3. ?ps4-kexec

Step 4. PROFIT (Linux)

From FreeBSD to Linux

PS4-kexec

jmp linux

Not so fast... we need to:

  • Load Linux into contiguous physical RAM
  • Set up Linux boot parameters
  • Shut down FreeBSD cleanly
  • Halt secondary CPUs
  • Make new pagetables and GDT
  • Disable the IOMMU
  • Relocate various things in memory
  • And more...

Okay, now jmp linux, right?

Sure, Linux will technically run

For a little bit anyway

And then it stops

No video, no serial output, nothing

Let's talk about hardware

What is x86?

A mediocre instruction set architecture

The PS4 is x86 (x86-64)

What is a PC?

A horrible, horrible thing built upon piles and piles of legacy nonsense dating back to 1981

The PS4 is NOT a PC

PC 101

  • 8259 Programmable Interrupt Controller (PIC)
  • 8253 Programmable Interval Timer (PIT)
  • 8250 UART at I/O 3f8h
  • 8042 PS/2 Keyboard Controller
  • MC146818 RTC/CMOS
  • ISA bus
  • VGA

The PS4 has none of these

AMD FCH Southbridge

Implements Intel legacy (1981)

 

Marvell Aeolia Southbridge

Implements Intel legacy (2002)

 

????

That's No Southbridge

That's a Marvell ARMADA SoC

  • Descendant from Intel StrongARM/XScale
  • ARM SoC with a bunch of peripherals
  • They stuck a PCIe bridge on it
  • Exposes ARM peripherals to the x86 side
  • Some extra stuff (e.g. HPET, ACPI stuff)
  • 256MB DDR RAM
  • Also runs FreeBSD in standby mode
  • Batshit insane

00:01.2

00: bus number (8 bits)

01: device number (5 bits)

2: function number (3 bits)

Intel “Panther Point” PCH

00:14.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB xHCI Host Controller (rev 04)
00:16.0 Communication controller: Intel Corporation 7 Series/C216 Chipset Family MEI Controller #1 (rev 04)
00:1a.0 USB controller: Intel Corporation 7 Series/C216 Chipset Family USB Enhanced Host Controller #2 (rev 04)
00:1b.0 Audio device: Intel Corporation 7 Series/C216 Chipset Family High Definition Audio Controller (rev 04)
00:1c.0 PCI bridge: Intel Corporation 7 Series/C216 Chipset Family PCI Express Root Port 1 (rev c4)
00:1c.1 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 2 (rev c4)
00:1c.2 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 3 (rev c4)
00:1c.3 PCI bridge: Intel Corporation 7 Series/C216 Chipset Family PCI Express Root Port 4 (rev c4)
00:1d.0 USB controller: Intel Corporation 7 Series/C216 Chipset Family USB Enhanced Host Controller #1 (rev 04)
00:1f.0 ISA bridge: Intel Corporation HM77 Express Chipset LPC Controller (rev 04)
00:1f.2 SATA controller: Intel Corporation 7 Series Chipset Family 6-port SATA Controller [AHCI mode] (rev 04)
00:1f.3 SMBus: Intel Corporation 7 Series/C216 Chipset Family SMBus Controller (rev 04)

Marvell “Aeolia”

[...]
00:12.0 System peripheral: Sony Corporation Aeolia ACPI
[...]
00:13.0 System peripheral: Sony Corporation Aeolia ACPI
[...]
00:14.0 System peripheral: Sony Corporation Aeolia ACPI
00:14.1 System peripheral: Sony Corporation Aeolia Ethernet Controller (Marvell Yukon 2 Family)
00:14.2 System peripheral: Sony Corporation Aeolia SATA AHCI Controller
00:14.3 System peripheral: Sony Corporation Aeolia SD/MMC Host Controller
00:14.4 System peripheral: Sony Corporation Aeolia PCI Express Glue and Miscellaneous Devices
00:14.5 System peripheral: Sony Corporation Aeolia DMA Controller
00:14.6 System peripheral: Sony Corporation Aeolia Memory (DDR3/SPM)
00:14.7 System peripheral: Sony Corporation Aeolia USB 3.0 xHCI Host Controller
00:15.0 System peripheral: Sony Corporation Aeolia ACPI
[...]
00:16.0 System peripheral: Sony Corporation Aeolia ACPI
[...]
00:17.0 System peripheral: Sony Corporation Aeolia ACPI
[...]

It clones itself across all PCI device numbers

8 functions ain't enough for everybody

00:14.4 “PCI Express Glue”

  • PCIe bridge config
  • MSI interrupt controller
  • ICC
  • HPET
  • Flash controller
  • RTC
  • Timers
  • 2 serial ports
  • I²C

Linux Minimum System Requirements

  • A timer (PIT)
  • Interrupts (PIC)
  • Some kind of console

PS4: no PIT, no PIC, no standard serial

Board has testpoints for an 8250-derived serial port

dmesg plz

Linux earlycon: early console for debugging

No IRQs required

console=uart8250,mmio32,0xd0340000,3200n8

Clock is different... 3200 means 115200

This gets us a boot log

Time Stamp Counter (tsc)

Newfangled timer, in-CPU

PS4 Liverpool APU supports proper TSC

Linux tries to calibrate it...

... against PIC or PMTIMER

Fail

Again, It Really Isn't A PC

 enum {
         X86_SUBARCH_PC = 0,
         X86_SUBARCH_LGUEST,
         X86_SUBARCH_XEN,
         X86_SUBARCH_INTEL_MID,
         X86_SUBARCH_CE4100,
+        X86_SUBARCH_PS4,
         X86_NR_SUBARCHS,
 };

Subarch specified by bootloader (ps4-kexec)

Enables custom TSC calibration code

Disables legacy PIC and RTC

ACPI

Not just “power”

Needed for proper PCI config, IOMMU, CPU frequency scaling...

PS4 has broken ACPI tables...

Fix them in ps4-kexec

PCI MSI 101

Message Signaled Interrupts

  • Device configuration registers for address and value
  • To fire an interrupt, devices write a value to an address
  • CPU IRQ controller (LAPIC) receives and fires interrupt vector
  • The message value directly defines the CPU IRQ vector

Aeolia MSI 101

  • Device MSI configuration registers ignored
  • Function 4 ("glue") implements custom MSI controller
  • Each function gets shared addr and top 27 bits of data
  • Each "sub-function" only gets separate bottom 5 bits
  • All MSIs originate from Function 4

 

(ノಠ益ಠ)ノ彡┻━┻

Driver hell

  • Sibling devices are inter-dependent
  • Linux IRQ vector allocation not sequential
  • Need to modify all drivers to use custom IRQ code

Aeolia on Linux

  • Core driver implements IRQ controller interface
  • Linux probe-defer mechanism to fix ordering issue
  • Some drivers (SDHCI, GigE) modified to request Aeolia IRQs
  • Some drivers (serial, USB) instantiated from wrappers
  • Each function uses a single shared IRQ :(

IOMMU to the rescue

  • Allows interrupt remapping
  • Consecutive message numbering
  • Can use unique IRQs per sub-function :)
  • Falls back to shared IRQs if IOMMU off
  • The ACPI table for the IOMMU is missing :(

Checklist

✓ IRQs (apcie)

✓ Timer (TSC)

✓ Early serial

✓ Late serial with IRQs (apcie-uart)

✓ Initramfs userspace

✗ Serial I/O hangs sometimes :(

More cleanup needed

FreeBSD masks some IRQ vectors on CPU#0 with nonstandard AMD LAPIC features

Clean them up in ps4-kexec

✓ Serial is stable

This took *ages* to debug

Just add drivers

✓ USB xHCI (3 USB controllers in one function...)

✓ SDHCI (Nonstandard PCI config, needs quirks...)

✓ Ethernet (Driver needs hacks; still partially broken...)

Worked fine on Linux 4.4

Failed on 4.9 - DMA broken?

Aeolia Strikes Back

31-bit DMA

┻━┻ ︵ヽ(`Д´)ノ︵ ┻━┻

More Linux driver patching...

And now for something completely different

PS4 Radeon Graphics

“Starsha”

dce_ihdef_get_info_crtc_linea_liverpool
"LVP A0" StarshaAsicStateRegInfo
ThJStarsha AGESAThebeJBDK

 

Nobody (not even Sony/AMD) agrees on the APU codename

We're calling it Liverpool

Liverpool Graphics

  • AMD GCN “Sea Islands” (CI) GPU
  • Similar to other chips in the generation
  • Some quirks, customizations, oddities
  • We used Bonaire as a base

Hacking on AMD drivers

AMD publishes 3D shader and command queue documentation

They do NOT publish register docs for recent GPUs

That's what we need to hack on kernel drivers :(

“The code is the documentation” - incomplete, magic numbers

Google Time

http://www.siliconkit.com/pragmatic/bonaire.xml

XML dump of Bonaire register documentation?

    <field>
        <fname>
            <token>P_ALWAYS_USE_FAST_TXCLK</token>
        </fname>
        <frange>
            <token>13:13</token>
        </frange>
        <ftype>
            <token>ALPHA</token>
            <token>{</token>
            <fieldtexts>
                <fieldtext>
                    <quoted>
                        <token>"TXCLK will be either 250MHz, 500MHz, or 1GHz
                               depends on port speeds "</token>
                    </quoted>

Broken, incomplete

Tell me more

What is RAI

http://www.siliconkit.com/pragmatic/RAI/rai.grammar4.txt

root ::= sections
sections ::= section sections
section ::= 'SECTION_START' 'CHIP_INFO' statements 'SECTION_END'
section ::= 'SECTION_START' 'CHIP_SPACES' chipspaces 'SECTION_END'
section ::= 'SECTION_START' 'CHIP_STREAMS' '[a-zA-Z0-9_]*' 'SECTION_END'
section ::= 'SECTION_START' 'CHIP_MEMORIES' '[a-zA-Z0-9_]*' 'SECTION_END'
section ::= 'SECTION_START' 'CHIP_PARAMETERS' '[a-zA-Z0-9_]*' 'SECTION_END'
section ::= 'SECTION_START' 'BLOCK_INFO' statements 'SECTION_END'
section ::= 'SECTION_START' 'BLOCK_REGISTERS' register 'SECTION_END'
register ::= title spaces size rattribute '{' fields '}' ';'
register ::= title spaces size '{' fields '}' ';'
title ::= '[a-zA-Z0-9_]*'
spaces ::= space spaces
[...]

AMD internal register description file?

Hmmmmm...

http://www.siliconkit.com/pragmatic/bonaire.xml

http://www.siliconkit.com/pragmatic/RAI/rai.grammar4.txt

Maybe...

http://www.siliconkit.com/pragmatic/bonaire.rai

Nope

http://www.siliconkit.com/pragmatic/RAI/bonaire.rai

Bingo

//Version 1.0.1.0
//CL# 890079 
//Version 1.0.0.0
//CL# 883050

SECTION_START CHIP_INFO

CHIP_NAME = "bonaire";
DESCRIPTION = "R8xx GPU Chip";
RELEASE = "Chip Spec 0.28";
// Edit Vendor ID Here: Default(0xFFFF) means search for all
ASIC_VENDOR_ID = 0x1002;
[...]

So I wrote a *working* parser

$ python showregname.py HDP_NONSURFACE_INFO
HDP_NONSURFACE_INFO (GpuF0Reg:0x2c08,GpuF1Reg:0x2c08) 32bit:
       0  NONSURF_ADDR_TYPE
          - 0: physical address with no translation. 
          - 1: virtual address, requires page table translation. 
     4:1  NONSURF_ARRAY_MODE
          - 0: ARRAY_LINEAR_GENERAL: Unaligned linear array 
          - 1: ARRAY_LINEAR_ALIGNED: Aligned linear array
[...]

Also does annotated register dumps, diffs, #define generation

4000+ registers documented in GpuF0Reg alone

Road to the framebuffer

HDMI is easy, right?

  • GPU has HDMI, DisplayPort ports
  • HDMI not connected; DP connected

????

External HDMI encoder

We must go deeper

Panasonic I²C DisplayPort → HDMI bridge

Requires configuration to work

Hooked up to the GPU I²C bus?

You wish

ICC

  • RPC protocol used to send commands to system MCU
  • Message box / doorbell protocol
  • Accessed via Aeolia
  • Used for things like power, buttons, LEDs...
  • And the HDMI encoder I²C

ICC I²C

Let's build a simple I²C interface?

Nah, let's make a bytecode scripting engine to issue I²C commands

 

Why?!?

Because ICC is too slow to issue requests one by one

More hacks...

  • HDMI encoder requires all 4 DisplayPort lanes active
  • Scanout memory bandwidth calculation is broken
  • Mouse cursor size is from previous generation (wat?)

✓ Framebuffer console working

✗ X won't start with radeon driver

A tale of two memories

PS4 uses a unified memory architecture

Linux legacy driver expectes a usable amount of "video" memory

PS4 configures emulated VRAM as 16MiB...

Solution: reconfigure memory controller in ps4-kexec to assign 1GiB of RAM as VRAM

✓ X starts

It's 3D time

Radeon GPU 101

Commands are sent to the GPU by putting them in rings:

  • Graphics ring
  • Compute rings
  • DMA rings

Commands are processed by the GPU Command Processor

It contains multiple sub-units (ME, PFP, CE), each of which is a custom ‘F32’ CPU running microcode firmware

Rings can call out to Indirect Buffers (IBs) with more commands

radeon: ring 0 test failed

The graphics ring isn't working

WREG32(scratch, 0xCAFEDEAD);
radeon_ring_lock(rdev, ring, 3);
radeon_ring_write(ring, PACKET3(PACKET3_SET_UCONFIG_REG, 1));
radeon_ring_write(ring, ((scratch - PACKET3_SET_UCONFIG_REG_START) >> 2));
radeon_ring_write(ring, 0xDEADBEEF);
radeon_ring_unlock_commit(rdev, ring, false);

The ring test writes to a GPU register from the ring, then checks to see if the write happened

Debug registers (thanks bonaire.rai!) show the CP is stuck...

... waiting for data in the ring...

... after a NOP command?

NOP is hard, let's go stalling

Packet headers have a length field of size - 2

2-word packet: size = 0.

They added a 1-word NOP: size = 0x3fff (-1)

Old microcode... interprets it as a huge packet

Hawaii has the same issue on old microcode:

if (rdev->family == CHIP_HAWAII) {
    if (rdev->new_fw)
        nop = PACKET3(PACKET3_NOP, 0x3FFF);
    else
        nop = RADEON_CP_PACKET2;
} else {
    nop = PACKET3(PACKET3_NOP, 0x3FFF);
}

radeon: ring 3 test failed

That's the SDMA ring

radeon_ring_write(ring, SDMA_PACKET(SDMA_OPCODE_WRITE,
                        SDMA_WRITE_SUB_OPCODE_LINEAR, 0));
radeon_ring_write(ring, lower_32_bits(gpu_addr));
radeon_ring_write(ring, upper_32_bits(gpu_addr));
radeon_ring_write(ring, 1); /* number of DWs to follow */
radeon_ring_write(ring, 0xDEADBEEF);

Same idea: write a value to memory, check for it

Debugging, the write happens... but it writes zero?

Double it up

So I tried queuing two writes instead:

radeon_ring_write(ring, SDMA_PACKET(SDMA_OPCODE_WRITE,
                        SDMA_WRITE_SUB_OPCODE_LINEAR, 0));
radeon_ring_write(ring, lower_32_bits(gpu_addr));
radeon_ring_write(ring, upper_32_bits(gpu_addr));
radeon_ring_write(ring, 1); /* number of DWs to follow */
radeon_ring_write(ring, 0xDEADBEEF);                       <-- What it *should*
radeon_ring_write(ring, SDMA_PACKET(SDMA_OPCODE_WRITE,         write
                        SDMA_WRITE_SUB_OPCODE_LINEAR, 0));
radeon_ring_write(ring, lower_32_bits(gpu_addr2));
radeon_ring_write(ring, upper_32_bits(gpu_addr2));
radeon_ring_write(ring, 1); /* number of DWs to follow */  <-- What it writes
radeon_ring_write(ring, 0x0BADF00D);

Now it writes... 1 to the first destination?

SDMA: Off-by-four

Linear writes from the ring start 4 words too late in the ring

IBs work fine, only the ring is broken

Workaround: use FILL opcode instead:

radeon_ring_write(ring, SDMA_PACKET(SDMA_OPCODE_CONSTANT_FILL, 0,
                        SDMA_CONSTANT_FILL_EXTRA_SIZE(2)));
radeon_ring_write(ring, lower_32_bits(gpu_addr2));
radeon_ring_write(ring, upper_32_bits(gpu_addr2));
radeon_ring_write(ring, 0xDEADBEEF); /* Fill value */
radeon_ring_write(ring, 4); /* number of bytes */

Still no worky

Can't write to pagetable config registers via GPU commands :(

Linux uses this to configure pagetables

Special register firewall in Liverpool? Security?

Workaround by directly writing from CPU, but it sucks

Maybe the register firewall is in the firmware?

Speaking of firmware

The Command Processor blocks require “microcode”

Thus far undocumented

We pull the firmware blobs from FreeBSD in ps4-kexec and pass them in initramfs (avoids redistribution issues)

Let's dig deeper

Reversing CPU architectures 101

  1. Guess an instruction
  2. Try running it
  3. See what it did
  4. GOTO 1

We can upload custom F32 firmware easily and have it write to scratch regs, then read what it wrote

The basic "write to GPU reg" instruction is easy to find from GPU register offsets, in the microcode blobs

f32dis

Disassembler for the AMD proprietary ‘F32’ GPU microcode

CLEAR_STATE:
  5e  cc800000 | stw r2, [r0, #0x0]
  5f  cc400000 | stw r1, [r0, #0x0]
  60  cc000016 | stw r0, [r0, #0x16]
  61  80000672 | b 0x672  

INDEX_BUFFER_SIZE:
  62  cc40002d | stw r1, [r0, #0x2d]
  63  7c408001 | mov r2, r1
  64  88000000 | btab

Instruction syntax shamelessly stolen from ARM

Not complete, but disassembles all instructions used in Liverpool and Bonaire firmwares for PFP, ME, CE, MEC, RLC

Alas

Register blocking not in the firmware

It seems it is blocked in hardware, when issued from GFX block (debug registers show an access violation)

Haven't found how to turn it off yet

3D does work with the CPU write workaround, though!

Current Checklist

✓ IRQs / Timer
✓ Serial port
✓ Shutdown / reboot
✓ Power button
✓ USB
✓ HDD
✓ Blu-Ray
✓ WiFi
✓ Bluetooth
✓ Ethernet (mostly)
✓ Framebuffer / KMS
✓ HDMI (basic)
✓ 3D (with ugly hack)
✓ S/PDIF audio


✗ HDMI audio
✗ RTC

✓ Blinkenlights

Code

github.com/fail0verflow/ps4-kexec

  • kexec and hardware reconfiguration / "bootloader" code

github.com/fail0verflow/ps4-linux

  • Kernel tree

github.com/fail0verflow/ps4-radeon-patches

  • Userspace library patches

github.com/fail0verflow/radeon-tools

  • f32dis and RAI tools

http://fail0verflow.com · @fail0verflow

Demo time!

Well...