2013-05-06

The future of console homebrew (and a shot of Espresso)

By marcan
Filed under wiiu

It’s been almost 7 years since the Wii was released. Back in 2006, not many owned a living room PC. PCs were still relatively bulky, and the Chinese offerings were limited to horrible media players. At the time, the prospect of having a game console double as a HTPC and being able to browse the web, play games for older platforms with emulation, and run homebrew games on a device which you already had in the living room was rather appealing.

Fast forward to today. Mobile SoCs have made huge advances - you can get a quad-core chip in a phone these days - and have made the jump to the living room. Spend $25 and you can get a Raspberry Pi, which is about on par with the Wii at ¹⁄₁₀ of the launch price and 1/7th of the power consumption (with HD video). Spend $100 and you can get an Ouya, which beats the Wii U’s CPU and doesn’t have too shabby graphics at one third the cost. These mobile-derived devices aren’t quite a replacement for game consoles yet, but they’re catching up fast. They’re cheap enough that they’re almost disposable. The software ecosystem is much larger and wider than any console has ever had. More importantly, they’re open, and the development tools and environments are way better for open development than any game console ever was.

The Wii U isn’t a particularly interesting device. It has the same old Wii CPU, times three. The GPU is a standard, and somewhat outdated Radeon core. The peripheral hardware is standard - SD, USB, SATA, WiFi, etc. The Wii hardware has been either kept as-is or replaced with compatibility shims. The only interesting bit is the controller, but there is already significant work underway to be able to use it with a PC (all you need is a wireless card capable of 5GHz 802.11n AP mode and special software). Even on the Wii U itself, the gamepad is managed by an independent Broadcom SoC that has its own firmware and communicates with the rest of the system via bog-standard USB and one of the video output heads on the Radeon.

The same goes, by the way, for the Xbox Durango and the Playstation Orbis. They’re both glorified PCs. With Valve’s Steam Box coming up, there will be little advantage to either of the first consoles other than potentially new input devices and exclusive games. And the Steam Box will almost certainly be hackable with trivial to no effort.

So where does that leave us? When the Wii U came out, our hacker instincts kicked in and we started looking into ways of breaking into the hardware. A few days before launch we had a firmware update scraper going. Over the next 30 days, we reached most of the milestones required to be able to say that we hacked the device; without going into details, there is basically no security left to break into, other than a mostly unimportant step of the boot process. What would remain is the tedious work of developing the open frameworks required to bootstrap a homebrew community, documenting everything, reverse engineering all of the new hardware, developing a persistent exploit (think tethered vs. untethered iPhone jailbreak, except without any extra hardware or cables), and packaging it all up.

Over the next few months, interest faded. I took a break to work on other projects. There wasn’t much of a reaction from the Wii homebrew community. Is it really worth going through all that effort when we already have open devices that are affordable and widely available? About 31 trustworthy people, most of them well-known people in the homebrew community, have access to what we developed, yet nobody stepped up to start working on a homebrew platform for the Wii U.

At the same time, there is an eternal clash between the homebrew community and those interested in pirating games. Writing homebrew software and frameworks is rather difficult - it requires new code to be written to support the hardware, which must be reverse engineered first. Convincing a game console to load copied games is comparatively simpler, as only the bare minimum amount of code patches required to convince the game/OS to load the game from alternate storage media are required. For example, on the PS3, the kernel payload of the first game loaders was a tiny system call patch, and I wrote an (unreleased) Wii USB loader using existing homebrew frameworks in a couple hundred lines of code, as a proof of concept. Every console after the PS2 was initially broken to run open homebrew code, and only later did piracy show up (excluding disc-drive-based hacks, which I consider a different category).

I think we may have reached the point where homebrew on closed game consoles is no longer appealing. The effort required to develop and maintain an environment for a big, complex modern game console is huge. The cat and mouse game with the manufacturer requires ongoing effort. There is a very real threat of litigation. Game pirates would become not just big users of the result of those efforts, but by far the overwhelming majority (not because there are more pirates, but because there are fewer homebrewers). The fact that the Wii U isn’t selling nearly as well as the Wii did doesn’t help drive enthusiasm either.

I could be wrong, of course. Maybe it’s just that I have a full-time job now and less of a chance to spend all-nighters staring at assembly code. Maybe there are tons of prospective Wii U homebrew developers quietly waiting in the sidelines for a release. Maybe we’ve just gotten lazy.

We could just release everything as-is, of course. However, we tried that with the PS3, and the results were not only disappointing, but we actually ended up in an undeserved legal mess. Homebrew for the PS3 is basically nonexistent, and all anyone cares about is piracy. This is not a situation which we want to see happen again.

So, instead, we can go for a compromise. Our original idea for Wii U homebrew was to “escape” from the Wii mode sandbox and enable the new Wii U hardware. This appeared possible initially, but unfortunately, it turned out that a few critical hardware registers were irreversibly disabled in Wii mode. However, due to the design of the Wii U’s architecture, a few things can be re-enabled. One of those is the multicore support of the Espresso CPU.

On the Wii, the Broadway CPU had no built-in security: games ran on the bare metal and the Starlet handled all security. The Starlet was responsible for kickstarting the Broadway and feeding it code to run. The Wii U extends this architecture by putting some extra security inside the Espresso CPU: now, the Espresso has its own secure boot ROM, like the Starbuck, and will only boot a signed and encrypted code package. This package (which we call an ancast image) is delivered by the Starbuck and verified and decrypted when the Espresso is reset.

This happens in Wii mode too. However, Wii mode software knows nothing of this mechanism. Nintendo worked around this by transparently replacing the NANDloader (which is usually the first code that runs on the Broadway when it boots) with a modified version, encrypted and signed using the new format. The Starbuck (running vWii IOS) loads the new NANDLoader to RAM as it normally would, but when the Espresso is reset, it instead runs its boot ROM. The ROM decrypts the NANDLoader in-place and verifies it against its hardcoded ECDSA public key. If the verification succeeds, it jumps to its entrypoint. The very first thing that the NANDLoader does is turn off the Espresso features and put it into Wii compatibility mode. The new NANDLoader is stored in dedicated vWii mode titles 1-200 and 1-201 (there are two variants, but 1-200 is equivalent to the “normal” NANDLoader).

Incidentally, for vWii mode, we made a minor tweak to HBC so that its binary can both be loaded standalone as was the case on the Wii, and also works together with a NANDloader - this means we can use the same binary for both platforms, and we don’t have to ship a separate NANDloader that would be replaced in vWii mode.

Thankfully, even though the Espresso has ROM security now, unlike the Starlet/Starbuck, it has no memory firewall or similar protection (i.e. AHBPROT). This means that we are free to mess with the contents of memory while the Espresso boots. We can perform a completely trivial and reliable race attack and gain control before the NANDloader has a chance to disable anything. Here’s how. You will have to be able to run code on the Starlet: you can either use Mini or hotpatch IOS using the AHBPROT feature of HBC.

Load the NANDloader to RAM (it is a standard DOL binary and should be loaded as indicated in its header). Note that you’ll have to gain access to 1-200 or 1-201 for this. Don’t bundle it in your app; that would not be redistributable and wrong. Instead, read it from NAND directly (mini), or patch IOS to let you access it, or use the existing title boot functionality in IOS (that already loads the NANDloader from 1-200) and just patch the part where the PPC is reset (see below).
Perform the standard PowerPC reset sequence (this is Mini code, see hollywood.h for the constants):
```
clear32(HW_RESETS, 0x30);
udelay(100);
set32(HW_RESETS, 0x20);
udelay(100);
set32(HW_RESETS, 0x10);
```
Note that it is not necessary to load any EXI bootstrap code, as the Espresso always boots from ROM.
Immediately start watching memory location 0x1330100 (0x81330100). Make sure to either use an uncached mapping or invalidate the cache every time. You may need to perform an ahb_flush_from(AHB_1) every time to make sure the AHB buffers don’t bite you. Look for a change in the data at that address: it is the first instruction of the NANDloader, and the ROM will start decrypting there. Verification happens simultaneously with decryption.
Replace the (now decrypted) instruction at that address with a jump to your own PowerPC code. Make sure to flush if not using an uncached mapping. The timing is pretty lenient here; the ROM is busy decrypting and verifying the rest of the NANDloader at this point, so you have plenty of time to detect and swap the instruction before the ROM jumps to it.
Congratulations, you are now running Espresso code.

To be able to use the extra cores, you will have to initialize them. There are also a number of settings related to the bus and memory coherency. Some of these may not be applicable to Wii mode, so you will have to experiment. These are the important steps (in pseudo-C), reverse engineered from the early initialization code of the Cafe OS (Wii U mode) kernel (this runs initially for core0, and then every other core also runs this code as it is bootstrapped):

hid5 |= 0xc0000000;  // enable HID5 and PIR
// At this point, upir contains the core ID (0, 1, 2) that is currently
// executing.

// Global init
if (upir == 0) {
	scr &= ~0x40000000; // scr, car, and bcr are global SPRs
	scr |= 0x80000000;
	car |= 0xfc100000; // these bit assignments are unknown
	bcr |= 0x08000000;
}

// Per-core init
// these registers and bits already exist in Broadway
hid0 = 0x110024; // enable BHT, BTIC, NHR, DPM
hid2 = 0xf0000; // enable cache and DMA errors
hid4 = 0xb3b00000; // 64B fetch, depth 4, SBE, ST0, LPE, L2MUM, L2CFI
// HID5 is new and unknown, probably mostly controls coherency and the
// new L2 and core interface units
if(pvr & 0xFFFF == 0x101)
	hid5 |= 0x6FBD4300;
else
	hid5 |= 0x6FFDC000;
hid2 |= 0xe0000000; // LSQE, WPE, PSE
msr |= 0x2000; // enable floating point
// boring floating point reg, TB, decrementer, mmu init omitted
hid0 |= 0xc00; // flash invalidate ICache, DCache
hid0 &= ~0x100000; // disable DPM
l2cr = new_l2cr = 0
if (pvr & 0xffff == 0x100)
	new_l2cr |= 0x80000;
hid5 |= 0x01000000;
if (core == 1)
	new_l2cr |= 0x20000000; // probably has something to do with the
							// extra L2 for core1
l2cr = new_l2cr;
new_l2cr |= 0x200000; // L2 global invalidate
l2cr = new_l2cr;
while (l2cr & 1); // wait for global invalidate to finish
l2cr &= ~0x200000; // clear L2 invalidate
l2cr |= 0x80000000; // L2 enable
hid0 |= 0x100000; // enable DPM
hid0 |= 0xc000; // enable DCache+ICache
// optional: enable locked L1 cache as usual
// boring standard GPR init omitted

// Core is now initialized. Check core ID (upir) and jump to wherever
// you want, set a flag for the main core, spin in a loop waiting for
// a vector, or whatever

// To kickstart the other cores (from core 0):
// core 1
scr |= 0x00400000;
// (wait for some flag set from core 1 when initialized)
// core 2
scr |= 0x00200000;
// (wait for some flag set from core 2 when initialized)

// Note: the Cafe OS kernel actually then uses core 1 as the main core
// after starting all three. This is because core 1 has more cache.

Note that cores 1 and 2 will start at the system reset vector, which should jump to code equivalent to the above. It is currently unclear what controls whether the cores end up with MSR[IP]=1 (high vectors) or MSR[IP]=0 (low vectors), but I’m pretty sure that at least in Wii mode you end up booting with IP=0 by default (vector 0x100 in MEM1), although one of the above settings might change that. Experiment. Correction: Cores 1 and 2 boot with MSR[IP]=1, thus at the high vectors. The old Wii PPC EXI bootstub hardware is still present and controls the instructions read from there, so just perform the normal EXI bootstub configuration sequence from the Starbuck. This is not needed for Core 0, which already has IP=0 by the time you gain control. Just flipping the two boot bits in SCR is enough to get the two other cores up without doing anything else, although coherency will probably be broken/disabled. Also, keep in mind that this trick gets you access to all 3 cores but they still run at the old Wii speed (729MHz). Speeding up the Espresso probably requires full access to Wii U mode.

The important SPRs are these:

dec   hex    name
287   0x11f  pvr
920   0x398  hid2
944   0x3b0  hid5
947   0x3b3  scr
948   0x3b4  car
949   0x3b5  bcr
1007  0x3ef  upir
1008  0x3f0  hid0
1009  0x3f1  hid1
1011  0x3f3  hid4
1017  0x3f9  l2cr

So what can you do with this? Well, libogc is probably near impossible to turn into an SMP-capable scheduler (and there are so many other problems with it that trying to keep using it for Wii U mode would be a terrible plan). However, it should be possible to port Linux to have tri-core support on the Wii U. It might also be possible to use the two extra cores in libogc apps purely for number crunching, with carefully designed locking (outside of libogc) and without calling any libogc functions from the other two cores (or any non-reentrant libc functions). Personally, I’d like to see a Linux port taking advantage of this (using Linux was my initial goal in Wii U mode, though I never got around to starting the port). Linux is the ideal choice for Wii U mode, as it has native drivers for most of the remaining hardware, and most importantly, it should be a lot easier to port the existing open source Radeon drivers to work on the Latte than to write one from scratch. The Wii U has tons of RAM, unlike the Wii, and natively runs a multitasking OS with paging and memory protection anyway, so there’s little advantage to not just using Linux.

Getting multicore support into some kind of homebrew platform is one of the many steps required to get to a Wii U homebrew ecosystem, so consider this a test to gauge the interest of the homebrew community. If there ends up being significant interest and progress is made, we will reconsider working towards a Wii U-mode homebrew ecosystem (or perhaps just pass on what we have to those who are more motivated than we are).

One final note: on the Espresso, the exclusive load and store instructions (lwarx and stwcx) are subtly broken and need a workaround. If you are seriously interested in this, and you have started working on it, ping me on IRC and I’ll let you know about the specific workaround that is required (I’m just too lazy to write it all out right now) and also gladly answer any other questions.

Update

To clarify, this only yields access to the extra cores, not the extra RAM and the rest of the hardware. For that, you’ll need a Wii U mode exploit. We do have such an exploit, but right now I believe that, if it were released, there wouldn’t be enough developer interest to kickstart a healthy homebrew ecosystem; if e.g. a Linux port to vWii-mode Espresso is developed, I will gladly stand corrected (and such a Linux port would be directly applicable to Wii U mode, modulo a few minor memory mapping differences, so it’s not wasted effort).