Console Hacking 2013: Omake
As you’re likely aware, our team gave a lecture at the 30th Chaos Communication Congress on hacking the Wii U. This blog post is a follow-up to the talk and contains clarifications, corrections, and material that we couldn’t fit in the one-hour time slot.
If you haven’t yet, please watch the talk before reading the rest of this post:
The hardware
First, two corrections: the second core (core 1) of the Espresso contains 2MB of L2 cache, and the GPU is a member of the R7xx family, but not specifically the RV770. Sorry, I was tired and sleep-deprived when putting together that slide and had two brain farts. The slide has been corrected in the online version above.
There has been a lot of speculation about the hardware, mostly before the talk, and even after it some people continue to be skeptical of our findings. How do we know what makes the Wii U tick? The answer is simple: inspection of the system firmware and runtime logs, and a little bit of common sense.
We used a simplified block diagram of the hardware for the presentation. Here’s a more detailed one which you might find more interesting (there may be a few minor mistakes, but in general it should be pretty close). Hover over to see the equivalent Wii block diagram, for comparison.
Espresso
Let’s look at the Espresso. Why a PPC750? Well, first of all, the Espresso has to be fully compatible with the Broadway to run Wii software. Several people have pointed out that many PowerPC cores are essentially fully backwards compatible for user software. However, Wii games run on the bare metal, without any OS. This means that the CPU needs to be 100% compatible at the system/OS level too, down to the smallest detail of the highly model-specific special-purpose registers. Wii software regularly messes with registers such as HID0-HID4, which are Hardware-Implementation Dependent registers. Additionally, the PPC750 line is the only range of PowerPC processors that implement Paired Singles, an (outdated) SIMD implementation that was introduced with the Gekko on the GameCube and which is not compatible with modern PowerPC SIMD, such as AltiVec. These processors also implement other GameCube/Wii-specific features, such as the Write-Gather Pipe (used to send commands to the GX) and the locked L1 cache and DMA that were discussed in the talk. On top of that, because the Espresso runs at the clock rate of the Wii in vWii mode (and no faster), instruction timings must be identical (or possibly better) in all cases, lest some Wii games run slower on vWii mode than on a real Wii.
Now, it is true that the Espresso implements some new features - most obviously, it is a multi-core, and thus SMP, system. I am not aware of any other SMP PPC750 implementation (although that is possible with vanilla PPC750, but inefficient). However, the Espresso was clearly commissioned by Nintendo specifically for the Wii U, and they had the freedom to make significant changes to the design. Most importantly, although the individual Espresso cores are almost identical to the standard PPC750CL core, the L2 cache subsystem and bus-interface subsystem have been significantly redesigned. The design philosophy seems to have been to keep the 750 core as untouched as possible, but graft in support for efficient SMP around it. As a result, they’ve tacked MERSI support onto the cores and the new L2 subsystem fully supports cache coherency between cores (and cache intervention). Incidentally, the 60x bus seems to have been upgraded as well (they call it 60xe now, and judging by the die shots, it looks wider).
In fact, the SMPization of the 750 in the Espresso is not perfect. There appears to be a bug that affects load-exclusive and store-exclusive instructions (an explicit cache flush is required), which means that using SMP Linux with them will require patching the kernel and libpthread to work around it (and possibly other software that directly uses these instructions to e.g. implement atomics). They would’ve never shipped such a blatant bug on a general-purpose CPU, but I guess they called it good enough for a game console since they can just work around it in the SDK (which they do: the Cafe OS locking primitives have the workaround).
Since I mentioned MERSI and cache coherency, you may be wondering about the point in the talk where I said that the Wii U is not a cache-coherent system. In fact, the Espresso fully supports cache coherency, and there is coherency between cores. However, as far as I can tell, the Latte does not implement this, and memory is not coherent with regards to the Starbuck and other hardware peripherals when performing DMA (although I have not verified that there really is absolutely no support - but it’s at least certainly true of memory during bootrom execution, and since the Wii didn’t implement coherency either, it sounds like the kind of thing Nintendo wouldn’t care about).
Recently, an anonymous contributor e-mailed me what appears to be a screenshot of the introduction page of the Espresso User Manual, which seems to confirm most of what we had deduced. Please don’t ask where it came from; I was only sent the screenshot and the sender used an anonymous e-mail service.
Oh, and that whole out-of-order execution debate? The confusion arose due to the myth that the PPC750 is in-order. It’s a superscalar core: it does dispatch up to 3 instructions per cycle and they can complete independently (and the results are placed in a completion queue). That qualifies as out-of-order. It’s not particularly wide and definitely isn’t nearly as aggressively out-of-order as modern Intel and POWER cores are, though. The Espresso is just as out-of-order as the Broadway and previous members of the 750 family. There’s no upgrade there: it’s a (simple) out-of-order core and it always was. Go read the PPC750CL User’s Manual if you want all the gory details (it also has information on the formerly-Nintendo-proprietary stuff like Paired Singles, DMA, Locked L1, Write-Gather Pipe, etc.).
Latte
What about the GPU? RV770 snafu aside (that was just a brain fart; I meant R7xx), how do we know that it’s of that family? As it turns out, the Cafe OS firmware is littered with references to the R6xx. However, the R6xx and R7xx are almost identical. ATI even published open documentation of their 3D engines together, as a single document, and the only real differences are a few details of the shaders. The register information that we’ve looked up does mostly match the published open docs. R7xx is clearly a minor evolution of the R6xx. Cafe OS also has references to “GPU7”, which seems to be another name that Nintendo uses for the GX2. With the R7xx available (and as, essentially, a mostly compatible step up), there would be no reason for Nintendo not to go with the R7xx as a base instead of the R6xx. Together, all of these strongly imply that the GPU is indeed an R7xx core and not an R6xx core. To those claiming that it supports newer features (e.g. I’ve heard Tesselation and Stream Out support listed as “evidence” of a newer core): go read the 3D engine doc linked above. Those features were already there on R6xx/7xx.
Now, the Latte is a fully custom SoC that has nothing to do with any particular member of the R7xx family, so there is no easy way to draw a parallel to any particular released AMD GPU. Maybe the specs (shader count, etc.) match an existing AMD chip, or maybe they don’t. We’re not particularly interested in the GPU prowess wars, so we haven’t poked at the GPU to try to figure out its specs.
Clock frequencies
This one’s easy: the clock speeds that I tweeted (CPU 1.243125GHz, GPU 549.999755MHz) came straight from the Cafe OS system log. The Espresso doesn’t support dynamic clock scaling (it only has one PLL and it is configured via external pins and only reconfigures itself on hardware reset), and I haven’t seen any evidence of them using power saving / clock scaling on the GPU. Both clocks rates are what you’d expect for a modern relayout of the 750 core on a 45nm process, and a power-savey R7xx at 40nm. The talk about clockspeeds being raised with a software upgrade is nonsense. I have no idea where that silly rumour came from, but there’s no truth to it. While it is possible for the clocks to be reconfigured in software (like on most modern systems - and this is used for vWii mode), it would make no sense for Nintendo to test and ship a living room system at one clock speed and then later raise it (this only makes sense on portable systems, where cores are often underclocked to save battery, but tested at higher clock speeds). It certainly hasn’t happened on the Wii U.
vWii mode
vWii mode is actually pretty interesting, because different parts of the backwards compatibility are implemented in different ways. Some of it is pretty obvious (e.g. the RAM controller is reconfigured and only 64MB of RAM are visible as MEM2), but some parts are much more interesting. Hover over the block diagram above and see if you can spot the interesting ones!
GX
Why is the GX a separate block and not a compatibility mode of the GX2? To be fair, I don’t have hard evidence that they do not share absolutely any hardware, but this goes back to common sense. The GX is completely different from the GX2/R7xx. Adapting the R7xx to look like a GX and be fully compatible at the register level (remember, Wii games run on bare metal) is a humongous task, and completely unrealistic to do and validate. If they were to do it that way, you’d expect some kind of complex configuration, perhaps involving firmware or special shaders, to switch to vWii mode - none of which is present in cafe2wii.
Meanwhile, the process node upgrade means that they can easily fit in the entire GX. On the Hollywood, the 3MB of 1T-SRAM took up half of the die. On the Latte, they’re just two small blocks in the corner (and one of them is even SRAM now). They could easily fit in the rest of the Hollywood in the remaining logic area above the big MEM1 block, and have the entire right hand half of the die left for the R7xx and the new SoC peripherals. Instead of the impractical task of implementing perfect hardware emulation on a completely different GPU architecture, they did the easy thing: throw the old hardware in there, and reuse the easy bits (TMEM and EFB) as handy MEM0 memory in Wii U mode. MEM0 is even mapped at the same address where the direct EFB access used to be on the GC and the Wii, so it probably is the same hardware, with the pixel format mapping turned off and instead presenting the memory as flat RAM. In Wii U mode, it is used mainly for the Cafe OS kernel.
DMCU
You may have noticed that good old VI (the display controller / Video Interface of ye olde GameCube and Wii) is gone. Huh? How does that work? Turns out they’re emulating that in software and translating its configuration to the R7xx’s CRTC registers. But where does this emulation software run? Sneaky: they added a special microcontroller to the Latte just for this purpose. The DMCU is a 68HC11 compatible 8-bit CPU whose sole purpose is to perform VI emulation. It has a frontend hardware shim that looks like VI to the Wii software, and behind the scenes it translates those registers to the Radeon’s, including upscaling configuration. cafe2wii loads the DMCU’s firmware (the DMCU doesn’t seem to be active in Wii U mode). It has its own dedicated RAM and access to both the faux-VI back side and the Radeon’s register area. There is also a little mailbox to talk to it from vWii mode: the System Menu and IOS use this to configure the 4:3 stretch mode for Virtual Console games (which is why The Homebrew Channel accidentally ends up in the wrong mode: after a recent update, the System Menu thinks it’s a Virtual Console title due to its title ID starting with ‘L’ and sets up that special mode). Presumably it also got an update when gamepad support for vWii showed up, though I haven’t looked at the firmware again since then.
Font ROM
The GC had a few bitmap fonts in its firmware ROM, for use by the BIOS / firmware and by games. This ROM was an external chip connected to the EXI (SPI-like) bus (actually the same chip as the RTC). The Wii got rid of the firmware part, but had to keep the fonts for backwards compatibility (and they were still in there in the RTC). The Wii U finally gets rid of the ROM, but Wii games might still use the fonts. What to do? Easy: they have an extra 8MB of MEM1 sitting around doing nothing in vWii mode. cafe2wii actually loads the old fonts into this area, and then presumably there’s a shim in front of the EXI hardware in the Latte that pulls the ROM data from there.
DI
The GC had a proprietary disc drive bus called DI. The Wii kept it, with minor changes. However, the Wii U finally switched to a more modern SATA port. The SATA peripheral is actually bog-standard AHCI to the Wii U software. However, vWii mode needs to talk to good old proprietary DI mode. How is this implemented? This one seems to be pure backwards compatibility hardware. cafe2wii turns on a DI compatibility mode in the AHCI controller. Presumably that enables the old DI registers and either funnels the old commands through SATA (which the drive then supports) or also translates commands (though the drive still needs to implement them - for example, the GC/Wii support reading on arbitrary 4-byte boundaries, not aligned with sectors, so the new drive must support that too).
WLAN
The Wii U has a different WLAN chipset. How could they possibly emulate an older Broadcom chipset in vWii mode (eeew, Broadcom)? They don’t. vWii IOS contains a totally new WLAN driver. This one leaks straight into the vWii sandbox and there’s no backwards compatibility. They can get away with it because (thankfully!) the WLAN driver is entirely contained in IOS, and IOS is also the sole manager of WLAN configuration (well, and the System Menu, but those settings are configured from Wii U mode on a Wii U), so games don’t know or care that it’s all different. On the plus side, that means that vWii mode should support 802.11n.
USB
The Wii U has 4 USB ports. The Wii had 2. How does this work? No, it’s not a built-in hub. Actually, the OHCI/EHCI controllers have registers that indicate the number of available ports, and the old drivers are perfectly happy with 4 ports instead of 2. This also goes for GhettOHCI: I wasn’t sure what to expect, but thankfully I didn’t hardcode the number of ports, and it Just Worked on the Wii U, reporting 4 ports instead of 2.
NAND
Everyone knows that vWii mode has the same old NAND flash storage as always, a dedicated 512MB of NAND just like on the old Wii, on a dedicated NAND Flash chip, containing the same old filesystem (but not boot1 or boot2, since they don’t need those in vWii mode, as cafe2wii boots it straight into IOS). What most people don’t know is that the “dedicated” part is a lie: Cafe OS also boots from and uses SLC NAND flash in addition to the 8GB or 32GB of eMMC storage. But there’s only one NAND flash chip. Is there? Kind of: there’s one NAND flash chip, but it has two NAND flash dies inside: 2x512MB, one for vWii mode, one for Wii U mode. There’s a separate chip enable pin for each. This is actually a pretty common arrangement for a NAND flash chip, though using each bank for a totally different OS is cute. This was confusing to figure out at first because the Samsung NAND flash datasheet for this part also applies to the single-bank 512MB version and others, and it is rather unclear about the specific layout of each particular part number (even iFixit got it wrong and listed it as 512MB, while in fact the specific part used is 1GB - or 8Gbit, hence the “08” in K9K8G08U1D).
SEEPROM
It is not shown in the block diagram above (because I ran out of space and I skipped miscellaneous hardware connected to GPIOs), but the Wii had a small serial EEPROM die built into the Hollywood multi-chip package (for mild security) and connected to some GPIOs. It was used to store the console’s certificate signature (signed by Nintendo, which cryptographically certifies the console as a real Wii), a few miscellaneous things (like the boot2 version number and the random number generator nonce counter), and, on Korean Wiis, the new Korean common key.
The Wii U still has a SEEPROM, but its contents have nothing to do with vWii mode. The old certificate data is actually in a new OTP memory bank on the Wii U, and cafe2wii just reads this and throws it at the end of the Starbuck’s SRAM where there was some free space. Then they hacked up vWii IOS to read the data from there instead of the real SEEPROM.
Keys, keys, keys
The Wii U has lots of keys: its OTP is 8 times the size of the Wii OTP (1KB in 8 banks of 128 bytes, instead of a single bank of 128 bytes). Incidentally, bank 0 is the vWii bank (and all the other banks are disabled in vWii mode, so it only gets to see the keys that it needs, which are the same ones that were present on Wiis). We posted SHA-1 hashes of a few of the important keys in the presentation slides, but here’s a more detailed description of what they are used for. Note that these are still SHA-1 hashes, not the actual keys.
Espresso vWii ancast key (11 days)
Found in the Espresso’s key fuses/OTP. Used to decrypt the vWii System Menu and the new NANDloader binaries (1-512 and 1-513) at load time. Disabled by the boot ROM until reset.
Espresso Wii U ancast key (11 days)
Found in the Espresso’s key fuses/OTP. Used to decrypt the Cafe OS kernel at load time. Disabled by the boot ROM until reset.
Note that the previous two hashes are contained in this file, and it is the SHA-1 hash of that file that we posted on the 11th day.
Wii U common key (30 days)
Found in the Starbuck’s OTP. Used to decrypt the specific title key for every Wii U application (this is done at installation time for system firmware and installable titles, and at load time for disc games). Note that Cafe OS and Starbuck binaries are double-encrypted with their own ancast keys too.
vWii common key (30 days)
Found in the Starbuck’s OTP. Used to decrypt the specific title key for vWii system updates (since the key is only needed at installation time, vWii mode doesn’t actually have access to it). Note that the System Menu and NANDloaders are double-encrypted with the vWii ancast key too.
Wii U ancast key (Clarification)
Found in the Starbuck’s OTP. Used to decrypt Starbuck binaries (Wii U IOS and cafe2wii). Unlike the Espresso keys, this one is enabled forever (except in vWii mode, of course), as the Starbuck boot0 really only runs at boot time, and Starbuck ancast binaries are simply parsed and decrypted by IOS itself when reloading.
Wii U boot1 key (not yet!)
Found in the Starbuck’s OTP. Used by boot0 to decrypt boot1. This key, and it alone, is selectively disabled in a special clear-only OTP mask register by boot0, and is not available after boot. We don’t have it yet, but we’re trying to get it with some cute side-channel attacks.
A few people have claimed that our work doesn’t qualify as having hacked the Wii U since we do not have this key yet. That doesn’t make any sense, though: We have full code execution in kernel mode in both the Espresso and the Starbuck, access to every other key (including the aforementioned ones as well as per-console storage encryption keys and the like), unrestricted access to Wii U mode hardware, etc. Stating that we haven’t hacked the Wii U because we don’t have the boot1 key is like saying that nobody has ever hacked any iPhone because nobody has ever extracted the GID Key. Either way, there’s a good chance we might be able to fish it out soon ;-).
EOF
That’s all for now! If you have any other technical questions, feel free to give us a shout in the comments and we’ll try to answer them if we can.