As you’re likely aware, our team gave a
at the 30th Chaos Communication Congress on hacking the Wii U. This blog post
is a follow-up to the talk and contains clarifications, corrections, and
material that we couldn’t fit in the one-hour time slot.
If you haven’t yet, please watch the talk before reading the rest of this post:
Slides: [Online](/media/30c3-slides/) ·
[Download / source code](/media/30c3-console-hacking-2013-slides.tar.gz)
## The hardware
First, two corrections: the second core (core 1) of the Espresso contains 2MB
of L2 cache, and the GPU is a member of the R7xx family, but not specifically
the RV770. Sorry, I was tired and sleep-deprived when putting together that
slide and had two brain farts. The slide has been corrected in the online
There has been a lot of speculation about the hardware, mostly before the talk,
and even after it some people continue to be skeptical of our findings. How do
we know what makes the Wii U tick? The answer is simple: inspection of the
system firmware and runtime logs, and a little bit of common sense.
We used a simplified block diagram of the hardware for the presentation. Here's
a more detailed one which you might find more interesting (there may be a few
minor mistakes, but in general it should be pretty close). Hover over to see
the equivalent Wii block diagram, for comparison.
Larger: [Wii U](/media/img/blockdia_wiiu_full.png) ·
Let's look at the Espresso. Why a PPC750? Well, first of all, the Espresso has
to be fully compatible with the Broadway to run Wii software. Several people
have pointed out that many PowerPC cores are essentially fully backwards
compatible for user software. However, Wii games run on the bare metal, without
any OS. This means that the CPU needs to be 100% compatible at the system/OS
level too, down to the smallest detail of the highly model-specific
special-purpose registers. Wii software regularly messes with registers such as
HID0-HID4, which are Hardware-Implementation Dependent registers. Additionally,
the PPC750 line is the only range of PowerPC processors that implement Paired
Singles, an (outdated) SIMD implementation that was introduced with the Gekko
on the GameCube and which is not compatible with modern PowerPC SIMD, such as
AltiVec. These processors also implement other GameCube/Wii-specific features,
such as the Write-Gather Pipe (used to send commands to the GX) and the locked
L1 cache and DMA that were discussed in the talk. On top of that, because the
Espresso runs at the clock rate of the Wii in vWii mode (and no faster),
instruction timings must be identical (or possibly better) in all cases, lest
some Wii games run slower on vWii mode than on a real Wii.
Now, it is true that the Espresso implements some new features - most obviously,
it is a multi-core, and thus SMP, system. I am not aware of any other SMP
PPC750 implementation (although that is
with vanilla PPC750, but inefficient). However, the Espresso was clearly
commissioned by Nintendo specifically for the Wii U, and they had the freedom
to make significant changes to the design. Most importantly, although the
individual Espresso cores are almost identical to the standard PPC750CL core,
the L2 cache subsystem and bus-interface subsystem have been significantly
redesigned. The design philosophy seems to have been to keep the 750 core as
untouched as possible, but graft in support for efficient SMP around it. As a
result, they've tacked MERSI support onto the cores and the new L2 subsystem
fully supports cache coherency between cores (and cache intervention).
Incidentally, the 60x bus seems to have been upgraded as well (they call it
60xe now, and judging by the die shots, it looks wider).
In fact, the SMPization of the 750 in the Espresso is not perfect. There
appears to be a bug that affects load-exclusive and store-exclusive
instructions (an explicit cache flush is required), which means that using SMP
Linux with them will require patching the kernel and libpthread to work around
it (and possibly other software that directly uses these instructions to e.g.
implement atomics). They would've never shipped such a blatant bug on a
general-purpose CPU, but I guess they called it good enough for a game console
since they can just work around it in the SDK (which they do: the Cafe OS
locking primitives have the workaround).
Since I mentioned MERSI and cache coherency, you may be wondering about the
point in the talk where I said that the Wii U is not a cache-coherent system.
In fact, the Espresso fully supports cache coherency, and there is coherency
between cores. However, as far as I can tell, the Latte does not implement this,
and memory is not coherent with regards to the Starbuck and other hardware
peripherals when performing DMA (although I have not verified that there really
is absolutely no support - but it's at least certainly true of memory during
bootrom execution, and since the Wii didn't implement coherency either, it
sounds like the kind of thing Nintendo wouldn't care about).
Recently, an anonymous contributor e-mailed me what appears to be a screenshot
of the [introduction page of the Espresso User Manual](https://marcansoft.com/transf/espresso_intro.png),
which seems to confirm most of what we had deduced. Please don't ask where it
came from; I was only sent the screenshot and the sender used an anonymous
Oh, and that whole out-of-order execution debate? The confusion arose due to the
myth that the PPC750 is in-order. It's a superscalar core: it does dispatch up
to 3 instructions per cycle and they can complete independently (and the results
are placed in a completion queue). That qualifies as out-of-order. It's not
particularly wide and definitely isn't nearly as aggressively out-of-order as
modern Intel and POWER cores are, though. The Espresso is just as out-of-order
as the Broadway and previous members of the 750 family. There's no upgrade
there: it's a (simple) out-of-order core and it always was. Go read the
[PPC750CL User's Manual](/media/files/ppc_750cl.pdf) if you want all the gory
details (it also has information on the formerly-Nintendo-proprietary
stuff like Paired Singles, DMA, Locked L1, Write-Gather Pipe, etc.).
What about the GPU? RV770 snafu aside (that was just a brain fart; I meant R7xx),
how do we know that it's of that family? As it turns out, the Cafe OS firmware
is littered with references to the R6xx. However, the R6xx and R7xx are almost
identical. ATI even published open documentation of their 3D engines
[together, as a single document](http://www.x.org/docs/AMD/R6xx_3D_Registers.pdf),
and the only real differences are a few details of the shaders. The register
information that we've looked up does mostly match the published open docs.
R7xx is clearly a minor evolution of the R6xx. Cafe OS also has references to
"GPU7", which seems to be another name that Nintendo uses for the GX2. With
the R7xx available (and as, essentially, a mostly compatible step up),
there would be no reason for Nintendo not to go with the R7xx as a base instead
of the R6xx. Together, all of these strongly imply that the GPU is indeed
an R7xx core and not an R6xx core. To those claiming that it supports newer
features (e.g. I've heard Tesselation and Stream Out support listed as
"evidence" of a newer core): go read the 3D engine doc linked above. Those
features were already there on R6xx/7xx.
Now, the Latte is a fully custom SoC that has nothing to do with any particular
member of the R7xx family, so there is no easy way to draw a parallel to any
particular released AMD GPU. Maybe the specs (shader count, etc.) match an
existing AMD chip, or maybe they don't. We're not particularly interested in
the GPU prowess wars, so we haven't poked at the GPU to try to figure out its
### Clock frequencies
This one's easy: the clock speeds that I tweeted (CPU 1.243125GHz, GPU 549.999755MHz)
came straight from the Cafe OS system log. The Espresso doesn't support dynamic
clock scaling (it only has one PLL and it is configured via external pins and
only reconfigures itself on hardware reset), and I haven't seen any evidence of
them using power saving / clock scaling on the GPU. Both clocks rates are what
you'd expect for a modern relayout of the 750 core on a 45nm process, and a
power-savey R7xx at 40nm. The talk about clockspeeds being raised with a
software upgrade is nonsense. I have no idea where that silly rumour came
from, but there's no truth to it. While it is possible for the clocks to be
reconfigured in software (like on most modern systems - and this is used for
vWii mode), it would make no sense for Nintendo to test and ship a living
room system at one clock speed and then later raise it (this only makes sense
on portable systems, where cores are often underclocked to save battery, but
tested at higher clock speeds). It certainly hasn't happened on the Wii U.
## vWii mode
vWii mode is actually pretty interesting, because different parts of the
backwards compatibility are implemented in different ways. Some of it is
pretty obvious (e.g. the RAM controller is reconfigured and only 64MB of RAM
are visible as MEM2), but some parts are much more interesting. Hover over the
block diagram above and see if you can spot the interesting ones!
Why is the GX a separate block and not a compatibility mode of the GX2? To be
fair, I don't have hard evidence that they do not share absolutely any hardware,
but this goes back to common sense. The GX is completely different from the
GX2/R7xx. Adapting the R7xx to look like a GX and be fully compatible at the
register level (remember, Wii games run on bare metal) is a humongous task, and
completely unrealistic to do and validate. If they were to do it that way,
you'd expect some kind of complex configuration, perhaps involving firmware or
special shaders, to switch to vWii mode - none of which is present in cafe2wii.
Meanwhile, the process node upgrade means that they can easily fit in the entire
GX. On the [Hollywood](https://marcansoft.com/transf/hollywood_annotated.jpg),
the 3MB of 1T-SRAM took up half of the die. On the
[Latte](https://marcansoft.com/transf/latte_annotated.jpg), they're just two
small blocks in the corner (and one of them is even SRAM now). They could
easily fit in the rest of the Hollywood in the remaining logic area above the
big MEM1 block, and have the entire right hand half of the die left for the
R7xx and the new SoC peripherals. Instead of the impractical task of
implementing perfect hardware emulation on a completely different GPU
architecture, they did the easy thing: throw the old hardware in there, and
reuse the easy bits (TMEM and EFB) as handy MEM0 memory in Wii U mode. MEM0 is
even mapped at the same address where the direct EFB access used to be on the
GC and the Wii, so it probably is the same hardware, with the pixel format
mapping turned off and instead presenting the memory as flat RAM. In Wii U mode,
it is used mainly for the Cafe OS kernel.
You may have noticed that good old VI (the display controller / Video Interface
of ye olde GameCube and Wii) is gone. Huh? How does that work? Turns out
they're emulating that in software and translating its configuration to
the R7xx's CRTC registers. But where does this emulation software run? Sneaky:
they added a special microcontroller to the Latte just for this purpose. The
DMCU is a [68HC11](http://en.wikipedia.org/wiki/HC11) compatible 8-bit CPU
whose sole purpose is to perform VI emulation. It has a frontend hardware shim
that looks like VI to the Wii software, and behind the scenes it translates
those registers to the Radeon's, including upscaling configuration. cafe2wii
loads the DMCU's firmware (the DMCU doesn't seem to be active in Wii U mode). It
has its own dedicated RAM and access to both the faux-VI back side and the
Radeon's register area. There is also a little mailbox to talk to it from vWii
mode: the System Menu and IOS use this to configure the 4:3 stretch mode for
Virtual Console games (which is why The Homebrew Channel accidentally ends up
in the wrong mode: after a recent update, the System Menu thinks it's a
Virtual Console title due to its title ID starting with 'L' and sets up that
special mode). Presumably it also got an update when gamepad support for
vWii showed up, though I haven't looked at the firmware again since then.
### Font ROM
The GC had a few bitmap fonts in its firmware ROM, for use by the
BIOS / firmware and by games. This ROM was an external chip connected to the
EXI (SPI-like) bus (actually the same chip as the RTC). The Wii got rid of
the firmware part, but had to keep the fonts for backwards compatibility (and
they were still in there in the RTC). The Wii U finally gets rid of the
ROM, but Wii games might still use the fonts. What to do? Easy: they have
an extra 8MB of MEM1 sitting around doing nothing in vWii mode. cafe2wii
actually loads the old fonts into this area, and then presumably there's a shim
in front of the EXI hardware in the Latte that pulls the ROM data from there.
The GC had a proprietary disc drive bus called DI. The Wii kept it, with minor
changes. However, the Wii U finally switched to a more modern SATA port. The
SATA peripheral is actually bog-standard AHCI to the Wii U software. However,
vWii mode needs to talk to good old proprietary DI mode. How is this
implemented? This one seems to be pure backwards compatibility hardware.
cafe2wii turns on a DI compatibility mode in the AHCI controller. Presumably
that enables the old DI registers and either funnels the old commands through
SATA (which the drive then supports) or also translates commands (though the
drive still needs to implement them - for example, the GC/Wii support reading
on arbitrary 4-byte boundaries, not aligned with sectors, so the new drive
must support that too).
The Wii U has a different WLAN chipset. How could they possibly emulate an
older Broadcom chipset in vWii mode (eeew, Broadcom)? They don't. vWii IOS
contains a totally new WLAN driver. This one leaks straight into the vWii
sandbox and there's no backwards compatibility. They can get away with it
because (thankfully!) the WLAN driver is entirely contained in IOS, and
IOS is also the sole manager of WLAN configuration (well, and the System Menu,
but those settings are configured from Wii U mode on a Wii U), so games don't
know or care that it's all different. On the plus side, that means that vWii
mode should support 802.11n.
The Wii U has 4 USB ports. The Wii had 2. How does this work? No, it's not a
built-in hub. Actually, the OHCI/EHCI controllers have registers that indicate
the number of available ports, and the old drivers are perfectly happy with
4 ports instead of 2. This also goes for GhettOHCI: I wasn't sure what to
expect, but thankfully I didn't hardcode the number of ports, and it Just Worked
on the Wii U, reporting 4 ports instead of 2.
Everyone knows that vWii mode has the same old NAND flash storage as always,
a dedicated 512MB of NAND just like on the old Wii, on a dedicated NAND Flash
chip, containing the same old filesystem (but not boot1 or boot2, since they
don't need those in vWii mode, as cafe2wii boots it straight into IOS).
What most people don't know is that the "dedicated" part is a lie: Cafe OS
also boots from and uses SLC NAND flash in addition to the 8GB or 32GB of
eMMC storage. But there's only one NAND flash chip. Is there? Kind of: there's
one NAND flash chip, but it has two NAND flash dies inside: 2x512MB, one for
vWii mode, one for Wii U mode. There's a separate chip enable pin for each.
This is actually a pretty common arrangement for a NAND flash chip, though
using each bank for a totally different OS is cute. This was confusing to
figure out at first because the Samsung NAND flash
[datasheet](http://wiiubrew.org/w/images/e/e1/K9k8g08u1d.pdf) for this
part also applies to the single-bank 512MB version and others, and it is
rather unclear about the specific layout of each particular part number (even
iFixit got it wrong and listed it as 512MB, while in fact the specific part
used is 1GB - or 8Gbit, hence the "08" in K9K8G08U1D).
It is not shown in the block diagram above (because I ran out of space and I
skipped miscellaneous hardware connected to GPIOs), but the Wii had a small
serial EEPROM die built into the Hollywood multi-chip package (for mild
security) and connected to some GPIOs. It was used to store the console's
certificate signature (signed by Nintendo, which cryptographically certifies
the console as a real Wii), a few miscellaneous things (like the boot2 version
number and the random number generator nonce counter), and, on Korean Wiis, the
new Korean common key.
The Wii U still has a SEEPROM, but its contents have nothing to do with vWii
mode. The old certificate data is actually in a new OTP memory bank on the
Wii U, and cafe2wii just reads this and throws it at the end of the Starbuck's
SRAM where there was some free space. Then they hacked up vWii IOS to read the
data from there instead of the real SEEPROM.
## Keys, keys, keys
The Wii U has lots of keys: its OTP is 8 times the size of the Wii OTP (1KB
in 8 banks of 128 bytes, instead of a single bank of 128 bytes).
Incidentally, bank 0 is the vWii bank (and all the other banks are disabled in
vWii mode, so it only gets to see the keys that it needs, which are the same
ones that were present on Wiis). We posted SHA-1 hashes of a few of the
important keys in the presentation slides, but here's a more detailed
description of what they are used for. Note that these are still SHA-1 hashes,
not the actual keys.
### Espresso vWii ancast key ([11 days](/blog/2012/11days.html))
Found in the Espresso’s key fuses/OTP. Used to decrypt the vWii System Menu and
the new NANDloader binaries (1-512 and 1-513) at load time. Disabled by the
boot ROM until reset.
Espresso Wii U ancast key (11 days)
Found in the Espresso’s key fuses/OTP. Used to decrypt the Cafe OS kernel at
load time. Disabled by the boot ROM until reset.
Note that the previous two hashes are contained in
this file, and it is
the SHA-1 hash of that file that we
posted on the 11th day.
Wii U common key (30 days)
Found in the Starbuck’s OTP. Used to decrypt the specific title key for every
Wii U application (this is done at installation time for system firmware and
installable titles, and at load time for disc games). Note that Cafe OS and
Starbuck binaries are double-encrypted with their own ancast keys too.
vWii common key (30 days)
Found in the Starbuck’s OTP. Used to decrypt the specific title key for vWii
system updates (since the key is only needed at installation time, vWii mode
doesn’t actually have access to it). Note that the System Menu and NANDloaders
are double-encrypted with the vWii ancast key too.
Found in the Starbuck’s OTP. Used to decrypt Starbuck binaries (Wii U IOS and
cafe2wii). Unlike the Espresso keys, this one is enabled forever (except in
vWii mode, of course), as the Starbuck boot0 really only runs at boot time,
and Starbuck ancast binaries are simply parsed and decrypted by IOS itself when
Wii U boot1 key (not yet!)
Found in the Starbuck’s OTP. Used by boot0 to decrypt boot1. This key, and it
alone, is selectively disabled in a special clear-only OTP mask register by
boot0, and is not available after boot. We don’t have it yet, but we’re trying
to get it with some cute side-channel attacks.
A few people have claimed that our work doesn’t qualify as having hacked the
Wii U since we do not have this key yet. That doesn’t make any sense, though:
We have full code execution in kernel mode in both the Espresso and the
Starbuck, access to every other key (including the aforementioned ones as well
as per-console storage encryption keys and the like), unrestricted access to Wii
U mode hardware, etc. Stating that we haven’t hacked the Wii U because we don’t
have the boot1 key is like saying that nobody has ever hacked any iPhone because
nobody has ever extracted the GID Key.
Either way, there’s a good chance we might be able to fish it out soon ;-).
That’s all for now! If you have any other technical questions, feel free to
give us a shout in the comments and we’ll try to answer them if we can.