On Friday, the 13th of January 2012, the ACM Queue published an article by Poul-Henning Kamp entitled ‘The CRYPO-CS-SETI challenge: An Un-programmng challenge’. In this post, Kamp challenged his readers to attempt to disassemble a program for an unknown computer. In what we assume was an attempt at increased dramatic impact, he described a scenario where part of an extra-terrestrial computer is discovered, with only a memory storage device intact.
We first heard of the challenge on the morning of Saturday the 14th, and thought it sounded like fun. Within five days we had completely disassembled the program. In addition, we had accidentally identified the oh-so-terrestrial source of the code.
This is the first in a series of posts in which we’ll describe how we went about reverse-engineering the machine architecture using nothing but the binary blob and our wits.
Why +++ Before we start tackling the ‘how’, it’s important that we take a moment to discuss why we bothered to attempt the challenge.
Our little group is an eclectic collection of hackers and reverse-engineers from all over the world. While some of us have some academic backgrounds, we are mostly just in it for the thrill of understanding. There is a certain cerebral rush in groking how complex systems work. The knowledge that we have conquered a given system is an excellent reward.
We took up Kamp’s challenge mostly because it sounded like fun and was an excellent opportunity to hone our skills and practice our art-form. Some people do cross-words or solve sudoku. We reverse.
Furthermore, as most of us spend a good deal of our lives reading assembly for one of a dozen or more different architectures, from the outset, we were more intrigued by the particular challenge of reverse-engineering the instruction set architecture than the relatively menial task of understanding what the disassembled code does.
The fact that the block of code we’re going to look at is not actually from a fragment of alien technology is actually highly significant. We don’t know anything about how any potential aliens think and we can make absolutely no assumptions about how they might model their computational problems, abstract machine operations or otherwise go about the task of writing code. It is simply impossible for us to imagine how they think.
On the other hand, as experienced reversers of earth-technology, we have a great deal of understanding of the ways that human engineers plan, model, abstract and build software and computing machines. In fact, 90% of reverse-engineering is not, as you might think, trying to understand how a particular device or bit of code works - it’s trying to understand how the person who developed the target thought about what he was building.
For the purposes of this challenge, we assumed:
- That the binary Kamp provided came from terrestrial sources.
- That the people that originally built the code and the machine it runs on were members of the CS community, and exposed to other machines and code.
- That it was from a functioning machine - i.e. that it had to ‘work’.
In a comment on the 14th, Kamp updated the problem description to include the information that the machine has an 8-bit data bus, and a 14-bit address bus. We don’t list this as an assumption, because by the time he posted this, we had already discovered it from the code itself.
These assumptions are important, because throughout, we suggest possible explanations and interpretations for specific byte sequences, which are firmly based on our own knowledge of existing human computers and code.
Our ‘method’, if so formal a word can be applied to the completely informal process we employed while attacking this challenge, is not entirely unlike the standard scientific approach to all things.
In general, we all sat around in an IRC channel, staring at different bits of the code. One of us would posit a theory about the interpretation of a given set of bytes, giving reasons why the theory seems like a good one, and everyone else would then try to find either reinforcing or contradicting examples. The guiding principle was that any given theory about the encoding of instructions had to produce disassembly which makes ‘sense’.
We hacked up a little trivial disassembler, which we could easily modify in order to ‘try out’ various ideas, to see if they felt like a good fit.
Most importantly, we never stopped questioning our previous theories and assumptions. If something felt wrong, or produced instruction sequences which no longer made sense in light of newer information and assumptions, we examined all the theories involved, and dropped those which seemed to make least sense.
The concept of code ‘making sense’ is hard to define. As seasoned reversers, we know what code looks like, how compilers work and what kinds of patterns they generate, and how developers approach certain problems. Code makes sense when it appears to accomplish a reasonable goal. For example, it doesn’t make sense to load a register with a value, and then immediately overwrite the register with a different value. Each instruction has to serve a purpose. This ‘code-sense’ is of course based on our understanding of human code, which is why the bit about assumptions before was so important.
In this series of posts, we’ll sometimes quote (edited) portions of our IRC conversations in order to highlight the way in which we came to various conclusions.