plaidCTF 2014 - paris (re300)
paris
Reversing (300 pts)
-------------------
This binary was found on some of our Windows machines. It's got The
Plague written all over it. What secrets are contained inside?
We are greeted by a Windows executable. Since I hate Windows and I can’t be arsed to pull up a Windows VM and debugger, I decided to solve this one statically. Time to load it into IDA.
After some boring boilerplate in the main function, there’s a call to this main loop:
.text:00402066 mainloop proc near
.text:00402066 xor ecx, ecx
.text:00402068
.text:00402068 loc_402068:
.text:00402068 push offset seh_handler
.text:0040206D push large dword ptr fs:0
.text:00402074 mov large fs:0, esp
.text:0040207B mov eax, 0
.text:00402080 mov [eax], eax
.text:00402082 sub edi, 1111h
.text:00402088 pop large dword ptr fs:0
.text:0040208F add esp, 4
.text:00402092 cmp done, 1
.text:00402099 jz nullsub_1
.text:0040209F jmp short loc_402068
.text:0040209F mainloop endp
.text:0040209F
I’ve had the unfortunate luck of bashing my head against pthreads compabitility
issues in Wine twice before (once for winepulse and twice for gstreamer), so
I knew that fs:0
is supposed to be a pointer to the
Thread Information Block.
The first field is the SEH
frame. Thus, the code does this:
- Sets up a new exception handler pointing to the
seh_handler
function - Dereferences a NULL pointer
- Subtracts
0x1111
fromedi
- Removes the exception handler
- Loops while
done
is zero
The subtraction immediately jumps out as a WTF. We’ll see what it does later on.
The seh_handler
simply iterates once through a table of function pointers
every time it is called: on the first exception it runs the first entry, then
the second entry, etc. There are 20 entries or states.
Now, in my mind, this started to stink of being a VM of some kind, but there’s something odd. Normally, in a VM, you’ll fetch the opcode, then index into a table of instruction handlers. Here, the table is always traversed in order. Let’s continue and see where this takes us.
What do the 20 functions do? First they call one of three functions:
init_onebyte
, init_twobytes
, init_threebytes
.
.text:0040247C state_0 proc far
.text:0040247C call init_onebyte
.text:00402481 sub eax, 0
.text:00402484 setz al
.text:00402487 jmp near ptr state_end
.text:00402487 state_0 endp
After looking up the arguments passed to SEH handlers (turns out the code wants to use the ContextRecord structure), this is what we get (note that some of the names are spoilers and I only added them later).
.text:00401C91 init_common proc near
.text:00401C91
.text:00401C91
.text:00401C91 crap1 = dword ptr 4
.text:00401C91 crap2 = dword ptr 8
.text:00401C91 ExceptionRecord = dword ptr 0Ch
.text:00401C91 EstablisherFrame = dword ptr 10h
.text:00401C91 ContextRecord = dword ptr 14h
.text:00401C91 DispatcherContext = dword ptr 18h
.text:00401C91
.text:00401C91 mov esi, [esp+ContextRecord]
.text:00401C95 mov edi, offset ContextRecordBuffer
.text:00401C9A mov ecx, 34h
.text:00401C9F rep movsd
.text:00401CA1 mov esi, offset ram
.text:00401CA6 mov edi, offset ram_save
.text:00401CAB mov ecx, 100h
.text:00401CB0 rep movsd
.text:00401CB2 mov esi, offset stack
.text:00401CB7 mov edi, offset stack_save
.text:00401CBC mov ecx, 80h
.text:00401CC1 rep movsd
.text:00401CC3 mov al, eq_flag
.text:00401CC8 mov eq_flag_save, al
.text:00401CCD mov al, done
.text:00401CD2 mov done_save, al
.text:00401CD7 mov ax, program_counter
.text:00401CDD mov program_counter_save, ax
.text:00401CE3 mov ax, stack_pointer
.text:00401CE9 mov stack_pointer_save, ax
.text:00401CEF retn
.text:00401CEF init_common endp
.text:00401CF0 init_onebyte proc near
.text:00401CF0
.text:00401CF0 call init_common
.text:00401CF5 movzx eax, program_counter
.text:00401CFC mov ebx, eax
.text:00401CFE inc ebx
.text:00401CFF mov program_counter, bx
.text:00401D06 add eax, offset rom
.text:00401D0B movzx eax, byte ptr [eax]
.text:00401D0E mov ebx, eax
.text:00401D10 shr eax, 3
.text:00401D13 and ebx, 7
.text:00401D16 shl ebx, 1
.text:00401D18 retn
.text:00401D18 init_onebyte endp
.text:00401D19 init_twobytes proc near
.text:00401D19
.text:00401D19 call init_common
.text:00401D1E movzx ebx, program_counter
.text:00401D25 mov eax, ebx
.text:00401D27 add eax, 2
.text:00401D2A mov program_counter, ax
.text:00401D30 add ebx, offset rom
.text:00401D36 mov ah, [ebx]
.text:00401D38 mov al, [ebx+1]
.text:00401D3B mov ebx, eax
.text:00401D3D mov ecx, eax
.text:00401D3F shr eax, 6
.text:00401D42 shr ebx, 3
.text:00401D45 and ebx, 7
.text:00401D48 shl ebx, 1
.text:00401D4A and ecx, 7
.text:00401D4D shl ecx, 1
.text:00401D4F retn
.text:00401D4F init_twobytes endp
.text:00401D50 init_threebytes proc near
.text:00401D50
.text:00401D50 call init_common
.text:00401D55 movzx ebx, program_counter
.text:00401D5C mov eax, ebx
.text:00401D5E add eax, 3
.text:00401D61 mov program_counter, ax
.text:00401D67 add ebx, offset rom
.text:00401D6D mov ah, [ebx]
.text:00401D6F shl eax, 8
.text:00401D72 mov al, [ebx+1]
.text:00401D75 mov ah, [ebx+2]
.text:00401D78 mov ebx, eax
.text:00401D7A mov ecx, eax
.text:00401D7C shr eax, 13h
.text:00401D7F shr ebx, 10h
.text:00401D82 and ebx, 7
.text:00401D85 shl ebx, 1
.text:00401D87 and ecx, 0FFFFh
.text:00401D8D retn
.text:00401D8D init_threebytes endp
init_<n>bytes
sure seem like they’re parsing an instruction encoding, but
running every possible instruction in a loop doesn’t make any sense. Further,
init_common
seems to be saving the state of the world to some temporary
buffers.
Let’s look at state_end
:
.text:00401FCA state_end proc far
.text:00401FCA
.text:00401FCA
.text:00401FCA arg_4 = dword ptr 0Ch
.text:00401FCA
.text:00401FCA mov edx, [esp+arg_4]
.text:00401FCE cmp al, 1
.text:00401FD0 jz loc_402467
.text:00401FD6 mov esi, offset ContextRecordBuffer
.text:00401FDB mov edi, edx
.text:00401FDD mov ecx, 34h
.text:00401FE2 rep movsd
.text:00401FE4 mov esi, offset ram_save
.text:00401FE9 mov edi, offset ram
.text:00401FEE mov ecx, 100h
.text:00401FF3 rep movsd
.text:00401FF5 mov esi, offset stack_save
.text:00401FFA mov edi, offset stack
.text:00401FFF mov ecx, 80h
.text:00402004 rep movsd
.text:00402006 mov al, eq_flag_save
.text:0040200B mov eq_flag, al
.text:00402010 mov al, done_save
.text:00402015 mov done, al
.text:0040201A mov ax, program_counter_save
.text:00402020 mov program_counter, ax
.text:00402026 mov ax, stack_pointer_save
.text:0040202C mov stack_pointer, ax
.text:00402032 jmp loc_402467
.text:00402032 state_end endp
Looks like it optionally restores all the data that was saved by init_common
,
but leaves it alone if al
is 1.
After staring at a couple of other state functions, it clicked: this is a VM indeed, but it’s a rollback-oriented VM. The VM decodes and runs every possible instruction that might be executed - and then checks the opcode after the fact. If the opcode was not the correct one for this instruction, then the state of the VM is rolled back to the state before the execution. Evil.
So what are we dealing with? This is a 16-bit VM with 8 registers that map to 4 32-bit x86 registers in the code outside the SEH. Instructions are one to three bytes long. It has a Harvard architecture (ROM and RAM are separate), and there’s also a dedicated data (not function call!) stack and an EQ flag. The registers map as follows:
r0 edi[15:0]
r1 edi[31:16]
r2 esi[15:0]
r3 esi[31:16]
r4 ebx[15:0]
r5 ebx[31:16]
r6 edx[15:0]
r7 edx[31:16]
Since the VM’s registers are the x86 registers of the faulted code, and edi is r1:r0, r1:r0 are decremented by 0x00001111 every machine cycle (not every instruction!) and the number of machine cycles between instructions depends on their distance in the handler table. Thankfully this only matters at one point in the VM’s code.
After reading through all the instruction handlers, this is the instruction set that I came up with:
___b0___ ___b1___ ___b2___ ID OP WTF
00000xxx 0 NOP
10000000 01sssddd 1 MOV Rd, Rs
10000000 10sssddd 2 STW [Rd], Rs Store Word (little-endian)
10000000 11sssddd 3 LWX Rd, [Rs] Load Word Exchanged (big-endian!)
10011ddd llllllll llllllll 4 MOV Rd, #lit Note that lit is little-endian
00100110 00sssddd 5 ADD Rd, Rs
00100110 01sssddd 6 SUB Rd, Rs Rd -= Rs
00100110 10sssddd 7 XOR Rd, Rs
00100110 11sssddd 8 AND Rd, Rs
01001ddd 9 SRB Rd Rd >>= 8
10101ddd 10 NOT Rd Rd = ~Rd
00010ddd 11 INC Rd
00001111 11aaabbb 12 CMP Ra, Rb EQ = Ra == Rb
11111xxx llllllll llllllll 13 JMP #lit Absolute
11101xxx llllllll llllllll 14 JEQ #lit If EQ == 1
00111sss 15 PSH Rs Dedicated stack
11000ddd 16 POP Rd Dedicated stack
11011ddd 17 SWP Rd Rd = (Rd>>8) | (Rd<<8) (Byte swap)
01010sss 18 WTF Rs ram[0x200..0x3ff] ^= rom[0x6c + Rs]
10100xxx 19 HLT End program
And this is the memory map of the VM:
ROM
0x00..0x6b program
0x6c..0x16b p-box
RAM
0x000..0x0ff password input
0x100..0x1ff expected stack contents
0x200..0x3ff s-box
I wrote a quick and dirty disassembler for the instruction set. This is the disassembled code that runs on the VM:
start:
NOP
NOP
NOP
MOV r2, 0x3133 ; NULL terminator after mangling
MOV r3, 0x0 ; password index
MOV r4, 0xff00 ; masks
MOV r5, 0xff
pw_loop:
; load two bytes of the password
; This is where that ugly edi -= 0x1111 in mainloop matters.
; There are 18 instructions between the slots for LWX and reg MOV,
; so 18*0x1111 = 0x13332 = 0x3332 (for r0)
; r7 = ~(swap16(swap16(ram[r3]) - 0x3332))
LWX r0, [r3]
; !!!! r0 -= 0x3332
MOV r7, r0
SWP r7
NOT r7
CMP r2, r7
JEQ read_done ; end of string
; XOR both (mangled) bytes together
MOV r6, r7
AND r6, r4 ;0xff00
AND r7, r5 ;0x00ff
SRB r6
XOR r7, r6
; r7 = sbox[r7] (16b)
MOV r6, 0x200
ADD r7, r7
ADD r6, r7
LWX r7, [r6]
SWP r7
; r7 ^= peek()
; push(r7)
POP r6
XOR r7, r6
PSH r6
PSH r7
; sbox[0..0x200] ^= pbox[r3]
WTF r3
; inc r3 and loop
INC r3
JMP pw_loop
read_done:
XOR r7, r7
MOV r2, 0x100
MOV r6, 0xaf21
checkloop:
LWX r5, [r2]
SWP r5
INC r2
INC r2
POP r3
CMP r6, r5
JEQ check_last
CMP r3, r5
JEQ checkloop
MOV r3, 0x0
MOV r2, 0x0
HLT
check_last:
MOV r5, 0x5a4d
CMP r5, r3
JEQ success
MOV r3, 0x0
MOV r2, 0x0
HLT
success:
MOV r3, 0xdead
MOV r2, 0xbeef
HLT
The program essentially mangles the passphrase and then compares it with a constant. Running the process backwards recovers the original expected password. I didn’t bother reversing the first part of the mangling process, so instead I just brute force all possible 256 character choices and pick whichever one matches the expected output. This is the code that both implements the disassembler and recovers the flag:
rom = [int(x,16) for x in """
00 00 00 9A 33 31 9B 00 00 9C 00 FF 9D FF 00 80
D8 80 47 DF AF 0F D7 EF 37 00 80 7E 26 E6 26 EF
4E 26 B7 9E 00 02 26 3F 26 3E 80 F7 DF C6 26 B7
3E 3F 53 13 FF 0F 00 26 BF 9A 00 01 9E 21 AF 80
D5 DD 12 12 C3 0F F5 EF 56 00 0F DD EF 3F 00 9B
00 00 9A 00 00 A7 9D 4D 5A 0F EB EF 65 00 9B 00
00 9A 00 00 A7 9B AD DE 9A EF BE A7""".split()]
pbox = [int(x,16) for x in """A9 2D F2 6D
2E 34 AA 55 7A C3 94 CC A2 11 D8 B9 A5 F0 E2 8C
54 CB 5D 18 D8 79 5F 3A 15 9E DA EA FC 77 2B 91
4F 21 29 26 1F 60 8F C4 BE 63 87 D8 81 1E 3F 76
E8 61 EB 94 F6 FA 74 47 FB 52 BA 53 7C 59 6F 51
3E C8 EE 2F 3A 69 80 1A CF 74 60 CD 0F C9 72 C7
F9 45 AD 91 45 95 45 14 CF F5 57 6F 39 5A D8 3C
DF 96 F0 CE 90 BE 29 8E FE 67 D7 7B 8D 4F 22 D9
7A 76 47 98 50 4A F7 47 4C 92 63 44 98 D9 34 2D
F8 C2 95 CA D4 BC 89 C6 98 64 16 BC AD E2 0E FD
D0 58 6D 75 C9 10 D6 5B 0F 2A BB CF 32 3D B4 4A
FF 36 B5 D2 27 4A 91 B8 A6 0C 33 3A 35 F2 66 39
7F 7A FB 4B 35 41 1E C2 50 E1 4F D5 60 B4 1E 7D
E4 35 DC FC 3B A9 78 F5 66 AD A0 5E 93 17 DB 99
59 61 86 2F 6F 63 F8 F6 EF FB 94 47 9B 17 D8 5D
08 26 40 E9 1C 73 F5 1A 4D B4 85 02 E9 CF CF 14
65 CA 74 E7 F9 9D B6 1A C1 A7 F2 94
""".split()]
ram = [int(x,16) for x in """
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0B 2E 02 6D 92 74 0C 87 B9 93 B3 ED 2C 31 07 71
10 7D 07 20 C6 E7 1B 3A D8 BA 17 94 6B FA 6C BE
1D 62 3B 4D AD 47 7A 7A 9D 3E A2 53 2F F2 A9 D1
74 F5 73 81 BC 11 15 AE 79 61 21 AF 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
F2 AE 8B 4F 49 C3 E5 A3 A3 1A 2D A5 90 7A 8E 08
E5 0F 41 61 52 D7 40 C0 78 A9 5D 2B 8B C4 33 C2
42 CB 22 81 77 3E 09 EF CB 6C CF B0 F4 C4 75 10
C1 93 AE D9 1D EC E8 11 16 DC 2D 3B 25 6C B8 C5
C3 C1 1C A9 18 85 E1 AB 1D D0 0C A0 B9 43 97 E3
85 B5 DB 20 48 FC 1E CD 79 5B BB BD 02 E3 32 E4
63 C0 1D 95 CA 48 63 9D 87 22 AA E9 7E 26 26 0A
9B 37 50 54 CC 61 13 7E 5A F8 BF F7 45 45 82 AB
1C 6F AB E8 27 48 07 25 71 AB 42 29 73 C3 55 E4
54 F6 0B D6 17 39 B5 AE E8 5C 85 05 E5 2C DD 3C
3B DA 09 7D C4 42 86 12 8B 2B 72 99 F8 86 9D C3
F1 65 85 2E 05 53 6F 48 ED 13 9C AB 68 44 57 03
6A A6 D7 9E B9 E6 16 A3 27 F6 CC 5A C6 CB EA 06
05 CA 4F 95 90 93 72 42 CE A3 08 1A 47 2A B8 D0
76 29 86 24 62 E5 32 D0 7D B2 0D C7 AD A7 EE 98
26 B9 DD 9D 57 A6 8B E9 13 3E 42 0E 3D AF E4 19
2B 7D 17 90 C4 52 9E 5B D4 96 5F 6D DC E7 08 99
2A 0B 54 E5 A4 E0 A1 7C 81 56 F2 02 76 46 4C B6
DA E3 F6 4A C0 07 1B 6B 16 8A 0F 6A 0D 18 F6 8C
CC B3 C2 6A BD E9 3D 0E F6 15 7C 4C 57 61 F2 A9
08 5B 06 72 97 9B FB F6 B0 7D 89 69 AF 3D FD 93
BD A0 C5 24 D1 E3 20 87 E9 BC 56 8D 2D 7A B3 66
5C E9 DA C9 AE 4B B0 53 15 8F F2 F1 99 A3 3B 28
CB 5B 9C 31 EB 7B C2 F1 5B 8C BF DC C5 66 B9 B2
A6 DC 26 52 39 30 64 55 9B 4B 00 E1 41 E0 B1 02
55 DE C9 EA 27 2D 45 D9 27 D2 17 3E 88 B4 3E BD
B0 E4 25 68 65 9B AB 0D FB A3 2C DC CF 58 98 58
EB EA 71 05 E1 60 95 56 F0 E6 34 3B 7D 28 65 45
70 02 37 AD 2A 70 B0 46 EF 9C F8 C0 56 2D 49 3A
C9 19 F7 B1 46 28 EF 64 01 07 58 BE EC E7 B4 8D
D6 B1 AC 9E F4 12 9E BB 7A 33 39 93 82 38 82 34
8C C3 00 88 8E 12 9C C3 4D 62 2F DC 7C 5A AA A5
""".split()]
INSTR1 = {
0x00: "NOP",
0x09: "SRB r%(b)d",
0x15: "NOT r%(b)d",
0x02: "INC r%(b)d",
0x07: "PSH r%(b)d",
0x18: "POP r%(b)d",
0x1b: "SWP r%(b)d",
0x0a: "WTF r%(b)d",
0x14: "HLT",
}
INSTR2 = {
0x201: "MOV r%(c)d, r%(b)d",
0x202: "STW [r%(c)d], r%(b)d",
0x203: "LWX r%(c)d, [r%(b)d]",
0x098: "ADD r%(c)d, r%(b)d",
0x099: "SUB r%(c)d, r%(b)d",
0x09a: "XOR r%(c)d, r%(b)d",
0x09b: "AND r%(c)d, r%(b)d",
0x03f: "CMP r%(b)d, r%(c)d",
}
INSTR3 = {
0x13: "MOV r%(b)d, #0x%(c)x",
0x1f: "JMP loc_%(c)x",
0x1d: "JEQ loc_%(c)x",
}
ops = []
def disasm(rom):
labels = set()
i = 0
while i < len(rom):
a = rom[i] >> 3
if a in (i>>5 for i in INSTR2):
a = (rom[i] << 2) | (rom[i+1] >> 6)
b = (rom[i+1] >> 3) & 7
c = rom[i+1] & 7
ops.append((i, INSTR2[a] % {"b":b, "c":c}))
i += 2
elif a in INSTR3:
a = rom[i] >> 3
b = rom[i] & 7
c = rom[i+1] | (rom[i+2] << 8)
ops.append((i, INSTR3[a] % {"b":b, "c":c}))
if (a in (0x1f, 0x1d)):
labels.add(c)
i += 3
elif a in INSTR1:
a = rom[i] >> 3
b = rom[i] & 7
ops.append((i, INSTR1[a] % {"b":b}))
i += 1
else:
raise Exception("Undefined minor opcode 0x%02x at 0x%x" % (a, i))
#print ops[-1]
print "start:"
for addr, op in ops:
if addr in labels:
print "loc_%x:" % addr
print " " + op
def load16(b):
l = []
for i in xrange(0, len(b), 2):
l.append(b[i] | (b[i+1] << 8))
return l
# Disassemble the rom...
disasm(rom)
# Work backwards
estack = []
sboxes = []
# Load expected data into estack
for i in xrange(0x100, 0x200, 2):
w = ram[i] | (ram[i+1] << 8)
if w == 0xaf21:
break
estack.append(w)
# Generate as many sboxes as needed, performing the XOR mangling
for i in xrange(len(estack)):
sboxes.append(load16(ram[0x200:0x400]))
for j in xrange(0x200, 0x400):
ram[j] ^= pbox[i]
# Last element in the expected stack
# (this is the element that is already in the stack at program start)
estack.append(0x5a4d)
# Work backwards to obtain flag
lb = 0 # Previous (next) output byte (0 = NULL terminator)
out = ""
while len(estack) > 1:
# Get expected word from stack
x = estack.pop(0)
# XOR with the previous word
x ^= estack[0]
# Pop the last sbox
sbox = sboxes.pop()
# Undo the sbox (this will raise an exception if fail)
x = sbox.index(x)
# Brute force the final mangling instead of inverting the function...
for i in xrange(256):
# Big endian load together w/ last byte
v = (i << 8) | lb
# EDI mangle (r0 -= 0x3332)
v = (v - 0x3332) & 0xffff
# Swap bytes again
v = (v >> 8) | ((v << 8) & 0xff00)
# Invert
v = (~v) & 0xffff
# XOR bytes together
v = (v >> 8) ^ (v & 0xff)
# Hope we can find a match...
if v == x:
lb = i
out += chr(i)
break
# Reverse flag
print out[::-1]
Result: V1rTu4L_M4ch1n3s_4r3_Aw3s0m3!