http://blog.oxff.net/#xwuqt3dgysac6jqbro5q

2011-08-19 16:23

Executing 32bit Code in a 64bit Linux 2.6 Process

Before I start diving into the technical details in this blog post, a few words on fair use:

Recently, the IT Security community is experiencing a rise of rip-off without attribution as a whole. While this likely has always been going on, at least personally I became more aware about this. That is also the reason, why I am rarely publishing any open source any more. When I recently answered an email from a commercial instution, asking if libscizzle source code would be available at some point and I told them, it wouldn't, I got the best reply ever: «Ok, we will just wait until your bachelor thesis [on this topic] will be published and reimplement it.»

Thanks for the faith in my research; unfortunately this means I just won't make the thesis conveniently accessible (more than what is required by the thesis requirements anyway). In an attempt to find the right balance between not nurishing commercial exploiters and keeping Sebastian Krahmer happy (who is doing a great job at sharing code for free), here goes the first post of sharing some knowledge about my research without giving away the assembled puzzle, ready to be copied in some cheap development costs country.

While it is a common misconception that the current execution mode is a per-process specific setting set by the kernel, this is not true in its entirety. As the processor has no notion of processes (traditionally, the x86 CPU has hardware support for multiple tasks, which is usually not utilized by modern operating system code), the current execution mode is actually controlled by a bit in the segment descriptor for the current code segment.

By creating a new segment for 32bit execution within the Local Descriptor Table using the __modify_ldt system call and loading this segment's selector into the cs register using a call far instruction, we can actually execute 32bit code. Unfortunately, the call ptr16:32 instruction encoding has been deemed invalid in 64bit mode for unknown reasons. This leads to the slightly convoluted trampoline generation code shown below that uses RIP relative addressing. The original stack pointer is saved in the r8 register, which is only accessible from 64bit mode. All other saving of 32bit accessible register parts can then be done on the 32bit stack using pushad and pushf.

                if(!allocateLowStack(m_stack32)) {
                        throw;
                }
                /* mov r8, rsp */
                * (uint16_t *) &m_trampoline[0] = 0x8949;
                m_trampoline[2] = 0xe0;
                /* mov rsp, ... */
                * (uint16_t *) &m_trampoline[3] = 0xbc48;
                * (uint64_t *) &m_trampoline[5] = (uint64_t) m_stack32;
                /* call far [rip+4] */
                * (uint16_t *) &m_trampoline[13] = 0x1dff;
                * (uint32_t *) &m_trampoline[15] = 4;
                /* mov rsp, r8 */
                * (uint16_t *) &m_trampoline[19] = 0x894c;
                m_trampoline[21] = 0xc4;                
                /* ret */
                m_trampoline[22] = 0xc3;      
                /* m16:32 call far destination */
                * (uint32_t *) &m_trampoline[23] =
                        (uint32_t) (uint64_t) m_32bitCode;
                * (uint16_t *) &m_trampoline[27] = codeSelector;

The Linux Kernel's implementation of access to the Local Descriptor Table has no notion of this special control bit for code segments. However, it just so happens that the code will zero initialize the whole segment descriptor before writing it and therefore also zero out this control bit. Although this control bit cannot be set, leaving it at zero corresponds to setting the execution mode to 32bit. Therefore, it is out of pure luck for us possible to achieve the execution of 32bit code within 64bit processes, albeit the opposite would be impossible.

So how could one approach executing 64bit code inside a 32bit process? While this is not relevant to my shellcode detection research, it might be interesting to packer developers or ptrace sandbox escapists. One possible solution would be reusing the 64bit code segment selector available from the GDT on 64bit kernels, as this should map the entire address space with the long mode bit set. I didn't test if this is possible, though, as the kernel might have a separate GDT for 32bit kernels. On Windows this trick is definitely possible.

Update: Doing a 9a . . . . 33 00 or in other words call far 0x33:.... does indeed allow 64bit code execution in a 32bit process, without anything to set up with system calls. This probably breaks some ptrace based sandboxes. GDB gets confused by this as well, the second single step will look like it skipped two instructions, because the 32bit instructions disassembled by GDB are actually one 64bit instruction:

GDB 32bit vs. 64bit