Bypass DEP and NX bit | Bypass ASLR | Bypass Stack Canary and Cookie
Buffer overflows are not anymore the most popular vulnerabilities. The vulnerability analysis tools aid the developers to identify buffer overflow vulnerabilities (at least the obvious ones) at the time of development and this significantly had reduced from the number of buffer overflow vulnerabilities. Moreover protections such as Non-Executable Stacks, Address Space Layout Randomization and Stack Canaries have made the life miserable for the buffer overflow exploit writers even when they found a buffer overflow. Nonetheless buffer overflows are still a threat and in some situations are exploitable and can lead to the old school arbitrary code execution.
Learning the methods to bypass each of the aforementioned security mechanisms not only gives you an insight as a exploit developer to leverage a buffer overflow vulnerability in case it is exploitable but also guides you not to waste your time identifying buffer overflows in case they are not exploitable. As a developer point of the view also, you understand the importance of a security mechanism.
Non-Executable Stack (NX bit) | Data Execution Prevention (DEP)
In case you still haven’t read my introduction to the Buffer overflow exploit development I strongly encourage you to do so before reading the rest of this article. In that article we placed our shellcode as part of the buffer on the stack. Well of course that was possible because we were working on an old operating system with executable stack. This is a requisite since in case of Non-Executable stack the CPU does not execute instructions on the stack memory.
Non-Executable stacks are feasible by the NX bit of the architecture. Starting from the last versions of Intel x86-32 there has been a NX bit around which if enabled the codes on the stack cannot be executed. It took some time before operating systems make use of this bit. For example Microsoft offered Data Execution Prevention (DEP) feature in the last service packs of windows XP that enables the NX bit for the stack memory. Shipping out the x86-64 platforms, mostly all operating systems have activated Non-Executable stack by means of page table entries.
Address Space Layout Randomization (ASLR)
Before ASLR, you could easily predict the address of your shellcode almost precisely. You saw how we did that using a debugger in my introduction to the Buffer overflow exploit development. To understand the ASLR and the difference it caused you need to know how a process address space is managed. Nearly all operating systems support the paging and paging brings virtual address space isolation. This means every process sees the whole memory as its own (or at least part of the memory except the kernel part). None of the processes can reference other process’s memory. This is possible because every process has its own page table and when a process is scheduled for execution its page table is loaded into the MMU. Process address space isolation made the life very easy for compilers. That is, compilers start from a virtual address and use them in the executable of the program. Considering our example, we used this logic to predict the address of our shellcode. Operating system dependent factors are still involved in the program address space determination. For example the environment variables change the starting address of executing program variables on the stack. However by use of NOP seldom the effect of OS factors can easily be neutralized.
By the advance of ASLR, program virtual addresses were not fixed anymore and we could not hard code the address of our shellcode to the exploit. If ASLR is activated for a program, in each run the addresses change and you don’t know the address of your shellcode at the time of injecting it to the buffer. ASLR is an operating system feature. Knowing if your operating system of choice has ASLR feature is as easy as running a program multiple times and examining its addresses with a debugger. ASLR operates in two modes. In fact it depends on your compilation that in which mode your program operates. In the first mode of ASLR just the stack and data segments’ address are randomized and the code or text segment addresses are fixed. In the second mode which is the default mode for the network library and sensitive programs all of the segments’ addresses are randomized. Knowing in which mode your program of choice operates is very important as you see in the bypassing ASLR protection section.
Stack Canaries | Cookies
Stack Canary is a compiler feature not the operating system or the architecture. As you have seen in my introduction to the Buffer overflow exploit development our final goal to leverage a buffer overflow was to overwrite the return address on the stack. The stack canary provides a method to identify Instruction pointer (IP) overwrite and stopping the execution. The logic is very simple a 4 byte data known as stack canary or stack cookie is placed just before the return address on the stack and a check is performed before the function returns to see if it’s been overwritten. If it’s been overwritten, the return address is probably overwritten too and the execution should be stopped before the control flow goes to the shellcode.
Compilers place a piece of code or a function at the beginning of each function that adjusts the stack and place the stack canary on it. They also add a piece of code at the end of each function to check the current stack canary value against the initialized one and stop the execution if any attacks found. Of course this affects the performance so some compilers just activate the stack canary for sensitive functions (those that have at least one variable or one input). Below is an example of a function which will be called at the beginning of a stack-canary-activated function (the code is taken from a guide to the kernel exploitation book):
__SEH_prolog4_GS: push offset _except_handler4 push dword ptr fs: mov eax,dword ptr [esp+10h] mov dword ptr [esp+10h],ebp lea ebp,[esp+10h] sub esp,eax push ebx push esi push edi mov eax,dword ptr [__security_cookie] xor dword ptr [ebp-4],eax xor eax,ebp mov dword ptr [ebp-1Ch],eax eax mov dword ptr [ebp-18h],esp push dword ptr [ebp-8] mov eax,dword ptr [ebp-4] mov dword ptr [ebp-4],0FFFFFFFEh mov dword ptr [ebp-8],eax lea eax,[ebp-10h] mov dword ptr fs:[00000000h],eax ret
This code also adjusts the exception registration record on the stack for a windows 2003 32 bit.
Bypassing Anti-Exploitation protections
Now that you know what are the buffer overflow protections and where they come from, you are ready to learn how to bypass them. We first introduce the method particular to each protection and then review the approach to circumvent them when they are at place together.
Bypass Non-Executable Stack protection | NX bit
Bypass Non-Executable Stack protection in local exploitation
In local kernel exploits you can easily bypass the NX bit security protection. You save your shellcode as part of your executable that exploits the vulnerability and you redirect the execution to the address of the shellcode. The success of this method again depends on your architecture. On CISC architectures kernel address space is above the user land address space you so can address the userland memory in the kernel. When you save your shellcode as part of the executable and you trigger a kernel exploit, the shellcode is accessible from the kernel vulnerable path.
Bypass Non-Executable Stack protection in remote exploitation
Non-Executable stacks as mentioned prevent you from injecting your shellcode to the vulnerable stack buffer. One of the solutions to this problem is return-to-lib approach. In this approach instead of running the code to spawn the shell we redirect the execution to a shared library which does the same. One of the traditional functions to redirect to is the “system” function in the libc library. Of course the success of this solution depends to the architecture. On x86-32 architectures, parameters are passed on the stack and since you have control over the stack so you can put the parameters to the system function on the stack and call the system function by replacing the return address with its address. On x86-64 architectures however, parameters are passed using registers so this solution is not an option. But an enhanced version of this method named code borrowing can be acquired. In code borrowing you redirect the execution to some pop instructions that loads the parameters on the stack to the registers. Then when your registers are ready you redirect the execution to the function. For redirection the last pop should be terminated with Ret.
Shared library developers learned to remove functions like system function so you cannot easily redirect execution to a critical libc function because it probably does not exist. Exploit writers on the other hand invented another successor of this method: Return Oriented Programming. In this approach instead of redirecting the call just once, you make up the stack such that several redirections take place. Each redirection is terminated with a “ret” instruction and in each redirection part of the shellcode is executed. When the execution arrives to the “ret”, the next redirection address is popped from the stack, the place we have control. The addresses for redirection points to the places in the text segment of the executable or shared libraries. This method works on the x86-32 architecture. By combining this method with code borrowing you can also exploit x86-64 architectures.
Bypassing Address Space Layout Randomization | ASLR
As we mentioned in the Address Space Layout Randomization section there are two modes of ASLR. In the position independent code mode, the code, data and the stack segments are randomized. But in the second mode just the stack and data segment addresses are randomized. Of course the ladder is easier to bypass. The only thing you need is to find a JMP ESP instruction in the code (.text) segment of the executable and redirect the control flow there. It may sound that JMP ESP is an odd instruction to exist in a normal executable but no worries! JMP ESP opcode is 0xffe4 and since x86 does not need addresses to be aligned in memory you can jump to the middle of an instruction which has this pattern. .text segment addresses are not randomized so the address you find to redirect to is fixed. After redirection the control goes back to the stack and the only thing you need is to place the shell code after the return address on the stack. If the overflow is not that big you can just place a “short Jump” instruction after the return address on the stack to redirect the control to the addresses below the return address. Since “short Jump” needs a relative position you just need to know the offset of the shellcode to the return address and insert the twos complement of the negative of that offset after the return address.
If the code is compiled with the position independent code option then you have to look the executable shared libraries for fixed addresses. For example fast system calls like Vsyscall was added to x86-64 linux kernels before 3.1. These fast system calls like time(), gettimeofday() and getcpu() are fixed to static addresses. On windows also Process Information Block (PEB) data structure used to be in a fixed address. Using the ROP method to bypass DEP discussed in the previous section you can assemble a series of usefull gadgets (several instructions ended with a ret instruction) in these fixed addresses and make up your shellcode.
There also might be weaknesses in the ASLR implementation that allows you brute forcing the exploit. Brute forcing is possible when the randomization is somehow predicted i.e. the range of randomized addresses are known somehow. In those cases just one success is enough to compromise the system.
Bypassing Stack Canaries | Cookies
I open this discussion by the most obvious solution that is revealing the stack canary and forger it in your exploit. Theoretically this solution may seem simple but there are considerations. First the canary isn’t going to be predictable nor is it a fixed value. However in some cases it is proved to be the same for a process during its life cycle. This means if you have already some sort of control over the running process you may be able to reveal the canary value and forger it. This is very useful for a local exploit and specifically for a kernel local exploit. For remote exploits however, it doesn’t seem feasible unless you have also an arbitrary read vulnerability which allows you to fingerprint the memory and use the enumerated canary value for a subsequent stack buffer overflow exploit.
Stack canaries used to be easy to bypass especially on Microsoft operating systems before the NT 6 kernel. For example on windows XP or 2003 (all of the service packs) you could easily bypass the stack canary checking with the aid of Structured Exception Handling (SEH) mechanism. SEH was a method to handle the exceptions which Microsoft introduced in its C++ version. C++ compiled applications on windows operating systems before NT 6 place Exception registration records on the stack. When an exception raises the address of first Exception registration record is fetched and the control is passed to it. If a buffer overflow is big enough, the attacker can overwrite Exception registration record on the stack and continue the overflow until an exception arises (for example when the overflow goes to addresses that are not mapped). The Exception handler pops the exception record (manipulated by the attacker) from the stack and the control goes there. Because of the exception and the fact the function execution is not finished the code to check the canary is never called and you redirected the execution to wherever you want using the manipulated exception registration record.
With the advance of NT version 6 kernels, SHE overwrite does not work anymore since the exception registration records are no longer placed on the stack. In those situations the simplest antidote against stack canary is not to touch the canary or the return address. In this method you hunt an important variable on the stack and do not touch the canary. You might be able to arbitrary execute a command if there is a function pointer on the stack. If not you must look for a sensitive variable on the stack which its manipulation gains some benefits. Another solution is to turn a buffer overflow to an indexed based overflow and overwrite the return address without touching the canary. Of course this method does not work always since not any overflow can be turned to an index based overwrite.
I close the discussion of stack canaries by a talk on symmetric multi processor (SMP) systems. As you may know on these systems several processors are executing instructions concurrently. This feature opens a new exploitation vector for us. Consider your target is a process that has multiple threads and these threads are scheduled to run each on one of the CPUs. In this situation if you manage to cause a large overflow that exceeds the current page and overwrites the next page you may have a chance to execute an arbitrary command. Here the next page may contain bytes that are being executed by another process so you may have a chance to get your shellcode executed before the first thread triggers the canary overwrite fault!
Bypassing multi layered | defense in depth protections
In a defense in depth strategy a combination of aforementioned protections is at place. Normally you see both ASLR and DEP (Non executable stack) in the current Microsoft, Linux and Mac operating systems although there are still programs that do not support these features. The latest approach to bypass these protections is to use ROP. On x86-64 systems you may also need code borrowing because function parameters are passed in registers on those architectures. If a program is compiled by stack canary option then it really depends on your case. If the buffer overflow can be turned to an index based overflow then you have a chance to overwrite the return address without touching the stack cookie.
Stack overflows are not as common and popular as several years ago. The prevention methods from one side and the protection mechanisms from the other side have made the exploitation of buffer overflows very hard. Now that you’re familiar with the architecture, operating system and compiler barriers in the way of exploitation (and also antidotes) you can secure your software products with open eyes. On the other hand if you’re a hacker or penetration tester you should have learned that hacking is an art and it requires creativity; there may be an easy solution around the corner in your case waiting for your creativity! That being said, buffer overflows are still a threat but do not invest your time more than required finding stack overflows in cases where the bars are too high!