February « 2011 « RCE Endeavors

February 21, 2011

API Hooking Through Near Call Replacement

Filed under: General x86 — admin @ 3:00 AM

This post will focus on an interesting technique that I thought of this past week. I’ve noticed that a lot of API hooking libraries, or techniques in general, rely on replacing the prologue to the function with a jump into the hook (e.g. writing in E9 XXXXXXXX and jumping to a trampoline). This technique is quick, effective, and reliable in most scenarios. However, it is also easily detectable since all someone has to do is just check the first five bytes of a function to see whether there is a jump or not. The idea that I thought of this past weekend (although I’m sure I’m not the only one) was to replace the references to a function to that of the hook. For example, suppose I have a function at .text:00555440

.text:00555440                 push    esi
.text:00555441                 mov     si, [esp+4+arg_0]
.text:00555446                 test    si, si
.text:00555449                 jl      loc_5554FE
.text:0055544F                 mov     eax, [ecx+0A4h]
.text:00555455                 movsx   edx, si
...

The actual content is not important. Now again suppose that this function has five places that it’s being called from: 0045A9D5, 0045AA9E, 0045AAC3, 0045AAE1, 0045AB53.

...
.text:0045AA9C                 mov     ecx, esi
.text:0045AA9E                 call    sub_555440
.text:0045AAA3                 jmp     loc_45AB3A
...
.text:0045AADF                 mov     ecx, esi
.text:0045AAE1                 call    sub_555440
.text:0045AAE6                 jmp     short loc_45AB3A
...

and so on. My idea entails replacing all five call sub_555440 instructions to a call to the hooking function. Visually, it is something like this:

Where the green line denotes the new path of the calling functions. While easy to explain pictorially or in words, actually programming it is somewhat tedious. The main issue that is x86 has eight different ways on how a function can be called (reproduced below from here)

Opcode	Mnemonic	Description
`E8 cw`	`CALL rel16`	Call near, relative, displacement relative to next instruction
`E8 cd`	`CALL rel32`	Call near, relative, displacement relative to next instruction
`FF /2`	`CALL r/m16`	Call near, absolute indirect, address given in r/m16
`FF /2`	`CALL r/m32`	Call near, absolute indirect, address given in r/m32
`9A cd`	`CALL ptr16:16`	Call far, absolute, address given in operand
`9A cp`	`CALL ptr16:32`	Call far, absolute, address given in operand
`FF /3`	`CALL m16:16`	Call far, absolute indirect, address given in m16:16
`FF /3`	`CALL m16:32`	Call far, absolute indirect, address given in m16:32

The ones that I chose to focus on are the two most common ways that 32-bit programs call functions — E8 cw or FF /2, the 32-bit near relative and near absolute calls. Replacing them involves parsing through the entire file in memory, section by section, and replacing references as they are found. For the actual implementation I cheated a bit and parsed a user-supplied section for the references, but it can be extended to do through all without any problem, just more computation time. The actual implementation looks like this

void replace_references(DWORD_PTR image_base, const char* section_name, LPVOID original_addr, LPVOID hook_addr) {
    PIMAGE_DOS_HEADER dos_header = (PIMAGE_DOS_HEADER)image_base;
    PIMAGE_SECTION_HEADER section_header = get_section_addr_by_name(image_base, section_name);
    BYTE* section_start = (BYTE*)(image_base + section_header->VirtualAddress);
    BYTE* section_end = section_start + section_header->SizeOfRawData;
    BYTE call_near_relative = 0xE8;
    BYTE call_near_absolute[] = {0xFF, 0x15};
    DWORD new_protections = PAGE_EXECUTE_READWRITE;
    DWORD old_protections = 0;
    for(section_start; section_start < section_end; ++section_start) {
        VirtualProtect(section_start, PROTECT_SIZE, new_protections, &old_protections);
        if(*section_start == call_near_relative) {
            BYTE offset[] = {
                *(section_start + 0x4),
                *(section_start + 0x3),
                *(section_start + 0x2),
                *(section_start + 0x1)
            };
            DWORD_PTR relative_offset = (((offset[0] & 0xFF) << 24) | ((offset[1] & 0xFF) << 16) |
                ((offset[2] & 0xFF) << 8) | offset[3] & 0xFF) + NEAR_PATCH_SIZE;
            if((section_start + relative_offset) == original_addr) {
                DWORD_PTR hook_offset = (BYTE*)hook_addr - section_start - NEAR_PATCH_SIZE;
                patch_memory((section_start + 0x1), &hook_offset);
            }
        }
        else if(memcmp(section_start, call_near_absolute, sizeof(call_near_absolute)) == 0) {
            BYTE offset[] = {
                *(section_start + 0x5),
                *(section_start + 0x4),
                *(section_start + 0x3),
                *(section_start + 0x2)
            };
            PDWORD_PTR absolute_addr = (PDWORD_PTR)(((offset[0] & 0xFF) << 24) | ((offset[1] & 0xFF) << 16) |
                ((offset[2] & 0xFF) << 8) | offset[3] & 0xFF);
            __try {
                if(*absolute_addr == (DWORD_PTR)original_addr)
                    patch_memory(absolute_addr, &hook_addr);
            }
            __except(translate_exception(GetExceptionCode(), GetExceptionInformation())) {
                //Dereferenced bad pointer
            }
        }
        VirtualProtect(section_start, PROTECT_SIZE, old_protections, NULL);
    }
}

void replace_references(DWORD_PTR image_base, const char* section_name, LPVOID original_addr, LPVOID hook_addr) { PIMAGE_DOS_HEADER dos_header = (PIMAGE_DOS_HEADER)image_base; PIMAGE_SECTION_HEADER section_header = get_section_addr_by_name(image_base, section_name); BYTE* section_start = (BYTE*)(image_base + section_header->VirtualAddress); BYTE* section_end = section_start + section_header->SizeOfRawData; BYTE call_near_relative = 0xE8; BYTE call_near_absolute[] = {0xFF, 0x15}; DWORD new_protections = PAGE_EXECUTE_READWRITE; DWORD old_protections = 0; for(section_start; section_start < section_end; ++section_start) { VirtualProtect(section_start, PROTECT_SIZE, new_protections, &old_protections); if(*section_start == call_near_relative) { BYTE offset[] = { *(section_start + 0x4), *(section_start + 0x3), *(section_start + 0x2), *(section_start + 0x1) }; DWORD_PTR relative_offset = (((offset[0] & 0xFF) << 24) | ((offset[1] & 0xFF) << 16) | ((offset[2] & 0xFF) << 8) | offset[3] & 0xFF) + NEAR_PATCH_SIZE; if((section_start + relative_offset) == original_addr) { DWORD_PTR hook_offset = (BYTE*)hook_addr - section_start - NEAR_PATCH_SIZE; patch_memory((section_start + 0x1), &hook_offset); } } else if(memcmp(section_start, call_near_absolute, sizeof(call_near_absolute)) == 0) { BYTE offset[] = { *(section_start + 0x5), *(section_start + 0x4), *(section_start + 0x3), *(section_start + 0x2) }; PDWORD_PTR absolute_addr = (PDWORD_PTR)(((offset[0] & 0xFF) << 24) | ((offset[1] & 0xFF) << 16) | ((offset[2] & 0xFF) << 8) | offset[3] & 0xFF); __try { if(*absolute_addr == (DWORD_PTR)original_addr) patch_memory(absolute_addr, &hook_addr); } __except(translate_exception(GetExceptionCode(), GetExceptionInformation())) { //Dereferenced bad pointer } } VirtualProtect(section_start, PROTECT_SIZE, old_protections, NULL); } }

There is quite a bit to explain here. First off, the function takes four parameters — the base of the executable (obtained with GetModuleHandle(NULL)), a string to the section name to perform the replacements in, the desired address to be replaced, and lastly, the address of the hooking function that will replace it. The function then goes through the section byte by byte checking for either the E8 or FF 15 opcodes. Once these are found it is time to check and see if the call is to the correct place. For a relative near call this is pretty simple. The offset to the destination is the next four bytes – 0x5. All that needs to be done is to get those four bytes, change their endianness, and see whether the call leads to the address that is to be replaced. For an absolute near call, the process gets a bit tricky. The absolute address of the function to be called is stored in a 32-bit register. This means that the register needs to be read and dereferenced. The contents inside that register then need to be changed to that of the address of the hooking function. For example,

004010F1  |. FF15 C0204000  CALL DWORD PTR DS:[<&USER32.MessageBoxA>>; \MessageBoxA

Here [004020C0] will contain 757CFEAE which is the address of USER32.MessageBoxA. To hook the call, [004020C0] should be replaced with the address of the hook. One issue the absolute address that this function finds may not in fact be an absolute address. Since the function simply reads a byte (or two) at a time for a match, the absolute address may not actually be an address. It may be two different instructions that coincidentally got interpreted as one by this function. Short of writing a disassembler, there is not much that can be done about this since instructions are variable length on the x86 architecture. It will simply suffice to catch the inevitable access violation error that will result from trying to dereference an invalid pointer. However, if a valid address if found (for a relative or absolute call), patching it is very simple

void patch_memory(LPVOID patch_addr, LPVOID replacement_addr) {
    DWORD old_protections = 0;
    VirtualProtect(patch_addr, sizeof(DWORD_PTR), PAGE_EXECUTE_READWRITE, &old_protections);
    memmove(patch_addr, replacement_addr, sizeof(DWORD_PTR));
    VirtualProtect(patch_addr, sizeof(DWORD_PTR), old_protections, NULL);
}

That is about it for how it works. The usage of it is pretty straightforward as well. It is easy to hook a function by an absolute address or by its name. Below is a code snippet that hooks MessageBoxA and also hooks a function at 00401000. The hook at 00401000 is more of a specialized exception to how hooking an unknown function by address should be done. Typically it would be done as an offset from an image or section instead of a full absolute address. The absolute version was shown simply to save lines of code and demonstrate the technique.

       DWORD_PTR image_base = (DWORD_PTR)GetModuleHandle(NULL);
        //Consider a new thread for actual use
        replace_references(image_base, ".text", &MessageBoxA, &MessageBoxA_hook);
        if(image_base == 0x00400000)
            replace_references(image_base, ".text", generate_number, &generate_number_hook);
        else
            MessageBox(NULL, L"Executable did not load at 0x00400000", L"Error", MB_ICONEXCLAMATION);

Here the base of the executable is retrieved and MessageBoxA and an “unknown” function in the executable are hooked. The executable is provided in the zip and should always load at 00400000. Again, this was done to simply demonstrate the technique and is not the way it should be done as programs can have randomized base addresses. The two hooking functions look pretty standard

typedef int (__cdecl *pgenerate_number)(int a, int b);
pgenerate_number generate_number = (pgenerate_number)(0x00401000); //Example compiled to load at a fixed base address
//Otherwise an offset from an image/section base would be needed
 
int __cdecl generate_number_hook(int a, int b) {
    return 666;
}
 
int WINAPI MessageBoxA_hook(HWND hwnd, LPCSTR lpText, LPCSTR lpCaption, UINT uType) {
    __asm pushad
    MessageBoxA(hwnd, "Hooked MessageBoxA called!", "MessageBoxA_hook", uType);
    __asm popad
    return MessageBoxA(hwnd, lpText, lpCaption, uType);
}

The hooks are designed for the sample application, but the technique extends to any application with minimal modifications (none in the case of MessageBoxA). Below is a screenshot of them in action:

The hook for MessageBoxA is called and the one for the “unknown” function is called too to always return “666”.
It should definitely be noted that this technique does have its downsides. The first one was already mentioned previously — the fact that you cannot really tell where you are exactly when you match what could be a call. This is a problem that cannot be fixed easily, it really would require writing a large part of a disassembler to fix. The second major problem is that it may be computationally expensive. There may be a lot of sections or the section sizes may be very large. The best fix would be to do a multithreaded replacement on the desired sections. The application will not freeze during the scan, but the functions may be called before the scanner arrives to replace the addresses. The third problem is that modules may be loaded after yours that call the functions, thus they will not be patched. This is more of a simple fix and can be remedied by hooking LoadLibrary/LoadLibraryEx and patching the addresses before returning to the application. The fourth, and what I think is the largest problem, is that this technique is extremely difficult to defeat. Doing something like

MOV EAX, [004020C0]
NOP
CALL DWORD PTR:[EAX]

completely defeats this method. This can be minimized by heuristically looking at the instructions being scanned, but the obfuscations can get much more complex to the point of it being impractical. Alternatively, any address that is purely calculated at runtime will not be spotted by this. While being an interesting technique, I can understand why I would not find any current existing implementations of it.
In the archive below is the full source code for the hook and sample programs as well as binaries for the DLL and sample EXE that can be hooked.
Download: Archive of source/binaries

A downloadable PDF of this post can be found here.

Comments (0)

February 14, 2011

Running a (32-bit) Process in the Context of Another

Filed under: General x86 — admin @ 12:53 AM

Follow me

I’ve recently finished reading Malware Analyst’s Cookbook, a very thorough and up-to-date book on malware analysis. Among one of the many useful things in the later chapters was a code injection technique in which a process runs in anothers’ context, which they called “process hollowing”. Googling the term yields no useful results, and the closest that I could find was a 2004 paper by Tan Chew Keong called, “Dynamic Forking of Win32 EXE” in which the technique is outlined in a similar manner as the book. I decided to give a shot at implementing this one afternoon, especially since code for the technique was partially provided in the book.

The overall technique is very simple. A malicious process executes, creates a benign looking process (svchost.exe, lsass.exe, …) in a suspended state. The malware then deallocates all of the memory of the benign process and replaces it with its own. Once this is done, the malware resumes the process and now the benign looking process is executing only malicious code. The main benefit that this technique provides over simply naming the malware (svchost.exe, lsass.exe, …) is that the processes’ PEB is untouched; meaning that it will preserve valid looking fields in important structures such as RTL_USER_PROCESS_PARAMETERS, which stores important information such as the path on file, command line parameters, working directories, etc. Therefore, to someone inspecting the process externally or statically (e.g. not attaching a debugger to a running instance), everything will appear normal. Below is a utility that I coded up in a few hours which runs one process in the context of another. The variation and introduction of an external application to do this is intentional, since just writing one process that runs in the context of another by itself is pretty useless (except for malware). The version shown here is about as stripped down as can get — it simply demonstrates the technique. Anyone using the code should definitely have error handling at each step. I also won’t take any undeserved credit for the code provided below except the fact that I physically typed it up and haven’t found code written in C++ showing this technique (excusing the above-mentioned link to the paper, which does it slightly differently). There are, of course, the obvious limitations to this technique as implemented (32-bit to 32-bit replacement only, the two processes must have the same subsystem, etc.)

#include <Windows.h>
#include <assert.h>
 
typedef NTSTATUS (__stdcall* pNtUnmapViewOfSection)(HANDLE ProcessHandle,
    PVOID BaseAddress);
 
typedef struct {
    PIMAGE_DOS_HEADER dos_header;
    PIMAGE_NT_HEADERS nt_headers;
    PIMAGE_SECTION_HEADER section_header;
    LPBYTE file_data;
} NEW_PROCESS_INFO, *PNEW_PROCESS_INFO;
 
void get_replacement_info(const char* full_file_path, PNEW_PROCESS_INFO new_process_info) {
    HANDLE hFile = CreateFileA(full_file_path, GENERIC_READ, FILE_SHARE_READ, NULL,
        OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
    DWORD file_size = GetFileSize(hFile, NULL); //Note: High DWORD ignored, dangerous with >4GB files :-P
    new_process_info->file_data = (LPBYTE)malloc(file_size * sizeof(LPBYTE));
    DWORD bytes_read;
    ReadFile(hFile, new_process_info->file_data, file_size, &bytes_read, 0);
    assert(bytes_read == file_size);
    new_process_info->dos_header = 
        (PIMAGE_DOS_HEADER)(&new_process_info->file_data[0]);
    new_process_info->nt_headers = 
        (PIMAGE_NT_HEADERS)(&new_process_info->file_data[new_process_info->dos_header->e_lfanew]);
}
 
int main(int argc, char* argv[]) {
    NEW_PROCESS_INFO new_process_info;
    PROCESS_INFORMATION process_info;
    STARTUPINFOA startup_info;
    RtlZeroMemory(&startup_info, sizeof(STARTUPINFOA));
    pNtUnmapViewOfSection NtUnmapViewOfSection = NULL;
    CreateProcessA(NULL, argv[1], NULL, NULL, FALSE, CREATE_SUSPENDED,
        NULL, NULL, &startup_info, &process_info);
    get_replacement_info(argv[2], &new_process_info);
    NtUnmapViewOfSection = (pNtUnmapViewOfSection)(GetProcAddress(
        GetModuleHandleA("ntdll.dll"), "NtUnmapViewOfSection"));
 
    //Remove target memory code
    NtUnmapViewOfSection(process_info.hProcess, (PVOID)new_process_info.nt_headers->OptionalHeader.ImageBase);
 
    //Allocate memory in target process starting at replacements image base
    VirtualAllocEx(process_info.hProcess, (PVOID)new_process_info.nt_headers->OptionalHeader.ImageBase,
        new_process_info.nt_headers->OptionalHeader.SizeOfImage,
        MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
 
    //Copy in PE header of replacement process
    WriteProcessMemory(process_info.hProcess, (PVOID)new_process_info.nt_headers->OptionalHeader.ImageBase,
        &new_process_info.file_data[0], new_process_info.nt_headers->OptionalHeader.SizeOfHeaders, NULL);
 
    //Write in all sections of the replacement process
    for(int i = 0; i < new_process_info.nt_headers->FileHeader.NumberOfSections; i++) {
        //Get offset of section
        int section_offset = new_process_info.dos_header->e_lfanew + sizeof(IMAGE_NT_HEADERS) +
            (sizeof(IMAGE_SECTION_HEADER) * i);
        new_process_info.section_header = (PIMAGE_SECTION_HEADER)(&new_process_info.file_data[section_offset]);
        //Write in section
        WriteProcessMemory(process_info.hProcess, (PVOID)(new_process_info.nt_headers->OptionalHeader.ImageBase +
            new_process_info.section_header->VirtualAddress),
            &new_process_info.file_data[new_process_info.section_header->PointerToRawData],
            new_process_info.section_header->SizeOfRawData, NULL);
    }
 
    //Get CONTEXT of main thread of suspended process, fix up EAX to point to new entry point
    LPCONTEXT thread_context = (LPCONTEXT)_aligned_malloc(sizeof(CONTEXT), sizeof(DWORD));
    thread_context->ContextFlags = CONTEXT_FULL;
    GetThreadContext(process_info.hThread, thread_context);
    thread_context->Eax = new_process_info.nt_headers->OptionalHeader.ImageBase +
        new_process_info.nt_headers->OptionalHeader.AddressOfEntryPoint;
    SetThreadContext(process_info.hThread, thread_context);
 
    //Resume the main thread, now holding the replacement processes code
    ResumeThread(process_info.hThread);
 
    free(new_process_info.file_data);
    _aligned_free(thread_context);
    return 0;
}

#include <Windows.h> #include <assert.h> typedef NTSTATUS (__stdcall* pNtUnmapViewOfSection)(HANDLE ProcessHandle, PVOID BaseAddress); typedef struct { PIMAGE_DOS_HEADER dos_header; PIMAGE_NT_HEADERS nt_headers; PIMAGE_SECTION_HEADER section_header; LPBYTE file_data; } NEW_PROCESS_INFO, *PNEW_PROCESS_INFO; void get_replacement_info(const char* full_file_path, PNEW_PROCESS_INFO new_process_info) { HANDLE hFile = CreateFileA(full_file_path, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL); DWORD file_size = GetFileSize(hFile, NULL); //Note: High DWORD ignored, dangerous with >4GB files :-P new_process_info->file_data = (LPBYTE)malloc(file_size * sizeof(LPBYTE)); DWORD bytes_read; ReadFile(hFile, new_process_info->file_data, file_size, &bytes_read, 0); assert(bytes_read == file_size); new_process_info->dos_header = (PIMAGE_DOS_HEADER)(&new_process_info->file_data[0]); new_process_info->nt_headers = (PIMAGE_NT_HEADERS)(&new_process_info->file_data[new_process_info->dos_header->e_lfanew]); } int main(int argc, char* argv[]) { NEW_PROCESS_INFO new_process_info; PROCESS_INFORMATION process_info; STARTUPINFOA startup_info; RtlZeroMemory(&startup_info, sizeof(STARTUPINFOA)); pNtUnmapViewOfSection NtUnmapViewOfSection = NULL; CreateProcessA(NULL, argv[1], NULL, NULL, FALSE, CREATE_SUSPENDED, NULL, NULL, &startup_info, &process_info); get_replacement_info(argv[2], &new_process_info); NtUnmapViewOfSection = (pNtUnmapViewOfSection)(GetProcAddress( GetModuleHandleA("ntdll.dll"), "NtUnmapViewOfSection")); //Remove target memory code NtUnmapViewOfSection(process_info.hProcess, (PVOID)new_process_info.nt_headers->OptionalHeader.ImageBase); //Allocate memory in target process starting at replacements image base VirtualAllocEx(process_info.hProcess, (PVOID)new_process_info.nt_headers->OptionalHeader.ImageBase, new_process_info.nt_headers->OptionalHeader.SizeOfImage, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE); //Copy in PE header of replacement process WriteProcessMemory(process_info.hProcess, (PVOID)new_process_info.nt_headers->OptionalHeader.ImageBase, &new_process_info.file_data[0], new_process_info.nt_headers->OptionalHeader.SizeOfHeaders, NULL); //Write in all sections of the replacement process for(int i = 0; i < new_process_info.nt_headers->FileHeader.NumberOfSections; i++) { //Get offset of section int section_offset = new_process_info.dos_header->e_lfanew + sizeof(IMAGE_NT_HEADERS) + (sizeof(IMAGE_SECTION_HEADER) * i); new_process_info.section_header = (PIMAGE_SECTION_HEADER)(&new_process_info.file_data[section_offset]); //Write in section WriteProcessMemory(process_info.hProcess, (PVOID)(new_process_info.nt_headers->OptionalHeader.ImageBase + new_process_info.section_header->VirtualAddress), &new_process_info.file_data[new_process_info.section_header->PointerToRawData], new_process_info.section_header->SizeOfRawData, NULL); } //Get CONTEXT of main thread of suspended process, fix up EAX to point to new entry point LPCONTEXT thread_context = (LPCONTEXT)_aligned_malloc(sizeof(CONTEXT), sizeof(DWORD)); thread_context->ContextFlags = CONTEXT_FULL; GetThreadContext(process_info.hThread, thread_context); thread_context->Eax = new_process_info.nt_headers->OptionalHeader.ImageBase + new_process_info.nt_headers->OptionalHeader.AddressOfEntryPoint; SetThreadContext(process_info.hThread, thread_context); //Resume the main thread, now holding the replacement processes code ResumeThread(process_info.hThread); free(new_process_info.file_data); _aligned_free(thread_context); return 0; }

The usage is as follows:

runasprocess [process to replace] [replacement process]

Here is a program that outputs a MessageBox running as an instance of VMWare:

The 64-bit analogue to this isn’t so nice — you can’t simply use PIMAGE_NT_HEADERS64 and a 64-bit _CONTEXT structure and modify RAX since process loading is done differently under 64-bit systems. I haven’t read much about 64-bit loading so it’s definitely something for the future.

A downloadable PDF of this post can be found here.

Comments (0)

February 4, 2011

Game Hacking: Age of Empires II (Part 2/?)

Filed under: Game Hacking,Reverse Engineering — admin @ 11:26 AM

Follow me

This part will focus on how to draw player stats on the game screen. One obvious advantage to this over the previous article is that the player stats are directly in the game, thus there is no need to alt-tab out or play in windowed mode to see what the other players have. Additionally, a toggle feature will be added that allows the player to cycle through various modes to see different types of statistics on the others. The first step is to find a place where either some text, or more specifically, the players scores are drawn. The advantage to finding where the player scores are drawn is that player statistics will be drawn in a natural location and that they can correspond in color to the player. The first step is to think about how the player text is drawn on screen. There is always the player name, a colon followed by a space, and two numbers with a forward slash separating them. Searching for a format similar to this in the games referenced strings leads to what could be the right path.
A string with our desired format is found, and some other interesting strings which hint to drawing. Setting a breakpoint on where the “%s: %d/%d” string is referenced shows that it does in fact relate to the playersÂ scores. The function that uses this is called on a timer continually to update the players scores on the screen. The image below shows what values are loaded in registers on the first and second call respectively.

The general gist of how the function works is that the player whose score is to be retrieved is loaded into the EAX register. The function flow continues until the score is retrieved. The score for the next player is retrieved while the string containing the name and score of the previous player is drawn on the screen. The process then continues for the next player and restarts while the game is active. Modifying the name on a call shows that this does indeed affect how the text is drawn on the screen

This can then be taken advantage of to draw what we want. Tracing exactly where the name gets drawn leads to the call from .text:0052107D.

.text:0052104C                 mov     eax, [esp+350h+var_330]
.text:00521050                 mov     ecx, [esp+350h+var_340]
.text:00521054                 mov     edx, [esi+8]
.text:00521057                 push    eax             ; int
.text:00521058                 mov     eax, [esi+4]
.text:0052105B                 push    ecx             ; int
.text:0052105C                 mov     ecx, [esi]
.text:0052105E                 push    ebx             ; int
.text:0052105F                 push    edi             ; int
.text:00521060                 push    edx             ; int
.text:00521061                 mov     edx, [esp+364h+var_31C]
.text:00521065                 push    eax             ; int
.text:00521066                 mov     eax, [esp+368h+var_318]
.text:0052106A                 push    ecx             ; int
.text:0052106B                 push    edx             ; int
.text:0052106C                 mov     edx, [esp+370h+var_338]
.text:00521070                 lea     ecx, [esp+370h+Str1]
.text:00521077                 push    eax             ; int
.text:00521078                 push    ecx             ; Str1
.text:00521079                 mov     ecx, [edx]
.text:0052107B                 push    5               ; int
.text:0052107D                 call    sub_54A510

Immediately after this routine completes, the text is drawn on the screen. The function has 11 arguments, the second one being the player name, and third being the RGB value of the player. For the purpose of developing this portion of the hack, the other arguments are irrelevant and probably relate to the position on the screen where the text is to be drawn if I had to take a guess. The idea then is to hook .text:0054A510, grab the stats for the players name, modify the resulting string (which will be in “%s: %d/%d” format) to our custom string, and then pass this back to the original function to be drawn on the screen. The resulting code would look like

__declspec(naked) int score_update_hook(int always_five, char *player, int rgb_value, int unk1, int unk2,
    int unk3, int unk4, int unk5, int unk6, int unk7, int unk8) {
    __asm pushad
    char *name; //Placeholder for address of name buffer
    __asm {
        mov ebx, dword ptr[esp+0x28]
        mov name, ebx
    }
    stats = items_find_by_name(&base_pointers, name);
    if(stats != NULL) {
        if(toggle_option == CURRENT_RES)
            _snprintf(name, SCORE_MAX_LENGTH, "W:%1.0f  F:%1.0f  G:%1.0f  S:%1.0f\0",
            stats->player_stat->wood, stats->player_stat->food, stats->player_stat->gold,
            stats->player_stat->stone);
        else if(toggle_option == ALL_RES)
            _snprintf(name, SCORE_MAX_LENGTH, "W:%1.0f  F:%1.0f  G:%1.0f  S:%1.0f\0",
            stats->player_stat->total_wood_gathered, stats->player_stat->total_food_gathered,
            stats->player_stat->total_gold_gathered, stats->player_stat->total_stone_gathered);
        else if(toggle_option == POP_AGE)
            _snprintf(name, SCORE_MAX_LENGTH, "Pop: %1.0f/%1.0f  Vil:%1.0f  Mil:%1.0f  Age:%1.0f\0",
            stats->player_stat->pop_current, (stats->player_stat->pop_current + stats->player_stat->pop_left),
            stats->player_stat->num_villagers, stats->player_stat->num_military, stats->player_stat->current_age);
    }
    __asm {
        popad
        jmp score_update
    }
}

The actual structure of the function is abused a bit here. Since .text:0054A510 has no local variables, we can create one on the stack at [EBP-0x4], since there won’t be anything to overwrite there. This dummy argument will act as our third argument at [ESP+0x28] (this function does not set up any sort of BP-based frame). Then anything we do to this dummy argument will be reflected as a change to the third parameter. Thus, the hook grabs the player name of who is to be updated, gets their stats, and checks what mode the user wants to be displayed. The modes are currently controlled through a regular enum in toggle_options.h

typedef enum TOGGLE_OPTIONS {
    CURRENT_RES = 1,
    ALL_RES,
    POP_AGE
} toggle_options;

Future plans can be to extend this system to allow the user to script their own format to be displayed with what they want. This technique still holds on multiplayer, as shown by the screenshots below.

Multiplayer note: The hooking technique posted below is detectable by Voobly. See the end of part 1 for suggestions on bypasses.

Usage: Enter a game and hit the hotkey to enable (default is F5). Use F6 to disable the hack, F7 to toggle options, and F8 to clear the stat list in case all names were not retrieved. A player is added to the list when they perform any action in game that modifies their resources. Duplicates are not stored in the list.

I’d prefer for the hack to develop through a series of articles instead of opening up a SVN server on here since that will give me motivation to continue its development.

The source for the in-game hack DLL can be found here.

A downloadable PDF of this post can be found here.

Comments (0)

February 2, 2011

Game Hacking: Age of Empires II (Part 1/?)

Filed under: Game Hacking,Reverse Engineering — admin @ 7:06 AM

Follow me

This article will be one of an ongoing series — as I have time to write them. I’ve spent the better part of this past weekend and this week focusing on game hacking. The target was an old classic, Age of Empires II: The Conquerors.

It is an old 2D RTS game where you build up an empire, build armies, make/break alliances, and so on. For a much more in-depth explanation than I care to provide see the Wikipedia article on its predecessor. One of the interesting things that I discovered while messing around with this game is that the statistics (all resource/villager/military/… counts, current age, population, etc) counts are stored in a structure that is indexed into for the appropriate player. This wouldn’t be too interesting as a single player game, but Age of Empires is a multiplayer game, with a semi-big community of around 1200 active players. The same base classes/structures are shared between AI bots and active players, meaning there are no modifications to be made between single and multiplayer, in terms of hack development.

Finding the stat structure started off normally, by using Cheat Engine to search for the type of variable the player’s resources were stored in, and what was writing to it. Doing the usual things like building/destroying buildings, collecting resources, and making units yielded a wide variety of results. In the end, there were multiple candidates for where values are being written from, but they were narrowed down to a very good function at .text:00555470.

This is good candidate because the floating point stack is being utilized, and it was the only one like it that popped up when dropping resources off at the town center. Looking actively into this function yields clues about how it works. The important parts are reproduced below.

.text:0055544F                 mov     eax, [ecx+0A4h]
.text:00555455                 movsx   edx, si
.text:00555458                 cmp     edx, eax
.text:0055545A                 jge     loc_5554FE
.text:00555460                 mov     eax, [ecx+0A8h]
.text:00555466                 fld     [esp+4+arg_4]
.text:0055546A                 fadd    dword ptr [eax+edx*4]
.text:0055546D                 lea     eax, [eax+edx*4]
.text:00555470                 fstp    dword ptr [eax]
.text:00555472                 mov     eax, [ecx+4]
.text:00555475                 test    eax, eax
.text:00555477                 jz      loc_5554FE

This function is very interesting because it is a __thiscall and does not set up any sort of bp-based stack frame. ECX here is used without being initialized (hint for __thiscall) and the arguments are also referenced directly through the stack pointer. This is where the static code analysis ends though, and a debugger needs to be attached to follow exactly how everything behaves. OllyDbg happens to be one of the best for live code analysis.

Breakpoints set, everything can be observed as this function gets called. When I created a unit, the following values were in the registers when the first breakpoint hit.

Here ECX is the “this” pointer. It points to the base of the calling class. When the EIP is 0x0055546A, the application had the following state

1.0f was loaded into ST0. However, when I dequeued a unit, -1.0f was loaded into ST0. The second parameter is therefore some sort of flag for whether a unit is being queued or dequeued. This value is added to [EAX+EDX*4], and the result stored in the address that EAX points to. Looking at what is inside [EAX] then yields what is more or less a holy grail.

While the values may look nonsensical at first, this is actually a layout of a structure that holds a ton of statistics about a player.

The first four members of this structure correlate to the players food, wood, stone, and gold. Then comes the population left before hitting the cap, an unknown value, the current age, and so on (see player_stats.h for my listings). These values can continue to be found by setting memory breakpoints on an individual one or a region, followed by performing an action in the game and observing what breakpoints are triggered. Since the base of this structure is a member of the calling class, it can always be found at (base address of class + 0xA8), as evidenced in the disassembly of the function. Just looking at the base address of the class yields some interesting information.

In addition to having pointers to some ridiculously large structures (possibly covered in the future), the player’s name is also listed at (base address of class + 0xA8). Sidenote: The address pointing to it is different since I started another game instance between taking the two screenshots. This provides identifying information for each class that calls .text:00555470. Then the theory is that if I hook .text:00555470, I can store all of the pointers to every player in the game. This is true because this function is called by every player on the games start, and also throughout the game. Using Microsoft’s Detours library makes this extremely easy. The hook function looks as follows

__declspec(naked) void resources_changed_hook(short int res_type, float usage_type, int unused) {
	__asm {
		pushad
		mov eax, temp_pointer
		mov dword ptr[eax], ecx         //temp_pointer->base_pointer points to calling class
    }
    temp_pointer->player_name = (char*)(*(temp_pointer->base_pointer + (0x98 / sizeof(DWORD_PTR))));
    temp_pointer->player_stat = (player_stats*)(*(temp_pointer->base_pointer + (0xA8 / sizeof(DWORD_PTR))));
    if(insert(&base_pointers, temp_pointer) == true)
        temp_pointer = (item_set*)HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, sizeof(item_set));
    __asm {
		popad
		jmp resources_changed
	}
}

The stack is preserved, the address of the calling class is stored, the offsets calculated and set, and the calling function inserted into a set (no repetitions allowed).

The actual structure is a shell of the calling class and stores only the class pointer, the statistics structure, and the players name

typedef struct ITEM_SET {
	int* base_pointer;
        char* player_name;
	player_stats* player_stat;
	ITEM_SET *next;
} item_set, *pitem_set;

This can/will all be expanded and redone as more of the calling class is reverse engineered. The stored items can then be simply traversed like a list and the data for each player printed out.

void print(item_set** head) {
	if(*head == NULL)
		return;
	item_set* node_ptr = *head;
	while(node_ptr != NULL) {
        printf("Player: %s -- Wood: %1.0f - Food %1.0f - Gold: %1.0f - Stone: %1.0f\n", node_ptr->player_name,
            node_ptr->player_stat->wood, node_ptr->player_stat->food, node_ptr->player_stat->gold, node_ptr->player_stat->stone);
		node_ptr = node_ptr->next;
	}
	printf("\n");
}

Here is an example of it in action:

Using this, a player is able to modify their own stats in addition to reading the stats of others. Doing something like

while(node_ptr != NULL) {
    if(strcmp("qwerty", node_ptr->player_name) == 0) {
        node_ptr->player_stat->food = 10000.0f;
        break;
    }
    node_ptr = node_ptr->next;
}

Would work fine on single player. However, on multiplayer this would cause an out of sync error. Since each player keeps a copy of the other players information in their game, any unwarranted change would cause the game to go out of sync. However, simply reading memory will still be fine. The player_stats structure consists of 198 members with roughly half of them documented by me. Additional help is welcome.

A screenshot from a CBA game at Voobly — technique still holds.

Warning: The version posted here will result in a ban from Voobly. Their anti-cheat checks functions for hooks and will ban any user found to be modifying the functions in memory. This is still extremely easy to get around using hardware breakpoints and a different DLL injection technique with no keyboard hook — but that might be another part of this series. The risk takers who are uninitiated with those topics may want to try to attach the hooks prior to an in-game lobby launch and detach immediately upon entering the game to beat the anti-cheat scan. This is definitely not recommended however. Also, the fact that it is in a separate window makes this more of a proof of concept than an actual functional hack until further work/posts are done on the game.

Possible future articles as time permits:

Hooking internal drawing functions or DirectDraw functions to draw text on the screen

Reversing the protocol and packet structure

Bypassing Voobly’s anti-cheat system (large hints given)

The source code to the hooking DLL can be found here.

A downloadable PDF of this post can be found here.

Comments (1)

RCE Endeavors 😅

February 21, 2011

API Hooking Through Near Call Replacement

February 14, 2011

Running a (32-bit) Process in the Context of Another

February 4, 2011

Game Hacking: Age of Empires II (Part 2/?)

February 2, 2011

Game Hacking: Age of Empires II (Part 1/?)