Archive

Archive for May, 2014

An Experiment In Performing Remote Calls on x64

May 4th, 2014 2 comments

Recently I was trying to do something more than just executing code in the context of a remote process: I wanted to call a function remotely, including supplying arguments, and have the program continue execution afterwards. What I will present in this post is what I have quickly come up with to achieve the task. There certainly are edge cases (discussed at the end) where the code will run into issues, but the general logic of it is

  • Suspend all threads in the target process. This is achieved in the code with a call to the NtSuspendProcess native API.
  • Allocate space in the process that will contain the x64 assembly code which will set up the parameters and stack to perform the call.
  • Save all registers that will be used in performing the call. The example code does not save flags, but a full implementation will want to do that as well.
  • Write in the parameters following the Windows x64 ABI (first four parameters in RCX, RDX, R8, and R9) respectively, with the rest on the stack. The caller will have to know and supply the stack offset to the other parameters.
  • Set up the trampoline to perform the call.
  • Resume the process via NtResumeProcess and let the call happen.
  • Save the result of the call and continue execution.

With that in mind, I present the example code. The code contained within this post has had the error handling taken out of it in order to save space, unlike the code in the attached zip archive at the bottom. The program will take in a process id as a decimal value and perform a remote call on it. The outline looks as follows:

#define DEFAULT_PROCESS_RIGHTS \
    PROCESS_CREATE_THREAD | PROCESS_DUP_HANDLE | PROCESS_QUERY_INFORMATION | PROCESS_SUSPEND_RESUME \
    | PROCESS_TERMINATE | PROCESS_VM_OPERATION | PROCESS_VM_READ | PROCESS_VM_WRITE
 
int main(int argc, char *argv[])
{
    if (argc != 2)
    {
        printf("Usage: %s ProcessId", argv[0]);
        return -1;
    }
 
    DWORD dwProcessId = strtoul(argv[1], nullptr, 10);
 
    HANDLE hProcess = OpenProcess(DEFAULT_PROCESS_RIGHTS, FALSE, dwProcessId);
 
    (void)GetNativeFunctions();
 
    (void)PerformRemoteMessageBoxCall(hProcess, dwProcessId);
    //(void)PerformRemoteCreateProcessACall(hProcess, dwProcessId);
 
    (void)CloseHandle(hProcess);
 
    return 0;
}

GetNativeFunctions retrieves pointers to NtSuspendProcess and NtResumeProcess. This saves the work of doing a manual implementation of traversing the thread list and suspending/resuming everything as needed.

const bool GetNativeFunctions(void)
{
    HMODULE hModule = GetModuleHandle(L"ntdll.dll");
 
    NtSuspendProcessFnc = (pNtSuspendProcess)GetProcAddress(hModule, "NtSuspendProcess");
    NtResumeProcessFnc = (pNtResumeProcess)GetProcAddress(hModule, "NtResumeProcess");
 
    return (NtSuspendProcessFnc != nullptr) && (NtResumeProcessFnc != nullptr);
}

Before presenting the function that is responsible for setting up and performing the remote call, there are a few helper functions that need to be mentioned. The way that the call will be performed is by redirecting the instruction pointer (RIP in the case of x64) to the memory region that was allocated and has had the remote call code written into it. I chose the main thread to do this, which required writing a helper function to retrieve the main thread of a process. Since there is no marker for which thread is the main thread in a process, I chose to go by thread creation time and assume that the earliest created thread is the main thread. The list of threads is retrieved through a Toolhelp snapshot and this takes place while the process is suspended, so no threads will be created or die while this snapshot is taken and the earliest thread is found. The code for this is below:

#define DEFAULT_THREAD_RIGHTS \
    THREAD_GET_CONTEXT | THREAD_SET_CONTEXT \
    | THREAD_QUERY_INFORMATION | THREAD_SET_INFORMATION \
    | THREAD_SUSPEND_RESUME | THREAD_TERMINATE
 
const DWORD GetMainThreadId(const DWORD dwProcessId)
{
    HANDLE hSnapshot = CreateToolhelp32Snapshot(TH32CS_SNAPTHREAD, dwProcessId);
 
    THREADENTRY32 threadEntry = { 0 };
    threadEntry.dwSize = sizeof(THREADENTRY32);
    (void)Thread32First(hSnapshot, &threadEntry);
 
    std::vector vecThreads;
    do
    {
        if (threadEntry.th32OwnerProcessID == dwProcessId)
        {
            vecThreads.push_back(threadEntry.th32ThreadID);
        }
    } while (Thread32Next(hSnapshot, &threadEntry));
 
    std::sort(vecThreads.begin(), vecThreads.end(),
        [](const DWORD dwFirstThreadId, const DWORD dwSecondThreadId)
        {
            FILETIME ftCreationTimeFirst = { 0 };
            FILETIME ftCreationTimeSecond = { 0 };
            FILETIME ftUnused = { 0 };
 
            //Assuming these calls will succeed.
            HANDLE hThreadFirst = OpenThread(DEFAULT_THREAD_RIGHTS, FALSE, dwFirstThreadId);
            HANDLE hThreadSecond = OpenThread(DEFAULT_THREAD_RIGHTS, FALSE, dwSecondThreadId);
 
            (void)GetThreadTimes(hThreadFirst, &ftCreationTimeFirst, &ftUnused, &ftUnused, &ftUnused);
            (void)GetThreadTimes(hThreadSecond, &ftCreationTimeSecond, &ftUnused, &ftUnused, &ftUnused);
 
            (void)CloseHandle(hThreadFirst);
            (void)CloseHandle(hThreadSecond);
 
            LONG lResult = CompareFileTime(&ftCreationTimeFirst, &ftCreationTimeSecond);
            return lResult > 0;
        });
 
    (void)CloseHandle(hSnapshot);
 
    return vecThreads.front();
}

The next two helper functions are for retrieving the context of a thread, in this case the main thread, and for changing the instruction pointer. They are straightforward and shown here only for completeness.

const CONTEXT GetContext(const DWORD dwThreadId)
{
    CONTEXT ctx = { 0 };
 
    HANDLE hThread = OpenThread(DEFAULT_THREAD_RIGHTS, FALSE, dwThreadId);
 
    ctx.ContextFlags = CONTEXT_ALL;
    (void)GetThreadContext(hThread, &ctx);
 
    (void)CloseHandle(hThread);
 
    return ctx;
}
 
const bool SetInstructionPointer(const DWORD dwThreadId, const DWORD_PTR dwAddress, CONTEXT *pContext)
{
    pContext->Rip = dwAddress;
 
    HANDLE hThread = OpenThread(DEFAULT_THREAD_RIGHTS, FALSE, dwThreadId);
 
    (void)SetThreadContext(hThread, pContext);
 
    (void)CloseHandle(hThread);
 
    return true;
}

With all of these presented, the main PerformRemoteCall function can now be shown:

const bool PerformRemoteCall(const HANDLE hProcess, const DWORD dwProcessId, const DWORD_PTR dwAddress, const DWORD_PTR *pArguments,
    const ULONG ulArgumentCount, DWORD_PTR *dwOutReturnVirtualAddress = nullptr, const DWORD dwX64StackDisplacement = 0)
{
    NTSTATUS status = NtSuspendProcessFnc(hProcess);
    if (!NT_SUCCESS(status))
    {
        printf("Could not suspend process. Last error = %X", GetLastError());
        return false;
    }
 
    LPVOID lpFunctionBase = VirtualAllocEx(hProcess, nullptr, PAGE_SIZE, MEM_RESERVE | MEM_COMMIT, PAGE_EXECUTE_READWRITE);
    if (lpFunctionBase == nullptr)
    {
        printf("Could not allocate memory for function call in process. Last error = %X", GetLastError());
        return false;
    }
 
    DWORD dwMainThreadId = GetMainThreadId(dwProcessId);
    CONTEXT ctx = GetContext(dwMainThreadId);
 
    size_t argumentsBaseIndex = 10;
    unsigned char remoteCallEntryBase[256] =
    {
        0x40, 0x57,                                                 /*push rdi*/
        0x48, 0x83, 0xEC, 0x40,                                     /*sub rsp, 0x40*/
        0x48, 0x8B, 0xFC,                                           /*mov rdi, rsp*/
        0x50,                                                       /*push rax*/
        0x51,                                                       /*push rcx*/
        0x52,                                                       /*push rdx*/
        0x41, 0x50,                                                 /*push r8*/
        0x41, 0x51,                                                 /*push r9*/
    };
    unsigned char remoteCallArgBase1stArg[] =
    {
        0x48, 0xB9, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, /*mov rcx, 0xAAAAAAAAAAAAAAAA*/
    };
    unsigned char remoteCallArgBase2ndArg[] =
    {
        0x48, 0xBA, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, /*mov rdx, 0xBBBBBBBBBBBBBBBB*/
    };
    unsigned char remoteCallArgBase3rdArg[] =
    {
        0x49, 0xB8, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, /*mov r8, 0xCCCCCCCCCCCCCCCC*/
    };
    unsigned char remoteCallArgBase4thArg[] =
    {
        0x49, 0xB9, 0xDD, 0xDD, 0xDD, 0xDD, 0xDD, 0xDD, 0xDD, 0xDD, /*mov r9, 0xDDDDDDDDDDDDDDDD*/
    };
    unsigned char remoteCallArgBaseStack[] =
    {
        0x48, 0xB8, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, /*mov rax, 0xBBBBBBBBBBBBBBBB*/
        0x48, 0x89, 0x44, 0x24, 0xFF                                /*mov qword ptr [rsp+0xFF], rax*/
    };
    unsigned char remoteCallExitBase[] =
    {
        0x48, 0xB8, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, /*mov rax, 0xBBBBBBBBBBBBBBBB*/
        0xFF, 0xD0,                                                 /*call rax*/
        0x53,                                                       /*push rbx*/
        0x48, 0xBB, 0xDD, 0xCC, 0xBB, 0xAA, 0xDD, 0xCC, 0xBB, 0xAA, /*mov rbx, 0xAABBCCDDAABBCCDD*/
        0x48, 0x81, 0xC3, 0x00, 0x04, 0x00, 0x00,                   /*add rbx, 0x400*/
        0x48, 0x89, 0x03,                                           /*mov [rbx], rax*/
        0x5B,                                                       /*pop rbx*/
        0x48, 0x83, 0xC4, 0x40,                                     /*add rsp, 0x40*/
        0x41, 0x59,                                                 /*pop r9*/
        0x41, 0x58,                                                 /*pop r8*/
        0x5A,                                                       /*pop rdx*/
        0x59,                                                       /*pop rcx*/
        0x58,                                                       /*pop rax*/
        0x5F,                                                       /*pop rdi*/
        0x68, 0xCC, 0xCC, 0xCC, 0xCC,                               /*push 0xCCCCCCCC*/
        0xC7, 0x44, 0x24, 0x04, 0xDD, 0xDD, 0xDD, 0xDD,             /*mov [rsp+4], 0xDDDDDDDD*/
        0xC3                                                        /*ret*/
    };
    unsigned char *remoteCallRegisterArguments[] =
    {
        remoteCallArgBase1stArg, remoteCallArgBase2ndArg, remoteCallArgBase3rdArg,
        remoteCallArgBase4thArg
    };
    size_t remoteCallRegisterArgumentsSize[] =
    {
        sizeof(remoteCallArgBase1stArg), sizeof(remoteCallArgBase2ndArg),
        sizeof(remoteCallArgBase3rdArg), sizeof(remoteCallArgBase4thArg)
    };
 
    DWORD_PTR dwOriginalAddress = ctx.Rip;
    DWORD_PTR dwAllocationBaseAddress = (DWORD_PTR)lpFunctionBase;
    DWORD dwLowAddress = dwOriginalAddress & 0xFFFFFFFF;
    DWORD dwHighAddress = (dwOriginalAddress == 0) ? 0 : ((dwOriginalAddress >> 32) & 0xFFFFFFFF);
 
    memset(&remoteCallEntryBase[argumentsBaseIndex], 0x90, sizeof(remoteCallEntryBase)-argumentsBaseIndex);
 
    memcpy(&remoteCallExitBase[2], &dwAddress, sizeof(DWORD_PTR));
    memcpy(&remoteCallExitBase[15], &dwAllocationBaseAddress, sizeof(DWORD_PTR));
    memcpy(&remoteCallExitBase[47], &dwLowAddress, sizeof(DWORD));
    memcpy(&remoteCallExitBase[55], &dwHighAddress, sizeof(DWORD));
 
    memcpy(&remoteCallEntryBase[sizeof(remoteCallEntryBase)-sizeof(remoteCallExitBase)],
        remoteCallExitBase, sizeof(remoteCallExitBase));
 
    if (ulArgumentCount >= 1)
    {
        memcpy(&remoteCallArgBase1stArg[2], &pArguments[0], sizeof(DWORD_PTR));
    }
    if (ulArgumentCount >= 2)
    {
        memcpy(&remoteCallArgBase2ndArg[2], &pArguments[1], sizeof(DWORD_PTR));
    }
    if (ulArgumentCount >= 3)
    {
        memcpy(&remoteCallArgBase3rdArg[2], &pArguments[2], sizeof(DWORD_PTR));
    }
    if (ulArgumentCount >= 4)
    {
        memcpy(&remoteCallArgBase4thArg[2], &pArguments[3], sizeof(DWORD_PTR));
    }
    for (unsigned long i = 0; i < min(4, ulArgumentCount); ++i)
    {
        memcpy(&remoteCallEntryBase[argumentsBaseIndex], remoteCallRegisterArguments[i], remoteCallRegisterArgumentsSize[i]);
        argumentsBaseIndex += remoteCallRegisterArgumentsSize[i];
    }
 
    unsigned char ucBaseDisplacement = dwX64StackDisplacement & 0xFF;
    for (unsigned long i = 4; i < ulArgumentCount; ++i)
    {
        memcpy(&remoteCallArgBaseStack[2], &pArguments[i], sizeof(DWORD_PTR));
        memcpy(&remoteCallArgBaseStack[14], &ucBaseDisplacement, sizeof(unsigned char));
        memcpy(&remoteCallEntryBase[argumentsBaseIndex], remoteCallArgBaseStack, sizeof(remoteCallArgBaseStack));
        argumentsBaseIndex += sizeof(remoteCallArgBaseStack);
        ucBaseDisplacement += sizeof(DWORD_PTR);
    }
 
    SIZE_T bytesWritten = 0;
    (void)WriteProcessMemory(hProcess, lpFunctionBase, remoteCallEntryBase, sizeof(remoteCallEntryBase), &bytesWritten);
    if (bytesWritten == 0 || bytesWritten != sizeof(remoteCallEntryBase))
    {
        printf("Could not write remote function code into process. Last error = %X", GetLastError());
        return false;
    }
 
    if (!SetInstructionPointer(dwMainThreadId, (DWORD_PTR)lpFunctionBase, &ctx))
    {
        return false;
    }
 
    if (dwOutReturnVirtualAddress != nullptr)
    {
        *dwOutReturnVirtualAddress = (DWORD_PTR)lpFunctionBase + 0x400;
    }
 
    status = NtResumeProcessFnc(hProcess);
 
    if (!NT_SUCCESS(status))
    {
        printf("Could not resume process. Last error = %X", GetLastError());
        return false;
    }
 
    return true;
}

The function is rather involved but works as follows

  • The process is suspended. Memory is then allocated inside of it which will hold the function that will be generated at run-time to call the target function.
  • The thread context is retrieved in order to modify the instruction pointer later.
  • A local stack frame is set up and the registers RAX, RCX, RDX, R8, and R9 are saved. The latter four are saved because they will be used as parameters, and RAX is saved because it will hold the address of the function to remotely call.
  • The values of the first four parameters are moved in to their corresponding register, (first = RCX, second = RDX, third = R8, fourth = R9).
  • Additional values are stored on the stack. Depending on the passed in stack displacement, they will be stored in the following format (0xFF will be replaced by the displacement).
mov rax, 0xBBBBBBBBBBBBBBBB
mov qword ptr [rsp+0xFF], rax
  • At the exit point of this local function stack frame, the target address is moved into the RAX register and called. Its return value is then moved into [RBX], which is the memory location that will store the result of the function call. In the example code, RBX is set to the base address of the allocated memory + 0x400 bytes.
  • The function epilogue happens and the stack is fixed up as well as the saved registers being restored.
  • A trampoline is set up to return execution to where it was prior to all of this happening.
  • All of this gets written in to the process and the instruction pointer gets set to the start of this region.
  • The process is resumed and the call is allowed to happen

It is simple to set up wrappers around this function and begin performing remote calls. Here are examples of MessageBoxA and CreateProcessA

const bool PerformRemoteMessageBoxCall(const HANDLE hProcess, const DWORD dwProcessId)
{
    HMODULE hUser32Dll = GetModuleHandle(L"user32.dll");
 
    const DWORD_PTR dwMessageBox = (DWORD_PTR)GetProcAddress(GetModuleHandle(L"user32.dll"), "MessageBoxA");
    const char strCaption[] = "Remote Title";
    const char strTitle[] = "Caption for remote MessageBoxA call";
 
    LPVOID lpMemory = VirtualAllocEx(hProcess, nullptr, PAGE_SIZE, MEM_RESERVE | MEM_COMMIT, PAGE_EXECUTE_READWRITE);
 
    SIZE_T bytesWritten = 0;
    (void)WriteProcessMemory(hProcess, lpMemory, strCaption, sizeof(strCaption), &bytesWritten);
 
    DWORD_PTR dwTitleAddress = (DWORD_PTR)lpMemory + bytesWritten;
    (void)WriteProcessMemory(hProcess, (LPVOID)dwTitleAddress, strTitle, sizeof(strTitle), &bytesWritten);
 
    DWORD_PTR dwArguments[] =
    {
        NULL,
        dwTitleAddress,
        (DWORD_PTR)lpMemory,
        MB_ICONEXCLAMATION
    };
 
    return PerformRemoteCall(hProcess, dwProcessId, dwMessageBox, &dwArguments[0], 4);
 
}
 
const bool PerformRemoteCreateProcessACall(const HANDLE hProcess, const DWORD dwProcessId)
{
    HMODULE hKernel32Dll = GetModuleHandle(L"kernel32.dll");
 
    const DWORD_PTR dwCreateProcessA = (DWORD_PTR)GetProcAddress(hKernel32Dll, "CreateProcessA");
    const char strProcessPath[] = "C://Windows//system32//notepad.exe";
 
    LPVOID lpMemory = VirtualAllocEx(hProcess, nullptr, PAGE_SIZE, MEM_RESERVE | MEM_COMMIT, PAGE_EXECUTE_READWRITE);
 
    SIZE_T bytesWritten = 0;
    (void)WriteProcessMemory(hProcess, lpMemory, strProcessPath, sizeof(strProcessPath), &bytesWritten);
 
    STARTUPINFO startupInfo = { 0 };
    startupInfo.cb = sizeof(STARTUPINFO);
    DWORD_PTR dwStartupStructAddress = (DWORD_PTR)lpMemory + bytesWritten;
    (void)WriteProcessMemory(hProcess, (LPVOID)dwStartupStructAddress, &startupInfo, sizeof(STARTUPINFO), &bytesWritten);
 
    DWORD_PTR dwArguments[] =
    {
        (DWORD_PTR)lpMemory,
        NULL,
        NULL,
        NULL,
        0,
        0,
        NULL,
        NULL,
        dwStartupStructAddress,
        dwStartupStructAddress + bytesWritten
    };
 
    return PerformRemoteCall(hProcess, dwProcessId, dwCreateProcessA, &dwArguments[0],
        sizeof(dwArguments) / sizeof(dwArguments[0]), nullptr, 0x20);
}

At run-time here is what the generated assembly code will look like for these functions.

MessageBoxA

00000000001C0000 40 57                push        rdi  
00000000001C0002 48 83 EC 40          sub         rsp,40h  
00000000001C0006 48 8B FC             mov         rdi,rsp  
00000000001C0009 50                   push        rax  
00000000001C000A 48 B9 00 00 00 00 00 00 00 00 mov         rcx,0  
00000000001C0014 48 BA 0D 00 1B 00 00 00 00 00 mov         rdx,1B000Dh  
00000000001C001E 49 B8 00 00 1B 00 00 00 00 00 mov         r8,1B0000h  
00000000001C0028 49 B9 30 00 00 00 00 00 00 00 mov         r9,30h  
00000000001C0032 90                   nop  
00000000001C0033 90                   nop
... tons more NOPs ...
00000000001C00C4 48 B8 38 31 DE 56 F8 7F 00 00 mov         rax,7FF856DE3138h  
00000000001C00CE FF D0                call        rax  
00000000001C00D0 53                   push        rbx  
00000000001C00D1 48 BB 00 00 1C 00 00 00 00 00 mov         rbx,1C0000h  
00000000001C00DB 48 81 C3 00 04 00 00 add         rbx,400h  
00000000001C00E2 48 89 03             mov         qword ptr [rbx],rax  
00000000001C00E5 5B                   pop         rbx  
00000000001C00E6 48 83 C4 40          add         rsp,40h  
00000000001C00EA 41 59                pop         r9  
00000000001C00EC 41 58                pop         r8  
00000000001C00EE 5A                   pop         rdx  
00000000001C00EF 59                   pop         rcx  
00000000001C00F0 58                   pop         rax  
00000000001C00F1 5F                   pop         rdi  
00000000001C00F2 68 AD 39 00 40       push        400039ADh  
00000000001C00F7 C7 44 24 04 01 00 00 00 mov         dword ptr [rsp+4],1  
00000000001C00FF C3                   ret

CreateProcessA

00000000004F0000 40 57                push        rdi  
00000000004F0002 48 83 EC 40          sub         rsp,40h  
00000000004F0006 48 8B FC             mov         rdi,rsp  
00000000004F0009 50                   push        rax  
00000000004F000A 48 B9 00 00 1D 00 00 00 00 00 mov         rcx,1D0000h  
00000000004F0014 48 BA 00 00 00 00 00 00 00 00 mov         rdx,0  
00000000004F001E 49 B8 00 00 00 00 00 00 00 00 mov         r8,0  
00000000004F0028 49 B9 00 00 00 00 00 00 00 00 mov         r9,0  
00000000004F0032 48 B8 00 00 00 00 00 00 00 00 mov         rax,0  
00000000004F003C 48 89 44 24 20       mov         qword ptr [rsp+20h],rax  
00000000004F0041 48 B8 00 00 00 00 00 00 00 00 mov         rax,0  
00000000004F004B 48 89 44 24 28       mov         qword ptr [rsp+28h],rax  
00000000004F0050 48 B8 00 00 00 00 00 00 00 00 mov         rax,0  
00000000004F005A 48 89 44 24 30       mov         qword ptr [rsp+30h],rax  
00000000004F005F 48 B8 00 00 00 00 00 00 00 00 mov         rax,0  
00000000004F0069 48 89 44 24 38       mov         qword ptr [rsp+38h],rax  
00000000004F006E 48 B8 23 00 1D 00 00 00 00 00 mov         rax,1D0023h  
00000000004F0078 48 89 44 24 40       mov         qword ptr [rsp+40h],rax  
00000000004F007D 48 B8 8B 00 1D 00 00 00 00 00 mov         rax,1D008Bh  
00000000004F0087 48 89 44 24 48       mov         qword ptr [rsp+48h],rax  
00000000004F008C 90                   nop  
00000000004F008D 90                   nop  
... tons more NOPs ...
00000000004F00C4 48 B8 A0 8A 61 55 F8 7F 00 00 mov         rax,7FF855618AA0h  
00000000004F00CE FF D0                call        rax  
00000000004F00D0 53                   push        rbx  
00000000004F00D1 48 BB 00 00 4F 00 00 00 00 00 mov         rbx,4F0000h  
00000000004F00DB 48 81 C3 00 04 00 00 add         rbx,400h  
00000000004F00E2 48 89 03             mov         qword ptr [rbx],rax  
00000000004F00E5 5B                   pop         rbx  
00000000004F00E6 48 83 C4 40          add         rsp,40h  
00000000004F00EA 41 59                pop         r9  
00000000004F00EC 41 58                pop         r8  
00000000004F00EE 5A                   pop         rdx  
00000000004F00EF 59                   pop         rcx  
00000000004F00F0 58                   pop         rax  
00000000004F00F1 5F                   pop         rdi  
00000000004F00F2 68 AD 39 00 40       push        400039ADh  
00000000004F00F7 C7 44 24 04 01 00 00 00 mov         dword ptr [rsp+4],1  
00000000004F00FF C3                   ret

When will this not work?

There are certainly cases where the above code to perform remote calls will not work:

  • The function uses an unusual calling convention, i.e. doesn’t clean up its own stack on x64.
  • The main thread is sleeping, blocked, or in a yielding state.

The full source relating to this can be found here.

Categories: General x86-64, Programming Tags:

Messing with MSN Internet Games (2/2)

May 2nd, 2014 No comments

spades
The not-too-long-awaited followup continues. This post will outline some of the internals of how the common network code residing in zgmprxy.dll works. This DLL is shared across Internet Checkers, Internet Backgammon, and Internet Spades to carry out all of the network functionality. Fortunately, or rather unfortunately from a challenge perspective, Microsoft has provided debugging symbols for zgmprxy.dll. This removes some of the challenge in finding interesting functions, but does still allow for some decent reverse engineering knowledge to actually understand how everything is working.

Starting Point

The obvious starting point for this is to load and look through the zgmproxy.pdb file provided through the Microsoft Symbol Server. There are tons of good functions to look through, but for the sake of brevity, I will be focusing on four of them here.

?BeginConnect@CStadiumSocket@@QEAAJQEAGK@Z
?SendData@CStadiumSocket@@QEAAHPEADIHH@Z
?DecryptSocketData@CStadiumSocket@@AEAAJXZ
?Disconnect@CStadiumSocket@@QEAAXXZ

Understanding how name decorations work allows for a recovery of a large amount of information, such as parameter number any types, function name and class membership information, calling convention (__thiscall for this case obviously, although I treat it as __stdcall with the “this” pointer as the first parameter in the example code), etc.

The Plan

The plan here does not change too much from what happened in the previous post:

  • Get into the address space of the target executable. Nothing here changes from last post.
  • Get the addresses of the above functions. This becomes very simple with the debug/symbol APIs provided by the WinAPI.
  • Install hooks at desired places on the functions.
  • Save off the CStadiumSocket instance so we can call functions in it at our own leisure. As an example for this post, it will be to send custom chat messages instead of the pre-selected ones offered by the games.

DllMain does not change drastically from the last revision.

int APIENTRY DllMain(HMODULE hModule, DWORD dwReason, LPVOID lpReserved)
{
        switch(dwReason)
    {
    case DLL_PROCESS_ATTACH:
        (void)DisableThreadLibraryCalls(hModule);
        if(AllocConsole())
        {
            freopen("CONOUT$", "w", stdout);
            SetConsoleTitle(L"Console");
            SetConsoleTextAttribute(GetStdHandle(STD_OUTPUT_HANDLE), FOREGROUND_RED | FOREGROUND_GREEN | FOREGROUND_BLUE);
            printf("DLL loaded.\n");
        }
        if(GetFunctions())
        {
            pExceptionHandler = AddVectoredExceptionHandler(TRUE, VectoredHandler);
            if(SetBreakpoints())
            {
                if(CreateThread(NULL, 0, DlgThread, hModule, 0, NULL) == NULL)
                    printf("Could not create dialog thread. Last error = %X\n", GetLastError());
            }
            else
            {
                printf("Could not set initial breakpoints.\n");
            }
            printf("CStadiumSocket::BeginConnect: %016X\n"
                "CStadiumSocket::SendData: %016X\n"
                "CStadiumSocket::DecryptSocketData: %016X\n"
                "CStadiumSocket::Disconnect: %016X\n",
                BeginConnectFnc, SendDataFnc, DecryptSocketDataFnc, DisconnectFnc);
        }
        break;
 
    case DLL_PROCESS_DETACH:
        //Clean up here usually
        break;
 
    case DLL_THREAD_ATTACH:
        break;
 
    case DLL_THREAD_DETACH:
        break;
    }
 
    return TRUE;
}

There are four functions now as well as a new thread which will hold a dialog to enter custom chat (discussed later). Memory breakpoints are still used and nothing has changed about how they are added. GetFunctions() has drastically changed in this revision. Instead of finding the target functions through GetProcAddress, the injected DLL can load up symbols at runtime and find the four desired functions through the use of the SymGetSymFromName64 function.

const bool GetFunctions(void)
{
    (void)SymSetOptions(SYMOPT_UNDNAME);
    if(SymInitialize(GetCurrentProcess(), "", TRUE))
    {
        IMAGEHLP_SYMBOL64 imageHelp = { 0 };
        imageHelp.SizeOfStruct = sizeof(IMAGEHLP_SYMBOL64);
 
        (void)SymGetSymFromName64(GetCurrentProcess(), "CStadiumSocket::BeginConnect", &imageHelp);
        BeginConnectFnc = (pBeginConnect)imageHelp.Address;
 
        (void)SymGetSymFromName64(GetCurrentProcess(), "CStadiumSocket::SendData", &imageHelp);
        SendDataFnc = (pSendData)imageHelp.Address;
 
        (void)SymGetSymFromName64(GetCurrentProcess(), "CStadiumSocket::DecryptSocketData", &imageHelp);
        DecryptSocketDataFnc = (pDecryptSocketData)imageHelp.Address;  
 
        (void)SymGetSymFromName64(GetCurrentProcess(), "CStadiumSocket::Disconnect", &imageHelp);
        DisconnectFnc = (pDisconnect)imageHelp.Address;
 
    }
    else
    {
        printf("Could not initialize symbols. Last error = %X", GetLastError());
    }
    return ((BeginConnectFnc != NULL) && (SendDataFnc != NULL)
        && (DecryptSocketDataFnc != NULL) && (DisconnectFnc != NULL));
}

Here symbols will be loaded with undecorated names and the target functions will be retrieved. The zgmprxy.pdb file must reside in one of the directories that SymInitialize checks, namely in one of the following:

    The current working directory of the application
    The _NT_SYMBOL_PATH environment variable
    The _NT_ALTERNATE_SYMBOL_PATH environment variable

That is really all there is in terms of large changes from last post, so it’s time to begin actually reversing these four functions.

?BeginConnect@CStadiumSocket@@QEAAJQEAGK@Z

As the function name implies, this is called to begin a connection with the matchmaking service and game. The control flow graph looks pretty straightforward, as is the functionality of BeginConnect.

msncfgFrom a cursory inspection, the function appears to be a wrapper around QueueUserWorkItem. It takes a URL and port number as input, and is responsible for initializing and formatting them in a way before launching an asynchronous task. My x64 -> C interpretation yields something similar to the following (x64 code in comment form, my C translation below). Allocation sizes were retrieved during a trace and don’t necessarily fully reflect the logic:

int CStadiumSocket::BeginConnect(wchar_t *pUrl, unsigned long ulPortNumber)
{
//.text:000007FF34FB24C7                 mov     rcx, r12        ; size_t
//.text:000007FF34FB24CA                 call    ??_U@YAPEAX_K@Z ; operator new[](unsigned __int64)
//.text:000007FF34FB24CF                 mov     rsi, rax
//.text:000007FF34FB24D2                 cmp     rax, rbx
//.text:000007FF34FB24D5                 jnz     short loc_7FF34FB24E1
    wchar_t *strPortNum = new wchar_t[32];
    if(strPortNum == NULL)
        return 0x800404DB;
 
//.text:000007FF34FB24E1                 mov     r8, r12         ; size_t
//.text:000007FF34FB24E4                 xor     edx, edx        ; int
//.text:000007FF34FB24E6                 mov     rcx, rax        ; void *
//.text:000007FF34FB24E9                 call    memset
    memset(pBuffer, 0, 32 * sizeof(wchar_t));
 
//.text:000007FF34FB24EE                 lea     r12, [rbp+3Ch]
//.text:000007FF34FB24F2                 mov     r11d, 401h
//.text:000007FF34FB24F8                 mov     rax, r12
//.text:000007FF34FB24FB                 sub     rdi, r12
//.text:000007FF34FB24FE
//.text:000007FF34FB24FE loc_7FF34FB24FE:                        ; CODE XREF: CStadiumSocket::BeginConnect(ushort * const,ulong)+77j
//.text:000007FF34FB24FE                 cmp     r11, rbx
//.text:000007FF34FB2501                 jz      short loc_7FF34FB251E
//.text:000007FF34FB2503                 movzx   ecx, word ptr [rdi+rax]
//.text:000007FF34FB2507                 cmp     cx, bx
//.text:000007FF34FB250A                 jz      short loc_7FF34FB2519
//.text:000007FF34FB250C                 mov     [rax], cx
//.text:000007FF34FB250F                 add     rax, 2
//.text:000007FF34FB2513                 sub     r11, 1
//.text:000007FF34FB2517                 jnz     short loc_7FF34FB24FE
//.text:000007FF34FB2519
//.text:000007FF34FB2519 loc_7FF34FB2519:                        ; CODE XREF: CStadiumSocket::BeginConnect(ushort * const,ulong)+6Aj
//.text:000007FF34FB2519                 cmp     r11, rbx
//.text:000007FF34FB251C                 jnz     short loc_7FF34FB2522
//.text:000007FF34FB251E
//.text:000007FF34FB251E loc_7FF34FB251E:                        ; CODE XREF: CStadiumSocket::BeginConnect(ushort * const,ulong)+61j
//.text:000007FF34FB251E                 sub     rax, 2 
    for(unsigned int i = 0; i < 1025; ++i)
    {
        m_pBuffer[i] = pUrl[i];
        if(pBuffer[i] == 0)
            break;
    }
 
//.text:000007FF34FB2522                 mov     r9d, 0Ah        ; int
//.text:000007FF34FB2528                 mov     rdx, rsi        ; wchar_t *
//.text:000007FF34FB252B                 mov     ecx, r13d       ; int
//.text:000007FF34FB252E                 lea     r8d, [r9+16h]   ; size_t
//.text:000007FF34FB2532                 mov     [rax], bx
//.text:000007FF34FB2535                 call    _itow_s
    (void)_itow_s(ulPortNumber, strPortNum, 32, 10);
 
//.text:000007FF34FB253A                 mov     [rbp+38h], r13d
//.text:000007FF34FB253E                 mov     r13d, 30h
//.text:000007FF34FB2544                 lea     rcx, [rsp+68h+var_48] ; void *
//.text:000007FF34FB2549                 mov     r8, r13         ; size_t
//.text:000007FF34FB254C                 xor     edx, edx        ; int
//.text:000007FF34FB254E                 mov     [rbp+85Ch], ebx
//.text:000007FF34FB2554                 call    memset
    char partialContextBuffer[48];
    memset(str, 0, sizeof(str));
 
//.text:000007FF34FB2559                 lea     ecx, [r13+28h]  ; size_t
//.text:000007FF34FB255D                 mov     [rsp+68h+var_44], ebx
//.text:000007FF34FB2561                 mov     [rsp+68h+var_40], 1
//.text:000007FF34FB2569                 call    ??2@YAPEAX_K@Z  ; operator new(unsigned __int64)
//.text:000007FF34FB256E                 mov     rdi, rax
//.text:000007FF34FB2571                 cmp     rax, rbx
//.text:000007FF34FB2574                 jz      short loc_7FF34FB257E
//.text:000007FF34FB2576                 mov     dword ptr [rax], 1
//.text:000007FF34FB257C                 jmp     short loc_7FF34FB2581
    char *pContextBuffer = new char[88]; 
    if(pContextBuffer == NULL)
        return 0x800404DB;
 
//.text:000007FF34FB2586                 lea     rcx, [rdi+18h]  ; void *
//.text:000007FF34FB258A                 lea     rdx, [rsp+68h+var_48] ; void *
//.text:000007FF34FB258F                 mov     r8, r13         ; size_t
//.text:000007FF34FB2592                 mov     [rdi+8], r12
//.text:000007FF34FB2596                 mov     [rdi+10h], rsi
//.text:000007FF34FB259A                 call    memmove
    *(pContextBuffer) = 1; //At 000007FF34FB2576
    *(pContextBuffer + 8) = &m_pBuffer;
    *(pContextBuffer + 16) = &strPortNum;
    memmove(&pContextBuffer[24], partialContextBuffer, 48);
 
//.text:000007FF34FB259F                 lea     r11, [rbp+0A80h]
//.text:000007FF34FB25A6                 lea     rax, [rbp+18h]
//.text:000007FF34FB25AA                 lea     rcx, ?AsyncGetAddrInfoW@CStadiumSocket@@SAKPEAX@Z ; Function
//.text:000007FF34FB25B1                 xor     r8d, r8d        ; Flags
//.text:000007FF34FB25B4                 mov     rdx, rdi        ; Context
//.text:000007FF34FB25B7                 mov     [rdi+48h], r11
//.text:000007FF34FB25BB                 mov     [rdi+50h], rax
//.text:000007FF34FB25BF                 call    cs:__imp_QueueUserWorkItem
//.text:000007FF34FB25C5                 cmp     eax, ebx
//.text:000007FF34FB25C7                 jnz     short loc_7FF34FB25D5
//.text:000007FF34FB25C9                 mov     ebx, 800404BFh
//.text:000007FF34FB25CE                 jmp     short loc_7FF34FB25D5
    if(QueueUserWorkItem(&AsyncGetAddrInfo, pContextBuffer, 0) == FALSE)
        return 0x800404BF;
 
//From success case
    return 0;
}

?SendData@CStadiumSocket@@QEAAHPEADIHH@Z

The next function to look at is the SendData function. This function formats the data to send and invokes OnASyncDataWrite to write it out. The function creates a buffer of max length 0x4010 (16400) bytes, copies in the message buffer, and appends a few fields to the end. There is some handling code in the event that the message is of a handshake type, or if it is a message that is to be queued up. Below is a mostly complete translation of the assembly.

int CStadiumSocket::SendData(char *pBuffer, unsigned int uiLength, bool bIsHandshake, bool bLastHandshake)
{
//.text : 000007FF34FB350C                 cmp     dword ptr[rcx + 0A88h], 0
//.text : 000007FF34FB3513                 mov     rax, [rcx + 840h]
//.text : 000007FF34FB351A                 mov     r13, rdx
//.text : 000007FF34FB351D                 mov     rax, [rax + 10h]
//.text : 000007FF34FB3521                 lea     rdx, aTrue; "true"
//.text : 000007FF34FB3528                 mov     rdi, rcx
//.text : 000007FF34FB352B                 mov[rsp + 58h + var_20], rax
//.text : 000007FF34FB3530                 lea     r11, aFalse; "false"
//.text : 000007FF34FB3537                 mov     ebp, r8d
//.text : 000007FF34FB353A                 mov     r10, r11
//.text : 000007FF34FB353D                 mov     rcx, r11
//.text : 000007FF34FB3540                 mov     r12d, r9d
//.text : 000007FF34FB3543                 cmovnz  r10, rdx
//.text : 000007FF34FB3547                 cmp[rsp + 58h + arg_20], 0
//.text : 000007FF34FB354F                 cmovnz  rcx, rdx
//.text : 000007FF34FB3553                 test    r9d, r9d
//.text : 000007FF34FB3556                 mov[rsp + 58h + var_28], r10
//.text : 000007FF34FB355B                 mov[rsp + 58h + var_30], rcx
//.text : 000007FF34FB3560                 cmovnz  r11, rdx
//.text : 000007FF34FB3564                 mov     r9d, r8d
//.text : 000007FF34FB3567                 lea     rcx, aCstadiumsoc_15; "CStadiumSocket::SendData:\n    BUFFER:  "...
//.text : 000007FF34FB356E                 mov     r8, r13
//.text : 000007FF34FB3571                 mov     edx, ebp
//.text : 000007FF34FB3573                 mov[rsp + 58h + var_38], r11
//.text : 000007FF34FB3578                 call ? SafeDbgLog@@YAXPEBGZZ; SafeDbgLog(ushort const *, ...)
    QueueNode *pQueueNode = m_msgQueue;
 
    char *strIsHandshake = (bIsHandshake == 0) ? "true" : "false";
    char *strPostHandshake = (m_bPostHandshake == 0) ? "true" : "false";
    char *strLastHandshake = (bLastHandshake == 0) ? "true" : "false";
 
    SafeDbgLog("CStadiumSocket::SendData:    BUFFER:    \"%*.S\"    LENGTH:    %u    HANDSHAKE: %s    LAST HS:   %s    POST HS:   %s    Queue:     %u",
        uiLength, pBuffer, uiLength, strIsHandshake, strLastHandshake, strPostHandshake, pQueueNode.Count);
 
//.text : 000007FF34FB357D                 mov     ecx, 4010h; size_t
//.text : 000007FF34FB3582                 call ? ? 2@YAPEAX_K@Z; operator new(unsigned __int64)
//.text : 000007FF34FB3587                 mov     rsi, rax
//.text : 000007FF34FB358A                 mov[rsp + 58h + arg_0], rax
//.text : 000007FF34FB358F                 test    rax, rax
//.text : 000007FF34FB3592                 jz      loc_7FF34FB36B3
//.text : 000007FF34FB3598                 mov     ebx, 4000h
//.text : 000007FF34FB359D                 xor     edx, edx; int
//.text : 000007FF34FB359F                 mov     rcx, rax; void *
//.text : 000007FF34FB35A2                 mov     r8, rbx; size_t
//.text : 000007FF34FB35A5                 call    memset
//.text : 000007FF34FB35AA                 cmp     ebp, ebx
//.text : 000007FF34FB35AC                 mov     rdx, r13; void *
//.text : 000007FF34FB35AF                 cmovb   rbx, rbp
//.text : 000007FF34FB35B3                 mov     rcx, rsi; void *
//.text : 000007FF34FB35B6                 mov     r8, rbx; size_t
//.text : 000007FF34FB35B9                 call    memmove
//.text : 000007FF34FB35BE                 and     dword ptr[rsi + 4000h], 0
//.text : 000007FF34FB35C5                 mov[rsi + 4004h], ebp
//.text : 000007FF34FB35CB                 mov[rsi + 4008h], r12d
//.text : 000007FF34FB35D2                 and     dword ptr[rsi + 400Ch], 0
    char *pFullBuffer = new char[0x4010];
    if(pFullBuffer == NULL)
    {
        return 0;
    }
 
    memset(pFullBuffer, 0, 0x4000);
 
    uiLength = (uiLength < 0x4000) ? uiLength : 0x4000;
    memmove(pFullBuffer, pBuffer, uiLength);
 
    pFullBuffer[0x4000] = 0;
    pFullBuffer[0x4004] = uiLength;
    pFullBuffer[0x4008] = bPostHandshake;
    pFullBuffer[0x400C] = 0;
 
//.text : 000007FF34FB35D9                 test    r12d, r12d
//.text : 000007FF34FB35DC                 jz      short loc_7FF34FB3658
//.text : 000007FF34FB35DE                 mov     rax, [rdi + 840h]
//.text : 000007FF34FB35E5                 mov     rbx, [rax]
//.text : 000007FF34FB35E8                 test    rbx, rbx
//.text : 000007FF34FB35EB
//.text : 000007FF34FB35EB loc_7FF34FB35EB : ; CODE XREF : CStadiumSocket::SendData(char *, uint, int, int) + 119j
//.text : 000007FF34FB35EB                 jz      short loc_7FF34FB364F
//...
//.text : 000007FF34FB364F loc_7FF34FB364F : ; CODE XREF : CStadiumSocket::SendData(char *, uint, int, int) : loc_7FF34FB35EBj
//.text : 000007FF34FB364F                 lea     rcx, aCstadiumsoc_18; "CStadiumSocket::SendData: AddTail in se"...
//.text : 000007FF34FB3656                 jmp     short loc_7FF34FB365F
//.text : 000007FF34FB3658; -------------------------------------------------------------------------- -
//.text : 000007FF34FB3658
//.text : 000007FF34FB3658 loc_7FF34FB3658 : ; CODE XREF : CStadiumSocket::SendData(char *, uint, int, int) + E8j
//.text : 000007FF34FB3658                 lea     rcx, aCstadiumsock_9; "CStadiumSocket::SendData: AddTail\n\n"
//.text : 000007FF34FB365F
//.text : 000007FF34FB365F loc_7FF34FB365F : ; CODE XREF : CStadiumSocket::SendData(char *, uint, int, int) + 162j
//.text : 000007FF34FB365F                 call ? SafeDbgLog@@YAXPEBGZZ; SafeDbgLog(ushort const *, ...)
    bool bAddTail = (!bPostHandshake || pQueueNode->Prev == NULL);
    if(!bPostHandshake)
    {
        SafeDbgLog("CStadiumSocket::SendData: AddTail\n\n");
    }
    else if(pQueueNode->Prev == NULL)
    {
        SafeDbgLog("CStadiumSocket::SendData: AddTail in search.");
    }
 
//.text : 000007FF34FB3664                 mov     rbx, [rdi + 840h]
//.text : 000007FF34FB366B                 lea     rdx, [rsp + 58h + arg_0]
//.text : 000007FF34FB3670                 mov     r8, [rbx + 8]
//.text : 000007FF34FB3674                 xor     r9d, r9d
//.text : 000007FF34FB3677                 mov     rcx, rbx
//.text : 000007FF34FB367A                 call ? NewNode@ 
//.text : 000007FF34FB367F                 mov     rcx, [rbx + 8]
//.text : 000007FF34FB3683                 test    rcx, rcx
//.text : 000007FF34FB3686                 jz      short loc_7FF34FB368D
//.text : 000007FF34FB3688 loc_7FF34FB3688 : ; CODE XREF : CStadiumSocket::SendData(char *, uint, int, int) + 149j
//.text : 000007FF34FB3688                 mov[rcx], rax
//.text : 000007FF34FB368B                 jmp     short loc_7FF34FB3690
//.text : 000007FF34FB368D; -------------------------------------------------------------------------- -
//.text : 000007FF34FB368D
//.text : 000007FF34FB368D loc_7FF34FB368D : ; CODE XREF : CStadiumSocket::SendData(char *, uint, int, int) + 192j
//.text : 000007FF34FB368D
    if(bAddTail)
    {
        QueueNode *pNewNode = ATL::CAtlList::NewNode(pQueueNode->Top, pQueueNode->Prev, pQueueNode->Next);
        if(pQueueNode->Next == NULL)
        {
            pQueueNode->Next = pNewNode;
        }
        else
        {
            pQueueNode = pNewNode;
        }        
    }
 
//.text : 000007FF34FB3690                 cmp[rsp + 58h + arg_20], 0
//.text : 000007FF34FB3698                 mov[rbx + 8], rax
//.text : 000007FF34FB369C                 mov     ebx, 1
//.text : 000007FF34FB36A1                 jz      short loc_7FF34FB36A9
//.text : 000007FF34FB36A3                 mov[rdi + 0A88h], ebx
//.text : 000007FF34FB36A9
//.text : 000007FF34FB36A9 loc_7FF34FB36A9 : ; CODE XREF : CStadiumSocket::SendData(char *, uint, int, int) + 1ADj
//.text : 000007FF34FB36A9                 mov     rcx, rdi
//.text : 000007FF34FB36AC                 call ? OnAsyncDataWrite@CStadiumSocket@@AEAAXXZ; CStadiumSocket::OnAsyncDataWrite(void)
        pQueueNode->Next = pQueueNode;
        m_bPostHandshake = bLastHandshake;
        OnASyncDataWrite();
    }
 
//.text : 000007FF34FB35EB                 jz      short loc_7FF34FB364F
//.text : 000007FF34FB35ED                 test    rbx, rbx
//.text : 000007FF34FB35F0                 jz      short loc_7FF34FB3644
//.text : 000007FF34FB35F2                 mov     rcx, [rbx + 10h]
//.text : 000007FF34FB35F6                 mov     rax, [rbx]
//.text : 000007FF34FB35F9                 test    rcx, rcx
//.text : 000007FF34FB35FC                 jz      short loc_7FF34FB3607
//.text : 000007FF34FB35FE                 cmp     dword ptr[rcx + 4008h], 0
//.text : 000007FF34FB3605                 jz      short loc_7FF34FB360F
//.text : 000007FF34FB3607
//.text : 000007FF34FB3607 loc_7FF34FB3607 : ; CODE XREF : CStadiumSocket::SendData(char *, uint, int, int) + 108j
//.text : 000007FF34FB3607                 mov     rbx, rax
//.text : 000007FF34FB360A                 test    rax, rax
//.text : 000007FF34FB360D                 jmp     short loc_7FF34FB35EB
//.text : 000007FF34FB360F; -------------------------------------------------------------------------- -
//.text : 000007FF34FB360F
//.text : 000007FF34FB360F loc_7FF34FB360F : ; CODE XREF : CStadiumSocket::SendData(char *, uint, int, int) + 111j
//.text : 000007FF34FB360F                 lea     rcx, aCstadiumsoc_28; "CStadiumSocket::SendData: InsertBefore "...
//.text : 000007FF34FB3616                 call ? SafeDbgLog@@YAXPEBGZZ; SafeDbgLog(ushort const *, ...)
    else if(bPostHandshake)
    {
        pQueueNode *pNodePtr = pQueueNode;
        while(pNodePtr->Next != NULL)
        {
            pNodePtr = pNodePtr->Next;
            if(pNodePtr.pData[0x4008] == 0)
            {
                break;
            } 
        }
        SafeDbgLog("CStadiumSocket::SendData: InsertBefore in search.");
 
//.text : 000007FF34FB361B                 mov     rsi, [rdi + 840h]
//.text : 000007FF34FB3622                 mov     r8, [rbx + 8]
//.text : 000007FF34FB3626                 lea     rdx, [rsp + 58h + arg_0]
//.text : 000007FF34FB362B                 mov     rcx, rsi
//.text : 000007FF34FB362E                 mov     r9, rbx
//.text : 000007FF34FB3631                 call ? NewNode@ 
//.text : 000007FF34FB3636                 mov     rcx, [rbx + 8]
//.text : 000007FF34FB363A                 test    rcx, rcx
//.text : 000007FF34FB363D                 jnz     short loc_7FF34FB3688
//.text : 000007FF34FB363F                 mov     [rsi], rax
//.text : 000007FF34FB3642                 jmp     short loc_7FF34FB3690
        QueueNode *pNewNode = ATL::CAtlList::NewNode(pQueueNode->Top, pQueueNode->Prev, pQueueNode->Next);
        //Follows same insertion logic, except for ->Prev. Sets handshake flag again.
        OnASyncDataWrite();
    }
}

The logic looks rather complicated, but it the overall picture is that this function is responsible for scheduling of messages leaving the network and tags them with their type (handshake or not). It allocates and writes the buffer to send out and inserts it in to the message queue, which is read by OnASyncDataWrite and sent out after adding the encryption layer. Hooking this function will allow for the filtering of messages leaving the client for purposes of logging, fuzzing/modification, or other suitable purposes.

?DecryptSocketData@CStadiumSocket@@AEAAJXZ

This function is responsible for decrypting socket data after it comes in over the network from the server. In the case that the client is sending packets, CStadiumSocket::SendData is called, which in turn calls CStadiumSocket::OnASyncDataWrite; correspondingly the reverse happens in the receive case, and a CStadiumSocket::OnASyncDataRead function calls CStadiumSocket::DecryptSocketData. The internal works of this function are not necessarily important, and I will omit my x64 -> C conversion notes. The important part is to get a pointer to the buffer that has been decrypted. Doing so will allow for monitoring of messages coming from the server and like the SendData case, allows for logging or fuzzing of incoming messages to test client robustness. Doing some runtime tracing of this function, I found a good spot to pull the decrypted data from:

//.text : 000007FF34FB3D20                 movsxd  rcx, dword ptr[rdi + 400Ch]
.text : 000007FF34FB3D27                 mov     r8d, [r12]; size_t
.text : 000007FF34FB3D2B                 mov     rdx, [r12 + 8]; void *
.text : 000007FF34FB3D30                 add     rcx, rdi; void *
.text : 000007FF34FB3D33                 call    memmove

After the call to memmove, RDX will contain the decrypted buffer, with R8 containing the size. This seems like the perfect place to set the hook, at CStadiumSocket::DecryptSocketData + 0x1C3.

?DecryptSocketData@CStadiumSocket@@AEAAJXZ

The last function to look at. What happens here is also not necessarily important for our needs; looking through the assembly, it send out a “goodbye” message, what internally is referred to as a SEC_HANDSHAKE by the application, and shuts down send operations on the socket. Messages are still received and written out to the debug log (in the event that debug logging is enabled), and the socket is fully shut down and cleaned up after nothing is left to receive. This function is only hooked if we plan on doing something across multiple games in the same program instance, e.g. we resign a game and start a new one without restarting the application. Seeing this function called allows us to know that the CStadiumSocket instance captured by CStadiumSocket::BeginConnect is no longer valid for use.

Wrapping Up

Having all of this done and analyzed, changing the vectored exception handler to hook these functions (or in the middle of a function in the case of CStadiumSocket::DecryptSocketData) is just as simple as it was in the last post:

LONG CALLBACK VectoredHandler(EXCEPTION_POINTERS *pExceptionInfo)
{
    if(pExceptionInfo->ExceptionRecord->ExceptionCode == STATUS_GUARD_PAGE_VIOLATION)
    {        
        pExceptionInfo->ContextRecord->EFlags |= 0x100;
 
        DWORD_PTR dwExceptionAddress = (DWORD_PTR)pExceptionInfo->ExceptionRecord->ExceptionAddress;
        CONTEXT *pContext = pExceptionInfo->ContextRecord;
 
        if(dwExceptionAddress == (DWORD_PTR)BeginConnectFnc)
        {
            pThisPtr = (void *)pContext->Rcx;
            printf("Starting connection. CStadiumSocket instance is at: %016X\n", pThisPtr);
        }
        else if(dwExceptionAddress == (DWORD_PTR)SendDataFnc)
        {
            DWORD_PTR *pdwParametersBase = (DWORD_PTR *)(pContext->Rsp + 0x28);
            SendDataHook((void *)pContext->Rcx, (char *)pContext->Rdx, (unsigned int)pContext->R8, (int)pContext->R9, (int)(*(pdwParametersBase)));
        }
        else if(dwExceptionAddress == (DWORD_PTR)DecryptSocketDataFnc + 0x1C3)
        {
            DecryptSocketDataHook((char *)pContext->Rdx, (unsigned int)pContext->R8);
        }
        else if(dwExceptionAddress == (DWORD_PTR)DisconnectFnc)
        {
            printf("Closing connection. CStadiumSocket instance is being set to NULL\n");
            pThisPtr = NULL;
        }
 
        return EXCEPTION_CONTINUE_EXECUTION;
    }
 
    if(pExceptionInfo->ExceptionRecord->ExceptionCode == STATUS_SINGLE_STEP)
    {
        (void)SetBreakpoints();
        return EXCEPTION_CONTINUE_EXECUTION;
    }
    return EXCEPTION_CONTINUE_SEARCH;
}

To have some fun, the injected DLL can create a dialog box for chat input and send it over to the server. The game server expects a numeric value corresponding to the allowed chat in the scrollbox, but does not do any checking on it. This allows for any arbitrary message to be sent over to the server and the player on the other side will see it. The only caveat is that spaces (0x20) characters must be converted to %20. The code is as follows

INT_PTR CALLBACK DialogProc(HWND hwndDlg, UINT uMsg, WPARAM wParam, LPARAM lParam)
{
    switch(uMsg)
    {
    case WM_COMMAND:
        switch(LOWORD(wParam))
        {
            case ID_SEND:
            {
                //Possible condition here where Disconnect is called while custom chat message is being sent.
                if(pThisPtr != NULL)
                {
                    char strSendBuffer[512] = { 0 };
                    char strBuffer[256] = { 0 };
                    GetDlgItemTextA(hwndDlg, IDC_CHATTEXT, strBuffer, sizeof(strBuffer) - 1);
 
					//Extremely unsafe example code, careful...
					for (unsigned int i = 0; i < strlen(strBuffer); ++i)
					{
						if (strBuffer[i] == ' ')
						{
							memmove(&strBuffer[i + 3], &strBuffer[i + 1], strlen(&strBuffer[i]));
							strBuffer[i] = '%';
							strBuffer[i + 1] = '2';
							strBuffer[i + 2] = '0';
						}
					}
 
                    _snprintf(strSendBuffer, sizeof(strSendBuffer) - 1,
                        "CALL Chat sChatText=%s&sFontFace=MS%%20Shell%%20Dlg&arfFontFlags=0&eFontColor=12345&eFontCharSet=1\r\n",
                        strBuffer);
 
                    SendDataFnc(pThisPtr, strSendBuffer, (unsigned int)strlen(strSendBuffer), 0, 1);
                }
            }
                break;
        }
    default:
        return FALSE;
    }
    return TRUE;
}
 
DWORD WINAPI DlgThread(LPVOID hModule)
{
    return (DWORD)DialogBox((HINSTANCE)hModule, MAKEINTRESOURCE(DLG_MAIN), NULL, DialogProc);
}

Here is an example of it at work:
customchat

Additional Final Words

Some other fun things to mess with:

  • Logging can be enabled by patching out
.text : 000007FF34FAB6FA                 cmp     cs : ? m_loggingEnabled@@3_NA, 0; bool m_loggingEnabled
.text : 000007FF34FAB701                 jz      short loc_7FF34FAB77E

and creating a “LoggingEnabled” expandable string registry key at HKEY_CURRENT_USER/Software/Microsoft/zone.com. The logs provide tons of debug output about the internal state changes of the application, e.g.

[Time: 05-01-2014 21:48:59.253]
CStadiumProxyBase::SetInternalState:
    OLD STATE:    0 (IST_NOT_CONNECTED)
    NEW STATE:    2 (IST_JOIN_PENDING)
    NEW STATUS:   1 (STADIUM_CONNECTION_CONNECTING)
    LIGHT STATUS: 0 (STADIUM_CONNECTION_NOT_CONNECTED)
    m_pFullState:    0x00000000
  • The values in the ZS_PublicELO and ZS_PrivateELO tags can be modified to be much higher values. If you do this on two clients you are guaranteed a match against yourself, unless someone else is also doing this.
  • The games have some cases where they do not perform full santization of game state, so making impossible moves is sometimes allowed.

The full source code relating to this can be found here.