Home > General x86-64, Programming > An Experiment In Performing Remote Calls on x64

An Experiment In Performing Remote Calls on x64

Recently I was trying to do something more than just executing code in the context of a remote process: I wanted to call a function remotely, including supplying arguments, and have the program continue execution afterwards. What I will present in this post is what I have quickly come up with to achieve the task. There certainly are edge cases (discussed at the end) where the code will run into issues, but the general logic of it is

  • Suspend all threads in the target process. This is achieved in the code with a call to the NtSuspendProcess native API.
  • Allocate space in the process that will contain the x64 assembly code which will set up the parameters and stack to perform the call.
  • Save all registers that will be used in performing the call. The example code does not save flags, but a full implementation will want to do that as well.
  • Write in the parameters following the Windows x64 ABI (first four parameters in RCX, RDX, R8, and R9) respectively, with the rest on the stack. The caller will have to know and supply the stack offset to the other parameters.
  • Set up the trampoline to perform the call.
  • Resume the process via NtResumeProcess and let the call happen.
  • Save the result of the call and continue execution.

With that in mind, I present the example code. The code contained within this post has had the error handling taken out of it in order to save space, unlike the code in the attached zip archive at the bottom. The program will take in a process id as a decimal value and perform a remote call on it. The outline looks as follows:

#define DEFAULT_PROCESS_RIGHTS \
    PROCESS_CREATE_THREAD | PROCESS_DUP_HANDLE | PROCESS_QUERY_INFORMATION | PROCESS_SUSPEND_RESUME \
    | PROCESS_TERMINATE | PROCESS_VM_OPERATION | PROCESS_VM_READ | PROCESS_VM_WRITE
 
int main(int argc, char *argv[])
{
    if (argc != 2)
    {
        printf("Usage: %s ProcessId", argv[0]);
        return -1;
    }
 
    DWORD dwProcessId = strtoul(argv[1], nullptr, 10);
 
    HANDLE hProcess = OpenProcess(DEFAULT_PROCESS_RIGHTS, FALSE, dwProcessId);
 
    (void)GetNativeFunctions();
 
    (void)PerformRemoteMessageBoxCall(hProcess, dwProcessId);
    //(void)PerformRemoteCreateProcessACall(hProcess, dwProcessId);
 
    (void)CloseHandle(hProcess);
 
    return 0;
}

GetNativeFunctions retrieves pointers to NtSuspendProcess and NtResumeProcess. This saves the work of doing a manual implementation of traversing the thread list and suspending/resuming everything as needed.

const bool GetNativeFunctions(void)
{
    HMODULE hModule = GetModuleHandle(L"ntdll.dll");
 
    NtSuspendProcessFnc = (pNtSuspendProcess)GetProcAddress(hModule, "NtSuspendProcess");
    NtResumeProcessFnc = (pNtResumeProcess)GetProcAddress(hModule, "NtResumeProcess");
 
    return (NtSuspendProcessFnc != nullptr) && (NtResumeProcessFnc != nullptr);
}

Before presenting the function that is responsible for setting up and performing the remote call, there are a few helper functions that need to be mentioned. The way that the call will be performed is by redirecting the instruction pointer (RIP in the case of x64) to the memory region that was allocated and has had the remote call code written into it. I chose the main thread to do this, which required writing a helper function to retrieve the main thread of a process. Since there is no marker for which thread is the main thread in a process, I chose to go by thread creation time and assume that the earliest created thread is the main thread. The list of threads is retrieved through a Toolhelp snapshot and this takes place while the process is suspended, so no threads will be created or die while this snapshot is taken and the earliest thread is found. The code for this is below:

#define DEFAULT_THREAD_RIGHTS \
    THREAD_GET_CONTEXT | THREAD_SET_CONTEXT \
    | THREAD_QUERY_INFORMATION | THREAD_SET_INFORMATION \
    | THREAD_SUSPEND_RESUME | THREAD_TERMINATE
 
const DWORD GetMainThreadId(const DWORD dwProcessId)
{
    HANDLE hSnapshot = CreateToolhelp32Snapshot(TH32CS_SNAPTHREAD, dwProcessId);
 
    THREADENTRY32 threadEntry = { 0 };
    threadEntry.dwSize = sizeof(THREADENTRY32);
    (void)Thread32First(hSnapshot, &threadEntry);
 
    std::vector vecThreads;
    do
    {
        if (threadEntry.th32OwnerProcessID == dwProcessId)
        {
            vecThreads.push_back(threadEntry.th32ThreadID);
        }
    } while (Thread32Next(hSnapshot, &threadEntry));
 
    std::sort(vecThreads.begin(), vecThreads.end(),
        [](const DWORD dwFirstThreadId, const DWORD dwSecondThreadId)
        {
            FILETIME ftCreationTimeFirst = { 0 };
            FILETIME ftCreationTimeSecond = { 0 };
            FILETIME ftUnused = { 0 };
 
            //Assuming these calls will succeed.
            HANDLE hThreadFirst = OpenThread(DEFAULT_THREAD_RIGHTS, FALSE, dwFirstThreadId);
            HANDLE hThreadSecond = OpenThread(DEFAULT_THREAD_RIGHTS, FALSE, dwSecondThreadId);
 
            (void)GetThreadTimes(hThreadFirst, &ftCreationTimeFirst, &ftUnused, &ftUnused, &ftUnused);
            (void)GetThreadTimes(hThreadSecond, &ftCreationTimeSecond, &ftUnused, &ftUnused, &ftUnused);
 
            (void)CloseHandle(hThreadFirst);
            (void)CloseHandle(hThreadSecond);
 
            LONG lResult = CompareFileTime(&ftCreationTimeFirst, &ftCreationTimeSecond);
            return lResult > 0;
        });
 
    (void)CloseHandle(hSnapshot);
 
    return vecThreads.front();
}

The next two helper functions are for retrieving the context of a thread, in this case the main thread, and for changing the instruction pointer. They are straightforward and shown here only for completeness.

const CONTEXT GetContext(const DWORD dwThreadId)
{
    CONTEXT ctx = { 0 };
 
    HANDLE hThread = OpenThread(DEFAULT_THREAD_RIGHTS, FALSE, dwThreadId);
 
    ctx.ContextFlags = CONTEXT_ALL;
    (void)GetThreadContext(hThread, &ctx);
 
    (void)CloseHandle(hThread);
 
    return ctx;
}
 
const bool SetInstructionPointer(const DWORD dwThreadId, const DWORD_PTR dwAddress, CONTEXT *pContext)
{
    pContext->Rip = dwAddress;
 
    HANDLE hThread = OpenThread(DEFAULT_THREAD_RIGHTS, FALSE, dwThreadId);
 
    (void)SetThreadContext(hThread, pContext);
 
    (void)CloseHandle(hThread);
 
    return true;
}

With all of these presented, the main PerformRemoteCall function can now be shown:

const bool PerformRemoteCall(const HANDLE hProcess, const DWORD dwProcessId, const DWORD_PTR dwAddress, const DWORD_PTR *pArguments,
    const ULONG ulArgumentCount, DWORD_PTR *dwOutReturnVirtualAddress = nullptr, const DWORD dwX64StackDisplacement = 0)
{
    NTSTATUS status = NtSuspendProcessFnc(hProcess);
    if (!NT_SUCCESS(status))
    {
        printf("Could not suspend process. Last error = %X", GetLastError());
        return false;
    }
 
    LPVOID lpFunctionBase = VirtualAllocEx(hProcess, nullptr, PAGE_SIZE, MEM_RESERVE | MEM_COMMIT, PAGE_EXECUTE_READWRITE);
    if (lpFunctionBase == nullptr)
    {
        printf("Could not allocate memory for function call in process. Last error = %X", GetLastError());
        return false;
    }
 
    DWORD dwMainThreadId = GetMainThreadId(dwProcessId);
    CONTEXT ctx = GetContext(dwMainThreadId);
 
    size_t argumentsBaseIndex = 10;
    unsigned char remoteCallEntryBase[256] =
    {
        0x40, 0x57,                                                 /*push rdi*/
        0x48, 0x83, 0xEC, 0x40,                                     /*sub rsp, 0x40*/
        0x48, 0x8B, 0xFC,                                           /*mov rdi, rsp*/
        0x50,                                                       /*push rax*/
        0x51,                                                       /*push rcx*/
        0x52,                                                       /*push rdx*/
        0x41, 0x50,                                                 /*push r8*/
        0x41, 0x51,                                                 /*push r9*/
    };
    unsigned char remoteCallArgBase1stArg[] =
    {
        0x48, 0xB9, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, /*mov rcx, 0xAAAAAAAAAAAAAAAA*/
    };
    unsigned char remoteCallArgBase2ndArg[] =
    {
        0x48, 0xBA, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, /*mov rdx, 0xBBBBBBBBBBBBBBBB*/
    };
    unsigned char remoteCallArgBase3rdArg[] =
    {
        0x49, 0xB8, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, /*mov r8, 0xCCCCCCCCCCCCCCCC*/
    };
    unsigned char remoteCallArgBase4thArg[] =
    {
        0x49, 0xB9, 0xDD, 0xDD, 0xDD, 0xDD, 0xDD, 0xDD, 0xDD, 0xDD, /*mov r9, 0xDDDDDDDDDDDDDDDD*/
    };
    unsigned char remoteCallArgBaseStack[] =
    {
        0x48, 0xB8, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, /*mov rax, 0xBBBBBBBBBBBBBBBB*/
        0x48, 0x89, 0x44, 0x24, 0xFF                                /*mov qword ptr [rsp+0xFF], rax*/
    };
    unsigned char remoteCallExitBase[] =
    {
        0x48, 0xB8, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, /*mov rax, 0xBBBBBBBBBBBBBBBB*/
        0xFF, 0xD0,                                                 /*call rax*/
        0x53,                                                       /*push rbx*/
        0x48, 0xBB, 0xDD, 0xCC, 0xBB, 0xAA, 0xDD, 0xCC, 0xBB, 0xAA, /*mov rbx, 0xAABBCCDDAABBCCDD*/
        0x48, 0x81, 0xC3, 0x00, 0x04, 0x00, 0x00,                   /*add rbx, 0x400*/
        0x48, 0x89, 0x03,                                           /*mov [rbx], rax*/
        0x5B,                                                       /*pop rbx*/
        0x48, 0x83, 0xC4, 0x40,                                     /*add rsp, 0x40*/
        0x41, 0x59,                                                 /*pop r9*/
        0x41, 0x58,                                                 /*pop r8*/
        0x5A,                                                       /*pop rdx*/
        0x59,                                                       /*pop rcx*/
        0x58,                                                       /*pop rax*/
        0x5F,                                                       /*pop rdi*/
        0x68, 0xCC, 0xCC, 0xCC, 0xCC,                               /*push 0xCCCCCCCC*/
        0xC7, 0x44, 0x24, 0x04, 0xDD, 0xDD, 0xDD, 0xDD,             /*mov [rsp+4], 0xDDDDDDDD*/
        0xC3                                                        /*ret*/
    };
    unsigned char *remoteCallRegisterArguments[] =
    {
        remoteCallArgBase1stArg, remoteCallArgBase2ndArg, remoteCallArgBase3rdArg,
        remoteCallArgBase4thArg
    };
    size_t remoteCallRegisterArgumentsSize[] =
    {
        sizeof(remoteCallArgBase1stArg), sizeof(remoteCallArgBase2ndArg),
        sizeof(remoteCallArgBase3rdArg), sizeof(remoteCallArgBase4thArg)
    };
 
    DWORD_PTR dwOriginalAddress = ctx.Rip;
    DWORD_PTR dwAllocationBaseAddress = (DWORD_PTR)lpFunctionBase;
    DWORD dwLowAddress = dwOriginalAddress & 0xFFFFFFFF;
    DWORD dwHighAddress = (dwOriginalAddress == 0) ? 0 : ((dwOriginalAddress >> 32) & 0xFFFFFFFF);
 
    memset(&remoteCallEntryBase[argumentsBaseIndex], 0x90, sizeof(remoteCallEntryBase)-argumentsBaseIndex);
 
    memcpy(&remoteCallExitBase[2], &dwAddress, sizeof(DWORD_PTR));
    memcpy(&remoteCallExitBase[15], &dwAllocationBaseAddress, sizeof(DWORD_PTR));
    memcpy(&remoteCallExitBase[47], &dwLowAddress, sizeof(DWORD));
    memcpy(&remoteCallExitBase[55], &dwHighAddress, sizeof(DWORD));
 
    memcpy(&remoteCallEntryBase[sizeof(remoteCallEntryBase)-sizeof(remoteCallExitBase)],
        remoteCallExitBase, sizeof(remoteCallExitBase));
 
    if (ulArgumentCount >= 1)
    {
        memcpy(&remoteCallArgBase1stArg[2], &pArguments[0], sizeof(DWORD_PTR));
    }
    if (ulArgumentCount >= 2)
    {
        memcpy(&remoteCallArgBase2ndArg[2], &pArguments[1], sizeof(DWORD_PTR));
    }
    if (ulArgumentCount >= 3)
    {
        memcpy(&remoteCallArgBase3rdArg[2], &pArguments[2], sizeof(DWORD_PTR));
    }
    if (ulArgumentCount >= 4)
    {
        memcpy(&remoteCallArgBase4thArg[2], &pArguments[3], sizeof(DWORD_PTR));
    }
    for (unsigned long i = 0; i < min(4, ulArgumentCount); ++i)
    {
        memcpy(&remoteCallEntryBase[argumentsBaseIndex], remoteCallRegisterArguments[i], remoteCallRegisterArgumentsSize[i]);
        argumentsBaseIndex += remoteCallRegisterArgumentsSize[i];
    }
 
    unsigned char ucBaseDisplacement = dwX64StackDisplacement & 0xFF;
    for (unsigned long i = 4; i < ulArgumentCount; ++i)
    {
        memcpy(&remoteCallArgBaseStack[2], &pArguments[i], sizeof(DWORD_PTR));
        memcpy(&remoteCallArgBaseStack[14], &ucBaseDisplacement, sizeof(unsigned char));
        memcpy(&remoteCallEntryBase[argumentsBaseIndex], remoteCallArgBaseStack, sizeof(remoteCallArgBaseStack));
        argumentsBaseIndex += sizeof(remoteCallArgBaseStack);
        ucBaseDisplacement += sizeof(DWORD_PTR);
    }
 
    SIZE_T bytesWritten = 0;
    (void)WriteProcessMemory(hProcess, lpFunctionBase, remoteCallEntryBase, sizeof(remoteCallEntryBase), &bytesWritten);
    if (bytesWritten == 0 || bytesWritten != sizeof(remoteCallEntryBase))
    {
        printf("Could not write remote function code into process. Last error = %X", GetLastError());
        return false;
    }
 
    if (!SetInstructionPointer(dwMainThreadId, (DWORD_PTR)lpFunctionBase, &ctx))
    {
        return false;
    }
 
    if (dwOutReturnVirtualAddress != nullptr)
    {
        *dwOutReturnVirtualAddress = (DWORD_PTR)lpFunctionBase + 0x400;
    }
 
    status = NtResumeProcessFnc(hProcess);
 
    if (!NT_SUCCESS(status))
    {
        printf("Could not resume process. Last error = %X", GetLastError());
        return false;
    }
 
    return true;
}

The function is rather involved but works as follows

  • The process is suspended. Memory is then allocated inside of it which will hold the function that will be generated at run-time to call the target function.
  • The thread context is retrieved in order to modify the instruction pointer later.
  • A local stack frame is set up and the registers RAX, RCX, RDX, R8, and R9 are saved. The latter four are saved because they will be used as parameters, and RAX is saved because it will hold the address of the function to remotely call.
  • The values of the first four parameters are moved in to their corresponding register, (first = RCX, second = RDX, third = R8, fourth = R9).
  • Additional values are stored on the stack. Depending on the passed in stack displacement, they will be stored in the following format (0xFF will be replaced by the displacement).
mov rax, 0xBBBBBBBBBBBBBBBB
mov qword ptr [rsp+0xFF], rax
  • At the exit point of this local function stack frame, the target address is moved into the RAX register and called. Its return value is then moved into [RBX], which is the memory location that will store the result of the function call. In the example code, RBX is set to the base address of the allocated memory + 0x400 bytes.
  • The function epilogue happens and the stack is fixed up as well as the saved registers being restored.
  • A trampoline is set up to return execution to where it was prior to all of this happening.
  • All of this gets written in to the process and the instruction pointer gets set to the start of this region.
  • The process is resumed and the call is allowed to happen

It is simple to set up wrappers around this function and begin performing remote calls. Here are examples of MessageBoxA and CreateProcessA

const bool PerformRemoteMessageBoxCall(const HANDLE hProcess, const DWORD dwProcessId)
{
    HMODULE hUser32Dll = GetModuleHandle(L"user32.dll");
 
    const DWORD_PTR dwMessageBox = (DWORD_PTR)GetProcAddress(GetModuleHandle(L"user32.dll"), "MessageBoxA");
    const char strCaption[] = "Remote Title";
    const char strTitle[] = "Caption for remote MessageBoxA call";
 
    LPVOID lpMemory = VirtualAllocEx(hProcess, nullptr, PAGE_SIZE, MEM_RESERVE | MEM_COMMIT, PAGE_EXECUTE_READWRITE);
 
    SIZE_T bytesWritten = 0;
    (void)WriteProcessMemory(hProcess, lpMemory, strCaption, sizeof(strCaption), &bytesWritten);
 
    DWORD_PTR dwTitleAddress = (DWORD_PTR)lpMemory + bytesWritten;
    (void)WriteProcessMemory(hProcess, (LPVOID)dwTitleAddress, strTitle, sizeof(strTitle), &bytesWritten);
 
    DWORD_PTR dwArguments[] =
    {
        NULL,
        dwTitleAddress,
        (DWORD_PTR)lpMemory,
        MB_ICONEXCLAMATION
    };
 
    return PerformRemoteCall(hProcess, dwProcessId, dwMessageBox, &dwArguments[0], 4);
 
}
 
const bool PerformRemoteCreateProcessACall(const HANDLE hProcess, const DWORD dwProcessId)
{
    HMODULE hKernel32Dll = GetModuleHandle(L"kernel32.dll");
 
    const DWORD_PTR dwCreateProcessA = (DWORD_PTR)GetProcAddress(hKernel32Dll, "CreateProcessA");
    const char strProcessPath[] = "C://Windows//system32//notepad.exe";
 
    LPVOID lpMemory = VirtualAllocEx(hProcess, nullptr, PAGE_SIZE, MEM_RESERVE | MEM_COMMIT, PAGE_EXECUTE_READWRITE);
 
    SIZE_T bytesWritten = 0;
    (void)WriteProcessMemory(hProcess, lpMemory, strProcessPath, sizeof(strProcessPath), &bytesWritten);
 
    STARTUPINFO startupInfo = { 0 };
    startupInfo.cb = sizeof(STARTUPINFO);
    DWORD_PTR dwStartupStructAddress = (DWORD_PTR)lpMemory + bytesWritten;
    (void)WriteProcessMemory(hProcess, (LPVOID)dwStartupStructAddress, &startupInfo, sizeof(STARTUPINFO), &bytesWritten);
 
    DWORD_PTR dwArguments[] =
    {
        (DWORD_PTR)lpMemory,
        NULL,
        NULL,
        NULL,
        0,
        0,
        NULL,
        NULL,
        dwStartupStructAddress,
        dwStartupStructAddress + bytesWritten
    };
 
    return PerformRemoteCall(hProcess, dwProcessId, dwCreateProcessA, &dwArguments[0],
        sizeof(dwArguments) / sizeof(dwArguments[0]), nullptr, 0x20);
}

At run-time here is what the generated assembly code will look like for these functions.

MessageBoxA

00000000001C0000 40 57                push        rdi  
00000000001C0002 48 83 EC 40          sub         rsp,40h  
00000000001C0006 48 8B FC             mov         rdi,rsp  
00000000001C0009 50                   push        rax  
00000000001C000A 48 B9 00 00 00 00 00 00 00 00 mov         rcx,0  
00000000001C0014 48 BA 0D 00 1B 00 00 00 00 00 mov         rdx,1B000Dh  
00000000001C001E 49 B8 00 00 1B 00 00 00 00 00 mov         r8,1B0000h  
00000000001C0028 49 B9 30 00 00 00 00 00 00 00 mov         r9,30h  
00000000001C0032 90                   nop  
00000000001C0033 90                   nop
... tons more NOPs ...
00000000001C00C4 48 B8 38 31 DE 56 F8 7F 00 00 mov         rax,7FF856DE3138h  
00000000001C00CE FF D0                call        rax  
00000000001C00D0 53                   push        rbx  
00000000001C00D1 48 BB 00 00 1C 00 00 00 00 00 mov         rbx,1C0000h  
00000000001C00DB 48 81 C3 00 04 00 00 add         rbx,400h  
00000000001C00E2 48 89 03             mov         qword ptr [rbx],rax  
00000000001C00E5 5B                   pop         rbx  
00000000001C00E6 48 83 C4 40          add         rsp,40h  
00000000001C00EA 41 59                pop         r9  
00000000001C00EC 41 58                pop         r8  
00000000001C00EE 5A                   pop         rdx  
00000000001C00EF 59                   pop         rcx  
00000000001C00F0 58                   pop         rax  
00000000001C00F1 5F                   pop         rdi  
00000000001C00F2 68 AD 39 00 40       push        400039ADh  
00000000001C00F7 C7 44 24 04 01 00 00 00 mov         dword ptr [rsp+4],1  
00000000001C00FF C3                   ret

CreateProcessA

00000000004F0000 40 57                push        rdi  
00000000004F0002 48 83 EC 40          sub         rsp,40h  
00000000004F0006 48 8B FC             mov         rdi,rsp  
00000000004F0009 50                   push        rax  
00000000004F000A 48 B9 00 00 1D 00 00 00 00 00 mov         rcx,1D0000h  
00000000004F0014 48 BA 00 00 00 00 00 00 00 00 mov         rdx,0  
00000000004F001E 49 B8 00 00 00 00 00 00 00 00 mov         r8,0  
00000000004F0028 49 B9 00 00 00 00 00 00 00 00 mov         r9,0  
00000000004F0032 48 B8 00 00 00 00 00 00 00 00 mov         rax,0  
00000000004F003C 48 89 44 24 20       mov         qword ptr [rsp+20h],rax  
00000000004F0041 48 B8 00 00 00 00 00 00 00 00 mov         rax,0  
00000000004F004B 48 89 44 24 28       mov         qword ptr [rsp+28h],rax  
00000000004F0050 48 B8 00 00 00 00 00 00 00 00 mov         rax,0  
00000000004F005A 48 89 44 24 30       mov         qword ptr [rsp+30h],rax  
00000000004F005F 48 B8 00 00 00 00 00 00 00 00 mov         rax,0  
00000000004F0069 48 89 44 24 38       mov         qword ptr [rsp+38h],rax  
00000000004F006E 48 B8 23 00 1D 00 00 00 00 00 mov         rax,1D0023h  
00000000004F0078 48 89 44 24 40       mov         qword ptr [rsp+40h],rax  
00000000004F007D 48 B8 8B 00 1D 00 00 00 00 00 mov         rax,1D008Bh  
00000000004F0087 48 89 44 24 48       mov         qword ptr [rsp+48h],rax  
00000000004F008C 90                   nop  
00000000004F008D 90                   nop  
... tons more NOPs ...
00000000004F00C4 48 B8 A0 8A 61 55 F8 7F 00 00 mov         rax,7FF855618AA0h  
00000000004F00CE FF D0                call        rax  
00000000004F00D0 53                   push        rbx  
00000000004F00D1 48 BB 00 00 4F 00 00 00 00 00 mov         rbx,4F0000h  
00000000004F00DB 48 81 C3 00 04 00 00 add         rbx,400h  
00000000004F00E2 48 89 03             mov         qword ptr [rbx],rax  
00000000004F00E5 5B                   pop         rbx  
00000000004F00E6 48 83 C4 40          add         rsp,40h  
00000000004F00EA 41 59                pop         r9  
00000000004F00EC 41 58                pop         r8  
00000000004F00EE 5A                   pop         rdx  
00000000004F00EF 59                   pop         rcx  
00000000004F00F0 58                   pop         rax  
00000000004F00F1 5F                   pop         rdi  
00000000004F00F2 68 AD 39 00 40       push        400039ADh  
00000000004F00F7 C7 44 24 04 01 00 00 00 mov         dword ptr [rsp+4],1  
00000000004F00FF C3                   ret

When will this not work?

There are certainly cases where the above code to perform remote calls will not work:

  • The function uses an unusual calling convention, i.e. doesn’t clean up its own stack on x64.
  • The main thread is sleeping, blocked, or in a yielding state.

The full source relating to this can be found here.

Categories: General x86-64, Programming Tags:
  1. Lyr1k
    May 5th, 2014 at 08:54 | #1

    Nice post! A little advice: Why you use std::sort instead of std::min_element in GetMainThreadId? Obviously, std::sort runs longer than std::min_element. Or even you can perform simple check for the earlier creation time in do {} while loop (without using any STL functions and containers)

  2. admin
    May 5th, 2014 at 19:55 | #2

    std::min_element slipped my mind as I was writing up the example code. You are correct, it would be faster than std::sort. There are certainly improvements that can be made as you mention; I wrote the example code to be as straightforward to read as possible, not necessarily focusing on efficiency. Thanks for the advice, readers using/implementing this functionality should certainly take it.

  1. No trackbacks yet.