Table of Contents:
- DLL Injection: Background & DLL Proxying (1/5)
- DLL Injection: Windows Hooks (2/5)
- DLL Injection: Remote Threads (3/5)
- DLL Injection: Thread Context Hijacking (4/5)
- DLL Injection: Manual Mapping (5/5)
Thread context hijacking is a lesser used technique that makes a tradeoff: a stealthier way to perform DLL injection, but at the cost of a more complex loader implementation. Instead of creating a new thread to load the DLL, thread context hijacking involves changing the state of an existing thread to perform the loading. This is done by first suspending all threads in the process, and then calling GetThreadContext on the target thread that will be loading your DLL. Once the context is obtained, two registers will be changed: the current instruction pointer, stored in the Rip field of the CONTEXT structure, and the current stack pointer, stored in the Rsp field.
The instruction pointer will be changed to point to an address that has a stub of assembly instructions that will be written into the process by the loader. These instructions will save the thread state, call LoadLibraryA to load your DLL, restore the thread state, and pass execution back to the address of the original instruction pointer, as if nothing had ever happened. The stack pointer will be changed to point to a newly allocated block of memory that will hold the stack space required to store the thread state, and provide room for the LoadLibraryA routine to carry out its internal functionality as well. Changing the stack pointer is not mandatory to perform the context hijacking; using the existing memory of where the thread’s stack is located at can work as well, though it can be more error prone and more difficult to debug. The following figures will visually show what will be happening to perform the thread context hijacking. Initially, a thread will be in a suspended state, with its instruction pointer at some memory location.
The loader process will retrieve the instruction pointer and stack pointer and use these two addresses to create the stub instructions. The loader will then set the instruction pointer to the start address of the stub. The stub loads the library, restores the old stack pointer, and passes control back to the original instruction pointer.
Getting the target process handle
Having seen what will happen, it is now time to implement it. The first step is to get a handle to the target process with the appropriate access rights. The loader will be performing suspend and resume operations on the target processes threads and writing into the target processes address space. This will require a different set of permissions than those required for the CreateRemoteThreadEx implementation.
HANDLE GetTargetProcessHandle(const DWORD processId) {
const auto processHandle{ OpenProcess(
PROCESS_SUSPEND_RESUME | PROCESS_VM_OPERATION |
PROCESS_VM_WRITE | PROCESS_QUERY_INFORMATION, false, processId) };
if (processHandle == nullptr) {
PrintErrorAndExit("OpenProcess");
}
return processHandle;
}
Retrieving a handle to the target process.
The PROCESS_SUSPEND_RESUME permission, as its name implies, is needed to successfully perform the suspend and resume operations on the target processes threads. As in the CreateRemoteThreadEx implementation, the other three permissions are needed to write into the target processes address space.
Suspending the process
With a handle to the process, the next step is to suspend all of the running threads. There are a couple of ways to do this: the long way and the short way. The long way involves creating a snapshot of the process’s threads, enumerating each thread in the snapshot, opening a handle to the thread, and then suspending the thread with a call to SuspendThread. Resuming each thread will involve the same series of steps, but calling ResumeThread instead. The short way is to use an undocumented native API to suspend or resume the entire process for you. Inside of ntdll.dll there are two exported functions: NtSuspendProcess and NtResumeProcess. As their names suggest, these functions will suspend or resume all threads in a process. The definitions for these two functions are provided below:
using NtSuspendProcessPtr = int(__stdcall*)(HANDLE processHandle);
using NtResumeProcessPtr = int(__stdcall*)(HANDLE processHandle);
The prototypes for NtSuspendProcess and NtResumeProcess.
Addresses to these native APIs can be retrieved from ntdll.dll via GetProcAddress.
template <typename NativeFunction>
NativeFunction GetNativeFunctionPtr(const std::string& functionName) {
const auto ntdllHandle{ GetModuleHandleA("ntdll.dll") };
if (ntdllHandle == nullptr) {
PrintErrorAndExit("GetModuleHandleA");
}
return reinterpret_cast<NativeFunction>(
GetProcAddress(ntdllHandle, functionName.c_str()));
}
A generic function to obtain a function pointer from ntdll.dll.
Retrieving the thread context
Next, the thread contexts can be retrieved. Since all of the threads are suspended, their contexts can be changed. The code below will retrieve the contexts for each thread in the target process; these contexts are returned as a pair consisting of the thread ID and the threads corresponding CONTEXT structure.
std::vector<std::pair<DWORD, CONTEXT>> GetTargetProcessThreadContexts(
const HANDLE processHandle) {
const std::shared_ptr<HPSS> snapshot(new HPSS{}, [&](HPSS* snapshotPtr) {
PssFreeSnapshot(processHandle, *snapshotPtr);
});
auto result{ PssCaptureSnapshot(processHandle,
PSS_CAPTURE_THREADS | PSS_CAPTURE_THREAD_CONTEXT,
CONTEXT_ALL, snapshot.get()) };
if (result != ERROR_SUCCESS) {
PrintErrorAndExit("PssCaptureSnapshot");
}
const std::shared_ptr<HPSSWALK> walker(new HPSSWALK{},
[&](HPSSWALK* walkerPtr) {
PssWalkMarkerFree(*walkerPtr);
});
result = PssWalkMarkerCreate(nullptr, walker.get());
if (result != ERROR_SUCCESS) {
PrintErrorAndExit("PssWalkMarkerCreate");
}
std::vector<std::pair<DWORD, CONTEXT>> threadIdWithContext{};
PSS_THREAD_ENTRY thread{};
while (PssWalkSnapshot(*snapshot, PSS_WALK_THREADS,
*walker, &thread, sizeof(thread)) == ERROR_SUCCESS) {
threadIdWithContext.push_back(std::make_pair(
thread.ThreadId, *thread.ContextRecord));
}
return threadIdWithContext;
}
The GetTargetProcessThreadContexts function retrieves the thread ID and its accompanying CONTEXT structure.
The GetTargetProcessThreadContexts function works by creating a process snapshot using PssCaptureSnapshot. The PSS_CAPTURE_THREADS and PSS_CAPTURE_THREAD_CONTEXT capture flags ensure that each thread and its CONTEXT structure will be captured in this snapshot. Once the snapshot is created, each thread in the snapshot is iterated over, with a pair of the thread ID and CONTEXT structure being saved into a vector. Once the iteration completes, the vector will contain all threads and their contexts and is returned to the caller.
Now that all thread contexts are available, you can choose which thread to hijack to run the injection code. Ideally, this should be an active thread, so that your DLL injection happens immediately. If you choose a thread that is waiting on some condition, the DLL injection may happen at a much later time, if it even happens at all. The main thread of the process is a good choice to use. As mentioned earlier, the instruction pointer (RIP) will be changed in this thread. But before changing the instruction pointer, the assembly stub to hijack the thread needs to be generated.
Generating the hijack stub
The assembly stub needs five pieces of information: the address of the temporary stack pointer that will be used, the address of the LoadLibraryA function, the absolute path of the DLL to inject, and the addresses of the old instruction pointer and stack pointer to restore the thread’s original execution state. The listing below shows the stub generation function.
auto GenerateHijackStub(
const void* const remoteStackFrameAddress,
const void* const remoteLoadLibraryAddress,
const std::string& fullModulePath,
const DWORD_PTR originalRipAddress,
const DWORD_PTR originalStackPointer) {
std::array<unsigned char, 22> hijackStubPrologue{
/* mov rsp, [remote stack pointer address] */
0x48, 0xBC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC,
/* push rax */
0x50,
/* push rcx */
0x51,
/* push rdx */
0x52,
/* push r8*/
0x41, 0x50,
/* push r9 */
0x41, 0x51,
/* push r10 */
0x41, 0x52,
/* push r11 */
0x41, 0x53,
/* pushfq */
0x9C
};
std::array<unsigned char, 27> hijackStubLoadLibrary{
/* lea rcx, [rip + module path offset] */
0x48, 0x8D, 0x0D, 0xCC, 0xCC, 0xCC, 0xCC,
/* mov rdx, LoadLibraryA address*/
0x48, 0xBA, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC,
/* sub rsp, 0x40 */
0x48, 0x83, 0xEC, 0x40,
/* call rdx */
0xFF, 0xD2,
/* add rsp, 0x40 */
0x48, 0x83, 0xC4, 0x40
};
std::array<unsigned char, 36 + MAX_PATH + 1> hijackStubEpilogue{
/* popfq */
0x9D,
/* pop r11 */
0x41, 0x5B,
/* pop r10 */
0x41, 0x5A,
/* pop r9 */
0x41, 0x59,
/* pop r8 */
0x41, 0x58,
/* pop rdx */
0x5A,
/* pop rcx */
0x59,
/* pop rax */
0x58,
/* mov rsp, [original stack pointer address] */
0x48, 0xBC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC,
/* push low word of original address*/
0x68, 0xCC, 0xCC, 0xCC, 0xCC,
/* mov [rsp+4], high word of original address*/
0xC7, 0x44, 0x24, 0x04, 0xCC, 0xCC, 0xCC, 0xCC,
/* ret */
0xC3,
/* null-terminated space for module path */
0x00
};
const auto stackFrameAddress{ reinterpret_cast<DWORD_PTR>(
remoteStackFrameAddress) + 0x40000 };
std::memcpy(&hijackStubPrologue[2], &stackFrameAddress, sizeof(DWORD_PTR));
const auto loadLibraryAddress{ reinterpret_cast<DWORD_PTR>(
remoteLoadLibraryAddress) };
const auto offsetToModuleName{ 56 };
const auto lowAddress{ static_cast<DWORD>(
originalRipAddress) & 0xFFFFFFFF };
const auto highAddress{ static_cast<DWORD>(
(originalRipAddress >> 32)) & 0xFFFFFFFF };
std::memcpy(&hijackStubLoadLibrary[3], &offsetToModuleName, sizeof(DWORD));
std::memcpy(&hijackStubLoadLibrary[9], &loadLibraryAddress, sizeof(DWORD_PTR));
std::memcpy(&hijackStubEpilogue[14], &originalStackPointer, sizeof(DWORD_PTR));
std::memcpy(&hijackStubEpilogue[23], &lowAddress, sizeof(DWORD));
std::memcpy(&hijackStubEpilogue[31], &highAddress, sizeof(DWORD));
std::memcpy(&hijackStubEpilogue[36], fullModulePath.c_str(),
fullModulePath.length());
return concatenate(hijackStubPrologue, hijackStubLoadLibrary,
hijackStubEpilogue);
}
The assembly stub generator function.
The stub is split into three parts: a prologue that sets the new stack pointer and saves the volatile registers* and flags, the core logic that performs the LoadLibraryA call, and an epilogue that restores the volatile registers and flags, sets the stack pointer to its original value, and then passes control back to the original instruction pointer.
* For brevity, only the volatile general-purpose registers are saved. A full implementation should also save the volatile floating-point registers.
You may notice that the new stack pointer is written in with an offset from its base address. This is done to give the LoadLibraryA a buffer area on the stack where it, and other internal functions that it calls, can build their stack frames. The stack pointer is also decremented prior to the LoadLibraryA call, and then incremented back to its original value afterwards. This is done to prevent the LoadLibraryA call from clobbering the saved general-purpose register values that are currently on the stack. The 0x40 offset that is used is large enough from the area on the stack where the general-purpose registers are stored that the LoadLibraryA will not overwrite them when it uses the stack to carry out its logic.
At the end of the stub, control is handed back to the original instruction pointer in a bit of an unintuitive way. The thread context needs to be the same when control is handed back, but there is no way to jump to an absolute 64-bit address without using a register. The alternative to this is to push the address on the top of the stack and execute the ret instruction. When the ret instruction executes, it will pop the value at the top of the stack and set the instruction pointer (RIP) to it. However, this approach also has a problem: x64 assembly has no way to push an immediate 64-bit value onto the stack. Fortunately, this limitation can be overcome by pushing the absolute address onto the stack in two parts: first the 32-bit low word of the address, and then by writing in the remaining high word into [Rsp+0x4], which corresponds to the top 32-bits. Now, once the ret instruction is executed, the absolute address will be popped off the top of the stack and put into the instruction pointer.
Setting the target thread’s context
With the assembly stub generated, the last part is to set the target thread’s context to point to the address of the stub.
void SetRemoteThreadContext(const DWORD threadId, const void* const newRip,
CONTEXT& context) {
auto threadHandle{ OpenThread(THREAD_SET_CONTEXT,
false, threadId) };
if (threadHandle == nullptr) {
PrintErrorAndExit("OpenThread");
}
context.Rip = reinterpret_cast<DWORD_PTR>(newRip);
auto result{ SetThreadContext(threadHandle, &context) };
if (!result) {
PrintErrorAndExit("SetThreadContext");
}
CloseHandle(threadHandle);
}
Setting a threads context to resume execution at a different instruction pointer.
To change a thread’s context, a handle to the thread must be open with the THREAD_SET_CONTEXT access right. Once this handle is obtained, changing the context is just a matter of calling the SetThreadContext function with the new context. For the context hijacking purposes, the Rip field is changed to point to the address of the assembly stub. With this last bit of functionality defined, the loader can be written.
void InjectWithHijackedThreadContext(const DWORD processId, std::string&
fullModulePath) {
const auto processHandle{ GetTargetProcessHandle(processId) };
const auto NtSuspendProcess{
GetNativeFunctionPtr<NtSuspendProcessPtr>("NtSuspendProcess") };
NtSuspendProcess(processHandle);
const auto threadContexts{
GetTargetProcessThreadContexts(processHandle) };
auto hijackThread{ threadContexts[0] };
const auto* remoteLoadLibraryAddress{ GetRemoteModuleFunctionAddress(
"kernel32.dll", "LoadLibraryA",
processId)};
const auto* remoteFullModulePathAddress{ WriteBytesToTargetProcess<char>(
processHandle, fullModulePath) };
std::array<unsigned char, 1024 * 512> remoteStackFrame{ 0xCC };
const auto* remoteStackFrameAddress{ WriteBytesToTargetProcess<unsigned char>(
processHandle, remoteStackFrame) };
auto hijackStub{ GenerateHijackStub(
remoteStackFrameAddress, remoteLoadLibraryAddress, fullModulePath,
hijackThread.second.Rip, hijackThread.second.Rsp) };
const auto* remoteHijackStub{ WriteBytesToTargetProcess<unsigned char>(
processHandle, hijackStub, true) };
SetRemoteThreadContext(hijackThread.first, remoteHijackStub,
hijackThread.second);
const auto NtResumeProcess{
GetNativeFunctionPtr<NtResumeProcessPtr>("NtResumeProcess") };
NtResumeProcess(processHandle);
}
int main(int argc, char* argv[]) {
auto fullModulePath{ GetInjectedDllPath("Ch10_GenericDll.dll") };
const auto processId{ GetTargetProcessAndThreadId(
"Untitled - Notepad").first };
InjectWithHijackedThreadContext(processId, fullModulePath);
return 0;
}
The full implementation of the loader.
The loader begins by opening a handle to the target process. Once the handle is obtained, the entire process is suspended by calling NtSuspendProcess. After the process is suspended, all thread contexts are retrieved and the first thread’s context is chosen to be modified. The assembly stub generation then begins setting up the required parameters by retrieving the address of LoadLibraryA, and allocating a block of memory in the target processes address space that will serve as the temporary stack. These two parameters, along with the full module path, original instruction pointer, and original stack pointer, are used to generate the assembly stub. Once the stub is generated by the loader, it is written into the target processes address space. Lastly, the target thread’s context is changed to point to the address where the assembly stub was written in, and NtResumeProcess is called to resume execution.
Running the demo
Note: If you are using the new UWP Notepad that is in the latest Windows version, you will need to downgrade to the classic version for the demo to work.
The ContextHijacking project provides the full implementation that was presented in this section. To test this locally, build both the GenericDll project and the ContextHijacking loader project. After a successful build, launch Notepad and then the loader application. You will see the familiar “DLL Injected!” message box pop up. Do not dismiss this message box yet. Instead, open up Process Hacker, find the notepad.exe process, and navigate to its Threads tab. Unlike the previous section example, where a new thread was created by a CreateRemoteThreadEx call, there is no new thread created.
However, looking in the Modules tab will reveal that GenericDll.dll was loaded, as shown below.
This shows that the GenericDll.dll DLL was injected into notepad.exe and the message box is executing in the context of the Notepad process.
If you are curious to see the execution of the context hijack stub, then you can trace through it with a debugger, though the steps are a bit more involved. Relaunch the Notepad application and attach to it with x64dbg by opening up x64dbg and navigating to File -> Attach from the menu bar. In the Attach dialog, enter in notepad.exe and select the Notepad process.
Make sure that x64dbg is not in a broken state and that the Notepad process is running. In Visual Studio, set a breakpoint on the NtResumeProcess call and launch the loader. When your breakpoint in Visual Studio is hit, the Notepad process will still be in a suspended state and you will have a chance to trace through the context hijack stub. Hover your cursor over the targetHijackStub variable and copy its address, as shown below.
Navigate back to x64dbg. Click in the main window showing the disassembled instructions, and press Ctrl + G to bring up the “Follow Expression” dialog.
Enter in the address of targetHijackStub in this dialog, and press Enter. This will take you to the place in memory where the context hijacking stub was allocated and written. You should see the instructions for the hijack stub. Click on the first instruction and press F2 to set a breakpoint. The address will be highlighted in red once the breakpoint is set.
Navigate back to Visual Studio and continue the execution of the loader. This will complete the NtResumeProcess call and the Notepad application will once again continue to run. Once the loader has finished execution, navigate back to x64dbg. The breakpoint that you set at the start of the context hijack stub should be hit. This will be denoted by the color of the instruction changing, as shown below:
At this point, you can continue to step through the instructions in x64dbg. When the call rdx instruction in the stub is executed, the GenericDll.dll DLL will be loaded and you will see the “DLL Injected!” message box pop up.