Programming « RCE Endeavors

April 4, 2015

Hiding Functionality with Exception Handlers (2/2)

Filed under: General x86,Programming,Reverse Engineering — admin @ 1:49 PM

This post will cover the second part of hiding functionality with exception handlers. Unlike the technique presented in the previous post, which modified the SEH record for the local thread, the aim here is to modify the SEH record for another thread in order to better hide what is actually going on. By the end of the post, there should be enough information to put together a working application capable of modifying the SEH list of any thread (barring some exceptions) and causing it to raise an exception to execute your code. The sample application will be a DLL that is injected into a process and hijacks one of its threads to perform some task.

What is the purpose of doing all of this if you’re injecting into a process anyway? After all, you can simply spawn your own thread or likely use the one created during the injection (if CreateRemoteThread was used) and just begin executing your code. I’d argue that this technique gives more obscurity to what is happening during static analysis and is something out of the norm. Plus its fun!

The overall code is very similar to what the first part showed, but now there need to be a few steps added in order to get the TIB of another thread. There are usually a few different approaches, of varying complexity and reliability.

Do it directly. Suspend the thread and gets its context. Change the instruction pointer to point to your code which changes the SEH list and raises an interrupt and resume. Perform your task and restore the original context in your SEH handler.
Do it indirectly. Suspend the thread, queue an asynchronous procedure call (APC) which changes the SEH list and raises an interrupt (with QueueUserAPC), and resume the thread. The thread must be in an alertable state (waiting on something) for this to work, which is typically the case for most threads in a process.
Take the middle ground. Suspend the thread and get the address of its FS segment directly using GetThreadSelectorEntry. Change the SEH list from within your thread and queue an APC to raise the interrupt, resume the thread.

The easiest approach is to do it indirectly with an APC. The code is really straightforward and looks like the following:

void InstallExceptionHandler(DWORD dwThreadId)
{
    auto handle = ThreadHandleTable[dwThreadId];
 
    DWORD dwError = SuspendThread(handle);
    if (dwError == -1)
    {
        fprintf(stderr, "Could not suspend thread. Error = %X.\n",
            GetLastError());
        return;
    }
 
    CONTEXT ctx = { CONTEXT_ALL };
    GetThreadContext(handle, &ctx);
    LDT_ENTRY ldtEntry = { 0 };
 
    GetThreadSelectorEntry(handle, ctx.SegFs, &ldtEntry);
    const DWORD dwFSAddress =
        (ldtEntry.HighWord.Bits.BaseHi << 24) |
        (ldtEntry.HighWord.Bits.BaseMid << 16) |
        (ldtEntry.BaseLow);
 
    fprintf(stderr, "FS segment address of target thread should be: %X.\n",
        dwFSAddress);
 
    dwError = QueueUserAPC(APCProc, handle, 0);
    if (dwError == 0)
    {
        fprintf(stderr, "Could not queue APC to thread. Error = %X.\n",
            GetLastError());
    }
 
    dwError = ResumeThread(handle);
    if (dwError == -1)
    {
        fprintf(stderr, "Could not resume thread. Error = %X.\n",
            GetLastError());
    }
}

Here the suspend/queue/resume wording is put directly in to code (with extra debug comments). When the thread resumes, APCProc will be invoked. APCProc will be running in the context of the target thread and is responsible for modifying the SEH list to add in a new handler. Because of this, APCProc can obtain the TIB without any extra overhead code to write and the code basically becomes a copy/paste from part one.

void CALLBACK APCProc(ULONG_PTR dwParam)
{
    fprintf(stderr, "APC callback invoked. Raising exception to trigger exception handler.\n");
 
    EXCEPTION_REGISTRATION *pHandlerBase = (EXCEPTION_REGISTRATION *)__readfsdword(0x18);
 
    fprintf(stderr, "Segment address of target thread: %X.\n", pHandlerBase);
 
    EXCEPTION_REGISTRATION NewHandler = { pHandlerBase->pPrevHandler,
        (EXCEPTION_REGISTRATION::pFncHandler)(MyTestHandler) };
 
    pHandlerBase->pPrevHandler = &NewHandler;
 
    RaiseException(STATUS_ACCESS_VIOLATION, 0, 0, nullptr);
}

The handler, NewHandler, being independent of all of this, doesn’t change much either.

EXCEPTION_DISPOSITION __cdecl MyTestHandler(EXCEPTION_RECORD *pExceptionRecord, void *pEstablisherFrame,
    CONTEXT *pContextRecord, void *pDispatcherContext)
{
    if (pExceptionRecord->ExceptionCode == STATUS_ACCESS_VIOLATION)
    {
        MessageBox(0, L"Some hidden functionality can go here.",
            L"Test", 0);
        return ExceptionContinueExecution;
    }
 
    return ExceptionContinueSearch;
}

Below are some screenshots of this at work on a 32-bit Notepad++ instance.

Thread 5504 is chosen here.
The MessageBox in the exception handler successfully pops ups. Hitting the “OK” button resumes execution as normal.

The source for the projects (Visual Studio 2013, Update 4) presented in these parts can be found here. Thanks for reading and follow on Twitter for more updates.

Follow me

Comments (0)

April 3, 2015

Hiding Functionality with Exception Handlers (1/2)

Filed under: General x86,Programming,Reverse Engineering — admin @ 9:56 AM

Follow me

This post will cover the topic of hiding functionality by taking advantage of the OS-supported exception handling provided by Windows. Namely, it will cover Structured Exception Handling (SEH), and how it can be utilized to obscure control flow at runtime and how it can make it more difficult to perform static analysis on a binary. Only the relevant parts of SEH will be covered here; the full details can be found on the MSDN page. Due to the differences in exception handling between Windows on x86 and x64, the general technique presented and accompanying code is relevant on x86 only. The code presented here will also discuss how to manually add exception records, without the use of the SetUnhandledExceptionFilter API. This technique as been seen in PE protectors, anti-intrusion bypass systems, and malware. As always, the material presented is for educational and research purposes; don’t do anything dumb/criminal with it.

Structured exception handling is best demonstrated through the use of the Microsoft extensions to C++ exception handling, namely using __try and __except statements. For example, the following code employs use of SEH:

__try
{
    printf("Hello, World!\n");
    int *pNull = nullptr;
    *pNull = 0x0BADC0DE;
}
__except (GetExceptionCode() == EXCEPTION_ACCESS_VIOLATION)
{
    printf("In exception handler.\n");
}

Using SEH, the access violation arising from the null pointer dereference will be caught by the user defined handler; something not possible in standard C++. How this works is that at the beginning of the function, the compiler sets up the exception frame for this code. Viewing the disassembly for the function, it becomes more apparent how this happens.

00B21003 6A FF                push        0FFFFFFFFh  
00B21005 68 18 25 B2 00       push        0B22518h  
00B2100A 68 AC 10 B2 00       push        0B210ACh  
00B2100F 64 A1 00 00 00 00    mov         eax,dword ptr fs:[00000000h]  
00B21015 50                   push        eax  
00B21016 64 89 25 00 00 00 00 mov         dword ptr fs:[0],esp

This code appears confusing at first, but can be cleared up by reading the crash course explanation page linked above. The code begins by pushing three values onto the stack. The two items in green will be ignored in the explanation and correspond to values in the exception record: the scope table and the try level. There are some obfuscation tricks to manipulating the scope table that can be done, but they won’t be discussed in this post. The full explanation of these fields and their purpose can be found on the crash course page. The next value, 0x0B210AC is an important one. Following this through in a debugger leads to the symbol __except_handler4.

This is the topmost handler of the exception chain and begins dispatching the exception. SEH works in such a way that the topmost exception handler in the chain is called and has a chance to handle the exception. If the exception is not handled, then the next entry in the exception chain is called until the exception is either handled or the final exception handler is called and the program aborts with an unhandled exception.

Afterwards, the value in FS:[0] is moved into the EAX register. FS:[0] contains the base address of a special Windows structure called the Thread Information Block (TIB). Among other things of interest, this structure contains a pointer to the current SEH frame at its base (+0x0). This value is then pushed onto the stack and the stack pointer at ESP is moved into FS:[0]. What is happening here is that an exception record structure is getting constructed on the stack and is being stored at the head of the SEH list. This allows for proper stack unwinding and exception handler call order in the event of an exception. The format of the exception record is documented on the crash course page and can be converted to a structure, with the irrelevant fields omitted, as follows:

typedef struct _EXCEPTION_REGISTRATION
{
    using pFncHandler = void (__cdecl*)(EXCEPTION_RECORD *, _EXCEPTION_REGISTRATION *,
        CONTEXT *, EXCEPTION_RECORD *);
 
    struct _EXCEPTION_REGISTRATION *pPrevHandler;
    pFncHandler pHandler;
 
    //Missing fields here:
    //Scope table
    //Try level
    //EBP
} EXCEPTION_REGISTRATION, *PEXCEPTION_REGISTRATION;

Now knowing the layout of these exception records and where to find them in memory, it is rather straightforward to modify the list. The steps are as follows:

Get the address of the TIB through the FS segment
Get a pointer to the current SEH frame from the TIB
Replace the head of the current SEH frame with a custom handler

Put into code, it looks like the following:

#include <cstdio>
#include <Windows.h>
 
typedef struct _EXCEPTION_REGISTRATION
{
    using pFncHandler = void (__cdecl *)(EXCEPTION_RECORD *, _EXCEPTION_REGISTRATION *,
        CONTEXT *, EXCEPTION_RECORD *);
 
    struct _EXCEPTION_REGISTRATION *pPrevHandler;
    pFncHandler pHandler;
 
} EXCEPTION_REGISTRATION, *PEXCEPTION_REGISTRATION;
 
//Base of TIB structure but we only care about exception chain.
EXCEPTION_REGISTRATION *pHandlerBase = (EXCEPTION_REGISTRATION *)__readfsdword(0x18);
 
EXCEPTION_DISPOSITION __cdecl MyTestHandler(EXCEPTION_RECORD *pExceptionRecord, void *pEstablisherFrame,
    CONTEXT *pContextRecord, void *pDispatcherContext)
{
    printf("Hello, World!\n");
 
    return ExceptionContinueExecution;
}
 
int main(int argc, char *argv[])
{
    fprintf(stderr, "TIB Base (Pointer to current SEH Frame): %p.\n", pHandlerBase);
 
    EXCEPTION_REGISTRATION NewHandler = { pHandlerBase->pPrevHandler,
        (EXCEPTION_REGISTRATION::pFncHandler)(MyTestHandler) };
 
    //Actually the pointer to first exception handler
    pHandlerBase->pPrevHandler = &NewHandler;
 
    RaiseException(0, 0, 0, nullptr);
 
    return 0;
}

Here a new handler, MyTestHandler, is added to the SEH chain. It gets invoked on the RaiseException call and tells the program to continue execution after printing out a string. Looking at the disassembly, there were no exception records generated for the code since it doesn’t use SEH, so the RaiseException call will appear to go to the unhandled exception filter and crash the application. However, the installation of the handler at runtime through the TIB prevents this and actually results in a call to somewhere unexpected. In addition to adding a new handler, it is also possible to replace an existing one.

Replacing entries in the SEH chain works on a per-thread basis. If the SEH list is modified on one thread and another thread raises an exception, the new SEH handler will not be called. Replacing SEH handlers for arbitrary threads and dispatching exceptions to run in their context will be the topic of the next post.

Comments (2)

March 28, 2015

Thoughts on Modern C++

Filed under: Programming — admin @ 1:49 PM

Follow me

I’ve recently finished reading Effective Modern C++, which is the continuation of the “Effective C++” series for C++11/14. The book covered most of the newer features of modern C++ along with sufficient code examples to show their usage and applications. Overall, I’m pretty excited to see some of these features getting adopted in current and future C++ code bases. The motivation for a lot of these features naturally stemmed from the reasons of performance and efficiency, as is to be expected with anything C++ related.

Features such as auto, nullptr, constexpr, alias declarations, override, default/delete declarations, lambdas, smart pointers, and other features allow for smaller, cleaner, and in the case of some, less bug-prone code. At the same time, type traits, noexcept, rvalue/universal references and their move/forward semantics, allow for more efficient code generation and run-time performance gains.

However, some of these features are not without their pitfalls. For example, the auto keyword has some tricky pitfalls due to the rules of modern C++ type deduction.

    int x = 123;
    auto y{ 123 };
 
    std::cout
        << "type of x: " << typeid(x).name()
        << std::endl
        << "type of y " << typeid(y).name()
        << std::endl;

the output of the following code is

type of x: int
type of y class std::initializer_list < int >

and

    std::vector boolVec = { true, false, false, true, true, false };
 
    bool bSecondElem = boolVec[1];
    auto bSecondElemAuto = boolVec[1];
 
    std::cout
        << "type of bSecondElem: " << typeid(bSecondElem).name()
        << std::endl
        << "type of bSecondElemAuto " << typeid(bSecondElemAuto).name()
        << std::endl;

outputs

type of bSecondElem: bool
type of bSecondElemAuto class std::_Vb_reference < struct std::_Wrap_alloc  > >

These were explained away by stating the different rules of auto type deduction versus template type dedication the beginning of the book, or the pitfalls of auto type deduction when dealing with classes such as std::vector < bool >, which save space by storing a bit per item and provide a reference to a proxy object when their operator[] is invoked. There are other notable edge cases, such as passing arguments through braced initializers to forwarding templates (Item 30, along with other issues), possible problems of dangling references from using default-capture lambdas (Item 31), and others. Even given these, I’m still excited to use these features (where appropriate) and see the benefits that modern C++ brings.

Comments (0)

March 26, 2015

Malware Techniques: Code Streaming

Filed under: General x86,General x86-64,Programming — admin @ 8:59 PM

Follow me

This quick post will cover the topic of code streaming. For example, take malware. One way for malware to hide and persist on a system is to not contain any malicious code. This is done by getting the malicious payload through an external source, such as a direct request to a web server, a Twitter/social media post, a Pastebin, or any other common mechanism. This code, usually encrypted or obfuscated in some way, is then mapped in to the malicious process and executed. After execution, the memory region is cleaned up and reused or reallocated in order to carry out further malicious functionality. The code for this functionality looks pretty straightforward:

MemoryExecutor::MemoryExecutor(const size_t ulAllocSize)
    : m_ulAllocSize{ ulAllocSize }
{
    m_pMemory = std::unique_ptr(new char[ulAllocSize]);
}
 
const bool MemoryExecutor::MapToRegion(const char * const pBytes, const size_t ulSize)
{
    if (ulSize > m_ulAllocSize)
    {
        m_ulAllocSize = ulSize;
        m_pMemory = std::unique_ptr((char *)std::realloc(m_pMemory.get(), m_ulAllocSize));
        if (m_pMemory.get() == nullptr)
        {
            return false;
        }
    }
 
#ifdef DEBUG
    std::memset(m_pMemory.get(), 0xCC, ulSize);
#endif
 
    std::memcpy(m_pMemory.get(), pBytes, ulSize);
 
    DWORD_PTR dwOldProtect = 0;
    return BOOLIFY(VirtualProtect(m_pMemory.get(), ulSize, PAGE_EXECUTE_READWRITE, (PDWORD)dwOldProtect));
}
 
void MemoryExecutor::ExecuteRegion()
{
    using pFnc = void (*)();
    pFnc pRuntimeFunction = (pFnc)m_pMemory.get();
    pRuntimeFunction();
 
    memset(m_pMemory.get(), 0, m_ulAllocSize);
    m_pMemory.release();
}

with the intention that pBytes in MapToRegion contains the malicious buffer. However, there are a few issues that come up, such as how to make WinAPI calls. The three solutions that I’ve seen to this come up in the wild are

Map position-independent shellcode that traverses the DLL list and manually implements GetProcAddress. This is done by accessing the PEB structure created for each process. The PEB structure contains a pointer to a PEB_LDR_DATA structure, which in turn contains three lists: load order, memory order, and initialization order. These three lists contain all of the DLLs loaded in to the process via their base address. Once a base address for the desired DLL is obtained by traversing the list, it is possible to find its export section and traverse the export table. For an x86 assembly implementation, see here. This technique, in a mix of x86 and C, was also used by me in demonstrating how to write a file packer here.
Set up registers/arguments and perform the native syscall. For example, the implementation of NtTerminateProcess on x64 looks like:

NtTerminateProcess:
00007FF998AE1040 4C 8B D1             mov         r10,rcx  
00007FF998AE1043 B8 2B 00 00 00       mov         eax,2Bh  
00007FF998AE1048 0F 05                syscall  
00007FF998AE104A C3                   ret

where the code in red is the syscall number. ThisÂ approach is pretty volatile because syscall numbers can change across different Windows versions.

Get the addresses of the DLLs from within the malware, via GetModuleHandle, and fix up the addresses manually when they’re mapped. It’s pretty sloppy, but I’ve seen it before.

As far as stealth goes, something like the code above is pretty easy to detect. The idea of code executing off the heap (after allocating and changing the page permissions) does set off the red flags. Other implementations that I’ve seen have been to

Allocate executable pages upfront with VirtualAlloc. This is basically the same thing as above.
Locate empty blocks of memory in executable pages and map the code there. These empty blocks of memory usually occur due to alignment reasons in the code and can be exploited to store the malicious functionality. This approach is pretty convenient since the memory page(s) will already have the appropriate permissions, and when executed, won’t look as suspicious as when executing off the heap.

Follow me

Comments (0)

January 15, 2015

Virtual Method Table (VMT) Hooking

Filed under: Game Hacking,General x86,General x86-64,Programming — admin @ 1:39 PM

Follow me

This post will cover the topic of hooking a classes’ virtual method table. This is a useful technique that has many applications, but is most commonly seen in developing game hacks. For example, employing VMT hooking of objects in a Direct3D/OpenGL graphics engine is how in-game overlays are displayed.

Virtual Method Tables (or vtables)

Usage of VMTs, in the context of C++ for this post, is how polymorphism is implemented at the language level. Internally, the VMT is represented as an array of function pointers, and typically resides at the beginning or end of the memory layout of the object. Whenever a C++ class declares a virtual function, the compiler will add an entry in to the VMT for it. If a class inherits from a base object and overrides a base virtual function, then the pointer to the overriden function will be present in the derived objects VMT. For example, take the following code, compiled with the VS 2013 compiler on an x86 system:

class Base
{
public:
    Base() { printf("-  Base::Base\n"); }
    virtual ~Base() { printf("-  Base::~Base\n"); }
 
    void A() { printf("-  Base::A\n"); }
    virtual void B() { printf("-  Base::B\n"); }
    virtual void C() { printf("-  Base::C\n"); }
};
 
class Derived final : public Base
{
public:
    Derived() { printf("-  Derived::Derived\n"); }
    ~Derived() { printf("-  Derived::~Derived\n"); }
 
    void B() override { printf("-  Derived::B\n"); }
    void C() override { printf("-  Derived::C\n"); }
};

with the instances of Base and Derived created as follows:

Base base;
Derived derived;
Base *pBase = new Derived;

The class Base has three virtual functions: ~Base, B, and C. The class Derived, which inherits from Base overrides the two virtual functions B and C. In memory, the VMT for Base will contain ~Base, B, and C, as can be inspected with the debugger:

while the VMT for the two Derived instances contain ~Derived, B, and C, but with different addresses for each than the ones in Base (see below).

So how are these actually used? Take, for example, a function that takes a pointer to a Base instance and invokes the functions A, B, and C, on it:

void Invoke(Base * const pBase)
{
    pBase->A();
    pBase->B();
    pBase->C();
}

and is invoked in the following manner:

    Invoke(&base);
    Invoke(&derived);
    Invoke(pBase);

The Invoke function disassembled for x86 is as follows:

    pBase->A();
004012C9 8B 4D 08             mov         ecx,dword ptr [pBase]  
004012CC E8 8F FE FF FF       call        Base::A (0401160h)  
    pBase->B();
004012D1 8B 45 08             mov         eax,dword ptr [pBase]  
004012D4 8B 10                mov         edx,dword ptr [eax]  
004012D6 8B 4D 08             mov         ecx,dword ptr [pBase]  
004012D9 8B 42 04             mov         eax,dword ptr [edx+4]  
004012DC FF D0                call        eax  
    pBase->C();
004012DE 8B 45 08             mov         eax,dword ptr [pBase]  
004012E1 8B 10                mov         edx,dword ptr [eax]  
004012E3 8B 4D 08             mov         ecx,dword ptr [pBase]  
004012E6 8B 42 08             mov         eax,dword ptr [edx+8]  
004012E9 FF D0                call        eax

This disassembly shows exactly what is going on under the hood with relation to polymorphism. For the invocations to B and C, the compiler moves the address of the object in to the EAX register. This is then dereferenced to get the base of the VMT and stored in the EDX register. The appropriate VMT entry for the function is found by using EDX as an index and storing the address in EAX. This function is then called. Since Base and Derived have different VMTs, this code will call different functions — the appropriate ones — for the appropriate object type. Seeing how it’s done under the hood also allows us to easily write a function to print the VMT.

void PrintVTable(Base * const pBase)
{
    unsigned int *pVTableBase = (unsigned int *)(*(unsigned int *)pBase);
    printf("First: %p\n"
        "Second: %p\n"
        "Third: %p\n",
        *pVTableBase, *(pVTableBase + 1), *(pVTableBase + 2));
}

Hooking the VMT

Knowing the layout of the VMT makes it trivial to hook. To accomplish this, all that is needed is to overwrite the entry in the VMT with the address of the desired hook function. This is done by using the VirtualProtect function to set the appropriate memory permissions alongside with memcpy to write in the desired hook address. Note that memcpy is used since everything resides within the same address space, otherwise WriteProcessMemory would have to be used. A hooking routine might look like the following:

void HookVMT(Base * const pBase)
{
    unsigned int *pVTableBase = (unsigned int *)(*(unsigned int *)pBase);
    unsigned int *pVTableFnc = (unsigned int *)((pVTableBase + 1));
    void *pHookFnc = (void *)VMTHookFnc;
 
    SIZE_T ulOldProtect = 0;
    (void)VirtualProtect(pVTableFnc, sizeof(void *), PAGE_EXECUTE_READWRITE, &ulOldProtect);
    memcpy(pVTableFnc, &pHookFnc, sizeof(void *));
    (void)VirtualProtect(pVTableFnc, sizeof(void *), ulOldProtect, &ulOldProtect);
}

and VMTHook having a simple definition of

void __fastcall VMTHookFnc(void *pEcx, void *pEdx)
{
    Base *pThisPtr = (Base *)pEcx;
 
    printf("In VMTHookFnc\n");
}

Here the fastcall calling convention is used to easily retrieve the this pointer, which is typically stored in the ECX register.

Applications

The application of this technique will show how to hook IDXGISwapChain::Present and allow for rendering/overlaying of text on a Direct3D10 application. This is not the only way to overlay text, nor necessarily the best, but still provides an adequate example to illustrate the point. The target application will be a Direct3D10 sample provided by the June 2010 DirectX SDK. See /Samples/C++/Direct3D10/Tutorials/Tutorial01 in the SDK. The sample application initializes the Direct3D device and swap chain with a call to D3D10CreateDeviceAndSwapChain then simply sets up a view and renders a blue background on the window (screenshot below).

To overlay text on a Direct3D application, the IDXGISwapChain object must be obtained. Then the Present function of the interface must be hooked, since that is the function responsible for showing the rendered image to the user. This is done here by hooking D3D10CreateDeviceAndSwapChain. Once this function is hooked, the hook will call the real D3D10CreateDeviceAndSwapChain function in order to set up the IDXGISwapChain interface. Then the VMT entry for Present will be replaced with a hooked version that renders text. Put into code it looks like the following:

HRESULT WINAPI D3D10CreateDeviceAndSwapChainHook(IDXGIAdapter *pAdapter, D3D10_DRIVER_TYPE DriverType, HMODULE Software,
    UINT Flags, UINT SDKVersion, DXGI_SWAP_CHAIN_DESC *pSwapChainDesc, IDXGISwapChain **ppSwapChain,
    ID3D10Device **ppDevice)
{
 
    printf("In D3D10CreateDeviceAndSwapChainHook\n");
 
    //Create the device and swap chain
    HRESULT hResult = pD3D10CreateDeviceAndSwapChain(pAdapter, DriverType, Software, Flags, SDKVersion,
        pSwapChainDesc, ppSwapChain, ppDevice);
 
    //Save the device and swap chain interface.
    //These aren't used in this example but are generally nice to have addresses to
    if(ppSwapChain == NULL)
    {
        printf("Swap chain is NULL.\n");
        return hResult;
    }
    else
    {
        pSwapChain = *ppSwapChain;
    }
    if(ppDevice == NULL)
    { 
        printf("Device is NULL.\n");
        return hResult;
    }
    else
    {
        pDevice = *ppDevice;
    }
 
    //Get the vtable address of the swap chain's Present function and modify it with our own.
    //Save it to return to later in our Present hook
    if(pSwapChain != NULL)
    {
        DWORD_PTR *SwapChainVTable = (DWORD_PTR *)pSwapChain;
        SwapChainVTable = (DWORD_PTR *)SwapChainVTable[0];
        printf("Swap chain VTable: %X\n", SwapChainVTable);
        PresentAddress = (pPresent)SwapChainVTable[8];
        printf("Present address: %X\n", PresentAddress);
 
        DWORD OldProtections = 0;
        VirtualProtect(&SwapChainVTable[8], sizeof(DWORD_PTR), PAGE_EXECUTE_READWRITE, &OldProtections);
        SwapChainVTable[8] = (DWORD_PTR)PresentHook;
        VirtualProtect(&SwapChainVTable[8], sizeof(DWORD_PTR), OldProtections, &OldProtections);
    }
 
    //Create the font that we will be drawing with
    CreateDrawingFont();
 
    return hResult;
}

CreateDrawingFont simply sets up a ID3DX10Font to draw with. Now since the VMT entry was replaced, PresentHook will be invoked instead of Present. Here is where the drawing can be done.

HRESULT WINAPI PresentHook(IDXGISwapChain *thisAddr, UINT SyncInterval, UINT Flags)
{
 
    //printf("In Present (%X)\n", PresentAddress);
 
    RECT Rect = { 100, 100, 200, 200 };
    pFont-&gt;DrawTextW(NULL, L"Hello, World!", -1, &Rect, DT_CENTER | DT_NOCLIP, RED);
    return PresentAddress(thisAddr, SyncInterval, Flags);
}

I chose a different calling convention here than for the earlier example code, but everything still functions the same. The end result shows the Present hook successfully rendering the text:
A few important caveats about doing it this way:

The hook must be installed prior to the call to D3D10CreateDeviceAndSwapChain. Otherwise handles to the device and swap chain won’t be obtained.
ID3DX10Font::DrawText can mess with the blend states, shaders, rasterizer, etc. Overlaying text on an application that makes use of these requires the hook developer to account for this and save/restore the states properly.

The source code for the VMT hook example, the slightly modified Direct3D10 sample application, and the Direct3D10 hook can be found here. The hook uses Microsoft Detours as a dependency to perform the initial hooking of D3D10CreateDeviceAndSwapChain.

Follow me

Comments (3)

« Newer Posts — Older Posts »

RCE Endeavors 😅

April 4, 2015

Hiding Functionality with Exception Handlers (2/2)

April 3, 2015

Hiding Functionality with Exception Handlers (1/2)

March 28, 2015

Thoughts on Modern C++

March 26, 2015

Malware Techniques: Code Streaming

January 15, 2015

Virtual Method Table (VMT) Hooking