Writing a Primitive Debugger: Part 3 (Call Stack, Registers, Contexts)

December 5, 2014

Writing a Primitive Debugger: Part 3 (Call Stack, Registers, Contexts)

Filed under: General x86,General x86-64,Programming,Reverse Engineering — admin @ 6:47 PM

Up to now, all of the functionality discussed in writing a debugger has been related to getting a debugger attached to a process and being able to set breakpoints and perform single stepping. While certainly useful, this functionality is more passive debugging: you can break the state of the process at a certain point and instrument it at the instruction level, but you cannot actually modify any behavior, or even view how the process got to that state. The next core functionality that will be covered will detail actually being able to view and change program execution state (in the form of the thread context, namely registers), and being able to view the thread’s call stack upon hitting a breakpoint.

Thread Contexts

A thread context, as defined relevant to Windows, “includes the thread’s set of machine registers, the kernel stack, a thread environment block, and a user stack in the address space of the thread’s process.” For a usermode debugger, which is what is being developed in these posts, the important parts are the machine registers and the user stack. The thread environment block is also accessible from user-mode but won’t be covered here due to its undocumented and very specific nature. When a process starts up, the loader will set up the processes main thread and begin execution at the entry point. This main thread can in turn launch additional threads, which themselves launch threads, and so on. Each of these threads will have their own context containing the items listed above.

The purpose of these contexts is that Windows, being a preemptive multitasking operating system, can have any [usermode] task, such as a thread executing code, interrupted at any point in time. During these interruptions, a context switch will be carried out, which is simply the process of saving the current execution context and setting the new one to execute. Eventually, when the original task is scheduled to resume, a context switch will again occur back to the context of the original thread and it will continue executing as if nothing had happened. What do these contexts look like? The answer is that it is entirely processor-specific, which shouldn’t be too surprising given that they store registers.

In Windows, the part of the thread context that is available to developers comes defined as a CONTEXT structure in winnt.h. For example, below is a snippet from a CONTEXT structure for x86 processors.

typedef struct _CONTEXT {
    DWORD   Dr0;
    DWORD   Dr1;
    DWORD   Dr2;
    ...
    FLOATING_SAVE_AREA FloatSave;
    DWORD   SegGs;
    DWORD   SegFs;
    ...
    DWORD   Edi;
    DWORD   Esi;
    DWORD   Ebx;
    ...
    DWORD   Ebp;
    DWORD   Eip;
    ...

The x64 version looks pretty closely related, with register widths being extended to 64-bits as well as additional registers and extensions added.

typedef struct DECLSPEC_ALIGN(16) _CONTEXT {
    DWORD ContextFlags;
    DWORD MxCsr;
    ...
    WORD   SegGs;
    WORD   SegSs;
    DWORD EFlags;
    ...
    DWORD64 Rax;
    DWORD64 Rcx;
    DWORD64 Rdx;
    ...
    DWORD64 R13;
    DWORD64 R14;
    DWORD64 R15;
    ...

This is the structure that will be the most useful to inspect and modify when debugging. A debugger should be able to print out this structure and allow for modification of any of its fields. Fortunately, there are two very useful APIs for retrieving and modifying this structure: GetThreadContext and SetThreadContext. These have been covered previously when discussing how to enable single-stepping. The context had to be retrieved and the EFlags registered modified. So what modifications are needed to the existing code/logic in order to add this functionality? It’s as simple as opening a handle to the current executing (or in the debuggers case, broken) thread and retrieving/setting the context.

const CONTEXT Debugger::GetExecutingContext()
{
    CONTEXT ctx = { 0 };
    ctx.ContextFlags = CONTEXT_ALL;
    SafeHandle hThread = OpenCurrentThread();
    if (hThread.IsValid())
    {
        bool bSuccess = BOOLIFY(GetThreadContext(hThread(), &ctx));
        if (!bSuccess)
        {
            fprintf(stderr, "Could not get context for thread %X. Error = %X\n", m_dwExecutingThreadId, GetLastError());
        }
    }
 
    memcpy(&m_lastContext, &ctx, sizeof(CONTEXT));
 
    return ctx;
}
 
const bool Debugger::SetExecutingContext(const CONTEXT &ctx)
{
    bool bSuccess = false;
    SafeHandle hThread = OpenCurrentThread();
    if (hThread.IsValid())
    {
        bSuccess = BOOLIFY(SetThreadContext(hThread(), &ctx));
    }
 
    memcpy(&m_lastContext, &ctx, sizeof(CONTEXT));
 
    return bSuccess;
}

For each access or modification, there is a handle opened (and closed) to the current thread — this certainly isn’t the most efficient approach, but serves well enough for demo purposes.Â The state of the context is then stored in m_lastContext. These functions are invoked when the process receives an EXCEPTION_BREAKPOINT and when single stepping the process, i.e. handling the EXCEPTION_SINGLE_STEP exception. Therefore, m_lastContext will always have the appropriate register values in the context structure when a breakpoint is hit or when the user is single stepping. These functions can also be invoked when the user wants to modify a certain register or registers through the debugger interface.Â Printing the context involves nothing more than printing out the values in the structure. I’ve chosen to only print out the more commonly used registers for the example code:

void Debugger::PrintContext()
{
#ifdef _M_IX86
    fprintf(stderr, "EAX: %p EBX: %p ECX: %p EDX: %p\n"
        "ESP: %p EBP: %p ESI: %p EDI: %p\n"
        "EIP: %p FLAGS: %X\n",
        m_lastContext.Eax, m_lastContext.Ebx, m_lastContext.Ecx, m_lastContext.Edx,
        m_lastContext.Esp, m_lastContext.Ebp, m_lastContext.Esi, m_lastContext.Edi,
        m_lastContext.Eip, m_lastContext.EFlags);
#elif defined _M_AMD64
    fprintf(stderr, "RAX: %p RBX: %p RCX: %p RDX: %p\n"
        "RSP: %p RBP: %p RSI: %p RDI: %p\n"
        "R8: %p R9: %p R10: %p R11: %p\n"
        "R12: %p R13: %p R14: %p R15: %p\n"
        "RIP: %p FLAGS: %X\n",
        m_lastContext.Rax, m_lastContext.Rbx, m_lastContext.Rcx, m_lastContext.Rdx,
        m_lastContext.Rsp, m_lastContext.Rbp, m_lastContext.Rsi, m_lastContext.Rdi,
        m_lastContext.R8, m_lastContext.R9, m_lastContext.R10, m_lastContext.R11,
        m_lastContext.R12, m_lastContext.R13, m_lastContext.R14, m_lastContext.R15,
        m_lastContext.Rip, m_lastContext.EFlags);
#else
#error "Unsupported architecture"
#endif
}

Call Stacks

At the lowest level, the scope of a function is defined by its stack frame. This is a compiler and/or ABI defined construct for how the state of the function will be layed out. A stack frame typically includes the return address of the caller, any parameters that were passed to the function from the caller, and space for local variables that exist within the scope of the function. For x86 and x64, among other architectures, these stack frames are preceded with a prologue, which is the code responsible for setting up the stack and frame pointers (ESP/EBP or RSP/RBP) from the caller to the callee. Prior to the callee returning, there is an epilogue, which is responsible for returning the stack and frame pointers to that of the caller. For example, consider the following C function:

void TestFunction(int a, int b, int c)
{
    int d = 4, e = 5, f = 6;
}

which was called in the following way

push        3  
push        2  
push        1  
call        TestFunction

Disassembled as x86, this becomes:


push        ebp  
mov         ebp,esp  
sub         esp,0Ch  
mov         dword ptr [ebp-4],4  
mov         dword ptr [ebp-8],5  
mov         dword ptr [ebp-0Ch],6  
mov         esp,ebp  
pop         ebp  
ret         0Ch

The prologue and epilogue are highlighted in orange. After the execution of the prologue, the stack frame for this function will contain the callers frame pointer in [EBP], the return address at [EBP+4] (because the CALL instruction implicitly pushes the address of the next instruction on the stack before changing execution), and the passed parameters at [EBP+8], [EBP+12], and [EBP+16]. The prologue subtracted 12 from the base of the stack to make room for local variables — the three 32-bit ints declared within the function. These will be at [EBP-4], [EBP-8], and [EBP-12], as can be see in the disassembly.

This setup is pretty convenient because it offers easy distinction between what is a parameter and what is a local variable. Debugging becomes a bit easier since everything is held on the stack and indexed through the frame pointer, rather than scattered around between registers and the stack. This changes a bit as you go from x86 to x64, where x64 will store the first four (or six, depending on your compiler/platform) arguments in registers, and the rest on the stack. This can also change a bit depending on calling conventions and compiler optimizations, especially frame-pointer omission.

Since the stack frame stores the return address of the caller, it is possible to see where the function was called from. That is what the call stack is: a collection of stack frames that represent the call chain in the code leading up to the current stack frame. This information is very useful to have in terms of debugging, because a bug that presented itself in one function may have manifested earlier on in the code. Being able to quickly traverse frames, and see the values within those frames, is an invaluable aid to debugging.

On the Windows platform, there is a convenient function that performs the tedium/annoyance of walking stack frames backwards: StackWalk64. This function is x86 and x64 compatible, but does require some setup prior to being invoked. Given the very machine-specific layout of stack frames, the StackWalk64 function requires filling out a STACKFRAME64 structure, which will be passed to it as an argument. Filling out this structure merely involves setting the instruction, frame, and stack pointers, along with the address modes, which will be flat addressing for the case of modern Windows on x86 and x64. Once this structure is set up, StackWalk64 can be called in a loop to retrieve the frames. Put into code, it looks like the following:

void Debugger::PrintCallStack()
{
    STACKFRAME64 stackFrame = { 0 };
    const DWORD_PTR dwMaxFrames = 50;
    CONTEXT ctx = GetExecutingContext();
 
    stackFrame.AddrPC.Mode = AddrModeFlat;
    stackFrame.AddrFrame.Mode = AddrModeFlat;
    stackFrame.AddrStack.Mode = AddrModeFlat;
 
#ifdef _M_IX86
    DWORD dwMachineType = IMAGE_FILE_MACHINE_I386;
    stackFrame.AddrPC.Offset = ctx.Eip;
    stackFrame.AddrFrame.Offset = ctx.Ebp;
    stackFrame.AddrStack.Offset = ctx.Esp;
#elif defined _M_AMD64
    DWORD dwMachineType = IMAGE_FILE_MACHINE_AMD64;
    stackFrame.AddrPC.Offset = ctx.Rip;
    stackFrame.AddrFrame.Offset = ctx.Rbp;
    stackFrame.AddrStack.Offset = ctx.Rsp;
#else
#error "Unsupported platform"
#endif
 
    SafeHandle hThread = OpenCurrentThread();
    for (int i = 0; i < dwMaxFrames; ++i)
    {
        const bool bSuccess = BOOLIFY(StackWalk64(dwMachineType, m_hProcess(), hThread(), &stackFrame,
            (dwMachineType == IMAGE_FILE_MACHINE_I386 ? nullptr : &ctx), nullptr,
            SymFunctionTableAccess64, SymGetModuleBase64, nullptr));
        if (!bSuccess || stackFrame.AddrPC.Offset == 0)
        {
            fprintf(stderr, "StackWalk64 finished.\n");
            break;
        }
 
        fprintf(stderr, "Frame: %X\n"
            "Execution address: %p\n"
            "Stack address: %p\n"
            "Frame address: %p\n",
            i, stackFrame.AddrPC.Offset,
            stackFrame.AddrStack.Offset, stackFrame.AddrFrame.Offset);
    }
}

Testing the functionality

To test this functionality we can create another demo app that will be used as the debug target. The simple one below is what I used:

#include <cstdio>
 
void d()
{
    printf("d called.\n");
}
 
void c()
{
    printf("c called.\n");
    d();
}
 
void b()
{
    printf("b called.\n");
    c();
}
 
void a()
{
    printf("a called.\n");
    b();
}
 
int main(int argc, char *argv[])
{
    printf("Addresses: \n"
        "a: %p\n"
        "b: %p\n"
        "c: %p\n"
        "d: %p\n",
        a, b, c, d);
 
    getchar();
    while (true)
    {
        a();
        getchar();
    }
 
    return 0;
}

I would recommend disabling incremental linking and ASLR (on the executable, not the system) for convenience sake. Below is the stack trace that Visual Studio produces when a breakpoint is set inside the d function and hit.

Demo.exe!d() Line 5	C++
Demo.exe!c() Line 14	C++
Demo.exe!b() Line 20	C++
Demo.exe!a() Line 26	C++
Demo.exe!main(int argc, char * * argv) Line 41	C++
Demo.exe!__tmainCRTStartup() Line 626	C
Demo.exe!mainCRTStartup() Line 466	C
kernel32.dll!@BaseThreadInitThunk@12()	Unknown
ntdll.dll!___RtlUserThreadStart@8()	Unknown
ntdll.dll!__RtlUserThreadStart@8()	Unknown

Attaching with the debugger also yields 10 frames, as listed below:

a
Target address: 0x4010f0
Received breakpoint at address 004010F0
Press c to continue or s to begin stepping.
l
Frame: 0
Execution address: 004010F0
Stack address: 00000000
Frame address: 0018FBE4
Frame: 1
Execution address: 004010DA
Stack address: 00000000
Frame address: 0018FBE8
Frame: 2
Execution address: 0040108A
Stack address: 00000000
Frame address: 0018FCBC
Frame: 3
Execution address: 0040103A
Stack address: 00000000
Frame address: 0018FD90
Frame: 4
Execution address: 004011C6
Stack address: 00000000
Frame address: 0018FE64
Frame: 5
Execution address: 00401699
Stack address: 00000000
Frame address: 0018FF38
Frame: 6
Execution address: 004017DD
Stack address: 00000000
Frame address: 0018FF88
Frame: 7
Execution address: 75D5338A
Stack address: 00000000
Frame address: 0018FF90
Frame: 8
Execution address: 77339F72
Stack address: 00000000
Frame address: 0018FF9C
Frame: 9
Execution address: 77339F45
Stack address: 00000000
Frame address: 0018FFDC
StackWalk64 finished.

The output is a bit less elegant than the Visual Studio debugger, but it is correct, which is the more important part. It would be nice, however, to put names to some of those addresses. That is where symbol loading and mapping come in, which will be the subject of the next post.

Article Roadmap
Future posts will be related on topics closely following the items below:

Basics
Adding/Removing Breakpoints, Single-stepping
Call Stack, Registers, Contexts
Symbols
Miscellaneous Features

The full source code relating to this can be found here. C++11 features were used, so MSVC 2012/2013 is most likely required.

RCE Endeavors 😅

December 5, 2014

Writing a Primitive Debugger: Part 3 (Call Stack, Registers, Contexts)

No Comments »

Leave a comment