Archive

Archive for December, 2014

Writing a Primitive Debugger: Part 5 (Miscellaneous)

December 20th, 2014 2 comments

Welcome to the final installment of how to write a primitive debugger. This post will cover some miscellaneous topics that were not present in the previous articles in order to add some missing core functionality. The topics covered here will be how to display a disassembly listing , how to step over code, i.e. step past a conditional branch, and how to dump and modify arbitrary memory of a process.

Disassembly

In order to display a disassembly dump on x86 and x64, this debugger will take advantage of the BeaEngine disassembly library. This is a very handy library that supports the 16/32/64-bit Intel instruction sets as well as floating point and vector extensions. The project is open source for those interested in looking at the internals of the disassembler. In the example code, it is distributed as DLLs that the code will load and be used at runtime. This is done as a convenience in order to prevent having to possibly recompile static libraries.

The disassembler code will be pretty straightforward to work with. BeaEngine has a DISASM structure that needs to be initialized with the architecture type and an address. This is then passed along to a Disasm function, which fills the structure with information about the instruction at the address. Since the disassembler is dynamically loaded, and is used for x86/x64 in the same code, the function pointer to Disasm needs to be retrieved. All of this initialization code can be handled in the constructor.

Disassembler::Disassembler(HANDLE hProcess) : m_hProcess{ hProcess }
{
    memset(&m_disassembler, 0, sizeof(DISASM));
#ifdef _M_IX86
    m_disassembler.Archi = 0;
    if (m_hDll == nullptr)
    {
        m_hDll = LoadLibrary(L"BeaEngine_x86.dll");
        m_pDisasm = (pDisasm)GetProcAddress(m_hDll, "_Disasm@4");
    }
#elif defined _M_AMD64
    m_disassembler.Archi = 64;
    if(m_hDll == nullptr)
    {
        m_hDll = LoadLibrary(L"BeaEngine_x64.dll");
        m_pDisasm = (pDisasm)GetProcAddress(m_hDll, "Disasm");
    }
#else
#error "Unsupported architecture"
#endif
}

with m_hDll and m_pDisasm being static, since there’s no need to retrieve these per instance. Since the code is meant to work on x86/x64, there are two separate versions of the DLL provided — one for use in x86 applications, the other for x64.

Now that the disassembly engine is loaded and initialized, it is time to actually begin disassembling code. There is an interesting problem that comes up, however. The debugger is attached to another process, but the disassembler is given an address in the current address space to disassemble at, i.e. the user can request disassembly at address 0x00411000 when prompted. The disassembly at address 0x00411000 in the debugger doesn’t have any relation to the disassembly at address 0x00411000 in the target, due to how virtual memory works. So the solution isn’t as easy as setting the target address to disassemble at to 0x00411000 and calling Disasm.

Instead, the memory at 0x00411000 in the target process must be read and that must be disassembled. Something like this was already done when implementing Interrupt Breakpoints; the original byte at the address was saved before replacing it with an 0xCC opcode. For this, it is still as simple as calling ReadProcessMemory and storing the buffer.

const bool Disassembler::TransferBytes(const DWORD_PTR dwAddress)
{
    SIZE_T ulBytesRead = 0;
    bool bSuccess = BOOLIFY(ReadProcessMemory(m_hProcess, (LPCVOID)dwAddress, m_bytes.data(), m_bytes.size(), &ulBytesRead));
    if (bSuccess && ulBytesRead == m_bytes.size())
    {
        return true;
    }
    else
    {
        fprintf(stderr, "Could not read from %p. Error = %X\n", dwAddress, GetLastError());
    }
 
    return false;
}

Once that is done, the disassembly process is no more difficult than the BeaEngine example. The target disassembly address is set and the Disasm function is called through the function pointer retrieved from the DLL. This function fills the DISASM structure (m_disassembler in the code), and returns the length of the instruction. This can be added to the previous address to get the address of the next instruction, and the process repeats.

const bool Disassembler::BytesAtAddress(DWORD_PTR dwAddress, size_t ulInstructionsToDisassemble /*= 15*/)
{
    if (IsInitialized())
    {
        SetDisassembler(dwAddress);
        bool bFailed = false;
        while (!bFailed && ulInstructionsToDisassemble-- > 0)
        {
            int iDisasmLength = m_pDisasm(&m_disassembler);
            if (iDisasmLength != UNKNOWN_OPCODE)
            {
                fprintf(stderr, "0x%p - %s\n", dwAddress, m_disassembler.CompleteInstr);
                m_disassembler.EIP += iDisasmLength;
                dwAddress += iDisasmLength;
            }
            else
            {
                fprintf(stderr, "Error: Reached unknown opcode in disassembly.\n");
                bFailed = true;
            }
        }
    }
    else
    {
        fprintf(stderr, "Could not show disassembly at address. Disassembler Dll was not loaded properly.\n");
        return false;
    }
 
    return true;
}

The SetDisassembler function is responsible for setting the correct starting address in the debuggers local copy of the target processes memory at the desired address. The debugger keeps a 4096 byte cache (the default Windows page size) and uses that if the target to disassemble exists within that range. Otherwise, a read is performed again and the cache re-initialized

void Disassembler::SetDisassembler(const DWORD_PTR dwAddress)
{
    bool bIsCached = ((dwAddress - m_dwStartAddress) < m_bytes.size());
    bIsCached &= (dwAddress < m_dwStartAddress);
    if (!bIsCached)
    {
        (void)TransferBytes(dwAddress);
        m_disassembler.EIP = (UIntPtr)m_bytes.data();
        m_dwStartAddress = dwAddress;
    }
    else
    {
        m_disassembler.EIP = (UIntPtr)&amp;m_bytes.data()[dwAddress - m_dwStartAddress];
    }
}

And that’s all it takes. The debugger can now print a disassembly listing at any readable address.

Step Over

Step into is the ability to step one instruction at a time as it executes and is something that is supported at the hardware level with the single step flag. Step over is implemented purely in code and is a convenience function that lets the user skip stepping into branches in the code. For example, take the following disassembly listing:

0040108D 81 C4 C0 00 00 00    add esp,    0C0h
00401093 3B EC                cmp         ebp,esp  
00401095 E8 76 03 00 00       call        SomeFunction (0401410h)  
0040109A 8B E5                mov         esp,ebp  
...

Assume that you are at a broken state at address 0x0040108D. You know that SomeFunction is not of any interest to you and you don’t want to single step through it. You’d rather get to the more interesting parts at address 0x0040109A and below. So what you do is when you’re at 0x00401093, you set a breakpoint at 0x0040109A and continue execution. This effectively skips the CALL instruction at 0x00401095 and hits your breakpoint at the instruction immediately following it, so you can continue debugging. Step over effectively wraps these steps in to one convenient function provided by a debugger.

In order to perform a step over, the debugger must know what the next instruction is. This is obviously needed because it is the instruction that the user wishes to break at next. The next instruction can be one of a few types:

  1. Invalid
  2. A non-branching instruction (i.e. add/mov/lea/push/…)
  3. A conditional branching instruction (i.e. jz/jge/jb/…)
  4. A non-conditional branching instruction (i.e. call/jmp/ret)

If it’s an invalid instruction, then it’s up to the debugger implementation to decide what to do next. In the second case, the next instruction is simply the address of the current one plus the length of the current instruction. The third case is interesting and is also partially implementation defined. If the user is broken on a conditional branch and wishes to step over, how should that be treated? For example, assume the user is looking at the following disassembly listing and is broken on 0x00401219:

00401213 8B 45 F8             mov         eax,dword ptr [a]  
00401216 3B 45 EC             cmp         eax,dword ptr [b]  
00401219 7E 05                jle         test+60h (0401220h)  
0040121B E8 50 FF FF FF       call        d (0401170h)  
00401220 8B F4                mov         esi,esp  

Assume [a] is greater than [b], so the jump will not be taken and the next instruction will be 0x0040121B. The user decides that they want to step over, so they will land at 0x0040121B, which is correct. Now assume the opposite: that [a] is less than or equal to [b]. This means that the branch will be taken and the next address will be 0x00401220. If the user is at 0x00401219 and decides to step over, then what happens? Since 0x0040121B will not be reached, that step over point isn’t necessary valid. Should execution continue because the step over will not be reached, or should the debugger “fix” it for the user and break at 0x00401220? Different debuggers do different things here. I would personally go with the latter case just to be safe. Especially since the debugger has access to the EFLAGS register and can tell whether the branch will be taken or not prior to execution of the instruction. This particular scenario is left undefined in the example code.

The last scenario is that of an unconditional branch. The two unconditional branches that affect implementing step over are JMP (unconditional jump) and RET (return). Under both of these, the point of execution is guaranteed to change: either to the jump destination or to the return address on the stack. Stepping over a RET instruction is pretty useless, because it won’t be hit. Likewise, stepping over a JMP instruction, in 95% of cases, will also be useless. The point of return from that JMP will most likely not be the instruction following it. For these cases, the example code converts the step over into a step into and follows execution. Having said all of this, the next instruction retrieval function is implemented as follows:

DWORD_PTR Disassembler::GetNextInstruction(const DWORD_PTR dwAddress, bool &bIsUnconditionalBranch)
{
    DWORD_PTR dwNextAddress = 0;
    if (IsInitialized())
    {
        SetDisassembler(dwAddress);
        int iDisasmLength = m_pDisasm(&m_disassembler);
        if (iDisasmLength != UNKNOWN_OPCODE)
        {
            if (m_disassembler.Instruction.BranchType == RetType || m_disassembler.Instruction.BranchType == JmpType)
            {
                bIsUnconditionalBranch = true;
            }
            else
            {
                dwNextAddress = (dwAddress + iDisasmLength);
            }
        }
        else
        {
            fprintf(stderr, "Could not get next instruction. Unknown opcode at %p.\n");
        }
    }
    else
    {
        fprintf(stderr, "Could not get next instruction. Disassembler Dll was not loaded propertly.\n");
    }
 
    return dwNextAddress;
}

with the full StepOver function being implemented as follows:

const bool Debugger::StepOver()
{
    CONTEXT ctx = GetExecutingContext();
    bool bIsUnconditionalBranch = false;
#ifdef _M_IX86
    DWORD_PTR dwStepOverAddress = m_pDisassembler->GetNextInstruction(ctx.Eip, bIsUnconditionalBranch);
#elif defined _M_AMD64
    DWORD_PTR dwStepOverAddress = m_pDisassembler->GetNextInstruction(ctx.Rip, bIsUnconditionalBranch);
#else
#error "Unsupported platform"
#endif
    if (bIsUnconditionalBranch)
    {
        return StepInto();
    }
    else if (dwStepOverAddress != 0)
    {
        m_pStepPoint->Disable();
        m_pStepPoint->ChangeAddress(dwStepOverAddress);
        (void)m_pStepPoint->Enable();
 
        ctx.EFlags &= ~0x100;
        (void)SetExecutingContext(ctx);
 
        return Continue(true);
    }
 
    return false;
}

with m_pStepPoint being a breakpoint to the step over address.

Dump and modify memory

This last piece of functionality is nothing more than an exercise in calling ReadProcessMemory and WriteProcessMemory.

const bool Debugger::PrintBytesAt(const DWORD_PTR dwAddress, size_t ulNumBytes /*= 40*/)
{
    SIZE_T ulBytesRead = 0;
    std::unique_ptr<unsigned char[]> pBuffer = std::unique_ptr<unsigned char[]>(new unsigned char[ulNumBytes]);
    const bool bSuccess = BOOLIFY(ReadProcessMemory(m_hProcess(), (LPCVOID)dwAddress, pBuffer.get(), ulNumBytes, &ulBytesRead));
    if (bSuccess && ulBytesRead == ulNumBytes)
    {
        for (unsigned int i = 0; i < ulBytesRead; ++i)
        {
            fprintf(stderr, "%02X ", pBuffer.get()[i]);
        }
        fprintf(stderr, "\n");
        return true;
    }
 
    fprintf(stderr, "Could not read memory at %p. Error = %X\n", dwAddress, GetLastError());
    return false;
}
 
const bool Debugger::ChangeByteAt(const DWORD_PTR dwAddress, const unsigned char cNewByte)
{
    SIZE_T ulBytesWritten = 0;
    const bool bSuccess = BOOLIFY(WriteProcessMemory(m_hProcess(), (LPVOID)dwAddress, &cNewByte, sizeof(unsigned char), &ulBytesWritten));
    if (bSuccess && ulBytesWritten == sizeof(unsigned char))
    {
        return true;
    }
 
    fprintf(stderr, "Could not change byte at %p. Error = %X\n", dwAddress, GetLastError());
    return false;
}

Testing the functionality

The same example program as in the previous posts will be used, with minor modifications:

#include 
 
void d()
{
    printf("d called.\n");
}
 
void c()
{
    int i = 0x1234;
    printf("c called.\n");
    printf("i is at %p with value %X.\n", &i, i);
    d();
    printf("i is at %p with value %X.\n", &i, i);
}
 
void b()
{
    printf("b called.\n");
    c();
}
 
void a()
{
    printf("a called.\n");
    b();
}
 
int main(int argc, char *argv[])
{
    printf("Addresses: \n"
        "a: %p\n"
        "b: %p\n"
        "c: %p\n"
        "d: %p\n",
        a, b, c, d);
 
    getchar();
    while (true)
    {
        a();
        getchar();
    }
 
    return 0;
}

To test memory modification, the i variable can be modified while the program is in a broken state in the d function. Entered commands are in red.

a
[A]ddress or [s]ymbol name? s
Name: d
Received breakpoint at address 00401170.
Press c to continue, s to step into, o to step over.
i
Enter address to print bytes at: 0x18fcac
34 12 00 00 CC CC CC CC 0C AD C2 AA 8C FD 18 00 8A 10 40 00 60 FE 18 00 94 FD 18
 00 00 E0 FD 7F CC CC CC CC CC CC CC CC
e
Enter address to change byte at: 0x18fcac
Enter new byte: 0x12
e
Enter address to change byte at: 0x18fcad
Enter new byte: 0x34
c
Received step at address 00401171

Output from the target application:

Addresses:
a: 00401000
b: 00401050
c: 004010A0
d: 00401170

a called.
b called.
c called.
i is at 0018FCAC with value 1234.
d called.
i is at 0018FCAC with value 3412.

Disassembly and step over are pretty straightforward to test when lined up with the Visual Studio debugger. For example, below is the disassembly relevant to the a function:

//printf("a called.\n");
00401009 68 48 21 40 00       push        402148h  
0040100E FF 15 94 20 40 00    call        dword ptr ds:[402094h]  
00401014 83 C4 04             add         esp,4  
//b();
00401017 E8 14 00 00 00       call        b (0401030h)  
0040101C 5F                   pop         edi  
}
...

Setting a breakpoint on 0x00401009 and stepping over shows the following behavior in the debugger:

a
[A]ddress or [s]ymbol name? a
Breakpoint address: 0x401009
Received breakpoint at address 00401009.
Press c to continue, s to step into, o to step over.
o
Could not write back original opcode to address 00000000. Error = 1E7
Received breakpoint at address 0040100E.
Press c to continue, s to step into, o to step over.
o
Received breakpoint at address 00401014.
Press c to continue, s to step into, o to step over.
o
Received breakpoint at address 00401017.
Press c to continue, s to step into, o to step over.
o
Received breakpoint at address 0040101C.
Press c to continue, s to step into, o to step over.

Lastly, a disassembly listing for all of this can be displayed:

d
Enter address to print disassembly at: 0x401009
0x00401009 - push 00402148h
0x0040100E - call dword ptr [00402094h]
0x00401014 - add esp, 04h
0x00401017 - call 0067D3A3h
0x0040101C - pop edi
0x0040101D - pop esi
0x0040101E - pop ebx
0x0040101F - mov esp, ebp
0x00401021 - pop ebp
0x00401022 - ret
0x00401023 - int3
0x00401024 - int3
0x00401025 - int3
0x00401026 - int3
0x00401027 - int3

which lines up with what Visual Studio gives.

Wrap up

Writing a debugger may seem like a daunting task, but it is certainly attainable. Aside from the disassembly engine — which can be a whole long series of posts in itself — everything was written from scratch in about 2,000 lines of code (doing a ‘\n’ regex search on the solution yields 2195 lines). Contained within those lines of code is the ability to

  • Add/Remove breakpoints
  • Step into / Step over instructions
  • Continue execution at a breakpoint or step
  • Print / Modify registers
  • Print a call stack
  • Match symbols to addresses / Dump symbols for a module
  • Print / Modify memory
  • Disassemble at an address

While it’s certainly not WinDbg or the Visual Studio debugger, it is an impressive amount for relatively little work. Hopefully those following these series of posts have gained a bit on insight into how the tools that they may use on a frequent basis work and what it takes to develop them. Thanks for reading.

Article Roadmap

The full source code relating to this can be found here. C++11 features were used, so MSVC 2012/2013 is most likely required.

Writing a Primitive Debugger: Part 4 (Symbols)

December 11th, 2014 No comments

Up to now, we have developed a debugger that can attach and detach from a process, set and remove breakpoints, print registers and a call stack, and modify control flow by changing the executing thread context. These are all pretty essential features of a debugger. The topic of  this post, debug symbols, is more of a “nice-to-have”. An application may or may not ship with debug symbols, but in the event that it does, i.e. it’s your own application, then the process of debugging becomes significantly more simple.

Debug Symbols

At its simplest definition, a debug symbol is a piece of information that shows how specific parts of a compiled program map back to the source level. For example, a debug symbol might tell information about the name of a variable at a memory address, or which line of code, and in which file, a series of assembly instructions map to. They are typically generated during debug builds and are used to provide some clarity to a developer that is debugging (or reverse engineering) a piece of code. There is no universal debug symbol format for a language, and they may vary between compilers. On the modern Windows platform, debug symbols come in the form of Program Database (PDB) files, ending with a .pdb extension.

These files hold a lot of useful information about the compiled executable or DLL. As mentioned above, they can contain information regarding which source file and line number (or which object file) a symbol at a certain address maps to. They can contain the names and types of global, static, and local variables, as well as classes and structs. They can also contain information compiler optimizations that were used when compiling the code. Some of these things may not be present if the code was compiled with stripped symbols. During a debugging session, the debugger will initialize a symbol handler and begin looking for, either recursively in common directories and/or user-specified directories, and parsing* matching PDB files. When a user is debugging, symbol information can be retrieved and names and source line numbers can be displayed to them (if available).
* This is a useful open source parser that can parse the proprietary format of PDB files.

Implementation

Microsoft provides a very rich set of APIs for handling symbols through the DbgHelp API. There are functions to load/enumerate symbols for a module, find a symbol by name or address, enumerate source file and line references found in PDBs, dynamically add or remove entries from the symbol table, interact with symbol stores, and much more. Given the very large API, I’ve only chosen to demonstrate implementation of the more common features. One thing to consider is that all functions in the DbgHelp API set are single threaded. The example code is single threaded, but does not have concurrency synchronization to ensure that it is only called from a single thread, meaning if you’re implementing something off of this code, make sure that you add concurrency synchronization.

Initializing a symbol handler is pretty straightforward: it merely involves calling SymInitialize. The function takes a process handle, which is opened by the debugger when it attaches. There is also a parameter for the user search path to locate PDB files, and a third parameter to specify whether the debugger is to enumerate all of the loaded modules in the process and load their symbols as well. For an attaching debugger, specifying that this behavior is dependent on the situation. There is a case, such as the debugger creating the target process to debug, or with delay-loaded DLLs, that can cause some symbols to not be loaded. Additionally, if this third parameter is set to true and the symbol handler is initialized prior to receiving all of the LOAD_DLL_DEBUG_EVENT events, then some symbols may not be loaded. The implementation sample code has been defaulted to false, and symbols for modules will be loaded in the CREATE_PROCESS_DEBUG_EVENT and LOAD_DLL_DEBUG_EVENT event handlers. This ensures that all symbol files for every module will be properly loaded.

Prior to initializing the symbol handler, the SymSetOptions function should be called, which configures how and what information the symbol handler will load. Simply put into code, the initialization routine looks like the following:

Symbols::Symbols(const HANDLE hProcess, const HANDLE hFile, const bool bLoadAll /*= false*/)
    : m_hProcess{ hProcess }, m_hFile{ hFile }
{
    (void)SymSetOptions(SYMOPT_CASE_INSENSITIVE | SYMOPT_DEFERRED_LOADS |
        SYMOPT_LOAD_LINES | SYMOPT_UNDNAME);
 
    const bool bSuccess = BOOLIFY(SymInitialize(hProcess, nullptr, bLoadAll));
    if (!bSuccess)
    {
        fprintf(stderr, "Could not initialize symbol handler. Error = %X.\n",
            GetLastError());
    }
}

The options here specify that symbol searches will be case insensitive, that symbols won’t be loaded until a reference is made (not to be confused with delay-loading  for DLLs that were mentioned above), that line information will be loaded, and that symbols will be displayed in an undecorated form. Case insensitivity and undecorated names are there for convenience; it would be annoying to search for exact symbol names such as “?f@@YAHD@Z” otherwise.

When the symbol handler is finished, i.e. the debugger is detaching from the process, a simple call to SymCleanup will terminate the symbol handler:

Symbols::~Symbols()
{
    const bool bSuccess = BOOLIFY(SymCleanup(m_hProcess));
    if (!bSuccess)
    {
        fprintf(stderr, "Could not terminate symbol handler. Error = %X.\n",
            GetLastError());
    }
}

That sets up the initialization and termination of the symbol handler. Time for everything in between.

Enumerating Symbols

One useful feature of a debugger might be to internally enumerate all symbols of a module. This can allow for storage and fast lookup at a later time. Or it can allow for a graphic display for the user and easy navigation to the symbol address from its name. Enumerating symbols is a two step process: first SymLoadModuleEx is called to load the symbol table for the module, then SymEnumSymbols can be called with the base address of the module. SymEnumSymbols takes a callback of type PSYM_ENUMERATESYMBOLS_CALLBACK as a parameter. This callback will be called for every symbol found in the modules symbol table and will have a SYMBOL_INFO structure that shows information about the symbol, such as its name, address, whether it is a register, what value it holds if its a constant, etc. Put in to code, this is rather straightforward:

const bool Symbols::EnumerateModuleSymbols(const char * const pModulePath, const DWORD64 dwBaseAddress)
{
    DWORD64 dwBaseOfDll = SymLoadModuleEx(m_hProcess, m_hFile, pModulePath, nullptr,
        dwBaseAddress, 0, nullptr, 0);
    if (dwBaseOfDll == 0)
    {
        fprintf(stderr, "Could not load modules for %s. Error = %X.\n",
            pModulePath, GetLastError());
        return false;
    }
 
    UserContext userContext = { this, pModulePath };
    const bool bSuccess = 
       BOOLIFY(SymEnumSymbols(m_hProcess, dwBaseOfDll, "*!*", SymEnumCallback, &userContext));
    if (!bSuccess)
    {
        fprintf(stderr, "Could not enumerate symbols for %s. Error = %X.\n",
            pModulePath, GetLastError());
    }
 
    return bSuccess;
}

Resolving Symbols

There are several ways to resolve symbols, but the two most common are by name and by address. This can be achieved by calling SymFromName and SymFromAddr respectively. Both of these populate a SYMBOL_INFO structure, just as calling SymEnumSymbols does. Invoking them is also rather straightforward:

const bool Symbols::SymbolFromAddress(const DWORD64 dwAddress, const SymbolInfo **pFullSymbolInfo)
{
    char pBuffer[sizeof(SYMBOL_INFO) + MAX_SYM_NAME * sizeof(char)] = { 0 };
    PSYMBOL_INFO pSymInfo = (PSYMBOL_INFO)pBuffer;
 
    pSymInfo->SizeOfStruct = sizeof(SYMBOL_INFO);
    pSymInfo->MaxNameLen = MAX_SYM_NAME;
 
    DWORD64 dwDisplacement = 0;
    const bool bSuccess = BOOLIFY(SymFromAddr(m_hProcess, dwAddress, &dwDisplacement, pSymInfo));
    if (!bSuccess)
    {
        fprintf(stderr, "Could not retrieve symbol from address %p. Error = %X.\n",
            (DWORD_PTR)dwAddress, GetLastError());
        return false;
    }
 
    fprintf(stderr, "Symbol found at %p. Name: %.*s. Base address of module: %p\n",
        (DWORD_PTR)dwAddress, pSymInfo->NameLen, pSymInfo->Name, (DWORD_PTR)pSymInfo->ModBase);
 
    *pFullSymbolInfo = FindSymbolByName(pSymInfo->Name);
 
    return bSuccess;
}
 
const bool Symbols::SymbolFromName(const char * const pName, const SymbolInfo **pFullSymbolInfo)
{
    char pBuffer[sizeof(SYMBOL_INFO) + MAX_SYM_NAME * sizeof(char)
        + sizeof(ULONG64) - 1 / sizeof(ULONG64)] = { 0 };
    PSYMBOL_INFO pSymInfo = (PSYMBOL_INFO)pBuffer;
 
    pSymInfo->SizeOfStruct = sizeof(SYMBOL_INFO);
    pSymInfo->MaxNameLen = MAX_SYM_NAME;
 
    const bool bSuccess = BOOLIFY(SymFromName(m_hProcess, pName, pSymInfo));
    if (!bSuccess)
    {
        fprintf(stderr, "Could not retrieve symbol for name %s. Error = %X.\n",
            pName, GetLastError());
        return false;
    }
 
    fprintf(stderr, "Symbol found for %s. Name: %.*s. Address: %p. Base address of module: %p\n",
        pName, pSymInfo->NameLen, pSymInfo->Name, (DWORD_PTR)pSymInfo->Address,
        (DWORD_PTR)pSymInfo->ModBase);
 
    *pFullSymbolInfo = FindSymbolByAddress((DWORD_PTR)pSymInfo->Address);
 
    return bSuccess;
}

with the SymbolInfo structure being an extended structure that holds information about source files and line numbers (see example code).

Testing the functionality

To test this functionality, we can take the sample program from the previous post (reproduced below) and see the difference in how call stacks look. The new functionality in this version has added the ability to resolve symbols for the addresses in the callstack. Also, the debugger was augmented to add two new abilities: to dump all symbols from a module, and to set/remove breakpoints on a symbol by name.

#include <cstdio>
 
void d()
{
    printf("d called.\n");
}
 
void c()
{
    printf("c called.\n");
    d();
}
 
void b()
{
    printf("b called.\n");
    c();
}
 
void a()
{
    printf("a called.\n");
    b();
}
 
int main(int argc, char *argv[])
{
    printf("Addresses: \n"
        "a: %p\n"
        "b: %p\n"
        "c: %p\n"
        "d: %p\n",
        a, b, c, d);
 
    getchar();
    while (true)
    {
        a();
        getchar();
    }
 
    return 0;
}

Setting a breakpoint on the d function and printing the call stacks shows the more useful functionality between the previous version of the debugger and this one. Entered commands are shown in red, while new symbol information is shown in orange.

a
[A]ddress or [s]ymbol name? s
Name: d
Received breakpoint at address 00401090.
Press c to continue or s to begin stepping.
l
Frame: 0
Execution address: 00401090
Stack address: 00000000
Frame address: 0018FDE8
Symbol name: d
Symbol address: 00401090
Address displacement: 0
Source file: c:\users\demo\desktop\demoapp\source.cpp
Line number: 4
Frame: 1
Execution address: 0040107C
Stack address: 00000000
Frame address: 0018FDEC
Symbol found at 0040107C. Name: c. Base address of module: 00400000
Symbol name: c
Symbol address: 00401060
Address displacement: 0
Source file: c:\users\demo\desktop\demoapp\source.cpp
Line number: 9
Frame: 2
Execution address: 0040104C
Stack address: 00000000
Frame address: 0018FE40
Symbol found at 0040104C. Name: b. Base address of module: 00400000
Symbol name: b
Symbol address: 00401030
Address displacement: 0
Source file: c:\users\demo\desktop\demoapp\source.cpp
Line number: 15
Frame: 3
Execution address: 0040101C
Stack address: 00000000
Frame address: 0018FE94
Symbol found at 0040101C. Name: a. Base address of module: 00400000
Symbol name: a
Symbol address: 00401000
Address displacement: 0
Source file: c:\users\demo\desktop\demoapp\source.cpp
Line number: 21
Frame: 4
Execution address: 004010EF
Stack address: 00000000
Frame address: 0018FEE8
Symbol found at 004010EF. Name: main. Base address of module: 00400000
Symbol name: main
Symbol address: 004010B0
Address displacement: 0
Source file: c:\users\demo\desktop\demoapp\source.cpp
Line number: 27
Frame: 5
Execution address: 004013A9
Stack address: 00000000
Frame address: 0018FF3C
Symbol found at 004013A9. Name: __tmainCRTStartup. Base address of module: 00400000
Symbol name: __tmainCRTStartup
Symbol address: 00401210
Address displacement: 0
Source file: f:\dd\vctools\crt\crtw32\dllstuff\crtexe.c
Line number: 473
Frame: 6
Execution address: 004014ED
Stack address: 00000000
Frame address: 0018FF8C
Symbol found at 004014ED. Name: mainCRTStartup. Base address of module: 00400000

Symbol name: mainCRTStartup
Symbol address: 004014E0
Address displacement: 0
Source file: f:\dd\vctools\crt\crtw32\dllstuff\crtexe.c
Line number: 456
Frame: 7
Execution address: 76AE919F
Stack address: 00000000
Frame address: 0018FF94
Symbol found at 76AE919F. Name: BaseThreadInitThunk. Base address of module: 00000000
Symbol name: BaseThreadInitThunk
Symbol address: 76AE9191
Address displacement: 0
Source file: (null)
Line number: 0
Frame: 8
Execution address: 77430BBB
Stack address: 00000000
Frame address: 0018FFA0
Symbol found at 77430BBB. Name: RtlInitializeExceptionChain. Base address of module: 00000000
Symbol name: RtlInitializeExceptionChain
Symbol address: 77430B37
Address displacement: 0
Source file: (null)
Line number: 0
Frame: 9
Execution address: 77430B91
Stack address: 00000000
Frame address: 0018FFE4
Symbol found at 77430B91. Name: RtlInitializeExceptionChain. Base address of module: 00000000
Symbol name: RtlInitializeExceptionChain
Symbol address: 77430B37
Address displacement: 0
Source file: (null)
Line number: 0
StackWalk64 finished.

This looks much more useful compared to just getting absolute addresses as in the previous version. Here, for some symbols, the source files can be found on the host machine and be presented to the user alongside the raw assembly. Additionally, symbols  can be printed for any module as shown below:

y
Enter in module name to dump symbols for: kernel32.dll
Symbol name: QuirkIsEnabledWorker
Symbol address: 76AE0010
Address displacement: 0
Source file: (null)
Line number: 0
Symbol name: EnumCalendarInfoExEx
Symbol address: 76AE03BD
Address displacement: 0
Source file: (null)
Line number: 0
Symbol name: GetFileMUIPath
Symbol address: 76AE03CE
Address displacement: 0
Source file: (null)
Line number: 0
...

That concludes the topic on symbols. The implementation presented here only scratched the surface of what is available in terms of the DbgHelp API, and I recommend that those interested further explore the MSDN documentation on the topics. The next article will conclude the series with a collection of miscellaneous features that debuggers typically possess. For that piece, it will probably include the ability to step over code (step into is currently implemented), present a disassembly listing to the user for x86 and x64, and allow for modification of arbitrary memory, instead of just registers and/or a thread context.

Article Roadmap
Future posts will be related on topics closely following the items below:

  • Basics
  • Adding/Removing Breakpoints, Single-stepping
  • Call Stack, Registers, Contexts
  • Symbols
  • Miscellaneous Features

The full source code relating to this can be found here. C++11 features were used, so MSVC 2012/2013 is most likely required.

Writing a Primitive Debugger: Part 3 (Call Stack, Registers, Contexts)

December 5th, 2014 No comments

Up to now, all of the functionality discussed in writing a debugger has been related to getting a debugger attached to a process and being able to set breakpoints and perform single stepping. While certainly useful, this functionality is more passive debugging: you can break the state of the process at a certain point and instrument it at the instruction level, but you cannot actually modify any behavior, or even view how the process got to that state. The next core functionality that will be covered will detail actually being able to view and change program execution state (in the form of the thread context, namely registers), and being able to view the thread’s call stack upon hitting a breakpoint.

Thread Contexts

A thread context, as defined relevant to Windows, “includes the thread’s set of machine registers, the kernel stack, a thread environment block, and a user stack in the address space of the thread’s process.” For a usermode debugger, which is what is being developed in these posts, the important parts are the machine registers and the user stack. The thread environment block is also accessible from user-mode but won’t be covered here due to its undocumented and very specific nature. When a process starts up, the loader will set up the processes main thread and begin execution at the entry point. This main thread can in turn launch additional threads, which themselves launch threads, and so on. Each of these threads will have their own context containing the items listed above.

The purpose of these contexts is that Windows, being a preemptive multitasking operating system, can have any [usermode] task, such as a thread executing code, interrupted at any point in time. During these interruptions, a context switch will be carried out, which is simply the process of saving the current execution context and setting the new one to execute. Eventually, when the original task is scheduled to resume, a context switch will again occur back to the context of the original thread and it will continue executing as if nothing had happened. What do these contexts look like? The answer is that it is entirely processor-specific, which shouldn’t be too surprising given that they store registers.

In Windows, the part of the thread context that is available to developers comes defined as a CONTEXT structure in winnt.h. For example, below is a snippet from a CONTEXT structure for x86 processors.

typedef struct _CONTEXT {
    DWORD   Dr0;
    DWORD   Dr1;
    DWORD   Dr2;
    ...
    FLOATING_SAVE_AREA FloatSave;
    DWORD   SegGs;
    DWORD   SegFs;
    ...
    DWORD   Edi;
    DWORD   Esi;
    DWORD   Ebx;
    ...
    DWORD   Ebp;
    DWORD   Eip;
    ...

The x64 version looks pretty closely related, with register widths being extended to 64-bits as well as additional registers and extensions added.

typedef struct DECLSPEC_ALIGN(16) _CONTEXT {
    DWORD ContextFlags;
    DWORD MxCsr;
    ...
    WORD   SegGs;
    WORD   SegSs;
    DWORD EFlags;
    ...
    DWORD64 Rax;
    DWORD64 Rcx;
    DWORD64 Rdx;
    ...
    DWORD64 R13;
    DWORD64 R14;
    DWORD64 R15;
    ...

This is the structure that will be the most useful to inspect and modify when debugging. A debugger should be able to print out this structure and allow for modification of any of its fields. Fortunately, there are two very useful APIs for retrieving and modifying this structure: GetThreadContext and SetThreadContext. These have been covered previously when discussing how to enable single-stepping. The context had to be retrieved and the EFlags registered modified. So what modifications are needed to the existing code/logic in order to add this functionality? It’s as simple as opening a handle to the current executing (or in the debuggers case, broken) thread and retrieving/setting the context.

const CONTEXT Debugger::GetExecutingContext()
{
    CONTEXT ctx = { 0 };
    ctx.ContextFlags = CONTEXT_ALL;
    SafeHandle hThread = OpenCurrentThread();
    if (hThread.IsValid())
    {
        bool bSuccess = BOOLIFY(GetThreadContext(hThread(), &ctx));
        if (!bSuccess)
        {
            fprintf(stderr, "Could not get context for thread %X. Error = %X\n", m_dwExecutingThreadId, GetLastError());
        }
    }
 
    memcpy(&m_lastContext, &ctx, sizeof(CONTEXT));
 
    return ctx;
}
 
const bool Debugger::SetExecutingContext(const CONTEXT &ctx)
{
    bool bSuccess = false;
    SafeHandle hThread = OpenCurrentThread();
    if (hThread.IsValid())
    {
        bSuccess = BOOLIFY(SetThreadContext(hThread(), &ctx));
    }
 
    memcpy(&m_lastContext, &ctx, sizeof(CONTEXT));
 
    return bSuccess;
}

For each access or modification, there is a handle opened (and closed) to the current thread — this certainly isn’t the most efficient approach, but serves well enough for demo purposes.  The state of the context is then stored in m_lastContext. These functions are invoked when the process receives an EXCEPTION_BREAKPOINT and when single stepping the process, i.e. handling the EXCEPTION_SINGLE_STEP exception. Therefore, m_lastContext will always have the appropriate register values in the context structure when a breakpoint is hit or when the user is single stepping. These functions can also be invoked when the user wants to modify a certain register or registers through the debugger interface.  Printing the context involves nothing more than printing out the values in the structure. I’ve chosen to only print out the more commonly used registers for the example code:

void Debugger::PrintContext()
{
#ifdef _M_IX86
    fprintf(stderr, "EAX: %p EBX: %p ECX: %p EDX: %p\n"
        "ESP: %p EBP: %p ESI: %p EDI: %p\n"
        "EIP: %p FLAGS: %X\n",
        m_lastContext.Eax, m_lastContext.Ebx, m_lastContext.Ecx, m_lastContext.Edx,
        m_lastContext.Esp, m_lastContext.Ebp, m_lastContext.Esi, m_lastContext.Edi,
        m_lastContext.Eip, m_lastContext.EFlags);
#elif defined _M_AMD64
    fprintf(stderr, "RAX: %p RBX: %p RCX: %p RDX: %p\n"
        "RSP: %p RBP: %p RSI: %p RDI: %p\n"
        "R8: %p R9: %p R10: %p R11: %p\n"
        "R12: %p R13: %p R14: %p R15: %p\n"
        "RIP: %p FLAGS: %X\n",
        m_lastContext.Rax, m_lastContext.Rbx, m_lastContext.Rcx, m_lastContext.Rdx,
        m_lastContext.Rsp, m_lastContext.Rbp, m_lastContext.Rsi, m_lastContext.Rdi,
        m_lastContext.R8, m_lastContext.R9, m_lastContext.R10, m_lastContext.R11,
        m_lastContext.R12, m_lastContext.R13, m_lastContext.R14, m_lastContext.R15,
        m_lastContext.Rip, m_lastContext.EFlags);
#else
#error "Unsupported architecture"
#endif
}

Call Stacks

At the lowest level, the scope of a function is defined by its stack frame. This is a compiler and/or ABI defined construct for how the state of the function will be layed out. A stack frame typically includes the return address of the caller, any parameters that were passed to the function from the caller, and space for local variables that exist within the scope of the function. For x86 and x64, among other architectures, these stack frames are preceded with a prologue, which is the code responsible for setting up the stack and frame pointers (ESP/EBP or RSP/RBP) from the caller to the callee. Prior to the callee returning, there is an epilogue, which is responsible for returning the stack and frame pointers to that of the caller. For example, consider the following C function:

void TestFunction(int a, int b, int c)
{
    int d = 4, e = 5, f = 6;
}

which was called in the following way

push        3  
push        2  
push        1  
call        TestFunction

Disassembled as x86, this becomes:


push        ebp  
mov         ebp,esp  
sub         esp,0Ch  
mov         dword ptr [ebp-4],4  
mov         dword ptr [ebp-8],5  
mov         dword ptr [ebp-0Ch],6  
mov         esp,ebp  
pop         ebp  
ret         0Ch  

The prologue and epilogue are highlighted in orange. After the execution of the prologue, the stack frame for this function will contain the callers frame pointer in [EBP], the return address at [EBP+4] (because the CALL instruction implicitly pushes the address of the next instruction on the stack before changing execution), and the passed parameters at [EBP+8], [EBP+12], and [EBP+16]. The prologue subtracted 12 from the base of the stack to make room for local variables — the three 32-bit ints declared within the function. These will be at [EBP-4], [EBP-8], and [EBP-12], as can be see in the disassembly.

This setup is pretty convenient because it offers easy distinction between what is a parameter and what is a local variable. Debugging becomes a bit easier since everything is held on the stack and indexed through the frame pointer, rather than scattered around between registers and the stack. This changes a bit as you go from x86 to x64, where x64 will store the first four (or six, depending on your compiler/platform) arguments in registers, and the rest on the stack. This can also change a bit depending on calling conventions and compiler optimizations, especially frame-pointer omission.

Since the stack frame stores the return address of the caller, it is possible to see where the function was called from. That is what the call stack is: a collection of stack frames that represent the call chain in the code leading up to the current stack frame. This information is very useful to have in terms of debugging, because a bug that presented itself in one function may have manifested earlier on in the code. Being able to quickly traverse frames, and see the values within those frames, is an invaluable aid to debugging.

On the Windows platform, there is a convenient function that performs the tedium/annoyance of walking stack frames backwards: StackWalk64. This function is x86 and x64 compatible, but does require some setup prior to being invoked. Given the very machine-specific layout of stack frames, the StackWalk64 function requires filling out a STACKFRAME64 structure, which will be passed to it as an argument. Filling out this structure merely involves setting the instruction, frame, and stack pointers, along with the address modes, which will be flat addressing for the case of modern Windows on x86 and x64. Once this structure is set up, StackWalk64 can be called in a loop to retrieve the frames. Put into code, it looks like the following:

void Debugger::PrintCallStack()
{
    STACKFRAME64 stackFrame = { 0 };
    const DWORD_PTR dwMaxFrames = 50;
    CONTEXT ctx = GetExecutingContext();
 
    stackFrame.AddrPC.Mode = AddrModeFlat;
    stackFrame.AddrFrame.Mode = AddrModeFlat;
    stackFrame.AddrStack.Mode = AddrModeFlat;
 
#ifdef _M_IX86
    DWORD dwMachineType = IMAGE_FILE_MACHINE_I386;
    stackFrame.AddrPC.Offset = ctx.Eip;
    stackFrame.AddrFrame.Offset = ctx.Ebp;
    stackFrame.AddrStack.Offset = ctx.Esp;
#elif defined _M_AMD64
    DWORD dwMachineType = IMAGE_FILE_MACHINE_AMD64;
    stackFrame.AddrPC.Offset = ctx.Rip;
    stackFrame.AddrFrame.Offset = ctx.Rbp;
    stackFrame.AddrStack.Offset = ctx.Rsp;
#else
#error "Unsupported platform"
#endif
 
    SafeHandle hThread = OpenCurrentThread();
    for (int i = 0; i < dwMaxFrames; ++i)
    {
        const bool bSuccess = BOOLIFY(StackWalk64(dwMachineType, m_hProcess(), hThread(), &stackFrame,
            (dwMachineType == IMAGE_FILE_MACHINE_I386 ? nullptr : &ctx), nullptr,
            SymFunctionTableAccess64, SymGetModuleBase64, nullptr));
        if (!bSuccess || stackFrame.AddrPC.Offset == 0)
        {
            fprintf(stderr, "StackWalk64 finished.\n");
            break;
        }
 
        fprintf(stderr, "Frame: %X\n"
            "Execution address: %p\n"
            "Stack address: %p\n"
            "Frame address: %p\n",
            i, stackFrame.AddrPC.Offset,
            stackFrame.AddrStack.Offset, stackFrame.AddrFrame.Offset);
    }
}

Testing the functionality

To test this functionality we can create another demo app that will be used as the debug target. The simple one below is what I used:

#include <cstdio>
 
void d()
{
    printf("d called.\n");
}
 
void c()
{
    printf("c called.\n");
    d();
}
 
void b()
{
    printf("b called.\n");
    c();
}
 
void a()
{
    printf("a called.\n");
    b();
}
 
int main(int argc, char *argv[])
{
    printf("Addresses: \n"
        "a: %p\n"
        "b: %p\n"
        "c: %p\n"
        "d: %p\n",
        a, b, c, d);
 
    getchar();
    while (true)
    {
        a();
        getchar();
    }
 
    return 0;
}

I would recommend disabling incremental linking and ASLR (on the executable, not the system) for convenience sake. Below is the stack trace that Visual Studio produces when a breakpoint is set inside the d function and hit.

Demo.exe!d() Line 5	C++
Demo.exe!c() Line 14	C++
Demo.exe!b() Line 20	C++
Demo.exe!a() Line 26	C++
Demo.exe!main(int argc, char * * argv) Line 41	C++
Demo.exe!__tmainCRTStartup() Line 626	C
Demo.exe!mainCRTStartup() Line 466	C
kernel32.dll!@BaseThreadInitThunk@12()	Unknown
ntdll.dll!___RtlUserThreadStart@8()	Unknown
ntdll.dll!__RtlUserThreadStart@8()	Unknown

Attaching with the debugger also yields 10 frames, as listed below:

a
Target address: 0x4010f0
Received breakpoint at address 004010F0
Press c to continue or s to begin stepping.
l
Frame: 0
Execution address: 004010F0
Stack address: 00000000
Frame address: 0018FBE4
Frame: 1
Execution address: 004010DA
Stack address: 00000000
Frame address: 0018FBE8
Frame: 2
Execution address: 0040108A
Stack address: 00000000
Frame address: 0018FCBC
Frame: 3
Execution address: 0040103A
Stack address: 00000000
Frame address: 0018FD90
Frame: 4
Execution address: 004011C6
Stack address: 00000000
Frame address: 0018FE64
Frame: 5
Execution address: 00401699
Stack address: 00000000
Frame address: 0018FF38
Frame: 6
Execution address: 004017DD
Stack address: 00000000
Frame address: 0018FF88
Frame: 7
Execution address: 75D5338A
Stack address: 00000000
Frame address: 0018FF90
Frame: 8
Execution address: 77339F72
Stack address: 00000000
Frame address: 0018FF9C
Frame: 9
Execution address: 77339F45
Stack address: 00000000
Frame address: 0018FFDC
StackWalk64 finished.

The output is a bit less elegant than the Visual Studio debugger, but it is correct, which is the more important part. It would be nice, however, to put names to some of those addresses. That is where symbol loading and mapping come in, which will be the subject of the next post.

Article Roadmap
Future posts will be related on topics closely following the items below:

  • Basics
  • Adding/Removing Breakpoints, Single-stepping
  • Call Stack, Registers, Contexts
  • Symbols
  • Miscellaneous Features

The full source code relating to this can be found here. C++11 features were used, so MSVC 2012/2013 is most likely required.

Twitter Time

December 4th, 2014 No comments

Because why not… Follow here at your choosing: https://twitter.com/codereversing

Categories: Uncategorized Tags:

Writing a Primitive Debugger: Part 2 (Breakpoints/Stepping)

December 2nd, 2014 No comments

For a debugger to be of any practical use, it needs the ability to pause, inspect, and resume a target process. This post will cover these topics by discussing what breakpoints are, how they are implemented by most debuggers on the x86 and x64 Windows platforms, and how to perform instruction level stepping.

Breakpoints
A breakpoint can simply be defined as a place in a executing code that causes an intentional halt of execution. In the context of debuggers, this is useful because it allows the debugger to inspect the process at that moment in time when nothing is going on — when the process is effectively “broken”. Typical functionality that is seen in debuggers when the target is in a broken state is the ability to view and modify registers/memory, print a disassembly listing of the area surrounding the breakpoint, step into or step over assembly instructions or lines of source code, and other related diagnostic features. Breakpoints themselves can come in a few different varieties.

Hardware Interrupt Breakpoints
Hardware interrupt breakpoints are probably the most common and simple breakpoints to understand and to implement in a debugger. These are specific to the x86 and x64 architectures and are implemented by using a hardware-supported instruction specifically made to generate a software interrupt that is meant for debuggers. The opcode for this instruction is 0xCC and it matches to an INT 3 instruction. The way that most debuggers implement this is to replace the opcode at the desired address — that is, the breakpoint address — with the 0xCC opcode. When the code is called, the interrupt (EXCEPTION_BREAKPOINT) will be raised, which the debugger will handle and give the user the option to perform the debugging functionality mentioned above. When the user is finished inspecting the program state at that address and wishes to continue, the debugger will replace the original instruction back, make sure that the executing address (EIP or RIP registers, depending on the architecture) point to that original address, and continue execution.

However, an interesting problem comes up. When the original instruction is replaced back, the breakpoint will be lost. This may be alright if the breakpoint is meant to be hit only once, but that is very rarely the case. There needs to be a way to re-enable that breakpoint immediately afterwards. This is done by setting the EFlags (or RFlags for x64) register to set the processor into single-step mode. Fortunately, that is pretty easily accomplished by enabling the 8th bit, i.e. performing an OR with 0x100. When the EXCEPTION_BREAKPOINT exception is handled and execution resumes, there will be another exception, EXCEPTION_SINGLE_STEP, thrown on the next instruction executed. At this point, the debugger can re-enable the breakpoint on the previous instruction and resume execution.

On the Windows platform, it is easy to see all of this at work by inspecting the DebugBreak function. This function is meant to trigger a local (from within the same process) breakpoint. Below is the disassembly of the function, which shows it simply being what is described above. Those wondering about the mov edi, edi instruction can read more about hot-patchable images, specifically Raymond Chen’s post about the explanation.

_DebugBreak@0:
752C3C5D 8B FF                mov         edi,edi  
752C3C5F CC                   int         3  
752C3C60 C3                   ret  

Hardware Debug Registers
The second breakpoint implementation technique is also specific to the x86 and x64 architectures and takes advantage of specific debug registers provided by the instruction set. These are the debug registers DR0, DR1, DR2, DR3, and DR7. The first four registers are used to hold the addresses that a hardware breakpoint will break on. Using these means that for the entire program, there can be at most four hardware breakpoints active. The DR7 register is used to control enablement of these registers, with bits 0, 2, 4, 6 corresponding to on/off for DR0, … , DR3. Additionally, bits 16-17, 20-21, 24-25, and 28-29 function as a bitmask for DR0, … , DR3 for when these breakpoints will trigger, with 00 being on execution, 01 on read, and 11 on write.

Setting these breakpoints on the Windows platform is a bit tricky. They must be set on the processes main thread. This involves getting the main thread, opening a handle to it with at least THREAD_GET_CONTEXT and THREAD_SET_CONTEXT privileges, and getting/setting the threads context with GetThreadContext/SetThreadContext with the newly added debug registers to reflect the changes. Take note that no executable code in memory is modified to set these breakpoints. It is not like the previous case where an opcode had to be replaced. These are breakpoints that set and unset by changing the contents of hardware registers. What will happen when these are set is that the process will raise an EXCEPTION_SINGLE_STEP exception upon hitting the instruction at the address, which the debugger will then process in a fashion nearly identical to the way it would in the previous section. Due to the small number limitation, these won’t be presented in the sample breakpoint code in this post, but may eventually be written about in the future for the sake of completeness. I have written about their usage for API hooking in a previous post (excuse the dead links in the beginning). The implementation for a debugger is very close to how it is presented there.

Software Breakpoints
This last class of breakpoints is performed entirely in software and is tied strongly to how the operating system functions. Another name for them is memory breakpoints. They combine some of the best features of interrupt breakpoints, namely the ability to have as many as you’d like, with the best features of hardware breakpoints, which is that nothing in the executing code needs to be overwritten. However, there is a major drawback: they add a significant slowdown to the execution of the code due to their implementation.

Instead of being implemented at the address level, these are implemented at the page level. How these work is that the address where a breakpoint will be set will have its page permissions changed to that of a guard page using VirtualProtectEx. When any instruction on the page will be accessed, there will be an EXCEPTION_GUARD_PAGE exception thrown. The debugger will handle this exception and check if the offending address is that of the breakpoint address. If so, the debugger can perform the usual handling/user prompt as with any other breakpoint. If not then the debugger must perform some extra steps.

According to the documentation, the guard page protection will be removed from the page after it is raised. This means that once the exception is handled and execution continues, any access afterwards will not generate an EXCEPTION_GUARD_PAGE exception. So in the case that the instruction accessed is not the desired breakpoint address, the breakpoint will be lost. To remedy this, the technique similar to the one presented in the hardware interrupt breakpoint section will be used. The processor will enter in to single-step mode and continue execution. On the next instruction, there will be an EXCEPTION_SINGLE_STEP exception raised. This will be handled by the debugger and the guard page property will be re-enabled on the page. This implementation also will not be covered in this post, but may be covered in the future. I have written about it before, also in the context of API hooking, here.

Implementations

As mentioned above, enabling and disabling hardware interrupt breakpoints is simply a matter of overwriting opcodes with 0xCC (INT 3). In Windows, this is accomplished in a pretty straightforward manner with the usage of ReadProcessMemory and WriteProcessMemory.

const bool InterruptBreakpoint::EnableBreakpoint()
{
    SIZE_T ulBytes = 0;
    bool bSuccess = BOOLIFY(ReadProcessMemory(m_hProcess, (LPCVOID)m_dwAddress, &m_originalByte, sizeof(unsigned char), &ulBytes));
    if (bSuccess && ulBytes == sizeof(unsigned char))
    {
        bSuccess = BOOLIFY(WriteProcessMemory(m_hProcess, (LPVOID)m_dwAddress, &m_breakpointOpcode, sizeof(unsigned char), &ulBytes));
        return bSuccess && (ulBytes == sizeof(unsigned char));
    }
    else
    {
        fprintf(stderr, "Could not read from address %p. Error = %X\n", m_dwAddress, GetLastError());
    }
 
    return false;
}

The original byte at the target address is read and stored and then overwritten with 0xCC. Nothing too shocking or out of the ordinary. Disabling the breakpoint is simply done by performing the opposite and writing back the original byte.

const bool InterruptBreakpoint::DisableBreakpoint()
{
    SIZE_T ulBytes = 0;
    const bool bSuccess = BOOLIFY(WriteProcessMemory(m_hProcess, (LPVOID)m_dwAddress, &m_originalByte, sizeof(unsigned char), &ulBytes));
    if (bSuccess && ulBytes == sizeof(unsigned char))
    {
        return true;
    }
    fprintf(stderr, "Could not write back original opcode to address %p. Error = %X\n", m_dwAddress, GetLastError());
 
    return false;
}

Now with the ability to enable/disable breakpoints, it is time to look at the handlers. As mentioned, upon accessing the instruction where the breakpoint resides, an EXCEPTION_BREAKPOINT exception will be raised. There are a few steps here that need to be done:

  1. Check if the breakpoint is in the breakpoint list. This is done because when the debugger first attaches, an EXCEPTION_BREAKPOINT will be raised from the debuggers attaching thread (see previous post). We don’t care about this exception, so just skip it and continue execution
  2. If it is in the breakpoint list, and therefore a breakpoint that the user explicitly set, it needs to be disabled. As mentionted before, this is to allow the original instruction to be executed.
  3. Open a handle to the thread and get the thread context. Change the execution pointer (EIP or RIP) to point to the breakpoint address and also enable single-step mode. Set the thread context to this newly modified context
  4. Save this breakpoint. This is to re-enable it during the EXCEPTION_SINGLE_STEP exception that will be raised immediately after continuing execution.
  5. Prompt the user to continue or perform single-steps. Wait for their response and continue execution afterwards depending on the choice.

Put into code, it looks like the following:

Register(DebugExceptions::eBreakpoint, [&](const DEBUG_EVENT &dbgEvent)
{
    auto &exceptionRecord = dbgEvent.u.Exception.ExceptionRecord;
    const DWORD_PTR dwExceptionAddress = (DWORD_PTR)exceptionRecord.ExceptionAddress;
    fprintf(stderr, "Received breakpoint at address %p.\n", dwExceptionAddress);  
 
    Breakpoint *pBreakpoint = m_pDebugger->FindBreakpoint(dwExceptionAddress);
    if (pBreakpoint != nullptr)
    {
        if (pBreakpoint->Disable())
        {
            CONTEXT ctx = { 0 };
            ctx.ContextFlags = CONTEXT_ALL;
            HANDLE hThread = OpenThread(THREAD_GET_CONTEXT | THREAD_SET_CONTEXT, FALSE, dbgEvent.dwThreadId);
            if (hThread != NULL)
            {
                (void)GetThreadContext(hThread, &ctx);
#ifdef _M_IX86
                ctx.Eip = (DWORD_PTR)dwExceptionAddress;
#elif defined _M_AMD64
                ctx.Rip = (DWORD_PTR)dwExceptionAddress;
#else
#error "Unsupported architecture"
#endif
                ctx.EFlags |= 0x100;
                m_pDebugger->m_pLastBreakpoint = pBreakpoint;
                m_pDebugger->m_dwExecutingThreadId = dbgEvent.dwThreadId;
                (void)SetThreadContext(hThread, &ctx);
 
                fprintf(stderr, "Press c to continue or s to begin stepping.\n");
                (void)m_pDebugger->WaitForContinue();
            }
            else
            {
                fprintf(stderr, "Could not open handle to thread %p. Error = %X\n", dbgEvent.dwThreadId, GetLastError());
            }
        }
        else
        {
            fprintf(stderr, "Could not remove breakpoint at address %p.", dwExceptionAddress);
        }
    }
    SetContinueStatus(DBG_CONTINUE);
});

The handler for EXCEPTION_SINGLE_STEP exceptions serves two purposes: it is there to re-enable breakpoints that were just hit, and it is also there to allow the user to continue single-stepping execution of the program. As a result of this, there needs to be a flag declared that is set when the user wishes to enter single-step mode by themselves (instead of it being set through hitting a breakpoint). If the user is in single-step mode then show them a prompt and wait for a response to continue stepping or to continue execution entirely. Again, put into code, it looks like the following:

Register(DebugExceptions::eSingleStep, [&](const DEBUG_EVENT &dbgEvent)
{
    auto &exceptionRecord = dbgEvent.u.Exception.ExceptionRecord;
    const DWORD_PTR dwExceptionAddress = (DWORD_PTR)exceptionRecord.ExceptionAddress;
    fprintf(stderr, "Received single step at address %p\n", dwExceptionAddress);
    if (m_pDebugger->m_bIsStepping)
    {
        fprintf(stderr, "Press s to continue stepping.\n");
        m_pDebugger->m_dwExecutingThreadId = dbgEvent.dwThreadId;
        (void)m_pDebugger->WaitForContinue();
    }
    if (!m_pDebugger->m_pLastBreakpoint->IsEnabled())
    {
        (void)m_pDebugger->m_pLastBreakpoint->Enable();
    }
 
    SetContinueStatus(DBG_CONTINUE);
});

Debugger in action

With everything in place, the debugger is ready to be tested out. The easiest way is to write a sample program that will be attached to:

#include <stdio.h>
#include <Windows.h>
 
void TestFunction()
{
    printf("Hello, World!\n");
}
 
int main(int argc, char *argv[])
{
    printf("Test function address: %p\n", TestFunction);
    while (true)
    {
        getchar();
        TestFunction();
    }
 
    return 0;
}

On my machine, TestFunction resided at 0x13B1000. After attaching and setting a breakpoint on this address, the debugger was able to successfully step execution of the program or continue entirely.

a
Target address: 0x13B1000
Received breakpoint at address 13B1000.
Press c to continue or s to begin stepping.
s
Received single step at address 13B1001
Press s to continue stepping.
s
Received single step at address 13B1003
Press s to continue stepping.
s
Received single step at address 13B1009
Press s to continue stepping.
s
Received single step at address 13B100A
Press s to continue stepping.
c

The stepped addresses successfully matched up with the disassembly.

013B1000 55                   push        ebp  
013B1001 8B EC                mov         ebp,esp  
013B1003 81 EC C0 00 00 00    sub         esp,0C0h  
013B1009 53                   push        ebx  
013B100A 56                   push        esi  

This was also tested on an x64 version (with a x64 build of the debugger) with equal success.

Article Roadmap
Future posts will be related on topics closely following the items below:

  • Basics
  • Adding/Removing Breakpoints, Single-stepping
  • Call Stack, Registers, Contexts
  • Symbols
  • Miscellaneous Features

The full source code relating to this can be found here. C++11 features were used, so MSVC 2012/2013 is most likely required.