RCE Endeavors 😅

May 17, 2015

Nop Hopping: Hiding Functionality in Alignment

Filed under: General x86,General x86-64,Programming — admin @ 1:02 PM

This post will cover the topic of hiding code functionality by taking advantage of compiler alignment. In order to maximize speed of data access, optimizers can try to align loops, function entries, jump destinations, etc., on a native word boundary. One example of this in actual executable code is a large series of NOP bytes after the end of a function. For example, the following was taken from an x64 library:

...
00007FFC676D7207 48 23 4C 24 38       and         rcx,qword ptr [rsp+38h]  
00007FFC676D720C 48 89 4C 24 38       mov         qword ptr [rsp+38h],rcx  
00007FFC676D7211 0F 85 91 A3 02 00    jne         00007FFC677015A8  
00007FFC676D7217 48 8B 5C 24 30       mov         rbx,qword ptr [rsp+30h]  
00007FFC676D721C 48 83 C4 20          add         rsp,20h  
00007FFC676D7220 5F                   pop         rdi  
00007FFC676D7221 C3                   ret  
00007FFC676D7222 90                   nop  
00007FFC676D7223 90                   nop  
00007FFC676D7224 90                   nop  
00007FFC676D7225 90                   nop  
00007FFC676D7226 90                   nop  
...

The NOPs are shown after the RET instruction. The size of these NOP blocks, if present, varies throughout programs. During my experimentation, I found that a majority of them (>95%) were 20 bytes or less. This leaves plenty of room for hiding functionality. One advantage of doing this is that the pages that these NOP blocks are on are already allocated, and they have executable privileges on them since they’re right next to actual executable code. This can enhance stealth since no extra allocations need to be made inside the program. Additionally, since these blocks are all over the program, it is possible to randomly select blocks to write your code in, preventing things such as signature scanning. It’s a rather overall nice technique and one that I used to use to bypass anti-cheat detection systems.

Finding the Regions

These NOP blocks are all over the place; they’re inside the main executable, and in each loaded library. This gives a very large search space. To begin, it is easiest to find and store the base address of the image and every library and its size. These will be the starting points for searching for these NOP blocks. This is done in a straightforward manner with the help of the CreateToolhelp32Snapshot API along with Module32First/Module32Next. These will return the base address of the image and its libraries as well as their sizes in memory.

const ModuleMap GetModules(const DWORD dwProcessId)
{
    ModuleMap mapModules;
 
    const HANDLE hToolhelp32 = CreateToolhelp32Snapshot(TH32CS_SNAPMODULE, dwProcessId);
    MODULEENTRY32 moduleEntry = { 0 };
    moduleEntry.dwSize = sizeof(MODULEENTRY32);
 
    const BOOL bSuccess = Module32First(hToolhelp32, &moduleEntry);
    if (!bSuccess)
    {
        fprintf(stderr, "Could not enumeate modules. Error = %X.\n",
            GetLastError());
        exit(-1);
    }
 
    do
    {
        const DWORD_PTR dwBase = (DWORD_PTR)moduleEntry.modBaseAddr;
        const DWORD_PTR dwEnd = dwBase + moduleEntry.modBaseSize;
 
        mapModules[std::wstring(moduleEntry.szModule)] = std::make_pair(dwBase, dwEnd);
 
    } while (Module32Next(hToolhelp32, &moduleEntry));
 
    CloseHandle(hToolhelp32);
 
    return mapModules;
}

Now that all modules and their sizes are stored, the next step involves enumerating them for the proper pages. These will be committed pages which have executables privileges in combination with either read/write or just read. This involves nothing more than enumerating through every modules and checking its address range with VirtualQueryEx, which will return regions of pages with the same permissions. This permission flag is masked for what is desired.

const ExecutableMap GetExecutableRegions(const HANDLE hProcess, const ModuleMap &mapModules)
{
    ExecutableMap mapExecutableRegions;
    ExecutableRegionsList lstExecutableRegions;
 
    for (auto &module : mapModules)
    {
        MEMORY_BASIC_INFORMATION memBasicInfo = { 0 };
        DWORD_PTR dwBaseAddress = module.second.first;
        const DWORD_PTR dwEndAddress = module.second.second;
 
        while (dwBaseAddress <= dwEndAddress)
        {
            const SIZE_T ulReadSize = VirtualQueryEx(hProcess, (LPCVOID)dwBaseAddress, &memBasicInfo, sizeof(MEMORY_BASIC_INFORMATION));
            if (ulReadSize > 0)
            {
                if ((memBasicInfo.State & MEM_COMMIT) &&
                    ((memBasicInfo.Protect & PAGE_EXECUTE_READWRITE) || (memBasicInfo.Protect & PAGE_EXECUTE_READ)))
                {
                    const DWORD_PTR dwRegionStart = (DWORD_PTR)memBasicInfo.AllocationBase;
                    const DWORD_PTR dwRegionEnd = dwRegionStart + (DWORD_PTR)memBasicInfo.RegionSize;
                    lstExecutableRegions.emplace_back(std::make_pair(dwRegionStart, dwRegionEnd));
                }
                dwBaseAddress += memBasicInfo.RegionSize;
            }
        }
 
        if (lstExecutableRegions.size() > 0)
        {
            mapExecutableRegions[module.first] = lstExecutableRegions;
            lstExecutableRegions.clear();
        }
    }
 
    if (mapExecutableRegions.size() == 0)
    {
        fprintf(stderr, "Could not find any executable regions.\n");
        exit(-1);
    }
 
    return mapExecutableRegions;
}

This filters the original module ranges down and only leaves ranges of pages that are committed and have the PAGE_EXECUTE_READWRITE or PAGE_EXECUTE_READ permission. These will be the ranges that are searched for NOP blocks. Now that the collection is filtered down even further, it is time to find the NOP blocks.

Finding NOP Blocks

Finding the NOP blocks is achieved in multiple steps. Since this code is a process that writes into another process, the first step involves copying over the bytes from the target process. The bytes copied over will be the executable range for the module found in the previous section. This range will then be scanned for NOP bytes, and these ranges stored. The code for this looks like the following:

const NopRangeList FindNopRanges(const HANDLE hProcess, const ExecutableMap &executableRegions, const size_t ulSize)
{
    NopRangeList nopRangeList;
 
    for (auto &executableRegion : executableRegions)
    {
        for (auto &executableAddressRange : executableRegion.second)
        {
            const DWORD_PTR dwLowerAddress = executableAddressRange.first;
            const DWORD_PTR dwHigherAddress = executableAddressRange.second;
            const DWORD_PTR dwRangeSize = dwHigherAddress - dwLowerAddress;
 
            if (dwRangeSize > ulSize)
            {
                std::unique_ptr pLocalBytes(new unsigned char[dwRangeSize]);
                SIZE_T ulBytesRead = 0;
                const bool bSuccess = BOOLIFY(ReadProcessMemory(hProcess, (LPCVOID)dwLowerAddress,
                    pLocalBytes.get(), dwRangeSize, &ulBytesRead));
                if (bSuccess && ulBytesRead == dwRangeSize)
                {
                    const DWORD_PTR dwOffset = dwLowerAddress - (DWORD_PTR)pLocalBytes.get();
 
                    NopRange nopRange = FindNops(pLocalBytes.get(), dwRangeSize, dwOffset);
                    if (nopRange.size() > 0)
                    {
                        nopRangeList.emplace_back(nopRange);
                    }
                }
                else
                {
                    fprintf(stderr, "Could not read from 0x%X. Error = %X\n",
                        executableAddressRange.first, GetLastError());
                }
            }
        }
    }
 
    return nopRangeList;
}

Here the bytes are copied into a local array with ReadProcessMemory. The offset between the address of this local array and the address that was read is calculated. This is needed because the instructions are read into this local array and the addresses are different. When these instructions are later interpreted and checked against NOP (0x90), the address of that NOP will correspond to the local array and not to the target process. Calculating the difference between these two and adding it back later will fix that problem up. At the end of this loop, nopRangeList will contain the NOP ranges for every module in the executable as shown belownop1
The topmost index will be a module and the inner index will hold an address range of NOPs within that module. For example, in the image above, nopRangeList[4][0] = [0x7FFC6701185D – 0x7FFC670118FF] is a range of NOPs in the target process found within kernel32.dll. This function also calls FindNops to do the work; the definition for FindNops is below:

const NopRange FindNops(const unsigned char * const pBytes, const size_t ulSize, const DWORD_PTR dwOffset)
{
    //Find all NOPs in the code
    const InstructionList nopList = GetNopList(pBytes, ulSize, dwOffset);
 
    //Merge continuous NOPs into an address range
    NopRange nopListMerged;
    if (nopList.size() > 1)
    {
        auto firstElem = nopList.begin();
        auto nextElem = ++firstElem;
        --firstElem;
        nopListMerged.push_back(std::make_pair(*firstElem, *firstElem));
 
        while (nextElem != nopList.end())
        {
            if (*nextElem == ((*firstElem) + 1))
            {
                auto elem = nopListMerged.back();
                const DWORD_PTR dwRangeStart = elem.first;
                const DWORD_PTR dwRangeEnd = *nextElem;
                nopListMerged.pop_back();
                nopListMerged.push_back(std::make_pair(dwRangeStart, dwRangeEnd));
            }
            else
            {
                nopListMerged.push_back(std::make_pair(*nextElem, *nextElem));
            }
 
            ++firstElem;
            ++nextElem;
        }
    }
 
    //Toss out address ranges that are too small
    NopRange nopListTrimmed;
    const int iMinNops = 20;
    for (auto &nopRange : nopListMerged)
    {
        const DWORD_PTR dwRangeStart = nopRange.first;
        const DWORD_PTR dwRangeEnd = nopRange.second;
 
        if ((dwRangeEnd - dwRangeStart) > iMinNops)
        {
            nopListTrimmed.push_back(std::make_pair(dwRangeStart, dwRangeEnd));
        }
    }
 
    return nopListTrimmed;
}

This function is responsible for finding the NOP ranges via a call to GetNopList, which returns every instruction that was a NOP in the given range. These returned NOPs will be unmerged, as shown below: nop2
Here you can see continuous addresses (0x…1000, 0x…1001, 0x…1002, …) that contain NOPs. The next loop is responsible for merging these entries into a std::pair range, containing the starting address and ending address of the range. The last loop then filters this even further to only include NOP ranges that are 20 bytes of greater.

GetNopList is implemented with the help of the BeaEngine disassembler.

const InstructionList GetInstructionList(const unsigned char * const pBytes, const size_t ulSize, const DWORD_PTR dwOffset,
    const bool bNopsOnly = false)
{
    InstructionList instructionList;
 
    DISASM disasm = { 0 };
#ifdef _M_IX86
    //Do nothing
#elif defined(_M_AMD64)
    disasm.Archi = 64;
#else
#error "Unsupported architecture"
#endif
 
    disasm.EIP = (UIntPtr)pBytes;
    int iLength = 0;
    int iLengthTotal = 0;
    do
    {
        iLength = DisasmFnc(&disasm);
        if (iLength != UNKNOWN_OPCODE)
        {
            const DWORD_PTR dwInstructionStart = (DWORD_PTR)(disasm.EIP);
            if (bNopsOnly)
            {
                if (disasm.Instruction.Opcode == NOP)
                {
                    instructionList.push_back(dwInstructionStart + dwOffset);
                }
            }
            else
            {
                instructionList.push_back(dwInstructionStart + dwOffset);
            }
 
            iLengthTotal += iLength;
            disasm.EIP += iLength;
        }
        else
        {
            ++iLengthTotal;
            ++disasm.EIP;
        }
    } while (iLengthTotal < ulSize);
 
    return instructionList;
}
 
const InstructionList GetNopList(const unsigned char * const pBytes, const size_t ulSize, const DWORD_PTR dwOffset)
{
    return GetInstructionList(pBytes, ulSize, dwOffset, true);
}

Putting Everything Together

Now the collection has been filtered down even further to only desirable NOP ranges (those >20 bytes). These will be the ones used to write in our instructions. The algorithm for doing this will be as follows:

  1. Select a module at random
  2. Select a NOP range from that module to write into at random that hasn’t been chosen already
  3. Write an instruction to the region
  4. Write an unconditional jump to the next NOP range, which will contain the next instruction and an unconditional jump.
  5. Continue doing steps 3-4 while there are instructions left to write

The code for steps 1-2 is shown below:

InstructionList SelectRegions(const HANDLE hProcess, const NopRangeList &nopRangeList, InstructionList &writeInstructions)
{
    InstructionList writtenList;
 
    auto firstElem = writeInstructions.begin();
    auto nextElem = ++firstElem;
    --firstElem;
    while(nextElem != writeInstructions.end())
    {
        bool bContinueSearching = true;
        do
        {
            const size_t ulCurrentIndexModule = std::rand() % nopRangeList.size();
            const size_t ulCurrentIndexAddressRange = std::rand() % nopRangeList[ulCurrentIndexModule].size();
 
            const DWORD_PTR dwBaseWriteAddress = nopRangeList[ulCurrentIndexModule][ulCurrentIndexAddressRange].first;
            if(std::find(writtenList.begin(), writtenList.end(), dwBaseWriteAddress) == writtenList.end())
            {
                writtenList.push_back(dwBaseWriteAddress);
                bContinueSearching = false;
            }
        } while (bContinueSearching);
 
        ++firstElem;
        ++nextElem;
    }
 
    return writtenList;
}

Here writtenList will contain the addresses in the target process to write instructions to.nop3
The rest of the algorithm involves writing in an instruction and a jump for each instruction that should be written. This is implemented in the WriteJumps function shown below:

const bool WriteJumps(const HANDLE hProcess, const InstructionList &writeInstructions, const InstructionList &selectedRegions)
{
#ifdef _M_IX86
#elif defined (_M_AMD64)
    unsigned char jmpBytes[] =
    {
        0x48, 0xB8, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, /*mov rax, 0xBBBBBBBBBBBBBBBB*/
        0xFF, 0xE0                                                  /*jmp rax*/
    };
#else
#error "Unsupported architecture"
#endif
 
    auto firstElem = selectedRegions.begin();
    auto nextElem = ++firstElem;
    --firstElem;
 
    int i = 0;
    while (nextElem != selectedRegions.end())
    {
        const DWORD_PTR dwInstructionSize = writeInstructions[i + 1] - writeInstructions[i];
        DWORD dwOldProtect = 0;
        bool bSuccess = BOOLIFY(VirtualProtectEx(hProcess, (LPVOID)*firstElem, dwInstructionSize, PAGE_EXECUTE_READWRITE, &dwOldProtect));
        if (bSuccess)
        {
            size_t ulBytesWritten = 0;
            bSuccess = BOOLIFY(WriteProcessMemory(hProcess, (LPVOID)*firstElem, (LPCVOID)writeInstructions[i++], dwInstructionSize,
                &ulBytesWritten));
 
            DWORD_PTR dwNextAddress = *nextElem;
            memcpy(&jmpBytes[2], &dwNextAddress, sizeof(DWORD_PTR));
 
            bSuccess = BOOLIFY(WriteProcessMemory(hProcess, (LPVOID)(*firstElem + dwInstructionSize), jmpBytes, sizeof(jmpBytes),
                &ulBytesWritten));
 
            bSuccess = BOOLIFY(VirtualProtectEx(hProcess, (LPVOID)*firstElem, dwInstructionSize, dwOldProtect, &dwOldProtect));
            if (!bSuccess)
            {
                fprintf(stderr, "Could not put permissions back on address 0x%X. Error = %X\n",
                    *firstElem, GetLastError());
                return false;
            }
 
        }
        else
        {
            fprintf(stderr, "Could not change permissions on address 0x%X. Error = %X\n",
                *firstElem, GetLastError());
            return false;
        }
 
        ++firstElem;
        ++nextElem;
    }
 
    return true;
}

For each region to be written to, this function will begin by changing the page permissions to PAGE_EXECUTE_READWRITE. Then the first WriteProcessMemory call will write the first instruction to the region. Following that, it will write an unconditional jump in the form of mov rax, <address> -> jmp rax. The page permissions will be changed back to what they were and the loop continues until there are no more instructions to write. To begin execution of these bytes, a remote thread can be created with CreateRemoteThread with the base of these instructions as the entry point.

An Example

Here is an example of what writing MessageBoxA(0, 0, 0, 0) into another process looks like. The code to be written looks like the following:

    HMODULE hModule = LoadLibrary(L"user32.dll");
    DWORD_PTR dwTargetAddress = (DWORD_PTR)GetProcAddress(hModule, "MessageBoxA");
 
#ifdef _M_IX86
#elif defined(_M_AMD64)
    DWORD dwHigh = (dwTargetAddress >> 32) & 0xFFFFFFFF;
    DWORD dwLow = (dwTargetAddress) & 0xFFFFFFFF;
 
    unsigned char pBytes[] =
    {
        0x45, 0x33, 0xC9,                               /*xor r9d, r9d*/
        0x45, 0x33, 0xC0,                               /*xor r8d, r8d*/
        0x33, 0xD2,                                     /*xor edx, edx*/
        0x33, 0xC9,                                     /*xor ecx, ecx*/
        0x68, 0x11, 0x11, 0x11, 0x11,                   /*push 0x11111111*/
        0xC7, 0x44, 0x24, 0x04, 0xDD, 0xCC, 0xBB, 0xAA, /*mov [rsp+4], 0AABBCCDD*/
        0xC3,                                           /*ret*/
        0xC3, 0xC3, 0xC3                                /*dummy*/
    };
 
    memcpy(&pBytes[11], &dwLow, sizeof(DWORD));
    memcpy(&pBytes[19], &dwHigh, sizeof(DWORD));

At the end of this code segment, pBytes will contain the bytes of a call to MessageBoxA(0, 0, 0, 0). The target process was a x64 process that I chose, it happened to be the 64-bit version of Dependency Walker (depends.exe) for this example. Here is what it looks like in action. The start address was 0x00007ffc67dcc396. nop4

The first instruction was written with a call to the next NOP range at 0x7FFC6701185D. Then at 0x7FFC6701185Dnop5The next instruction is written. This continues on until the call. nop6

nop7

nop8

nop9

nop10Eventually, when the remote thread runs, the following should appear:nop11Closing the MessageBox will return execution to normal.

Issues

The example code works on x64 only, but can be very easily ported to work on x86. This technique also doesn’t seem to work universally on all executables. For example, trying this on a 64-bit Notepad instance will crash it with the following error: “RangeChecks instrumentation code detected an out of range array access.” This is something that I am currently investigating and will hope to update soon. Edit: As a commentator pointed out (and I have confirmed), this is caused by Control Flow Guard being used for executables on Windows 8.1 and higher. The example code works without issues for x64 on Windows 7.

Get the code

The Visual Studio 2015 RC project for this example can be found here. The source code is viewable on Github here.

Follow on Twitter for more updates.

May 10, 2015

Debugging Injected DLLs

Filed under: General x86,General x86-64,NoCode,Programming — admin @ 11:58 AM

A quick post on how to debug injected DLLs through Visual Studio. This is rather straightforward, but it seems like a fair amount of people are unaware that this can be done. It might possibly because programs typically don’t have DLLs injected to them at runtime, so perhaps people think that debugging them can’t be done in a straightforward way. Fortunately, if you attach to the target process beforehand and inject a DLL, the Visual Studio debugger will detect the loaded DLL and allow for an ordinary debugging experience. The steps are rather simple:

1. Choose to attach to a process through the “Debug” menu in Visual Studio.

dbg1

2. Select the target process from the list and attach.

dbg2

3. Attach to the process and verify that breakpoints can get hit.

dbg3And that’s all there is to it. All of the useful features of the Visual Studio debugger are now available for debugging the injected DLLs.

April 24, 2015

Code Snippets: FindWindowLike

Filed under: General x86,General x86-64,Programming — admin @ 6:06 PM

Like many developers who write code in their free time, I have an overwhelming backlog of side projects. A lot of these projects don’t get finished for the usual variety of reasons: lack of time, loss of interest, another side projects comes up, and so on. As I searched through the folders of these projects, I realized that while I might not be able to complete them now or in the future, that there are certain portions of them that I can carve out and post about. These snippets will typically be short code blocks that I had written to solve a rather particular problem at the time.

The code snippet covered here will be a function I wrote called FindWindowLike. Interestingly enough, while Googling this, there appears to be an MSDN article from 1997 which lists a VB6 function that does the same thing. The one posted here is implemented much differently than that horrible mess though. The purpose of this function is to get a handle to a window while knowing only part of its title. This was useful when I was trying to get a window handle which changed on every instance of the program — for example the process id was part of the title. The code is very straightforward and uses EnumWindows to enumerate all windows on the desktop and perform a substring match.

typedef struct
{
    const TCHAR * const pPartialTitle;
    const bool bCaseSentitive;
    HWND hFoundHandle;
} WindowInformation;
 
BOOL CALLBACK EnumWindowsProc(HWND hWnd, LPARAM lParam)
{
    //Read up to 255 characters of window title
    TCHAR strWindowTitle[255] = { 0 };
 
    auto success = GetWindowText(hWnd, strWindowTitle, sizeof(strWindowTitle));
    if (success > 0)
    {
        WindowInformation *pWindowInfo = (WindowInformation *)lParam;
        auto isFound = pWindowInfo->bCaseSentitive ?
            StrStr(strWindowTitle, pWindowInfo->pPartialTitle) :
            StrStrI(strWindowTitle, pWindowInfo->pPartialTitle);
        if (isFound)
        {
            pWindowInfo->hFoundHandle = hWnd;
            return FALSE;
        }
    }
 
    return TRUE;
}
 
const HWND FindWindowLike(const TCHAR * const pPartialTitle, const bool bCaseSensitive = true)
{
    WindowInformation windowInfo = { pPartialTitle, bCaseSensitive, nullptr };
    (void)EnumWindows(EnumWindowsProc, (LPARAM)&windowInfo);
 
    if (windowInfo.hFoundHandle == nullptr)
    {
        fprintf(stderr, "Could not find window.\n");
    }
 
    return windowInfo.hFoundHandle;
}

The sample code is hosted on GitHub here.

April 13, 2015

Reverse Engineering Vectored Exception Handlers: Implementation (3/3)

Filed under: General x86,General x86-64,Programming,Reverse Engineering — admin @ 8:33 AM

Here an implementation of AddVectoredExceptionHandler as it was reverse engineered.

PVOID RtlAddVectoredExceptionHandler(ULONG FirstHandler, PVECTORED_EXCEPTION_HANDLER VectoredHandler, int Unknown)
{
    PPEB pPeb = GetPEB();
 
    VECTORED_HANDLER_ENTRY *pVecNewEntry =
        (VECTORED_HANDLER_ENTRY *)HeapAlloc((HANDLE)pPeb->ProcessHeap, 0, sizeof(VECTORED_HANDLER_ENTRY));
    if(pVecNewEntry == nullptr)
    {
        return nullptr;
    }
    pVecNewEntry->dwAlwaysOne = 1;
 
    PVOID pEncodedHandler = EncodePointer(VectoredHandler);
    VECTORED_HANDLER_LIST *pVecHandlerBase = (VECTORED_HANDLER_LIST *)(VectorHandlerListBase);
 
    AcquireSRWLockExclusive(&pVecHandlerBase->srwLock);
 
    pVecNewEntry->pVectoredHandler = (PVECTORED_EXCEPTION_HANDLER)pEncodedHandler;
 
    //If the list is empty then set the CrossProcessFlags fields
    if(pVecHandlerBase->pFirstHandler == (VECTORED_HANDLER_ENTRY *)&pVecHandlerBase->pFirstHandler)
    {
        InterlockedBitTestAndSet((LONG *)&pPeb->CrossProcessFlags, 2);
    }
 
    if(FirstHandler)
    {
        //Insert new node at the head of the VEH list
        pVecNewEntry->pNext = pVecHandlerBase->pFirstHandler;
        pVecNewEntry->pPrev = (VECTORED_HANDLER_ENTRY *)&pVecHandlerBase->pFirstHandler;
        pVecHandlerBase->pFirstHandler->pPrev = pVecNewEntry;
        pVecHandlerBase->pFirstHandler = pVecNewEntry;
    }
    else
    {
        //Insert new node at the end of the VEH list
        pVecNewEntry->pNext = (VECTORED_HANDLER_ENTRY *)&pVecHandlerBase->pFirstHandler;
        pVecNewEntry->pPrev = pVecHandlerBase->pLastHandler;
        pVecHandlerBase->pLastHandler->pNext = pVecNewEntry;
        pVecHandlerBase->pLastHandler = pVecNewEntry;
    }
 
    ReleaseSRWLockExclusive(&pVecHandlerBase->srwLock);
 
    return (PVOID)pVecNewEntry;
}

You can download the full Visual Studio 2013 project here. Follow on Twitter for more updates.

April 11, 2015

Reverse Engineering Vectored Exception Handlers: Functionality (2/3)

Filed under: General x86,General x86-64,Programming,Reverse Engineering — admin @ 11:00 AM

This post will continue where the first one left off and explain the operations happening on the doubly linked list of exception handlers. To understand anything in this post, you should read the first one.

Finding the Link Relationships

Given the information from part one, there are two structures at work here: _LdrpVectorHandlerList, which is a non-exported named symbol, and _LdrpVectorHandlerEntry, which is the name given to the struct allocated in _RtlpAddVectoredHandler. Each of these structures has two pointers within them that get moved around.

771E3686 cmp dword ptr [ebp+8],0
771E368A je _RtlpAddVectoredHandler@12+13DF3h (771F7414h)
--------> Jump resolved below
----771F7414 mov eax,dword ptr [edi+4]
----771F7417 mov dword ptr [esi],edi
----771F7419 mov dword ptr [esi+4],eax
----771F741C mov dword ptr [eax],es
----771F741E mov dword ptr [edi+4],esi
----771F7421 jmp _RtlpAddVectoredHandler@12+7Bh (771E369Ch)
771E3690 mov eax,dword ptr [edi]
771E3692 mov dword ptr [esi],eax
771E3694 mov dword ptr [esi+4],edi
771E3697 mov dword ptr [eax+4],esi
771E369A mov dword ptr [edi],esi
vec4

The best way to find out what is happening is to dynamically trace adding exception handlers. For example, what goes on in the code when three exception handlers are added in series?Each one will be added to the head of the list, so that if an exception occurs then the call order will be VectoredHandler3 -> VectoredHandler2 -> VectoredHandler1 -> Unhandled exception. For the case of a handler being inserted at the head of the list, the following instructions will be executed:

771E3690 mov eax,dword ptr [edi]
771E3692 mov dword ptr [esi],eax
771E3694 mov dword ptr [esi+4],edi
771E3697 mov dword ptr [eax+4],esi
771E369A mov dword ptr [edi],esi

The easiest way to see what is going on is to make a table of the runs. Here let X, Y, Z be the different memory addresses of ESI. Let Base be the base address of _LdrpVectorHandlerList, relative to EAX and EDI. I’ve also reproduced the structures and the mappings of registers to fields below.

typedef struct _LdrpVectorHandlerEntry
{
    _LdrpVectorHandlerEntry *pLink1; +0x0 [ESI]
    _LdrpVectorHandlerEntry *pLink2; +0x4 [ESI+0x4]
    DWORD dwAlwaysOne; +0x8
    PVECTORED_EXCEPTION_HANDLER pVectoredHandler; +0xC
} VECTORED_HANDLER_ENTRY, *PVECTORED_HANDLER_ENTRY;

typedef struct _LdrpVectorHandlerList
{
    SRWLOCK srwLock; +0x0
    VECTORED_HANDLER_ENTRY *pLink1; +0x4 [EDI]
    VECTORED_HANDLER_ENTRY *pLink2; +0x8
} VECTORED_HANDLER_LIST, *PVECTORED_HANDLER_LIST; +0xC

First run

[X][X+4][*(Base+4)][Base]
0x772847280x77284728XX

Second run

[Y][Y+4][*(Base+4)][Base]
X0x77284728YY

Third run

[Z][Z+4][*(Base+4)][Base]
Y0x77284728ZZ

Looking at the results of these three adds, you can begin to see a relationship.

[X] = [ESI] Always holds the address of the previous handler

[X+4] = [ESI+0x4] Always holds the address of the base of the table

[*(Base+4)] = [EAX+0x4] Always holds the address of the new handler

[Base] = [EDI] Always holds the address of the new handler

Given that this operation is to insert at the head of the list, it is possible to draw some conclusions. Since [ESI] always contains the address of the previous topmost handler, it can be assumed to be a pointer to the next handler in the chain. [ESI+0x4] can be assumed to be a pointer to the previous handler in the chain, which in the case of inserting a head node, is set as the base of the exception list. Now the struct definition can be completed.

typedef struct _LdrpVectorHandlerEntry { _LdrpVectorHandlerEntry *pNext; _LdrpVectorHandlerEntry *pPrev; DWORD dwAlwaysOne; PVECTORED_EXCEPTION_HANDLER pVectoredHandler; } VECTORED_HANDLER_ENTRY, *PVECTORED_HANDLER_ENTRY;

[EAX+0x4] is a bit more difficult to discern. EAX holds the value of the address of the second field in _LdrpVectorHandlerList. This is dereferenced and the second item in the dereferenced struct is set to the address of the new handler. What is happening here is that the pPrev field of the current topmost handler prior to inserting a new one is set to the address of the new handler, thus keeping the list chain intact. This may not seem obvious from looking at the assembly but is what is occurring when actually stepping through the instructions with a debugger. Lastly, EDI, which is the first member of _LdrpVectorHandlerList is set to hold the address of the new handler.

Now for the other case: inserting at the back of the vectored exception list. In that scenario, the following instructions will be executed:

771F7414 mov eax,dword ptr [edi+4]
771F7417 mov dword ptr [esi],edi
771F7419 mov dword ptr [esi+4],eax
771F741C mov dword ptr [eax],esi
771F741E mov dword ptr [edi+4],esi
771F7421 jmp _RtlpAddVectoredHandler@12+7Bh (771E369Ch)

This is a slight variation on the first case. The best way to see what is going on is to step through the assembly code again. Here X, Y, and Z will map to [ESI] like last time. Here Base will be [EDI+0x4], the third member of _LdrpVectorHandlerList — unlike [EDI] in the previous segment, which was the second member. [Base+0x4] will be [EDI + 0x4].

First run

[X][X+4][Base][*(Base+4)]
0x772847280x77284728XX

Second run

[Y][Y+4][Base][*(Base+4)]
0x77284728XYY

Third run

[Z][Z+4][Base][*(Base+4)]
0x77284728YZZ

Again,

[X] = [ESI]  Always holds the address of the base of the table

[X+4] = [ESI+0x4] Holds the address of the previous handler

[Base] = [EAX] Holds the address to the new handler

[*(Base +4)] = [EDI+0x4] Holds the address of the new handler

Here, the mappings that were established for [X] and [X+4] as pNext and pPrev still make sense. For a node inserted at the back of the exception list, pNext will point to the base of the table (end), and pPrev will point to the address of the previous handler. Here [Base] is the third member of _LdrpVectorHandlerList. Given what is known from the previous run and this one, it is possible to draw a conclusion that the two pointers in _LdrpVectorHandlerList are pointers to the first and last exception handlers. The definition of _LdrpVectorHandlerList can now be completed.

typedef struct _LdrpVectorHandlerList { SRWLOCK srwLock; VECTORED_HANDLER_ENTRY *pFirstHandler; VECTORED_HANDLER_ENTRY *pLastHandler; } VECTORED_HANDLER_LIST, *PVECTORED_HANDLER_LIST;

That wraps up the implementation details of vectored exception handlers. The full C implementation will be provided in the next post. Follow on Twitter for more updates.

Follow me

« Newer PostsOlder Posts »

Powered by WordPress