Archive

Archive for the ‘General x86’ Category

Manually Enumerating Process Modules

August 20th, 2015 No comments

This post will show how to enumerate a processes loaded modules without the use of any direct Windows API call. It will rely on partially undocumented functionality both from the native API and the undocumented structures provided by them. The implementation discussed is actually reasonably close to how EnumProcessModules works.

Undocumented Functions and Structures

The main undocumented function that will be used here is NtQueryInformationProcess, which is a very general function that can return a large variety of information about a process depending on input parameters. It takes in a PROCINFOCLASS as its second parameter, which determines which type of information it is to return. The value of this parameter are largely undocumented, but a complete definition of it can be found here. The parameter of interest here will be ProcessBasicInformation, which fills out a PROCESS_BASIC_INFORMATION structure prior to returning. In code this looks like the following:

PROCESS_BASIC_INFORMATION procBasicInfo = { 0 };
ULONG ulRetLength = 0;
NTSTATUS ntStatus = NtQueryInformationProcess(hProcess,
    PROCESS_INFORMATION_CLASS_FULL::ProcessBasicInformation, &procBasicInfo,
    sizeof(PROCESS_BASIC_INFORMATION), &ulRetLength);
if (ntStatus != STATUS_SUCCESS)
{
    fprintf(stderr, "Could not get process information. Status = %X\n",
        ntStatus);
    exit(-1);
}

This structure, too, is largely undocumented. Its full definition can be found here. The field of interest is the second one, the pointer to the processes PEB. This is a very large structure that is mapped into every process and contains an enormous amount of information about the process. Among the vast amount of information contained within the PEB are the loaded modules lists. The Ldr member in the PEB is a pointer to a PEB_LDR_DATA structure which contains these three lists. These three lists contain the same modules, but ordered differently; either in load order, memory initialization order, or initialization order as their names describe. The list consists of LDR_DATA_TABLE_ENTRY entries that contain extended information about the loaded module.

Retrieving Module Information

The above definitions are all that is needed in order to implement manual module traversal. The general idea is the following:

  1. Open a handle to the target process and obtain the address of its PEB (via NtQuerySystemInformation).
  2. Read the PEB structure from the process (via ReadProcessMemory).
  3. Read the PEB_LDR_DATA from the PEB (via ReadProcessMemory).
  4. Store off the top node and begin traversing the doubly-linked list, reading each node (via ReadProcessMemory).

Writing it in C++ translates to the following:

void EnumerateProcessDlls(const HANDLE hProcess)
{
    PROCESS_BASIC_INFORMATION procBasicInfo = { 0 };
    ULONG ulRetLength = 0;
    NTSTATUS ntStatus = NtQueryInformationProcess(hProcess,
        PROCESS_INFORMATION_CLASS_FULL::ProcessBasicInformation, &procBasicInfo,
        sizeof(PROCESS_BASIC_INFORMATION), &ulRetLength);
    if (ntStatus != STATUS_SUCCESS)
    {
        fprintf(stderr, "Could not get process information. Status = %X\n",
            ntStatus);
        exit(-1);
    }
 
    PEB procPeb = { 0 };
    SIZE_T ulBytesRead = 0;
    bool bRet = BOOLIFY(ReadProcessMemory(hProcess, (LPCVOID)procBasicInfo.PebBaseAddress, &procPeb,
        sizeof(PEB), &ulBytesRead));
    if (!bRet)
    {
        fprintf(stderr, "Failed to read PEB from process. Error = %X\n",
            GetLastError());
        exit(-1);
    }
 
    PEB_LDR_DATA pebLdrData = { 0 };
    bRet = BOOLIFY(ReadProcessMemory(hProcess, (LPCVOID)procPeb.Ldr, &pebLdrData, sizeof(PEB_LDR_DATA),
        &ulBytesRead));
    if (!bRet)
    {
        fprintf(stderr, "Failed to read module list from process. Error = %X\n",
            GetLastError());
        exit(-1);
    }
 
    LIST_ENTRY *pLdrListHead = (LIST_ENTRY *)pebLdrData.InLoadOrderModuleList.Flink;
    LIST_ENTRY *pLdrCurrentNode = pebLdrData.InLoadOrderModuleList.Flink;
    do
    {
        LDR_DATA_TABLE_ENTRY lstEntry = { 0 };
        bRet = BOOLIFY(ReadProcessMemory(hProcess, (LPCVOID)pLdrCurrentNode, &lstEntry,
            sizeof(LDR_DATA_TABLE_ENTRY), &ulBytesRead));
        if (!bRet)
        {
            fprintf(stderr, "Could not read list entry from LDR list. Error = %X\n",
                GetLastError());
            exit(-1);
        }
 
        pLdrCurrentNode = lstEntry.InLoadOrderLinks.Flink;
 
        WCHAR strFullDllName[MAX_PATH] = { 0 };
        WCHAR strBaseDllName[MAX_PATH] = { 0 };
        if (lstEntry.FullDllName.Length > 0)
        {
            bRet = BOOLIFY(ReadProcessMemory(hProcess, (LPCVOID)lstEntry.FullDllName.Buffer, &strFullDllName,
                lstEntry.FullDllName.Length, &ulBytesRead));
            if (bRet)
            {
                wprintf(L"Full Dll Name: %s\n", strFullDllName);
            }
        }
 
        if (lstEntry.BaseDllName.Length > 0)
        {
            bRet = BOOLIFY(ReadProcessMemory(hProcess, (LPCVOID)lstEntry.BaseDllName.Buffer, &strBaseDllName,
                lstEntry.BaseDllName.Length, &ulBytesRead));
            if (bRet)
            {
                wprintf(L"Base Dll Name: %s\n", strBaseDllName);
            }
        }
 
        if (lstEntry.DllBase != nullptr && lstEntry.SizeOfImage != 0)
        {
            wprintf(
                L"  Dll Base: %p\n"
                L"  Entry point: %p\n"
                L"  Size of Image: %X\n",
                lstEntry.DllBase, lstEntry.EntryPoint, lstEntry.SizeOfImage);
        }
 
    } while (pLdrListHead != pLdrCurrentNode);
}

Code

The Visual Studio 2015 project for this example can be found here. The source code is viewable on Github here.
This code was tested on x64 Windows 7, 8.1, and 10.

Follow on Twitter for more updates

Categories: General x86, General x86-64, Programming Tags:

Stealth Techniques: Hiding Files in the Registry

August 12th, 2015 1 comment

This post will cover the topic of a semi-common malware technique: hiding executable data in the Windows registry. This involves writing either part of, or an entire, executable into the registry and loading it to execute later. This technique aims at stealth by not tying the potential malicious functionality to a binary; instead the functionality can be scatted across the Windows registry under many keys, making it harder to detect. The actual data, the executable code that will be loaded in these keys, can be (re)-encoded an arbitrary amount of times to make signature scanning more difficult. A good detection strategy here would be to look for the process loading the data rather than scan the registry itself.

Storing Files in the Registry

The first part involves getting a file into the registry. For this example, an entire file will be split up and written in parts under a single key. In the next sections, this file will be retrieved, combined, and lastly executed in a hollowed process. There are several ways to go about doing this in terms of how the file will actually be stored in the registry. The registry has different value types that can store a variety of types of data, ranging from raw binary data, 32/64-bit values, and strings. For this example, the file will be Base64 encoded and written as string (REG_SZ) values.

Getting data into the registry is pretty straightforward. It involves opening a handle to the key with RegCreateKeyEx, which opens a handle to an existing key or creates a new one, followed by calling RegGetValue and RegSetValueEx to perform reads and writes. The example code below shows these three operations:

const HKEY OpenRegistryKey(const char * const strKeyName, const bool bCreate = true)
{
    HKEY hKey = nullptr;
    DWORD dwResult = 0;
 
    LONG lRet = RegCreateKeyExA(HKEY_CURRENT_USER, strKeyName, 0,
        nullptr, 0, KEY_READ | KEY_WRITE | KEY_CREATE_SUB_KEY,
        nullptr, &hKey, &dwResult);
    if (lRet != ERROR_SUCCESS)
    {
        fprintf(stderr, "Could not create/open registry key. Error = %X\n",
            lRet);
        exit(-1);
    }
 
    if (bCreate && dwResult == REG_CREATED_NEW_KEY)
    {
        fprintf(stdout, "Created new registry key.\n");
    }
    else
    {
        fprintf(stdout, "Opened existing registry key.\n");
    }
 
    return hKey;
}
 
void WriteRegistryKeyString(const HKEY hKey, const char * const strValueName,
    const BYTE *pBytes, const DWORD dwSize)
{
    std::string strEncodedData = base64_encode(pBytes, dwSize);
 
    LONG lRet = RegSetValueExA(hKey, strValueName, 0, REG_SZ, (const BYTE *)strEncodedData.c_str(), strEncodedData.length());
    if (lRet != ERROR_SUCCESS)
    {
        fprintf(stderr, "Could not write registry value. Error = %X\n",
            lRet);
        exit(-1);
    }
}
 
const std::array<BYTE, READ_WRITE_SIZE> ReadRegistryKeyString(const char * const strKeyName,
    const char * const strValueName, bool &bErrorOccured)
{
    DWORD dwType = 0;
    const DWORD dwMaxReadSize = READ_WRITE_SIZE * 2;
    DWORD dwReadSize = dwMaxReadSize;
 
    char strBytesEncoded[READ_WRITE_SIZE * 2] = { 0 };
 
    LONG lRet = RegGetValueA(HKEY_CURRENT_USER, strKeyName, strValueName,
        RRF_RT_REG_SZ, &dwType, strBytesEncoded, &dwReadSize);
 
    std::array<BYTE, READ_WRITE_SIZE> pBytes = { 0 };
    std::string strDecoded = base64_decode(std::string(strBytesEncoded));
    (void)memcpy(pBytes.data(), strDecoded.c_str(), strDecoded.size());
 
    if (lRet != ERROR_SUCCESS)
    {
        fprintf(stderr, "Could not read registry value. Error = %X\n",
            lRet);
        bErrorOccured = true;
    }
    if (dwType != REG_SZ || (dwReadSize == 0 || dwReadSize > dwMaxReadSize))
    {
        fprintf(stderr, "Did not correctly read back a string from the registry.\n");
        bErrorOccured = true;
    }
 
    return pBytes;
}

This is nearly all that is needed to get a file into the registry. There are some additional details, such as splitting the file up into several keys, which won’t be shown in this post to space save (but is available in the sample code). The code using these functions to split and write the file into the registry is shown below:

void WriteFileToRegistry(const char * const pFilePath)
{
    HKEY hKey = OpenRegistryKey("RegistryTest");
 
    std::string strSubName = "Part";
    std::string strSizeName = "Size";
    size_t ulIndex = 1;
 
    auto splitFile = SplitFile(pFilePath);
    for (size_t i = 0; i < splitFile.size(); ++i)
    {
        std::string strFullName(strSubName + std::to_string(ulIndex));
 
        WriteRegistryKeyString(hKey, strFullName.c_str(), splitFile[i].data(), READ_WRITE_SIZE);
        ++ulIndex;
    }
 
    CloseHandle(hKey);
}

The top-level key for the example code is under HKCU\\RegistryTest. The exectuable file will be split up into 2048 byte chunks, base64 encoded, and then written in as values named “Part1”, “Part2”, … “PartN”. An 8KB file written to the registry is shown below after executing this code:

scr1A Base64 decoder can quickly verify that the contents of the keys look correct. Inputting the “Part1” key shows the following output (trimmed), where the PE header can be seen:

MZ[144][0][3][0][0][0][4][0][0][0][255][255][0][0][184][0][0][0][0][0][0][0]@[0][0][0][0][0][0][0]
[0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][240][0][0][0]
[14][31][186][14][0][180][9][205]![184][1]L[205]!This program cannot be run in DOS mode.[13][13]
[10]$[0][0][0][0][0][0][0][181]!:

The file is now in the registry and can be deleted on disk.

Retrieving Files from the Registry

At this point, the file is split up and in the registry. Retrieving the file is nothing more than performing the opposite of the first section: reading the individual parts, base64 decoding them, and combining the data into a single stream. The code for this is shown below:

NewProcessInfo JoinRegistryToFile(const char * const strKeyName, const char * const strValueName)
{
    NewProcessInfo newProcessInfo = { 0 };
    std::vector<std::array<BYTE, READ_WRITE_SIZE>> splitFile;
 
    size_t ulKeyIndex = 1;
    std::string strFullName(strValueName + std::to_string(ulKeyIndex));
 
    bool bErrorOccured = false;
    auto partFile = ReadRegistryKeyString(strKeyName, strFullName.c_str(), bErrorOccured);
 
    while (!bErrorOccured)
    {
        splitFile.push_back(partFile);
 
        ++ulKeyIndex;
        strFullName = strValueName + std::to_string(ulKeyIndex);
 
        partFile = ReadRegistryKeyString(strKeyName, strFullName.c_str(), bErrorOccured);
    }
 
    newProcessInfo.pFileData = std::unique_ptr<BYTE[]>(new BYTE[splitFile.size() * READ_WRITE_SIZE]);
    memset(newProcessInfo.pFileData.get(), 0, splitFile.size() * READ_WRITE_SIZE);
 
    size_t ulWriteIndex = 0;
    for (auto &split : splitFile)
    {
        (void)memcpy(&newProcessInfo.pFileData.get()[ulWriteIndex * READ_WRITE_SIZE], splitFile[ulWriteIndex].data(),
            READ_WRITE_SIZE);
        ++ulWriteIndex;
    }
 
    newProcessInfo.pDosHeader = (IMAGE_DOS_HEADER *)&(newProcessInfo.pFileData.get()[0]);
    newProcessInfo.pNtHeaders = (IMAGE_NT_HEADERS *)&(newProcessInfo.pFileData.get()[newProcessInfo.pDosHeader->e_lfanew]);
 
    return newProcessInfo;
}

Here the ReadRegistryKeyString function, whose definition is in the previous section, is called to retrieve the parts. These individual parts are then combined afterwards and stored in newProcessInfo.pFileData. There are some additional fields initialized, such as the beginning of the PE DOS and NT headers, which will be useful for the next section.

Loading the Retrieved File

At this point, the file has been retrieved from the registry and is stored in a buffer in memory. Writing the contents to disk and launching the process would defeat the point of storing it in the registry in the first place, since the file is back on disk. Instead process hollowing will be employed. A dummy process will be launched in a suspended state and have its memory unmapped. Afterwards, the bytes that were retrieved from the registry will be mapped into this process and the process will begin executing. At the topmost level, the code looks like the following:

void ExecuteFileFromRegistry(const char * const pValueName)
{
    HKEY hKey = OpenRegistryKey("RegistryTest");
 
    auto newProcessInfo = JoinRegistryToFile("RegistryTest", pValueName);
    auto processInfo = MapTargetProcess(newProcessInfo, "DummyProcess.exe");
    RunTargetProcess(newProcessInfo, processInfo);
 
    CloseHandle(hKey);
}

MapTargetProcess and RunTargetProcess won’t be shown here since they are close copies from the 2011 post on the topic that I wrote. A note that I would like to make is that this technique works if the dummy and replacement processes are both x86, and were compiled with DEP/ASLR disabled. A refinement of this technique to support x64 and DEP/ASLR enabled is something that I hope to post soon. A screenshot of the code in action is shown below:scr2
Here DummyProcess.exe (included in the zip) is the process that has been hollowed out and replaced with another process, ReplacementProcess.exe (also included in the zip). The “Sample” folder provided in the zip provides interactive example. To demonstrate, do the following:

  • Run DummyProcess.exe and observe that it is a Win32 UI application.
  • Run write.bat, which calls FilelessLauncher.exe to write¬† ReplacementProcess.exe under HKCU\\RegistryTest
  • Delete ReplacementProcess.exe
  • Run execute.bat, which calls FilelessLauncher.exe to read HKCU\\RegistryTest and retrieve the bytes of ReplacementProcess.exe. It will then unmap DummyProcess.exe and write the bytes for ReplacementProcess.exe in. The process will then resume and a message box will pop up, which is the code for ReplacementProcess.exe

Make sure to clean up the registry afterwards.

Conclusion and Code

The technique presented in this post covered how to perform fileless storage of an executable by placing it in the Windows registry. In terms of counteracting this technique, there are many options. For example, the written code has to be retrieved somehow, which either means that hardcoded values must be present somewhere or that there is configuration somewhere stating how to get the data back from the registry. These are prime for marking as signatures of a malicious executable. Additionally, since process hollowing was employed, those weaknesses apply as well, such as checking the image in memory versus the image on disk and analyzing the differences, of which there will certainly be many. Dynamic analysis also can provide quick answers as to what is going on by monitoring registry APIs as well as checking if NtUnmapViewOfSection is being called, which in itself is a red flag.

The Visual Studio 2015 project for this example can be found here. The source code is viewable on Github here.
This code was tested on x64 Windows 7, 8.1, and 10.

Follow on Twitter for more updates

Categories: General x86, Programming Tags:

Common Types of Disassemblers

July 23rd, 2015 2 comments

The point of a disassembler is to take an input series of bytes and output an architecture-specific interpretation of those bytes. For example, a typical disassembler targeting the x86 architecture will take the following bytes: 55 8B EC B8 FF 00 00 00 33 DB 93, and produce a readable representation of those bytes similar to below:

55                   push        ebp  
8B EC                mov         ebp, esp  
B8 FF 00 00 00       mov         eax, 0FFh  
33 DB                xor         ebx,ebx  
93                   xchg        eax,ebx  

The process involves looking at the opcode(s), getting the instruction length, parsing out extra information in the instruction such as displacements, relative/absolute destinations, register/memory affected, etc. — basically a large amount of lookups and parsing. Fortunately, there are libraries for this. The disassembly engine used in this example will be BeaEngine due to its simplicity. Capstone Engine is also a great engine that supports many architectures, a clean and thread-safe API, and a permissive license among other things. After all of this is implemented, the actual challenge of parsing executable files comes into play. This issue will be the topic of this post.

There are two common ways of disassembling a file: linearly and recursively. In the case of linear disassembly, the disassembler begins reading the first instruction at an address in the binary and continues reading until some termination condition, a termination condition being a set amount of instructions decoded, the end of a block, or an error condition such as an unknown opcode. The code for linear disassembly is straightforward and is shown below. The termination condition in the example code will stop printing when a RET instruction is hit.

DISASM disasm = { 0 };
disasm.EIP = (UIntPtr)pStartingAddress;
 
int iLength = UNKNOWN_OPCODE;
do
{
    iLength = DisasmFnc(&disasm);
    fprintf(stdout, "0x%X -- %s\n",
        disasm.EIP, disasm.CompleteInstr);
 
    disasm.EIP += iLength;
 
} while (!IsRet(disasm.Instruction) && iLength != UNKNOWN_OPCODE);

The “algorithm” is (very) easy to write, and with knowledge into the format of the file being disassembled proves to be pretty reliable. For example, the Portable Executable (PE) format on Windows provides information on all executable sections and their sizes on disk and in memory with alignment. The ELF format on Linux provides the same relevant information. Using this information, a disassembler knows the exact range to disassemble to produce reliable output. The major drawback with this technique is that there is no reliable way to separate useless code from executing code. Any unused code/data inserted intentionally (or not) into the target area to disassemble will be listed. Looking at this in an assembly dump usually sticks out because the instructions will be nonsensical relative to surrounding code. Also any use of instruction interleaving, i.e. a jump into the middle of an instruction — usually for obfuscation purposes — will be missed by the disassembler.

The second type of way to disassemble a file is to do it recursively, that is to say that the disassembler will (try to) follow the control path of the actual program. The involves analyzing the destinations of any control flow instructions: calls, jumps, and returns. For every CALL instruction encountered, the address of the next instruction must be pushed on a stack, and the disassembly continues on at the CALL address. This continues on, recursively if need be for multiple CALLs, until a RET instruction is hit. Once a RET instruction is hit, the top of the call stack is popped off and disassembly continues on from that point. This is pretty much exactly how execution happens in a program. Also, for every unconditional jump instruction, the disassembly merely continues at the target destination. The sample code is a bit more complex, but not by much

DISASM disasm = { 0 };
disasm.EIP = (UIntPtr)pStartingAddress;
 
int iLength = UNKNOWN_OPCODE;
 
do
{
    iLength = DisasmFnc(&disasm);
    fprintf(stdout, "0x%X -- %s\n",
        disasm.EIP, disasm.CompleteInstr);
    if (IsCall(disasm.Instruction))
    {
        m_retStack.push(disasm.EIP + iLength);
        disasm.EIP = ResolveAddress(disasm);
    }
    else if (IsJump(disasm.Instruction))
    {
        disasm.EIP = ResolveAddress(disasm);
    }
    else if (IsRet(disasm.Instruction))
    {
        if (!m_retStack.empty())
        {
            disasm.EIP = m_retStack.top();
            m_retStack.pop();
        }
        else
        {
            break;
        }
    }
    else
    {
        disasm.EIP += iLength;
    }
 
} while (iLength != UNKNOWN_OPCODE);

This technique has its own benefits and drawbacks. The major benefit is that (theoretically) only exectuable code will be disassembled. This means that only relevant and executing code will be shown to the user. Also, the approximate or exact number of instructions to disassemble does not need to be known like in the linear technique. With recursive disassembly, you provide starting set(s) of instructions and then begin tracing control flow into those. Obfuscation techniques such as instruction interleaving will also be discovered. This technique does have a major drawback, however. CALLs or JMPs made indirectly cannot be deciphered. For example, the destinations of instructions such as JMP [ESI+0x4], CALL EBX, CALL [0xAABBCCDD] where 0xAABBCCDD contains an import fixed up at runtime, and so on, cannot be followed with the disassembler. This means that there are a lot of edge cases to consider when encountering instructions such as these in terms of knowing where to go next and making sure that the call stack is consistent.

The sample code provides a trivial implementation of both of these techniques. To see how it performs, there are also two functions provided. TestFunction1 demonstrates how a recursive disassembler follows control flow. Compare the two outputs:
Linear

0x1146670 -- call dword ptr [0114B008h]
0x1146676 -- ret

Recursive

0x1146670 -- call dword ptr [0114B008h]
0x754218E0 -- mov eax, dword ptr fs:[00000018h]
0x754218E6 -- mov eax, dword ptr [eax+24h]
0x754218E9 -- ret
0x1146676 -- ret

The second example, TestFunction2, shows how the recursive disassembler skips over instructions that are not executed.

0x66680 -- push ebp
0x66681 -- mov ebp, esp
0x66683 -- mov eax, 000000FFh
0x66688 -- call 000666AAh
0x6668D -- xor ebx, ebx
0x6668F -- xchg eax, ebx
0x66690 -- jmp 000666B1h
0x66692 -- cmp ecx, AABBCCDDh
0x66698 -- push 00000000h
0x6669A -- push 00000000h
0x6669C -- push 00000000h
0x6669E -- push 00000000h
0x666A0 -- call dword ptr [0006B0A0h]
0x666A6 -- pop ebp
0x666A7 -- mov esp, ebp
0x666A9 -- ret

Overall, each approach has its benefits and drawbacks. With good knowledge of an executable files format, a linear disassembler works perfectly fine for showing a disassembly listing. Typically, disassemblers with a focus on code analysis, i.e. IDA Pro, will use a recursive approach and have a sophisticated analysis engine to complement it.

The Visual Studio 2015 RC project for this example can be found here. The source code is viewable on Github here.

Follow on Twitter for more updates

Categories: General x86, General x86-64, Programming Tags:

Code Snippet: Safe Objects

July 8th, 2015 No comments

I’ve found that one of the annoying things with using the Windows API is that there is (usually) no automatic cleanup of opened handles. For example, most functions that open a handle, i.e. OpenProcess, CreateFile, LoadLibrary, etc., return back an opaque pointer to you that you are required to close when you’re done using it. This act of closing the handle is usually done with the generic CloseHandle function, or with another specific cleanup function mentioned in the documentation, i.e. FreeLibrary for the handle returned by LoadLibrary.

This is the traditional C way of doing things, but I found that it can be improved a bit by using RAII with C++. The idea is to have a wrapper class that contains the underlying handle type and performs a cleanup when when the lifetime of the object is finished. Thus was born the prototype code for a safe object:

namespace AutoClean
{
 
    namespace SafeObjectCleanupFnc
    {
        bool ClnCloseHandle(const HANDLE &handle) { return BOOLIFY(CloseHandle(handle)); };
        bool ClnFreeLibrary(const HMODULE &handle) { return BOOLIFY(FreeLibrary(handle)); };
        bool ClnLocalFree(const HLOCAL &handle) { return (LocalFree(handle) == nullptr); };
        bool ClnGlobalFree(const HGLOBAL &handle) { return (GlobalFree(handle) == nullptr); };
        bool ClnUnmapViewOfFile(const PVOID &handle) { return BOOLIFY(UnmapViewOfFile(handle)); };
        bool ClnCloseDesktop(const HDESK &handle) { return BOOLIFY(CloseDesktop(handle)); };
        bool ClnCloseWindowStation(const HWINSTA &handle) { return BOOLIFY(CloseWindowStation(handle)); };
        bool ClnCloseServiceHandle(const SC_HANDLE &handle) { return BOOLIFY(CloseServiceHandle(handle)); };
        bool ClnVirtualFree(const PVOID &handle) { return BOOLIFY(VirtualFree(handle, 0, MEM_RELEASE)); };
    }
 
    template <typename T, bool (* Cleanup)(const T &), PVOID InvalidValue>
    class SafeObject final
    {
    public:
        SafeObject() : m_obj{ obj }
        {
        }
 
        SafeObject(const SafeObject &copy) = delete;
 
        SafeObject(const T &obj) : m_obj{ obj }
        {
        }
 
        SafeObject(const SafeObject &&obj)
        {
            *this = std::move(obj);
        }
 
        ~SafeObject()
        {
            if (IsValid())
            {
                (void)Cleanup(m_obj);
            }
        }
 
        const bool IsValid() const
        {
            return m_obj != (T)InvalidValue;
        }
 
        SafeObject &operator=(const SafeObject &copy) = delete;
 
        SafeObject &operator=(SafeObject &&obj)
        {
            if (IsValid())
            {
                (void)Cleanup(m_obj);
            }
 
            m_obj = std::move(obj.m_obj);
            obj.m_obj = InvalidValue;
 
            return *this;
        }
 
        T * const Ptr()
        {
            return &m_obj;
        }
 
        const T operator()() const
        {
            return m_obj;
        }
 
    private:
        T m_obj;
    };
 
    using SafeHandle = SafeObject<HANDLE, SafeObjectCleanupFnc::ClnCloseHandle, INVALID_HANDLE_VALUE>;
    using SafeLibrary = SafeObject<HMODULE, SafeObjectCleanupFnc::ClnFreeLibrary, nullptr>;
    using SafeLocal = SafeObject<HLOCAL, SafeObjectCleanupFnc::ClnLocalFree, nullptr>;
    using SafeGlobal = SafeObject<HGLOBAL, SafeObjectCleanupFnc::ClnGlobalFree, nullptr>;
    using SafeMapView = SafeObject<PVOID, SafeObjectCleanupFnc::ClnUnmapViewOfFile, nullptr>;
    using SafeDesktop = SafeObject<HDESK, SafeObjectCleanupFnc::ClnCloseDesktop, nullptr>;
    using SafeWindowStation = SafeObject<HWINSTA, SafeObjectCleanupFnc::ClnCloseWindowStation, nullptr>;
    using SafeService = SafeObject<SC_HANDLE, SafeObjectCleanupFnc::ClnCloseServiceHandle, nullptr>;
    using SafeVirtual = SafeObject<PVOID, SafeObjectCleanupFnc::ClnVirtualFree, nullptr>;
}

This is a basic object that supports assignment and moves. Copying in this example code has been disabled since it introduces a lot of extra bookkeeping, but can be can with the use of DuplicateHandle. A sample usage of this code is shown below:

int main(int argc, char *argv[])
{
    AutoClean::SafeHandle handle1 = CreateFile(L"testfile1.txt", GENERIC_READ, 0, nullptr,
        OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, nullptr);
 
    AutoClean::SafeHandle handle2 = CreateFile(L"testfile2.txt", GENERIC_READ, 0, nullptr,
        OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, nullptr);
 
    fprintf(stderr, "Handle 1: %X\n"
        "Handle 2: %X\n",
        handle1(), handle2());
 
    handle1 = std::move(handle2);
 
    fprintf(stderr, "Handle 1: %X\n"
        "Handle 2: %X\n",
        handle1(), handle2());
 
    return 0;
}

The Visual Studio 2015 RC project for this example can be found here. The source code is viewable on Github here.

Edit: Added some additional features to the example code per requests via Twitter.

Follow on Twitter for more updates.

Categories: General x86, General x86-64, Programming Tags:

Syscall Hooking Under WoW64: Implementation (2/2)

June 15th, 2015 No comments

Welcome to the second and final installment of the series on WoW64 syscall hooking. This part builds on the background provided by the first part and shows what is required to build a working implementation of a syscall hook under WoW64 for x64 Windows 8.1. The code presented here should be trivially portable across different versions of Windows by simply changing syscall numbers, which again can be found on this useful table mentioned in the first part. The code provided will be a basic syscall hook that allows for filtering of one particular syscall. There is not much stopping it from being extended to support hooking an arbitrary amount of syscalls.

Getting Started

From the first part in the series, we see that each thread holds a pointer to a block of code responsible for making the switch from x86 to x64 code execution. This is the pathway that WoW64 uses to perform a syscall, and the pointer to this code resides within the thread local storage region allotted to each thread. Since the value (FS:[0xC0]) in all threads points to the same location, the easiest way to go about hooking syscalls is to replace that jump with an inline hook to our filtering code. This code can then check the syscall number, which is stored in EAX, against the desired syscall to hook. If they match, our hook gets called, otherwise execution will continue on as normal.

The process begins by getting this address. With the aid of inline assembly, this is very straightforward:

const DWORD_PTR __declspec(naked) GetWow64Address()
{
    __asm
    {
        mov eax, dword ptr fs:[0xC0]
        ret
    }
}

The return value of this function, stored in EAX, will hold the address of the block of code responsible for performing the jump to x64 bit mode. As previously mentioned, this address is the one that will be replaced with a jump to the filtering code. This is achieved with a simple overwrite of the instruction bytes at the address:

const void EnableWow64Redirect(const DWORD_PTR dwWow64Address, const LPVOID lpNewJumpLocation)
{
    unsigned char trampolineBytes[] =
    {
        0x68, 0xDD, 0xCC, 0xBB, 0xAA,       /*push 0xAABBCCDD*/
        0xC3,                               /*ret*/
        0xCC, 0xCC, 0xCC                    /*padding*/
    };
    memcpy(&trampolineBytes[1], &lpNewJumpLocation, sizeof(DWORD_PTR));
 
    WriteJump(dwWow64Address, trampolineBytes, sizeof(trampolineBytes));
}
 
const void WriteJump(const DWORD_PTR dwWow64Address, const void *pBuffer, size_t ulSize)
{
    DWORD dwOldProtect = 0;
    (void)VirtualProtect((LPVOID)dwWow64Address, PAGE_SIZE, PAGE_EXECUTE_READWRITE, &dwOldProtect);
    (void)memcpy((void *)dwWow64Address, pBuffer, ulSize);
    (void)VirtualProtect((LPVOID)dwWow64Address, PAGE_SIZE, dwOldProtect, &dwOldProtect);
}

You can see the hook being installed in a before/after form:

syscall1

Above are the bytes of the original code. This carries out the inter-segment jump to x64 bit mode as is standard for all WoW64 calls.

This is the code with the hook is shown below. Here a jump to 0x13411B0 is made, which is the location of the filtering code.syscall2

This filtering code is just an implementation of what was described above, and performs a check of the syscall number against the desired target and acts accordingly

void __declspec(naked) Wow64Trampoline()
{
    __asm
    {
        cmp eax, SYSCALL_INTERCEPT
        jz NtWriteVirtualMemoryHook
        jmp lpJmpRealloc
    }
}

In the case that the syscall number is not the one to filter, execution continues as normal by the jump to lpJmpRealloc, which is just an executable region of memory that contains the bytes of the original inter-segment jump (the jmp 0033:77BE1D84 instruction above). If the syscall number does match, then our hooking function, NtWriteVirtualMemoryHook, gets invoked. The example code just prints a message denoting that the call happened then continues on to performing the syscall. There is also a pushad/popad to properly preserve the stack in the event that another syscall is made from the syscall hook.

void __declspec(naked) NtWriteVirtualMemoryHook()
{
    __asm pushad
 
    fprintf(stderr, "NtWriteVirtualMemory called.\n");
 
    __asm
    {
        popad
        jmp lpJmpRealloc
    }
}

And that is all that there is to it as far as the example code goes. It can quickly be verified by seeing if the hook gets invoked for a call to NtWriteVirtualMemory.

    //Test syscall
    int i = 0x123;
    int j = 0x678;
    SIZE_T ulBytesWritten = 0;
 
    fprintf(stderr, "i = 0x%X\n", i);
    NtWriteVirtualMemory(GetCurrentProcess(), &i, &j, sizeof(int), &ulBytesWritten);
    fprintf(stderr, "i = 0x%X\n", i);

The output below shows everything working successfully

i = 0x123
NtWriteVirtualMemory called.
i = 0x678

Get the Code

The Visual Studio 2015 RC project for this example can be found here. The source code is viewable on Github here.

Follow on Twitter for more updates.

Categories: General x86, General x86-64, Programming Tags: