Archive

Archive for July, 2015

Common Types of Disassemblers

July 23rd, 2015 2 comments

The point of a disassembler is to take an input series of bytes and output an architecture-specific interpretation of those bytes. For example, a typical disassembler targeting the x86 architecture will take the following bytes: 55 8B EC B8 FF 00 00 00 33 DB 93, and produce a readable representation of those bytes similar to below:

55                   push        ebp  
8B EC                mov         ebp, esp  
B8 FF 00 00 00       mov         eax, 0FFh  
33 DB                xor         ebx,ebx  
93                   xchg        eax,ebx  

The process involves looking at the opcode(s), getting the instruction length, parsing out extra information in the instruction such as displacements, relative/absolute destinations, register/memory affected, etc. — basically a large amount of lookups and parsing. Fortunately, there are libraries for this. The disassembly engine used in this example will be BeaEngine due to its simplicity. Capstone Engine is also a great engine that supports many architectures, a clean and thread-safe API, and a permissive license among other things. After all of this is implemented, the actual challenge of parsing executable files comes into play. This issue will be the topic of this post.

There are two common ways of disassembling a file: linearly and recursively. In the case of linear disassembly, the disassembler begins reading the first instruction at an address in the binary and continues reading until some termination condition, a termination condition being a set amount of instructions decoded, the end of a block, or an error condition such as an unknown opcode. The code for linear disassembly is straightforward and is shown below. The termination condition in the example code will stop printing when a RET instruction is hit.

DISASM disasm = { 0 };
disasm.EIP = (UIntPtr)pStartingAddress;
 
int iLength = UNKNOWN_OPCODE;
do
{
    iLength = DisasmFnc(&disasm);
    fprintf(stdout, "0x%X -- %s\n",
        disasm.EIP, disasm.CompleteInstr);
 
    disasm.EIP += iLength;
 
} while (!IsRet(disasm.Instruction) && iLength != UNKNOWN_OPCODE);

The “algorithm” is (very) easy to write, and with knowledge into the format of the file being disassembled proves to be pretty reliable. For example, the Portable Executable (PE) format on Windows provides information on all executable sections and their sizes on disk and in memory with alignment. The ELF format on Linux provides the same relevant information. Using this information, a disassembler knows the exact range to disassemble to produce reliable output. The major drawback with this technique is that there is no reliable way to separate useless code from executing code. Any unused code/data inserted intentionally (or not) into the target area to disassemble will be listed. Looking at this in an assembly dump usually sticks out because the instructions will be nonsensical relative to surrounding code. Also any use of instruction interleaving, i.e. a jump into the middle of an instruction — usually for obfuscation purposes — will be missed by the disassembler.

The second type of way to disassemble a file is to do it recursively, that is to say that the disassembler will (try to) follow the control path of the actual program. The involves analyzing the destinations of any control flow instructions: calls, jumps, and returns. For every CALL instruction encountered, the address of the next instruction must be pushed on a stack, and the disassembly continues on at the CALL address. This continues on, recursively if need be for multiple CALLs, until a RET instruction is hit. Once a RET instruction is hit, the top of the call stack is popped off and disassembly continues on from that point. This is pretty much exactly how execution happens in a program. Also, for every unconditional jump instruction, the disassembly merely continues at the target destination. The sample code is a bit more complex, but not by much

DISASM disasm = { 0 };
disasm.EIP = (UIntPtr)pStartingAddress;
 
int iLength = UNKNOWN_OPCODE;
 
do
{
    iLength = DisasmFnc(&disasm);
    fprintf(stdout, "0x%X -- %s\n",
        disasm.EIP, disasm.CompleteInstr);
    if (IsCall(disasm.Instruction))
    {
        m_retStack.push(disasm.EIP + iLength);
        disasm.EIP = ResolveAddress(disasm);
    }
    else if (IsJump(disasm.Instruction))
    {
        disasm.EIP = ResolveAddress(disasm);
    }
    else if (IsRet(disasm.Instruction))
    {
        if (!m_retStack.empty())
        {
            disasm.EIP = m_retStack.top();
            m_retStack.pop();
        }
        else
        {
            break;
        }
    }
    else
    {
        disasm.EIP += iLength;
    }
 
} while (iLength != UNKNOWN_OPCODE);

This technique has its own benefits and drawbacks. The major benefit is that (theoretically) only exectuable code will be disassembled. This means that only relevant and executing code will be shown to the user. Also, the approximate or exact number of instructions to disassemble does not need to be known like in the linear technique. With recursive disassembly, you provide starting set(s) of instructions and then begin tracing control flow into those. Obfuscation techniques such as instruction interleaving will also be discovered. This technique does have a major drawback, however. CALLs or JMPs made indirectly cannot be deciphered. For example, the destinations of instructions such as JMP [ESI+0x4], CALL EBX, CALL [0xAABBCCDD] where 0xAABBCCDD contains an import fixed up at runtime, and so on, cannot be followed with the disassembler. This means that there are a lot of edge cases to consider when encountering instructions such as these in terms of knowing where to go next and making sure that the call stack is consistent.

The sample code provides a trivial implementation of both of these techniques. To see how it performs, there are also two functions provided. TestFunction1 demonstrates how a recursive disassembler follows control flow. Compare the two outputs:
Linear

0x1146670 -- call dword ptr [0114B008h]
0x1146676 -- ret

Recursive

0x1146670 -- call dword ptr [0114B008h]
0x754218E0 -- mov eax, dword ptr fs:[00000018h]
0x754218E6 -- mov eax, dword ptr [eax+24h]
0x754218E9 -- ret
0x1146676 -- ret

The second example, TestFunction2, shows how the recursive disassembler skips over instructions that are not executed.

0x66680 -- push ebp
0x66681 -- mov ebp, esp
0x66683 -- mov eax, 000000FFh
0x66688 -- call 000666AAh
0x6668D -- xor ebx, ebx
0x6668F -- xchg eax, ebx
0x66690 -- jmp 000666B1h
0x66692 -- cmp ecx, AABBCCDDh
0x66698 -- push 00000000h
0x6669A -- push 00000000h
0x6669C -- push 00000000h
0x6669E -- push 00000000h
0x666A0 -- call dword ptr [0006B0A0h]
0x666A6 -- pop ebp
0x666A7 -- mov esp, ebp
0x666A9 -- ret

Overall, each approach has its benefits and drawbacks. With good knowledge of an executable files format, a linear disassembler works perfectly fine for showing a disassembly listing. Typically, disassemblers with a focus on code analysis, i.e. IDA Pro, will use a recursive approach and have a sophisticated analysis engine to complement it.

The Visual Studio 2015 RC project for this example can be found here. The source code is viewable on Github here.

Follow on Twitter for more updates

Categories: General x86, General x86-64, Programming Tags:

Code Snippet: Safe Objects

July 8th, 2015 No comments

I’ve found that one of the annoying things with using the Windows API is that there is (usually) no automatic cleanup of opened handles. For example, most functions that open a handle, i.e. OpenProcess, CreateFile, LoadLibrary, etc., return back an opaque pointer to you that you are required to close when you’re done using it. This act of closing the handle is usually done with the generic CloseHandle function, or with another specific cleanup function mentioned in the documentation, i.e. FreeLibrary for the handle returned by LoadLibrary.

This is the traditional C way of doing things, but I found that it can be improved a bit by using RAII with C++. The idea is to have a wrapper class that contains the underlying handle type and performs a cleanup when when the lifetime of the object is finished. Thus was born the prototype code for a safe object:

namespace AutoClean
{
 
    namespace SafeObjectCleanupFnc
    {
        bool ClnCloseHandle(const HANDLE &handle) { return BOOLIFY(CloseHandle(handle)); };
        bool ClnFreeLibrary(const HMODULE &handle) { return BOOLIFY(FreeLibrary(handle)); };
        bool ClnLocalFree(const HLOCAL &handle) { return (LocalFree(handle) == nullptr); };
        bool ClnGlobalFree(const HGLOBAL &handle) { return (GlobalFree(handle) == nullptr); };
        bool ClnUnmapViewOfFile(const PVOID &handle) { return BOOLIFY(UnmapViewOfFile(handle)); };
        bool ClnCloseDesktop(const HDESK &handle) { return BOOLIFY(CloseDesktop(handle)); };
        bool ClnCloseWindowStation(const HWINSTA &handle) { return BOOLIFY(CloseWindowStation(handle)); };
        bool ClnCloseServiceHandle(const SC_HANDLE &handle) { return BOOLIFY(CloseServiceHandle(handle)); };
        bool ClnVirtualFree(const PVOID &handle) { return BOOLIFY(VirtualFree(handle, 0, MEM_RELEASE)); };
    }
 
    template <typename T, bool (* Cleanup)(const T &), PVOID InvalidValue>
    class SafeObject final
    {
    public:
        SafeObject() : m_obj{ obj }
        {
        }
 
        SafeObject(const SafeObject &copy) = delete;
 
        SafeObject(const T &obj) : m_obj{ obj }
        {
        }
 
        SafeObject(const SafeObject &&obj)
        {
            *this = std::move(obj);
        }
 
        ~SafeObject()
        {
            if (IsValid())
            {
                (void)Cleanup(m_obj);
            }
        }
 
        const bool IsValid() const
        {
            return m_obj != (T)InvalidValue;
        }
 
        SafeObject &operator=(const SafeObject &copy) = delete;
 
        SafeObject &operator=(SafeObject &&obj)
        {
            if (IsValid())
            {
                (void)Cleanup(m_obj);
            }
 
            m_obj = std::move(obj.m_obj);
            obj.m_obj = InvalidValue;
 
            return *this;
        }
 
        T * const Ptr()
        {
            return &m_obj;
        }
 
        const T operator()() const
        {
            return m_obj;
        }
 
    private:
        T m_obj;
    };
 
    using SafeHandle = SafeObject<HANDLE, SafeObjectCleanupFnc::ClnCloseHandle, INVALID_HANDLE_VALUE>;
    using SafeLibrary = SafeObject<HMODULE, SafeObjectCleanupFnc::ClnFreeLibrary, nullptr>;
    using SafeLocal = SafeObject<HLOCAL, SafeObjectCleanupFnc::ClnLocalFree, nullptr>;
    using SafeGlobal = SafeObject<HGLOBAL, SafeObjectCleanupFnc::ClnGlobalFree, nullptr>;
    using SafeMapView = SafeObject<PVOID, SafeObjectCleanupFnc::ClnUnmapViewOfFile, nullptr>;
    using SafeDesktop = SafeObject<HDESK, SafeObjectCleanupFnc::ClnCloseDesktop, nullptr>;
    using SafeWindowStation = SafeObject<HWINSTA, SafeObjectCleanupFnc::ClnCloseWindowStation, nullptr>;
    using SafeService = SafeObject<SC_HANDLE, SafeObjectCleanupFnc::ClnCloseServiceHandle, nullptr>;
    using SafeVirtual = SafeObject<PVOID, SafeObjectCleanupFnc::ClnVirtualFree, nullptr>;
}

This is a basic object that supports assignment and moves. Copying in this example code has been disabled since it introduces a lot of extra bookkeeping, but can be can with the use of DuplicateHandle. A sample usage of this code is shown below:

int main(int argc, char *argv[])
{
    AutoClean::SafeHandle handle1 = CreateFile(L"testfile1.txt", GENERIC_READ, 0, nullptr,
        OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, nullptr);
 
    AutoClean::SafeHandle handle2 = CreateFile(L"testfile2.txt", GENERIC_READ, 0, nullptr,
        OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, nullptr);
 
    fprintf(stderr, "Handle 1: %X\n"
        "Handle 2: %X\n",
        handle1(), handle2());
 
    handle1 = std::move(handle2);
 
    fprintf(stderr, "Handle 1: %X\n"
        "Handle 2: %X\n",
        handle1(), handle2());
 
    return 0;
}

The Visual Studio 2015 RC project for this example can be found here. The source code is viewable on Github here.

Edit: Added some additional features to the example code per requests via Twitter.

Follow on Twitter for more updates.

Categories: General x86, General x86-64, Programming Tags: