Archive

Archive for the ‘General x86’ Category

Bypassing Product Key Authentication (1/2)

July 21st, 2017 No comments

This post will cover the topic of product authentication in applications and how it can be bypassed. It aims to serve as a detailed walkthrough of how to locate these functions in a target application and methods in which an application can be modified to allow it to accept invalid product keys. The post will focus on a concrete application and will involve reverse engineering the code which is responsible for performing authentication. At this time, only the calling code will be investigated — this will not be a post about reverse engineering the actual algorithm itself, although that may come at a later date.

Tools

Not much is needed here outside of the standard tools. Below is what was used when creating this post:

  • Cheat Engine for memory scanning
  • x64dbg for dynamic analysis
  • Installer executable (SETUP.EXE) SHA1 Hash: AC9241F632FFB0D845E404FC06C3A204D2EE1B99 that comes with the Age of Mythology CD

The Target

The target for this post will be Age of Mythology. The game requires a valid product key as part of the installation process. The verification is done entirely within the executable itself; there is no online activation required, which would greatly complicate the process.

The installation process requires the input of a valid 25-character product key that is located on the physical CD case. Failure to provide this product key results in an error dialog saying that the product key is invalid and prevents the user from continuing the installation process.

The goal then is to bypass this process and be able to install the game without having a valid product key. This will involve finding the code responsible for calling the authentication function(s), reverse engineering it to understand how it works, and then finding a way to modify it so that it is possible to proceed in the installation process without having a valid key.

Finding the Function

As mentioned above, the natural starting point is to find where the product key is being verified. This can be accomplished in multiple ways, each one having its own benefits and drawbacks. For this example, the approach I took involved finding the key in memory and seeing where it was accessed. This was done by inputting a key into the box and then searching for the string in the process memory using Cheat Engine.

Doing this resulted in one address. Finding out what writes to this address provided additional information to investigate

At this point it is time to attach a debugger and begin stepping through some of this code. Starting at the top of the list of addresses, the ones in the 0x757621… range looked interesting. Given the high address, it can be concluded that these likely reside in a Windows core DLL. Navigating to the first address in the debugger reveals that it is part of the lstrcpyA function in kernel32.dll.

757621B0 | 6A 08                    | push 8
757621B2 | 68 B8 F5 7C 75           | push kernel32.757CF5B8
757621B7 | E8 E0 85 00 00           | call kernel32.7576A79C
757621BC | 83 65 FC 00              | and dword ptr ss:[ebp-4], 0
757621C0 | 8B 55 0C                 | mov edx, dword ptr ss:[ebp+C]
757621C3 | 8B 45 08                 | mov eax, dword ptr ss:[ebp+8]
757621C6 | 8B F0                    | mov esi, eax
757621C8 | 2B F2                    | sub esi, edx
757621CA | 8A 0A                    | mov cl, byte ptr ds:[edx]
757621CC | 88 0C 16                 | mov byte ptr ds:[esi+edx], cl
757621CF | 42                       | inc edx
757621D0 | 84 C9                    | test cl, cl
757621D2 | 75 F6                    | jne kernel32.757621CA
757621D4 | C7 45 FC FE FF FF FF     | mov dword ptr ss:[ebp-4], FFFFFFFE
757621DB | E8 01 86 00 00           | call kernel32.7576A7E1
757621E0 | C2 08 00                 | ret 8

There’s nothing surprising here, the two arguments are passed in [EBP+0x8] and [EBP+0xC]. The contents of the source argument are copied, one byte at a time, into the destination argument in a loop which terminates when a null terminator is found in the source parameter. Setting a breakpoint on this function shows that it is being hit multiple times. It is initially hit five times for the five different parts of the key. Afterwards it is hit with the entire key. For the first five parts, the call stack shows the call coming from the following:

0040CD06 | 50                       | push eax                                         |
0040CD07 | 68 D0 D8 47 00           | push ebud71f.47D8D0                              | 47D8D0:"11111"
0040CD0C | FF D6                    | call esi                                         | esi:lstrcpyA
0040CD0E | 8B 0D A0 CC 47 00        | mov ecx,dword ptr ds:[47CCA0]                    |
0040CD14 | 8B 11                    | mov edx,dword ptr ds:[ecx]                       |
0040CD16 | FF 92 90 00 00 00        | call dword ptr ds:[edx+90]                       |
0040CD1C | 50                       | push eax                                         |
0040CD1D | 68 D4 D9 47 00           | push ebud71f.47D9D4                              | 47D9D4:"22222"
0040CD22 | FF D6                    | call esi                                         | esi:lstrcpyA
0040CD24 | 8B 0D A4 CC 47 00        | mov ecx,dword ptr ds:[47CCA4]                    |
0040CD2A | 8B 01                    | mov eax,dword ptr ds:[ecx]                       |
0040CD2C | FF 90 90 00 00 00        | call dword ptr ds:[eax+90]                       |
0040CD32 | 50                       | push eax                                         |
0040CD33 | 68 D8 DA 47 00           | push ebud71f.47DAD8                              | 47DAD8:"33333"
0040CD38 | FF D6                    | call esi                                         | esi:lstrcpyA
0040CD3A | 8B 0D A8 CC 47 00        | mov ecx,dword ptr ds:[47CCA8]                    |
0040CD40 | 8B 11                    | mov edx,dword ptr ds:[ecx]                       |
0040CD42 | FF 92 90 00 00 00        | call dword ptr ds:[edx+90]                       |
0040CD48 | 50                       | push eax                                         |
0040CD49 | 68 DC DB 47 00           | push ebud71f.47DBDC                              | 47DBDC:"44444"
0040CD4E | FF D6                    | call esi                                         | esi:lstrcpyA
0040CD50 | 8B 0D AC CC 47 00        | mov ecx,dword ptr ds:[47CCAC]                    |
0040CD56 | 8B 01                    | mov eax,dword ptr ds:[ecx]                       |
0040CD58 | FF 90 90 00 00 00        | call dword ptr ds:[eax+90]                       |
0040CD5E | 50                       | push eax                                         |
0040CD5F | 68 E0 DC 47 00           | push ebud71f.47DCE0                              | 47DCE0:"55555"
0040CD64 | FF D6                    | call esi                                         | esi:lstrcpyA

Navigating around the code a bit doesn’t show immediately useful being done with the results of these calls. The next hit gives the following call stack:

Moving down the frame to the return to 0x0040D035 brings us to a rather large function starting at 0x0040CFD0. This will be the target of investigation.

Analyzing the Function

The function starts out with the following code:

0040CFE0 | 8D 44 24 2C                | lea eax,dword ptr ss:[esp+2C]                    | [esp+2C]:lstrcpyA
0040CFE4 | 6A 06                      | push 6                                           |
0040CFE6 | 50                         | push eax                                         |
0040CFE7 | BE 01 00 00 00             | mov esi,1                                        |
0040CFEC | 33 DB                      | xor ebx,ebx                                      |
0040CFEE | 68 39 01 00 00             | push 139                                         |
0040CFF3 | 51                         | push ecx                                         |
0040CFF4 | 89 74 24 20                | mov dword ptr ss:[esp+20],esi                    | [esp+20]:WaitForSingleObject
0040CFF8 | 89 5C 24 30                | mov dword ptr ss:[esp+30],ebx                    |
0040CFFC | 89 5C 24 2C                | mov dword ptr ss:[esp+2C],ebx                    | [esp+2C]:lstrcpyA
0040D000 | 89 5C 24 38                | mov dword ptr ss:[esp+38],ebx                    |
0040D004 | E8 37 3D 00 00             | call ebu9e7d.410D40                              |
0040D009 | A1 04 D1 47 00             | mov dword ptr ds:[47D104]                        |
0040D00E | 8D 54 24 44                | lea edx,dword ptr ss:[esp+44]                    |
0040D012 | 6A 0A                      | push A                                           |
0040D014 | 52                         | push edx                                         |
0040D015 | 68 83 01 00 00             | push 183                                         |
0040D01A | 50                         | push eax                                         |
0040D01B | E8 20 3D 00 00             | call ebu9e7d.410D40                              |

Stepping into the function at 0x004100D40 this shows that it is responsible for loading some resources from the executable.

00410D45 | 55                         | push ebp                                         |
00410D46 | 8B 6C 24 18                | mov ebp,dword ptr ss:[esp+18]                    |
...
00410D4C | 8D 44 2D 00                | lea eax,dword ptr ss:[ebp+ebp]                   |
00410D50 | 33 FF                      | xor edi,edi                                      |
00410D52 | 50                         | push eax                                         |
00410D53 | C6 03 00                   | mov byte ptr ds:[ebx],0                          |
00410D56 | E8 EF 36 04 00             | call ebu9e7d.45444A                              | malloc
00410D5B | 8B F0                      | mov esi,eax                                      |
...
00410D77 | 8B 4C 24 18                | mov ecx,dword ptr ss:[esp+18]                    | [esp+18]:lstrcpyA
00410D7B | 8B 54 24 14                | mov edx,dword ptr ss:[esp+14]                    |
00410D7F | 55                         | push ebp                                         |
00410D80 | 56                         | push esi                                         |
00410D81 | 51                         | push ecx                                         |
00410D82 | 52                         | push edx                                         |
00410D83 | FF 15 5C 64 46 00          | call dword ptr ds:[<&LoadStringA>]               |

The definition of LoadStringA shows that the function takes in the HINSTANCE of the executable to load the resource from, a resource identifier, an output buffer to receive the resource data, and the length of the output buffer. Three of the four parameters are those that are passed into 0x004100D40. The parameter that isn’t passed in is the output buffer parameter. This buffer parameter is created via a call to a malloc wrapper that creates a buffer of (2 * nBufferMax) size. After the resource is successfully loaded (the call succeeds), the loaded resource is copied into the third parameter of 0x004100D40 and is null-terminated.

00410D8D | 8B CD                      | mov ecx, ebp
00410D8F | 8B FB                      | mov edi, ebx
00410D91 | 8B C1                      | mov eax, ecx
00410D93 | C1 E9 02                   | shr ecx, 2
00410D96 | F3 A5                      | rep movsd dword ptr es:[edi], dword ptr ds:[esi]
00410D98 | 8B C8                      | mov ecx, eax
00410D9A | 83 E1 03                   | and ecx, 3
00410D9D | F3 A4                      | rep movsb byte ptr es:[edi], byte ptr ds:[esi]
00410D9F | C6 44 2B FF 00             | mov byte ptr ds:[ebx+ebp-1], 0
00410DA4 | 8A 03                      | mov al, byte ptr ds:[ebx]
00410DA6 | 84 C0                      | test al, al
00410DA8 | 8B F3                      | mov esi, ebx

This is done through some pretty clever assembly involving some shifts and rep moves. The function continues on perform some checks for code pages, presumably because the resource data can have an ANSI code page or DBCS code page. The function wraps up by calling two other functions at 0x00410200 and 0x00454361, which at a quick glance are responsible for verifying some information about the resource. The function wraps up by returning the length of the loaded resource.

Stepping through these calls in the original function at 0x0040D035 reveals that the loaded resources are:

  • “69405” at resource index 0x139
  • “Z09-00001” at resource index 0x183
  • A missing resource at index 0x185

Continuing on, the function then proceeds to try to load a DLL and get the address of a function. The executed code is colored below:

0040D07B | 68 A8 D3 46 00             | push ebu9e7d.46D3A8                              | 46D3A8:"%SETUPEXEDIR"
0040D080 | F3 AB                      | rep stosd dword ptr es:[edi],eax                 | edi:"\\PidGen.dll"
0040D082 | 8D 4C 24 64                | lea ecx,dword ptr ss:[esp+64]                    |
0040D086 | 88 5C 24 44                | mov byte ptr ss:[esp+44],bl                      |
0040D08A | 51                         | push ecx                                         |
0040D08B | C7 84 24 70 02 00 00 00 01 | mov dword ptr ss:[esp+270],100                   |
0040D096 | 66 AB                      | stosw word ptr es:[edi],ax                       | edi:"\\PidGen.dll"
0040D098 | FF D5                      | call ebp                                         | ebp:lstrcpyA
0040D09A | 8D 54 24 60                | lea edx,dword ptr ss:[esp+60]                    |
0040D09E | 68 04 01 00 00             | push 104                                         |
0040D0A3 | 52                         | push edx                                         |
0040D0A4 | E8 57 31 00 00             | call ebu9e7d.410200                              |
0040D0A9 | 8D 44 24 68                | lea eax,dword ptr ss:[esp+68]                    |
0040D0AD | 50                         | push eax                                         |
0040D0AE | E8 FD 4E 00 00             | call ebu9e7d.411FB0                              |
0040D0B3 | 8B 35 20 63 46 00          | mov esi,dword ptr ds:[<&lstrcat>]                |
0040D0B9 | 83 C4 0C                   | add esp,C                                        |
0040D0BC | 8D 4C 24 60                | lea ecx,dword ptr ss:[esp+60]                    |
0040D0C0 | 68 2C D8 46 00             | push ebu9e7d.46D82C                              | 46D82C:"PidGen.dll"
0040D0C5 | 51                         | push ecx                                         |
0040D0C6 | FF D6                      | call esi                                         |
0040D0C8 | 8D 54 24 60                | lea edx,dword ptr ss:[esp+60]                    |
0040D0CC | 52                         | push edx                                         |
0040D0CD | E8 AE 25 00 00             | call ebu9e7d.40F680                              |
0040D0D2 | 83 C4 04                   | add esp,4                                        |
0040D0D5 | 85 C0                      | test eax,eax                                     |
0040D0D7 | 75 46                      | jne ebu9e7d.40D11F                               |
0040D0D9 | 8D 44 24 60                | lea eax,dword ptr ss:[esp+60]                    |
0040D0DD | 50                         | push eax                                         |
0040D0DE | 68 04 01 00 00             | push 104                                         |
0040D0E3 | FF 15 00 63 46 00          | call dword ptr ds:[<&GetTempPathA>]              |
0040D0E9 | 8D 4C 24 60                | lea ecx,dword ptr ss:[esp+60]                    |
0040D0ED | 51                         | push ecx                                         |
0040D0EE | E8 BD 4E 00 00             | call ebu9e7d.411FB0                              |
0040D0F3 | 83 C4 04                   | add esp,4                                        |
0040D0F6 | 8D 54 24 60                | lea edx,dword ptr ss:[esp+60]                    |
0040D0FA | 68 2C D8 46 00             | push ebu9e7d.46D82C                              | 46D82C:"PidGen.dll"
0040D0FF | 52                         | push edx                                         |
0040D100 | FF D6                      | call esi                                         |
0040D102 | 8D 44 24 60                | lea eax,dword ptr ss:[esp+60]                    |
0040D106 | 50                         | push eax                                         |
0040D107 | E8 74 25 00 00             | call ebu9e7d.40F680                              |
0040D10C | 83 C4 04                   | add esp,4                                        |
0040D10F | 85 C0                      | test eax,eax                                     |
0040D111 | 75 0C                      | jne ebu9e7d.40D11F                               |
0040D113 | 8D 4C 24 60                | lea ecx,dword ptr ss:[esp+60]                    |
0040D117 | 68 20 BB 47 00             | push ebu9e7d.47BB20                              |
0040D11C | 51                         | push ecx                                         |
0040D11D | FF D5                      | call ebp                                         | ebp:lstrcpyA
0040D11F | 38 5C 24 60                | cmp byte ptr ss:[esp+60],bl                      |
0040D123 | 0F 84 51 01 00 00          | je ebu9e7d.40D27A                                |
0040D129 | 8D 54 24 60                | lea edx,dword ptr ss:[esp+60]                    |
0040D12D | 52                         | push edx                                         |
0040D12E | FF 15 98 61 46 00          | call dword ptr ds:[<&LoadLibraryA>]              |
0040D134 | 8B F0                      | mov esi,eax                                      |
0040D136 | 3B F3                      | cmp esi,ebx                                      |
0040D138 | 89 74 24 24                | mov dword ptr ss:[esp+24],esi                    |
0040D13C | 0F 84 38 01 00 00          | je ebu9e7d.40D27A                                |
0040D142 | 68 20 D8 46 00             | push ebu9e7d.46D820                              | 46D820:"PIDGenSimpA"
0040D147 | 56                         | push esi                                         |
0040D148 | FF 15 E4 62 46 00          | call dword ptr ds:[4662E4]                       |
0040D14E | 3B C3                      | cmp eax,ebx                                      |

The first part in orange is responsible for building the path string of the PidGen.dll file. There are other functions that are called above which are responsible for making sure that the file exists and that it has the correct file attributes. Once the path string is built, the PidGen.dll file is loaded with a call to LoadLibrary and the address of the PIDGenSimpA function is retrieved via a call to GetProcAddress (teal).

The Verification Call

The call to PIDGenSimpA shows that it takes nine parameters

0040D156 | 8D 4C 24 28                | lea ecx,dword ptr ss:[esp+28]                    |
0040D15A | 8D 54 24 20                | lea edx,dword ptr ss:[esp+20]                    |
0040D15E | 51                         | push ecx                                         |
0040D15F | 8D 8C 24 6C 02 00 00       | lea ecx,dword ptr ss:[esp+26C]                   |
0040D166 | 52                         | push edx                                         |
0040D167 | 51                         | push ecx                                         |
0040D168 | 8B 4C 24 28                | mov ecx,dword ptr ss:[esp+28]                    |
0040D16C | 8D 54 24 4C                | lea edx,dword ptr ss:[esp+4C]                    |
0040D170 | 52                         | push edx                                         |
0040D171 | 51                         | push ecx                                         |
0040D172 | 8D 54 24 28                | lea edx,dword ptr ss:[esp+28]                    |
0040D176 | 8D 4C 24 48                | lea ecx,dword ptr ss:[esp+48]                    |
0040D17A | 52                         | push edx                                         |
0040D17B | 51                         | push ecx                                         |
0040D17C | 8B 8C 24 88 03 00 00       | mov ecx,dword ptr ss:[esp+388]                   |
0040D183 | 8D 54 24 48                | lea edx,dword ptr ss:[esp+48]                    |
0040D187 | 52                         | push edx                                         |
0040D188 | 51                         | push ecx                                         |
0040D189 | FF D0                      | call eax                                         |

At the time of the call, the stack looks like the following:

0496F23C  0496F5E0  "1111122222333334444455555"
0496F240  0496F28C  "69405"
0496F244  0496F294  "Z09-00001"
0496F248  0496F274  
0496F24C  00000000  
0496F250  0496F2A0  
0496F254  0496F4C8  
0496F258  0496F280  
0496F25C  0496F288  

The first three parameters are plainly obvious; they are the input product key, and the two resources that were loaded from the executable earlier in the function. The next six are a bit harder to pin down however. Stepping back through the function and looking at the origin of these addresses shows that:

  • 0x0496F274 comes from the third string resource at index 0x185 that is attempted to be loaded. Since the resource does not exist, this string remains empty.
  • The value of 0 is initialized at the start of the function. It is possible to have it set to 1 if the call to 0x004112D0 returns 1, which it doesn’t in this case.
  • 0x0496F2A0 is not initialized anywhere, so it is likely an optional or output parameter
  • 0x0496F4C8 is written into at 0x0040D08B, which writes in 0x100 (256). Following 0x0496F4C8 in the memory dump shows that it contains the bytes 00 01 00 00, which is the little endian representation of 0x100 (256).
  • 0x0496F280 is not initialized anywhere, so it is likely an optional or output parameter
  • 0x0496F228 is not initialized anywhere, so it is likely an optional or output parameter

Since an invalid key is being entered, it will be useful to study the failure case and see what conditions bring up the “Invalid Product Key” popup. The call to PIDGenSimpA returns with a value of 1 when an invalid product key is entered. The following instructions are then executed:

0040D18B | 3B C3                      | cmp eax,ebx                                      |
0040D18D | 75 64                      | jne ebu9e7d.40D1F3                               |
...
0040D1F3 | 8B 0D 04 D1 47 00          | mov ecx,dword ptr ds:[47D104]                    |
0040D1F9 | 8D 84 24 64 01 00 00       | lea eax,dword ptr ss:[esp+164]                   |
0040D200 | 68 04 01 00 00             | push 104                                         |
0040D205 | 50                         | push eax                                         | eax:"Invalid Product Key"
0040D206 | 68 A3 00 00 00             | push A3                                          |
0040D20B | 51                         | push ecx                                         |
0040D20C | E8 2F 3B 00 00             | call ebu9e7d.410D40                              |
0040D211 | A1 0C D1 47 00             | mov dword ptr ds:[47D10C]                        |
0040D216 | 83 C4 10                   | add esp,10                                       |
0040D219 | 8D 94 24 64 01 00 00       | lea edx,dword ptr ss:[esp+164]                   |
0040D220 | 52                         | push edx                                         | edx:"Invalid Product Key"
0040D221 | 6A 30                      | push 30                                          |
0040D223 | 50                         | push eax                                         |
0040D224 | EB 32                      | jmp ebu9e7d.40D258                               |
...
0040D258 | E8 23 26 00 00             | call ebu9e7d.40F880                              |

The “Invalid Product Key” popup comes after stepping over the call at 0x0040D258, which calls 0x0040F8880. This concludes the analysis of the error-case. The next part will cover what happens in the success case and how to properly get the application to respond to the success case with an invalid product key.

Thanks for reading and follow on Twitter for more updates.

Heap Tracking

December 23rd, 2015 No comments

This post will cover the topic of finding and inspecting differences in a process heap over time. It will cover two techniques: a non-invasive one that iterates and copies heap entries from a separate process, and an invasive one that uses dynamic binary instrumentation to track all heap writes. Heap tracking is useful if you want to monitor large scale changes in an application over time. For example, looking at the state of the heap and potentially what data structures were modified after pressing a button or performing some complex action.

Non-invasive Heap Diffing

The non-invasive technique relies on remotely reading every allocated heap block in a target process and copying the bytes to the inspecting process. Once this iteration is done, a snapshot of the heap will be created and can then be accurately diffed against another snapshot at a later point in time to see how the heap state changed. This traversal is accomplished with the HeapList32First/HeapList32Next and Heap32First/Heap32Next functions from the Toolhelp API. The traversal code is shown below:

const Heap EnumerateProcessHeap(const DWORD processId, const HANDLE processHandle)
{
    HANDLE snapshot = CreateToolhelp32Snapshot(TH32CS_SNAPHEAPLIST, processId);
    if (snapshot == INVALID_HANDLE_VALUE)
    {
        fprintf(stderr, "Could not create toolhelp snapshot. "
            "Error = 0x%X\n", GetLastError());
        exit(-1);
    }
 
    Heap processHeapInfo;
 
    (void)NtSuspendProcess(processHandle);
 
    size_t reserveSize = 4096;
    std::unique_ptr<unsigned char[]> heapBuffer(new unsigned char[reserveSize]);
 
    HEAPLIST32 heapList = { 0 };
    heapList.dwSize = sizeof(HEAPLIST32);
    if (Heap32ListFirst(snapshot, &heapList))
    {
        do
        {
            HEAPENTRY32 heapEntry = { 0 };
            heapEntry.dwSize = sizeof(HEAPENTRY32);
 
            if (Heap32First(&heapEntry, processId, heapList.th32HeapID))
            {
                do
                {
                    if (IsReadable(processHandle, heapEntry.dwAddress, heapEntry.dwSize))
                    {
                        ReadHeapData(processHandle, heapEntry.dwAddress, heapEntry.dwSize,
                            processHeapInfo, heapBuffer, reserveSize);
                    }
 
                    heapEntry.dwSize = sizeof(HEAPENTRY32);
                } while (Heap32Next(&heapEntry));
            }
 
            heapList.dwSize = sizeof(HEAPLIST32);
        } while (Heap32ListNext(snapshot, &heapList));
    }
 
    (void)NtResumeProcess(processHandle);
 
    (void)CloseHandle(snapshot);
 
    return processHeapInfo;
}

For every heap list and subsequent heap entry, the heap block is read and its byte contents stored in an address -> byte pair. The remote read is just a call around ReadProcessMemory

void ReadHeapData(const HANDLE processHandle, const DWORD_PTR heapAddress, const size_t size, Heap &heapInfo,
    std::unique_ptr<unsigned char[]> &heapBuffer, size_t &reserveSize)
{
    if (size > reserveSize)
    {
        heapBuffer = std::unique_ptr<unsigned char[]>(new unsigned char[size]);
        reserveSize = size;
    }
 
    SIZE_T bytesRead = 0;
    const BOOL success = ReadProcessMemory(processHandle, (LPCVOID)heapAddress, heapBuffer.get(), size, &bytesRead);
 
    if (success == 0)
    {
        fprintf(stderr, "Could not read process memory at 0x%p "
            "Error = 0x%X\n", (void *)heapAddress, GetLastError());
        return;
    }
    if (bytesRead != size)
    {
        fprintf(stderr, "Could not read process all memory at 0x%p "
            "Error = 0x%X\n", (void *)heapAddress, GetLastError());
        return;
    }
 
    for (size_t i = 0; i < size; ++i)
    {
        heapInfo.emplace_hint(std::end(heapInfo), std::make_pair((heapAddress + i), heapBuffer[i]));
    }
}

At this point a snapshot of the heap is created. A screenshot of an example run shows the address -> byte pairs below.heapdiff

The next part is to take another snapshot at a later point in time and begin diffing the heaps. Diffing the heaps involves three scenarios: when a heap entry at the same address has changed, when an entry was removed (in first snapshot but not in second), and when a new allocation was made (in second heap snapshot but not in first). The code is pretty straightforward and performs a search and compare in the first heap against the second heap.

const HeapDiff GetHeapDiffs(const Heap &firstHeap, Heap &secondHeap)
{
    HeapDiff heapDiff;
 
    for (auto &heapEntry : firstHeap)
    {
        auto &secondHeapEntry = std::find_if(std::begin(secondHeap), std::end(secondHeap),
            [&](const std::pair<DWORD_PTR, unsigned char> &entry) -> bool
        {
            return entry.first == heapEntry.first;
        });
 
        if (secondHeapEntry != std::end(secondHeap))
        {
            if (heapEntry.second != secondHeapEntry->second)
            {
                //Entries in both heaps but are different
                heapDiff.emplace_hint(std::end(heapDiff),
                    heapEntry.first, std::make_pair(heapEntry.second, secondHeapEntry->second));
            }
            secondHeap.erase(secondHeapEntry);
        }
        else
        {
            //Entries in first heap and not in second heap
            heapDiff.emplace_hint(std::end(heapDiff),
                heapEntry.first, std::make_pair(heapEntry.second, heapEntry.second));
        }
    }
 
    for (auto &newEntries : secondHeap)
    {
        //Entries in second heap and not in first heap
        heapDiff.emplace_hint(std::end(heapDiff),
            newEntries.first, std::make_pair(newEntries.second, newEntries.second));
    }
 
    return heapDiff;
}

A screenshot post-diff is shown below:

heapdiff1

Looking at the above example, you can see that the bytes at heap address 0x003F0200 changed from 0x2B to 0x57, among many others. The last step is to merge contiguous blocks to make things more simple. The code is omitted here, but a final screenshot is shown below showing the final structure of the heap diff.heapdiff2The diff can be inspected for anything deemed interesting and can aid in reverse engineering an application. For example, to see where text is drawn in a text editor, you can write some text in the editor and take a snapshot

heapdiff3Prior to taking a second snapshot, change some of the text around and inspect the heap differences. For this example, some AA‘s were changed to BB.heapdiff4The heap contents beginning at 0x0079201C contained the text and were noted as changing from A -> B. Attaching a debugger and setting a breakpoint on-write at 0x0079201C showed an access from 0x00402CA5, which is a rep movs instruction responsible for copying the text to draw into the buffer.heapdiff5heapdiff6
The usefulness of this technique is obviously predicated on the desired data to reside in the process heap.

Invasive Heap Diffing

The technique described above is useful because it does not disturb the process state, aside from suspending and resuming it. The inspecting process has no access to the address space of the target process and performs all of its actions remotely. This next technique uses Intel’s Pin dynamic binary instrumentation platform to instrument a target process and monitor only heap writes. This means that, unlike the previous technique, the entire state of the heap does not need to be tracked. Pin allows for tracking of memory writes in a process, among many other things. Pin is injected as a DLL into a process, so all code written within it will have access to the process address space. That means that instead of traversing heap lists and heap entries, the HeapWalk function can be used directly to get all valid heap addresses.

In the example, all current heap addresses are kept in a std::set container. These are retrieved when the DLL is loaded in the process and instrumentation beings:

void WalkHeaps(WinApi::HANDLE *heaps, const size_t size)
{
    using namespace WinApi;
 
    fprintf(stderr, "Walking %i heaps.\n", size);
 
    for(size_t i = 0; i < size; ++i)
    {
        if(HeapLock(heaps[i]) == FALSE)
        {
            fprintf(stderr, "Could not lock heap 0x%X"
                "Error = 0x%X\n", heaps[i], GetLastError());
            continue;
        }
 
        PROCESS_HEAP_ENTRY heapEntry = { 0 };
        heapEntry.lpData = NULL;
        while(HeapWalk(heaps[i], &heapEntry) != FALSE)
        {
            for(size_t j = 0; j < heapEntry.cbData; ++j)
            {
                heapAddresses.insert(std::end(heapAddresses),
                    (DWORD_PTR)heapEntry.lpData + j);
            }
        }
 
        fprintf(stderr, "HeapWalk finished with 0x%X\n", GetLastError());
 
        if(HeapUnlock(heaps[i]) == FALSE)
        {
            fprintf(stderr, "Could not unlock heap 0x%X"
                "Error = 0x%X\n", heaps[i], GetLastError());
        }
    }
 
    size_t numHeapAddresses = heapAddresses.size();
    fprintf(stderr, "Found %i (0x%X) heap addresses.\n",
        numHeapAddresses, numHeapAddresses);
 
}

An instrumentation function is then added, which is called on every instruction execution:

INS_AddInstrumentFunction(OnInstruction, 0);

The OnInstruction function checks to see if it is a memory write. If it is then a call to our inspection function is added and subsequently invoked. This function checks if the address that is being written to is in the heap and logs it if that is the case.

VOID OnMemoryWriteBefore(VOID *ip, VOID *addr)
{
    if(IsInHeap(addr))
    {
        fprintf(trace, "Heap entry 0x%p has been modified.\n", addr);
    }
}

Testing this is pretty simple; create a simple application that allocates some data on the heap and performs constant writes to it:

int main(int argc, char *argv[])
{
    int *heapData = new int;
    *heapData = 0;
 
    fprintf(stdout, "Heap address: 0x%p", heapData);
 
    while(true)
    {
        *heapData = (*heapData + 1) % INT_MAX;
        Sleep(500);
    }
 
    return 0;
}

Running the instrumentation against the a compiled version of the code above produces the following output, showing successful instrumentation and heap tracking.heapdiff9

Heap entry 0x007B4B58 has been modified.
Heap entry 0x007B4B6C has been modified.
Heap entry 0x007B4B70 has been modified.
Heap entry 0x007B4B68 has been modified.
Heap entry 0x007B27C8 has been modified.
Heap entry 0x007B27C8 has been modified.
Heap entry 0x007B27C8 has been modified.
Heap entry 0x007B27C8 has been modified.
Heap entry 0x007B27C8 has been modified.
...

The Pin framework provides a lot more functionality than what is covered in the example code provided. The code can further be expanded to disassemble and interpret the writing address and get the current heap value and the value that will be written as in the first example.

Final Notes

This post presented a couple of techniques for finding differences in process heaps. The example code shows basic examples, but has some issues in terms of scaling; a 100MB heap diff takes about 15 minutes with the current implementation due to the large number of lookups. The code should serve as a good starting point to build on if the target application allocates a large amount of heap space.

Code

The Visual Studio 2015 project for this example can be found here. The source code is viewable on Github here. Thanks for reading and follow on Twitter for more updates.

Runtime DirectX Hooking

December 14th, 2015 1 comment

This post will cover the topic of hooking DirectX in a running application. This post will cover DirectX9 specifically, but the general technique applies to any version. A previous and similar post covered virtual table hooking for DirectX10 and DirectX11 (with minor adjustments). Unlike the previous post, this one aims to establish a technique to hook running DirectX applications. This means that it can be installed at any time, unlike the previous technique, which required starting a process in a suspended state and then hooking to get the device pointer.

Motivations

The motivations are similar to the previous post. By hooking the DirectX device, we can inspect or change the properties of rendered scenes (i.e. depth testing, object colors), overlay text or images, better display visual information, or do anything else with the scene. However, to achieve anything beyond the basics, it also takes a lot of effort in reverse engineering the actual application; simply having access to the rendered scene won’t get you too far.maxresdefault

An example of DirectX hooking to make certain models have a bright color, and to allow seeing of depth through objects that obstruct a view.

SC2Console

An example of outputting reverse engineered data from a client and overlaying it as text in the application. This is a pretty awesome project whose description and source code is available here.
Techniques

Typically when hooking DirectX, there are several popular options:

  • Hook IDirect3D9::CreateDevice and store the IDirect3DDevice9 pointer that is initialized when the function returns successfully. This needs to be done when the process is started in a suspended state, otherwise the device will have already been initialized.
  • Perform a byte pattern scan in memory for the signature of IDirect3DDevice9::EndScene, or any other DirectX function.
  • Create a dummy IDirect3DDevice9 instance, read its virtual table, find the address of EndScene, and hook at the target site.
  • Look for the CD3DBase::EndScene symbol in d3d9.dll and get its address.

Each one has its drawbacks, but my personal preference is the last option. It’s the one that offers the greatest reliability for the least amount of overhead code. The code for it is pretty straightforward, with the help of the Windows debugging APIs:

const DWORD_PTR GetAddressFromSymbols()
{
    BOOL success = SymInitialize(GetCurrentProcess(), nullptr, true);
    if (!success)
    {
        fprintf(stderr, "Could not load symbols for process.\n");
        return 0;
    }
 
    SYMBOL_INFO symInfo = { 0 };
    symInfo.SizeOfStruct = sizeof(SYMBOL_INFO);
 
    success = SymFromName(GetCurrentProcess(), "d3d9!CD3DBase::EndScene", &symInfo);
    if (!success)
    {
        fprintf(stderr, "Could not get symbol address.\n");
        return 0;
    }
 
    return (DWORD_PTR)symInfo.Address;
}

Once the address is retrieved, it’s simply a matter of installing the hook and writing code in the new hook function. The Hekate engine was used for hook installation/removal, making the code simple:

const bool Hook(const DWORD_PTR address, const DWORD_PTR hookAddress)
{
    pHook = std::unique_ptr<Hekate::Hook::InlineHook>(new Hekate::Hook::InlineHook(address, hookAddress));
 
    if (!pHook->Install())
    {
        fprintf(stderr, "Could not hook address 0x%X -> 0x%X\n", address, hookAddress);
    }
 
    return pHook->IsHooked();
}

The EndScene function was chosen specifically due to how DirectX9 applications are developed. For those unfamiliar with DirectX, the flow of rendering a scene generally goes as follows: BeginScene -> Draw the scene -> EndScene -> Present. Other DirectX9 hook implementations hook Present instead of EndScene, it becomes a matter of preference unless the target application does something special. In the example application, some text is overlaid on top of the scene:

HRESULT WINAPI EndSceneHook(void *pDevicePtr)
{
    using pFncOriginalEndScene = HRESULT (WINAPI *)(void *pDevicePtr);
    pFncOriginalEndScene EndSceneTrampoline =
        (pFncOriginalEndScene)pHook->TrampolineAddress();
 
    IDirect3DDevice9 *pDevice = (IDirect3DDevice9 *)pDevicePtr;
    ID3DXFont *pFont = nullptr;
 
    HRESULT result = D3DXCreateFont(pDevice, 30, 0, FW_NORMAL, 1, false,
        DEFAULT_CHARSET, OUT_DEFAULT_PRECIS, ANTIALIASED_QUALITY,
        DEFAULT_PITCH | FF_DONTCARE, L"Consolas", &pFont);
    if (FAILED(result))
    {
        fprintf(stderr, "Could not create font. Error = 0x%X\n", result);
    }
    else
    {
        RECT rect = { 0 };
        (void)SetRect(&rect, 0, 0, 300, 100);
        int height = pFont->DrawText(nullptr, L"Hello, World!", -1, &rect,
            DT_LEFT | DT_NOCLIP, -1);
        if (height == 0)
        {
            fprintf(stderr, "Could not draw text.\n");
        }
        (void)pFont->Release();
    }
 
    return EndSceneTrampoline(pDevicePtr);
}

Building as a DLL and injecting into the running application should show the text overlay (below):

sampleimgdx9

Hekate supports clean unhooking, so unloading the DLL should remove the text and let the application continue undisturbed.

Code

The Visual Studio 2015 project for this example can be found here. The source code is viewable on Github here. The Hekate static library dependency is included in a separate download here and goes into the DirectXHook/lib folder. Capstone Engine is used as a runtime dependency, so capstone_x86.dll/capstone_x64.dll in DirectXHook/thirdparty/capstone/lib should be put in the same directory that the target application is running from.

Thanks for reading and follow on Twitter for more updates

Hekate: x86/x64 Winsock Inspection/Modification (Alpha dev release)

September 9th, 2015 2 comments

Introduction

This post will cover Hekate, a C++ library for interacting with Winsock traffic occurring in a remote process. The purpose of the library is to provide an easy to use interface that allows for inspection, filtering, and modification of any Winsock traffic entering or leaving a target process. Hekate aims to simplify targeted collection of data, aide in reverse engineering protocols, and potentially provide basic security auditing by letting developers fuzz, modify, or replay data being sent to their process.

What it is

Hekate comes provided as a set of components that come together to hook and exfiltrate Winsock data. The final build of the project is a DLL that is injected into the target process. In the project are some of the following:

  1. A generic thread-safe x86/x64 inline hooking engine powered by Capstone Engine, usable for any function hooking (not just Winsock)
  2. IPC based on named pipes to allow sending data to a remote listening process
  3. Winsock specific hooks responsible for matching parameters against filters and taking appropriate action
  4. Several example projects showing the hook, filter, and modify functionality provided by the library
  5. RAII wrappers around Capstone Engine and Windows API objects that automatically clean up upon the resources no longer being needed

These components are combined into the Hekate “app”, which is responsible for handling incoming commands that clients issue and sending captured data out to them.

Architecture

The injected Hekate.dll functions as a server that listens for a client connection to send data out to (once established). The protocols that are communicated between client and server are written to utilize Protocol Buffers and are available in the .proto files contained in the source code. There are eight Winsock functions that are currently being monitored: send/sendto/WSASend/WSASendTo/recv/recvfrom/WSARecv/WSARecvFrom as provided by the Winsock API. For each of these functions, there is a corresponding protobuf message that will copy the parameters, serialize the message, and send it out to a client.  These messages can be found in the HekateServerProto.proto file:

message SendMessage_
{
	required int64 socket = 1;
	required bytes buffer = 2;
	required int32 length = 3;
	required int32 flags = 4;
}
...
message WSARecvFromMessage
{
	required int64 socket = 1;
	repeated int64 buffers = 2;
	repeated int32 buffer_size = 3;
	required int32 count = 4;
	required int64 bytes_received_address = 5;
	required int64 flags_address = 6;
	required int64 from_address = 7;
	required int64 from_length_address = 8;
	required int64 overlapped_address = 9;
	required int64 overlapped_routine_address = 10;
}

A client is responsible for receiving and deserializing these messages. A client is also responsible for issuing commands to the server. At this current dev release, the following commands are supported:

  • Add/Remove a hook on a Winsock function
  • Add/Remove a filter for Winsock data.
  • Pause/Continue execution on filter hit
  • Replay captured data

The client is able to send these commands out immediately after connection to the server is established. The received commands will be processed synchronously to when they are received. The client protocol also provides a debug acknowledge flag that the server will echo back upon a successful receipt of the message (for testing). Additionally, there is copious logging provided throughout the code to notify developers of any potential errors that might have occurred at any stage of usage. The full client-side protocol definition can be found in the HekateClientProto.proto file.

Internals: The Receive & Dispatch Loop

On startup, Hekate initializes two named pipes: \\.\pipe\HekatePipeOutbound and \\.\pipe\HekatePipeInbound. Outgoing messages will be sent on HekatePipeOutbound, and incoming commands will be listened to on HekatePipeInbound. The server will wait for connections on both of these pipes and then spawn a new thread to listen for messages from the client. The incoming/outgoing message format is currently broken into two parts: the first message being a 4-byte size, with the second being the serialized protobuf message of that size. Upon receipt by the server, the message is deserialized and passed to a callback provided by the app. This callback is responsible for parsing and dispatching the message. The flow of incoming messages is IPCNamedPipe::RecvLoop -> HekateMain::RecvCallback -> IMessageHandler::Parse -> HekateMessageHandler::On{Command}Message -> HekateMain::{Command}.

A client talking to the Hekate server mimics this communication behavior closely. A client must open \\.\pipe\HekatePipeOutbound with generic read access and \\.\pipe\HekatePipeInbound with generic write access. Upon the pipe connection being established, the client is free to begin sending commands and listening for responses using the {size} -> {message} scheme described above.

Dynamically Add and Remove Hooks

As mentioned above, Hekate comes with a generic x86/x64 inline hooking engine. On startup, Hekate will locate the target Winsock functions mentioned earlier. Once these are located, the dynamic hooking process can be carried out. When installing a hook, Hekate will disassemble the target function in order to find the appropriate amount of space needed. These instructions will then be relocated to a newly allocated region of memory and have a jump back to the original function. A comprehensive example is shown below:

h0

The original bytes of a send function.

h1

The appropriate amount of space has been calculated for an x86 hook. The bytes have been replaced with a push <target address> -> ret style detour. Extra bytes are padded with int 3 (breakpoint) instructions.

h11The hook function at 0x30BA140 is now being invoked instead when send is being called.

h2

At the end of the hook, the hook will call the relocated bytes, located here at 0x620000. This contains the original bytes that were relocated and a jump back to immediately after the hook in the send function.

This same exact technique is performed for x64 code as well. When a hook is removed, these relocated bytes are written back to the address of the original function and the memory holding the relocated instructions is freed. In an attempt to ensure safe installation and removal, Hekate will suspend all threads (except its own), write the instructions to the process memory, flush the instruction cache, then resume the process threads. An important note is that no hooking takes place on startup or at any point without an explicit command from the client. If the Hekate server DLL is injected into a target, then all it will do is listen for connections; nothing in the original target process will be modified.

Internals: Adding a Hook

The Hekate client protocol describes an easy way to add/remove hooks. Clients simply need to send a message to the server with the name of the desired function to hook/unhook, i.e. “send“, “WSARecv“, and so on. These messages (and more) are in HekateClientProto.proto

message AddHookMessage
{
	required string name = 1;
}

message RemoveHookMessage
{
	required string name = 1;
}

The flow follows the read/dispatch loop until it reaches HekateMain::AddHook, which is responsible for installing the hook and reporting success/failure. The full flow of the code is HekateMain::AddHook -> HookEngine:Add -> HookBase::Install -> InlineHook::InstallImpl -> InlinePlatformHook::Hook -> followed by platform specific calls to InlinePlatformHook_x86::HookImpl/InlinePlatformHook_x64::HookImpl depending on the build. Removing a hook follows a similar path through the files, although obviously calling Unhook/Remove functions instead.

Add and Remove Filters

Hekate allows for filtering of incoming and outgoing Winsock data. Currently there are three supported filter types: byte, length, and substring. Byte filters match against byte(s) found at specific locations in the packet data. Length filters match against packet length for less than, equal to, or greater than a particular size. Lastly, substring filters match against a sequential series of bytes at any location in the packet. Filters are also what allows for manipulation of the data matched against them; you can substitute parts of a message or replace it altogether. Currently filters are matched in a queue: the first filter set will be the first one matched against, the second one will be the second, and so on. There are future plans to add priority to filters, but this dev release does not contain it. Filters also come with a “break-on-hit” flag that allows for the thread calling the target Winsock function to halt when the filter is hit. A client is responsible for sending a continue message to continue execution.

Internals: Adding a Filter and Matching

Adding a filter is initiated entirely on the client-side. The client specifies the match/substitute/replace parameters and forwards this information to the Hekate server, where a new filter of the appropriate type will be created and stored. As an example, below is some sample code that adds a filter and is found in the test filter project provided with the source code:

    auto firstFilter = Hekate::Protobuf::Message::HekateClientMessageBuilder::CreateSubstringFilterMessage(0x123,
        false, "first", 5);
    int replacementIndices1[] = { 12, 13, 14, 15, 16 };
    Hekate::Protobuf::Message::HekateClientMessageBuilder::AddSubstituteMessage(firstFilter, "QWERT", replacementIndices1, 5);
    WriteMessage(hPipeIn, firstFilter);

Here a substring filter with id 0x123 is created that looks for the substring “first”. It will substitute “QWERT” in the packet data at indices [12, 16] if the filter is matched. With this filter, a packet with data = “This is the first buffer” will be matched and replaced to read “This is the QWERT buffer“. The replacement indices do not need to match the indices of where the data was originally. Using replacement indices [0, 4] on the original messages will give “QWERTis the first buffer“.

On the server side, a queue of filters is kept. As mentioned, this queue is processed in the order that filters were created. Filters are initially created and added in HekateMain::AddFilter. For every hook function, i.e. WinsockHooks::SendHook, WinsockHooks::SendToHook, …, WinsockHooks::WSARecvFromHook, the buffer(s) is taken and matched against filters in the queue. This happens in WinsockHooks::CheckFilters, which calls the beginning of the filter chain and reports back whether any filter has been hit. Each filter returns a FilteredInput structure, which contains information about whether the filter was hit, whether there is new/modified data to send out, and the data bytes and length. If a filter was hit and data has changed as a result, then data from this FilteredInput structure is sent out; otherwise the original data will be sent.

Replay Data

Hekate also allows for the complete replaying of data of outgoing data. Parameters are re-sent exactly as they were: to the same socket, with the same buffer and lengths, same flags and any additionally WSA* parameters are provided exactly as received (i.e. same overlapped completion routine address). By design, filters are bypassed when replaying data. Replayed data calls the relocated code instructions and bypasses hitting any hooks/filters.

Dependencies

Hekate relies on Plog for internal logging and Google’s Protocol Buffers for the messaging format between client and server. The protobuf compiler is not provided as part of this release. The compiler source and release binary is available on the Protobuf Github page. The version used for Hekate was build 3.0.0 Alpha 3.

Building the DLL and Examples

Hekate is best built using Visual Studio 2015. Opening up the Hekate.sln file shows six projects

  1. Hekate
  2. HekateMITM
  3. HekateTestFilter
  4. HekateTestListener
  5. HekateTestSender
  6. libprotobuf

Hekate is the main project and contains the DLL that acts as the server. Before building Hekate, libprotobuf needs to be built. Build libprotobuf with a Debug/Release configuration for x86 and x64. These four configurations (Debug x86, Release x86, Debug x64, Release x64) should result in successful builds and there will be four .lib files in the /Hekate/thirdparty/protobuf/lib directory. Make sure that these four .lib files, libprotobufd_x86.lib, libprotobuf_x86.lib, libprotobufd_x64.lib, and libprotobuf_x64.lib are present in the directory as they are needed for the different build configurations. Once this is done, the Hekate project can be built. This project must be built with DebugDll/ReleaseDll configurations instead of Debug/Release. The latter two have been left in for the project in case developers want to mess around with an executable locally instead of building a DLL that needs to be injected. Using these two configurations should result in Hekate.dll being built in the DebugDll/ReleaseDll directories.

There are also four sample projects that serve as sample targets or clients for Hekate. HekateMITM is a sample client/server application that sends and receives data over localhost. One thread is responsible for sending data and the other for receiving. This sample project should be buildable immediately under x86/x64 and has no dependencies. It is intended as a target to test out functionality provided by other projects. HekateTestFilter and HekateTestListener are two sample Hekate clients. HekateTestFilter sets up three different filters, one corresponding to each type. It will substitute bytes in one message, replace bytes entirely in another, and pause execution for five seconds on a third message. A run of HekateMITM and HekateTestFilter is shown below. You can see the filters at work, where the first message type was modified and the second message type replaced entirely (click to enlarge).

c1

HekateTestListener is a passive listener client that will print out the value of the parameters passed into the Winsock functions along with the buffer. HekateTestSender is just a simple target application useful in that calls the eight Winsock functions in a loop useful for debugging/testing.

Code

The Visual Studio 2015 project for this example can be found here. The source code is viewable on Github here.
This code was tested on x64 Windows 7, 8.1, and 10.

Issues

I’ve aimed to have very comprehensive logging contained in the code. The log file is currently written out to C:/Temp/log.txt, and is a good starting point if an error has occurred at runtime. Hekate.dll also relies on Capstone, so capstone_x86.dll/capstone_x64.dll must be present in the same directory as the target.

License

Hekate is provided as-is and is released under the GNU General Public Licence (GPL) v3 for non-commercial use only.

The code base will continue to evolve and features will continue to be added. The content covered in this post might eventually become outdated as a result of this. I am aiming to have each major release/update act as a changelog from this main post. Future plans as far as this project goes is to eventually develop a nice UI wrapper around it that allows for easy interaction and visualization of data, filters, and other related aspects of what is happening to Winsock traffic on a target process. Thanks for reading and be sure to follow on Twitter for more updates.

Categories: General x86, General x86-64, Programming Tags:

Manually Enumerating Process Modules

August 20th, 2015 No comments

This post will show how to enumerate a processes loaded modules without the use of any direct Windows API call. It will rely on partially undocumented functionality both from the native API and the undocumented structures provided by them. The implementation discussed is actually reasonably close to how EnumProcessModules works.

Undocumented Functions and Structures

The main undocumented function that will be used here is NtQueryInformationProcess, which is a very general function that can return a large variety of information about a process depending on input parameters. It takes in a PROCINFOCLASS as its second parameter, which determines which type of information it is to return. The value of this parameter are largely undocumented, but a complete definition of it can be found here. The parameter of interest here will be ProcessBasicInformation, which fills out a PROCESS_BASIC_INFORMATION structure prior to returning. In code this looks like the following:

PROCESS_BASIC_INFORMATION procBasicInfo = { 0 };
ULONG ulRetLength = 0;
NTSTATUS ntStatus = NtQueryInformationProcess(hProcess,
    PROCESS_INFORMATION_CLASS_FULL::ProcessBasicInformation, &procBasicInfo,
    sizeof(PROCESS_BASIC_INFORMATION), &ulRetLength);
if (ntStatus != STATUS_SUCCESS)
{
    fprintf(stderr, "Could not get process information. Status = %X\n",
        ntStatus);
    exit(-1);
}

This structure, too, is largely undocumented. Its full definition can be found here. The field of interest is the second one, the pointer to the processes PEB. This is a very large structure that is mapped into every process and contains an enormous amount of information about the process. Among the vast amount of information contained within the PEB are the loaded modules lists. The Ldr member in the PEB is a pointer to a PEB_LDR_DATA structure which contains these three lists. These three lists contain the same modules, but ordered differently; either in load order, memory initialization order, or initialization order as their names describe. The list consists of LDR_DATA_TABLE_ENTRY entries that contain extended information about the loaded module.

Retrieving Module Information

The above definitions are all that is needed in order to implement manual module traversal. The general idea is the following:

  1. Open a handle to the target process and obtain the address of its PEB (via NtQuerySystemInformation).
  2. Read the PEB structure from the process (via ReadProcessMemory).
  3. Read the PEB_LDR_DATA from the PEB (via ReadProcessMemory).
  4. Store off the top node and begin traversing the doubly-linked list, reading each node (via ReadProcessMemory).

Writing it in C++ translates to the following:

void EnumerateProcessDlls(const HANDLE hProcess)
{
    PROCESS_BASIC_INFORMATION procBasicInfo = { 0 };
    ULONG ulRetLength = 0;
    NTSTATUS ntStatus = NtQueryInformationProcess(hProcess,
        PROCESS_INFORMATION_CLASS_FULL::ProcessBasicInformation, &procBasicInfo,
        sizeof(PROCESS_BASIC_INFORMATION), &ulRetLength);
    if (ntStatus != STATUS_SUCCESS)
    {
        fprintf(stderr, "Could not get process information. Status = %X\n",
            ntStatus);
        exit(-1);
    }
 
    PEB procPeb = { 0 };
    SIZE_T ulBytesRead = 0;
    bool bRet = BOOLIFY(ReadProcessMemory(hProcess, (LPCVOID)procBasicInfo.PebBaseAddress, &procPeb,
        sizeof(PEB), &ulBytesRead));
    if (!bRet)
    {
        fprintf(stderr, "Failed to read PEB from process. Error = %X\n",
            GetLastError());
        exit(-1);
    }
 
    PEB_LDR_DATA pebLdrData = { 0 };
    bRet = BOOLIFY(ReadProcessMemory(hProcess, (LPCVOID)procPeb.Ldr, &pebLdrData, sizeof(PEB_LDR_DATA),
        &ulBytesRead));
    if (!bRet)
    {
        fprintf(stderr, "Failed to read module list from process. Error = %X\n",
            GetLastError());
        exit(-1);
    }
 
    LIST_ENTRY *pLdrListHead = (LIST_ENTRY *)pebLdrData.InLoadOrderModuleList.Flink;
    LIST_ENTRY *pLdrCurrentNode = pebLdrData.InLoadOrderModuleList.Flink;
    do
    {
        LDR_DATA_TABLE_ENTRY lstEntry = { 0 };
        bRet = BOOLIFY(ReadProcessMemory(hProcess, (LPCVOID)pLdrCurrentNode, &lstEntry,
            sizeof(LDR_DATA_TABLE_ENTRY), &ulBytesRead));
        if (!bRet)
        {
            fprintf(stderr, "Could not read list entry from LDR list. Error = %X\n",
                GetLastError());
            exit(-1);
        }
 
        pLdrCurrentNode = lstEntry.InLoadOrderLinks.Flink;
 
        WCHAR strFullDllName[MAX_PATH] = { 0 };
        WCHAR strBaseDllName[MAX_PATH] = { 0 };
        if (lstEntry.FullDllName.Length > 0)
        {
            bRet = BOOLIFY(ReadProcessMemory(hProcess, (LPCVOID)lstEntry.FullDllName.Buffer, &strFullDllName,
                lstEntry.FullDllName.Length, &ulBytesRead));
            if (bRet)
            {
                wprintf(L"Full Dll Name: %s\n", strFullDllName);
            }
        }
 
        if (lstEntry.BaseDllName.Length > 0)
        {
            bRet = BOOLIFY(ReadProcessMemory(hProcess, (LPCVOID)lstEntry.BaseDllName.Buffer, &strBaseDllName,
                lstEntry.BaseDllName.Length, &ulBytesRead));
            if (bRet)
            {
                wprintf(L"Base Dll Name: %s\n", strBaseDllName);
            }
        }
 
        if (lstEntry.DllBase != nullptr && lstEntry.SizeOfImage != 0)
        {
            wprintf(
                L"  Dll Base: %p\n"
                L"  Entry point: %p\n"
                L"  Size of Image: %X\n",
                lstEntry.DllBase, lstEntry.EntryPoint, lstEntry.SizeOfImage);
        }
 
    } while (pLdrListHead != pLdrCurrentNode);
}

Code

The Visual Studio 2015 project for this example can be found here. The source code is viewable on Github here.
This code was tested on x64 Windows 7, 8.1, and 10.

Follow on Twitter for more updates

Categories: General x86, General x86-64, Programming Tags: