BlueBox Android Challenge

Some weeks (? I don’t even remember the time frame) ago I was made aware by a friend of this challenge. Basically injection of Dalvik code through native code. I found some minutes this morning to look into it and while I’m sure somebody else has already solved it, it’s a nice way of showing a bit of how to reverse engineer Android applications with the Profiler.

The first problem we encounter is that the APK (which is a Zip archive) has been tampered with. It asks incorrectly for a decryption key because all file headers have had their GeneralPurposeBit modifed, e.g.:

; file header
; offset: 28BB2

Signature             : 02014B50
CreatorVersion        : 0014
ExtractorVersion      : 0014
GeneralBitFlag        : 0809       <-- should be 0
CompressionMethod     : 0008
LastModTime           : 2899
LastModDate           : 4262
Crc32                 : CFEF1C2F
CompressedSize        : 000000F7
UncompressedSize      : 000001D0
FileNameLength        : 001C
ExtraFieldLength      : 0000
FileCommentLength     : 0000
Disk                  : 0000
InternalFileAttributes: 0000
ExternalFileAttributes: 00000000
LocalHeaderOffset     : 00000C15
FileName              : res/menu/activity_action.xml
ExtraField            : 
FileComment           : 

A few lines of code to fix this field for all file entries in the Zip archive:

from Pro.UI import *

obj = proContext().currentScanProvider().getObject()
n = obj.GetEntryCount()
for i in range(n):
    obj.GetEntry(i).Set("GeneralBitFlag", 0)
s = obj.GetStream()
s.save(s.name() + "_fixed")

Now we can explore the contents of the APK with the Profiler. We have the usual 'classes.dex' file plus the native library 'lib/armeabi/libnet.so'. Let's open the library with IDA. You'll notice the functions it contains aren't many and just by looking at them we'll stumble at this function:

void *__fastcall search(unsigned int a1)
{
  unsigned int v1; // r4@1
  __int32 v2; // r7@1
  int v3; // r4@1
  int v4; // r5@2
  signed int v5; // r4@3
  signed int v6; // r8@3
  int v7; // r4@3
  int v8; // r6@3
  int v9; // r0@3
  int v10; // r0@3
  void *v11; // r4@3

  v1 = a1;
  v2 = sysconf(39);
  v3 = v1 - v1 % v2;
  do
  {
    v3 -= v2;
    v4 = v3 + 40;
  }
  while ( !findmagic(v3 + 40) );
  v5 = getStrIdx(v3 + 40, "L-ÿava/lang/String;", 0x12u);
  v6 = getStrIdx(v4, "add", 3u);
  v7 = getTypeIdx(v4, v5);
  v8 = getClassItem(v4, v7);
  v9 = getMethodIdx(v4, v6, v7);
  v10 = getCodeItem(v4, v8, v9);
  v11 = (void *)(v10 + 16);
  mprotect((void *)(v10 - (v10 + 16) % (unsigned int)v2 + 16), v2, 3);
  return memcpy(v11, inject, 0xDEu);
}

It's clear that at some point during the execution of the Dalvik code this function is triggered which writes the array 'inject' into the memory space of the DEX module. We can verify that they are indeed Dalvik opcodes with the appropriate filter. Select the bytes representing the array in the hex view and then open the filter view:

Dalvik filter

The functions called before the actual injection locate the exact position of the code. They help us as well: back to the Profiler, let's find the method "L-ÿava/lang/String;":"add":

Methods

From here we get the class index and name. Just by looking at the disassembled class we'll notice a method filled with nops:

Method with nops

The code size of the method matches the payload size (111 * 2 = 0xDE):

Classes

Let's write back the instructions to the DEX module:

Write payload

We could do this with a filter just as well by the way:

And now we can analyze the injected code:

  public void zoom(java.lang.String)
  {
    /* 0002782C 12 02             */ const/4 v2, #int 0 // #0
    /* 0002782E 6E 10 1F 0C 0C 00 */ invoke-virtual {v12}, int java.lang.String.length()
    /* 00027834 0A 04             */ move-result v4
    /* 00027836 22 05 DB 01       */ new-instance v5, java.util.HashMap
    /* 0002783A 70 10 4D 0C 05 00 */ invoke-direct {v5}, void java.util.HashMap.()
    /* 00027840 1A 00 00 00       */ const-string v0, ""
    /* 00027844 5B B0 FC 02       */ iput-object v0, v11, java.lang.String content
    /* 00027848 6E 10 22 0C 0C 00 */ invoke-virtual {v12}, char[] java.lang.String.toCharArray()
    /* 0002784E 0C 06             */ move-result-object v6
    /* 00027850 21 67             */ array-length v7, v6
    /* 00027852 01 23             */ move v3, v2
loc_40:
    /* 00027854 34 73 03 00       */ if-lt v3, v7, loc_46 // +3
    /* 00027858 0E 00             */ return-void
loc_46:
    /* 0002785A 49 00 06 03       */ aget-char v0, v6, v3
    /* 0002785E D8 00 00 BF       */ add-int/lit8 v0, v0, #int -65 // #bf
    /* 00027862 B4 40             */ rem-int/2addr v0, v4
    /* 00027864 71 10 01 0C 00 00 */ invoke-static {v0}, java.lang.Integer java.lang.Integer.valueOf(int)
    /* 0002786A 0C 08             */ move-result-object v8
    /* 0002786C 71 10 01 0C 02 00 */ invoke-static {v2}, java.lang.Integer java.lang.Integer.valueOf(int)
    /* 00027872 0C 00             */ move-result-object v0
    /* 00027874 6E 20 4E 0C 85 00 */ invoke-virtual {v5, v8}, bool java.util.HashMap.containsKey(java.lang.Object)
    /* 0002787A 0A 01             */ move-result v1
    /* 0002787C 38 01 2C 00       */ if-eqz v1, loc_168 // +44
    /* 00027880 6E 20 4F 0C 85 00 */ invoke-virtual {v5, v8}, java.lang.Object java.util.HashMap.get(java.lang.Object)
    /* 00027886 0C 00             */ move-result-object v0
    /* 00027888 1F 00 C3 01       */ check-cast v0, java.lang.Integer
loc_96:
    /* 0002788C 54 B1 FC 02       */ iget-object v1, v11, java.lang.String content
    /* 00027890 22 08 CE 01       */ new-instance v8, java.lang.StringBuilder
    /* 00027894 71 10 25 0C 01 00 */ invoke-static {v1}, java.lang.String java.lang.String.valueOf(java.lang.Object)
    /* 0002789A 0C 01             */ move-result-object v1
    /* 0002789C 70 20 28 0C 18 00 */ invoke-direct {v8, v1}, void java.lang.StringBuilder.(java.lang.String)
    /* 000278A2 6E 10 FC 0B 00 00 */ invoke-virtual {v0}, byte java.lang.Integer.byteValue()
    /* 000278A8 0A 00             */ move-result v0
    /* 000278AA D8 00 00 41       */ add-int/lit8 v0, v0, #int 65 // #41
    /* 000278AE 8E 00             */ int-to-char v0, v0
    /* 000278B0 71 10 ED 0B 00 00 */ invoke-static {v0}, java.lang.Character java.lang.Character.valueOf(char)
    /* 000278B6 0C 00             */ move-result-object v0
    /* 000278B8 6E 20 2C 0C 08 00 */ invoke-virtual {v8, v0}, java.lang.StringBuilder java.lang.StringBuilder.append(java.lang.Object)
    /* 000278BE 0C 00             */ move-result-object v0
    /* 000278C0 6E 10 31 0C 00 00 */ invoke-virtual {v0}, java.lang.String java.lang.StringBuilder.toString()
    /* 000278C6 0C 00             */ move-result-object v0
    /* 000278C8 5B B0 FC 02       */ iput-object v0, v11, java.lang.String content
    /* 000278CC D8 00 03 01       */ add-int/lit8 v0, v3, #int 1 // #01
    /* 000278D0 01 03             */ move v3, v0
    /* 000278D2 28 C1             */ goto loc_40 // -63
loc_168:
    /* 000278D4 01 21             */ move v1, v2
loc_170:
    /* 000278D6 35 41 DB FF       */ if-ge v1, v4, loc_96 // -37
    /* 000278DA 6E 10 FD 0B 08 00 */ invoke-virtual {v8}, int java.lang.Integer.intValue()
    /* 000278E0 0A 09             */ move-result v9
    /* 000278E2 B2 19             */ mul-int/2addr v9, v1
    /* 000278E4 B4 49             */ rem-int/2addr v9, v4
    /* 000278E6 12 1A             */ const/4 v10, #int 1 // #1
    /* 000278E8 33 A9 0E 00       */ if-ne v9, v10, loc_216 // +14
    /* 000278EC 71 10 01 0C 01 00 */ invoke-static {v1}, java.lang.Integer java.lang.Integer.valueOf(int)
    /* 000278F2 0C 00             */ move-result-object v0
    /* 000278F4 6E 30 50 0C 85 00 */ invoke-virtual {v5, v8, v0}, java.lang.Object java.util.HashMap.put(java.lang.Object, java.lang.Object)
    /* 000278FA 71 10 01 0C 01 00 */ invoke-static {v1}, java.lang.Integer java.lang.Integer.valueOf(int)
    /* 00027900 0C 00             */ move-result-object v0
    /* 00027902 28 C5             */ goto loc_96 // -59
loc_216:
    /* 00027904 D8 01 01 01       */ add-int/lit8 v1, v1, #int 1 // #01
    /* 00027908 28 E7             */ goto loc_170 // -25
  }

And that's it. It took much more time to write the post than the rest (about 10 minutes of time if that). Reverse engineering the crackme to find the correct key is beyond the scope of the post, although I'm sure it's fun as well.

Thanks to BlueBox for the crackme!

News for version 0.9.4

The new version is out with the following news:

added RTF support including OLE extraction and raw text preview
– added file times support and extraction in Zip archives
added disasm options to several engines
added support for Android Binary XML format
exposed several disasm engines as filters
– introduced metadata strings to SDK
– exposed Zip format class to Python
– fixed module initialization problem in the SDK

Some features planned for this release were postponed for the next version (or perhaps even the version after that), because too many unplanned new features have been introduced in 0.9.4. Some of the news above need further explanations and examples, but I’m afraid posts about them will have to wait. It would be nice to show some of these new features in conjunction with other features which are planned for the near future.

In the meantime we hope you enjoy the release!

Android Binary XML support

The upcoming version 0.9.4 of the Profiler adds support for Android’s binary XML format (such as that used by AndroidManifest.xml).

Android Binary XML

Let’s take the sample output of the aapt tool in the Android SDK:

N: android=http://schemas.android.com/apk/res/android
  E: manifest (line=22)
    A: package="com.example.android.notepad" (Raw: "com.example.android.notepad")
    E: uses-sdk (line=25)
      A: android:minSdkVersion(0x0101020c)=(type 0x10)0xb
    E: application (line=27)
      A: android:label(0x01010001)=@0x7f040000
      A: android:icon(0x01010002)=@0x7f020000
      E: provider (line=30)
        A: android:name(0x01010003)="NotePadProvider" (Raw: "NotePadProvider")
        A: android:exported(0x01010010)=(type 0x12)0x0
        A: android:authorities(0x01010018)="com.google.provider.NotePad" (Raw: "com.google.provider.NotePad")
        E: grant-uri-permission (line=33)
          A: android:pathPattern(0x0101002c)=".*" (Raw: ".*")
      E: activity (line=36)
        A: android:label(0x01010001)=@0x7f040005
        A: android:name(0x01010003)="NotesList" (Raw: "NotesList")
        E: intent-filter (line=37)
          E: action (line=38)
            A: android:name(0x01010003)="android.intent.action.MAIN" (Raw: "android.intent.action.MAIN")
          E: category (line=39)
            A: android:name(0x01010003)="android.intent.category.LAUNCHER" (Raw: "android.intent.category.LAUNCHER")
        E: intent-filter (line=41)
          E: action (line=42)
            A: android:name(0x01010003)="android.intent.action.VIEW" (Raw: "android.intent.action.VIEW")
          E: action (line=43)
            A: android:name(0x01010003)="android.intent.action.EDIT" (Raw: "android.intent.action.EDIT")
          E: action (line=44)
            A: android:name(0x01010003)="android.intent.action.PICK" (Raw: "android.intent.action.PICK")
          E: category (line=45)
            A: android:name(0x01010003)="android.intent.category.DEFAULT" (Raw: "android.intent.category.DEFAULT")
          E: data (line=46)
            A: android:mimeType(0x01010026)="vnd.android.cursor.dir/vnd.google.note" (Raw: "vnd.android.cursor.dir/vnd.google.note")
        E: intent-filter (line=48)
          E: action (line=49)
            A: android:name(0x01010003)="android.intent.action.GET_CONTENT" (Raw: "android.intent.action.GET_CONTENT")
          E: category (line=50)
            A: android:name(0x01010003)="android.intent.category.DEFAULT" (Raw: "android.intent.category.DEFAULT")
          E: data (line=51)
            A: android:mimeType(0x01010026)="vnd.android.cursor.item/vnd.google.note" (Raw: "vnd.android.cursor.item/vnd.google.note")
      E: activity (line=55)
        A: android:theme(0x01010000)=@0x103006e
        A: android:name(0x01010003)="NoteEditor" (Raw: "NoteEditor")
        A: android:screenOrientation(0x0101001e)=(type 0x10)0x4
        A: android:configChanges(0x0101001f)=(type 0x11)0xa0
        E: intent-filter (line=62)
          A: android:label(0x01010001)=@0x7f04000f
          E: action (line=63)
            A: android:name(0x01010003)="android.intent.action.VIEW" (Raw: "android.intent.action.VIEW")
          E: action (line=64)
            A: android:name(0x01010003)="android.intent.action.EDIT" (Raw: "android.intent.action.EDIT")
          E: action (line=65)
            A: android:name(0x01010003)="com.android.notepad.action.EDIT_NOTE" (Raw: "com.android.notepad.action.EDIT_NOTE")
          E: category (line=66)
            A: android:name(0x01010003)="android.intent.category.DEFAULT" (Raw: "android.intent.category.DEFAULT")
          E: data (line=67)
            A: android:mimeType(0x01010026)="vnd.android.cursor.item/vnd.google.note" (Raw: "vnd.android.cursor.item/vnd.google.note")
        E: intent-filter (line=74)
          E: action (line=75)
            A: android:name(0x01010003)="android.intent.action.INSERT" (Raw: "android.intent.action.INSERT")
          E: action (line=76)
            A: android:name(0x01010003)="android.intent.action.PASTE" (Raw: "android.intent.action.PASTE")
          E: category (line=77)
            A: android:name(0x01010003)="android.intent.category.DEFAULT" (Raw: "android.intent.category.DEFAULT")
          E: data (line=78)
            A: android:mimeType(0x01010026)="vnd.android.cursor.dir/vnd.google.note" (Raw: "vnd.android.cursor.dir/vnd.google.note")
      E: activity (line=83)
        A: android:theme(0x01010000)=@0x103006f
        A: android:label(0x01010001)=@0x7f040002
        A: android:icon(0x01010002)=@0x7f020003
        A: android:name(0x01010003)="TitleEditor" (Raw: "TitleEditor")
        A: android:windowSoftInputMode(0x0101022b)=(type 0x11)0x4
        E: intent-filter (line=92)
          A: android:label(0x01010001)=@0x7f040010
          E: action (line=96)
            A: android:name(0x01010003)="com.android.notepad.action.EDIT_TITLE" (Raw: "com.android.notepad.action.EDIT_TITLE")
          E: category (line=98)
            A: android:name(0x01010003)="android.intent.category.DEFAULT" (Raw: "android.intent.category.DEFAULT")
          E: category (line=101)
            A: android:name(0x01010003)="android.intent.category.ALTERNATIVE" (Raw: "android.intent.category.ALTERNATIVE")
          E: category (line=104)
            A: android:name(0x01010003)="android.intent.category.SELECTED_ALTERNATIVE" (Raw: "android.intent.category.SELECTED_ALTERNATIVE")
          E: data (line=106)
            A: android:mimeType(0x01010026)="vnd.android.cursor.item/vnd.google.note" (Raw: "vnd.android.cursor.item/vnd.google.note")
      E: activity (line=110)
        A: android:label(0x01010001)=@0x7f040001
        A: android:icon(0x01010002)=@0x7f020006
        A: android:name(0x01010003)="NotesLiveFolder" (Raw: "NotesLiveFolder")
        E: intent-filter (line=112)
          E: action (line=113)
            A: android:name(0x01010003)="android.intent.action.CREATE_LIVE_FOLDER" (Raw: "android.intent.action.CREATE_LIVE_FOLDER")
          E: category (line=114)
            A: android:name(0x01010003)="android.intent.category.DEFAULT" (Raw: "android.intent.category.DEFAULT")

And now the output of the Profiler:


  
  
    
      
    
    
      
        
        
      
      
        
        
        
        
        
      
      
        
        
        
      
    
    
      
        
        
        
        
        
      
      
        
        
        
        
      
    
    
      
        
        
        
        
        
      
    
    
      
        
        
      
    
  

Of course UI XMLs can be opened as well:

The converter can be used from Python as well as a filter called ‘android/from_axml‘.

The new version will be out in a few days. Stay tuned!

Disasm options & filters

The upcoming version 0.9.4 of the Profiler introduces improvements to several disasm engines: ActionScript3, Dalvik, Java, MSIL. In particular it adds options, so that the user can decide whether to include file offsets and opcodes in the output.

Disasm options

The code indentation can be changed as well.

Another important addition is that these engines have been exposed as filters. This is especially noteworthy since byte code can sometimes be injected or stored outside of a method body, so that it is necessary to be able to disassemble raw data.

Disasm filters

Of course these filters can be used from Python too.

from Pro.Core import *

sp = proContext().currentScanProvider()
c = sp.getObjectStream()
c.setRange(0x2570, 0x10)

fstr = ""
c = applyFilters(c, fstr)

s = c.read(0, c.size()).decode("utf-8")
print(s)

Output:

/* 00000000 1A 00 8A+ */ const-string v0,  // string@018a (394)
/* 00000004 12 01     */ const/4 v1, #int 0 // #0
/* 00000006 12 22     */ const/4 v2, #int 2 // #2
/* 00000008 70 52 42+ */ invoke-direct {v3, v4, v0, v1, v2},  // method@0042 (66)
/* 0000000E 0E 00     */ return-void

In the future it will be possible to output a filter directly to NTTextStream, avoiding the need to read from NTContainer.

Stay tuned!

Rich Text Format support (including OLE extraction)

The work on the upcoming 0.9.4 version of the Profiler has just begun, but there’s already an addition worth mentioning in depth: the support for RTF files. In particular there are two things which are quite useful: the preview of raw text and the extraction of OLE objects.

Let’s start with the first one which is very easy.

Text preview

The same text can be retrieved programmatically with the following code:

sp = proContext().currentScanProvider()
rtf = sp.getObject()
out = NTTextBuffer()
rtf.Output(out)
print(out.buffer)

And now the more interesting part about embedded OLE objects. While RTF is usually regarded as a more safe format than its DOC counterpart, it is able to embed foreign objects through the OLE technology. This technique can be, and is, used by malware authors to conceal the real threat.

Let’s take a look at a file with two embedded objects (a DOC and a PPT).

OLE

What is being viewed in the image above is the metadata of a JPEG contained in a PPT contained in a OLE Stream contained in a RTF file. Nice, isn’t it?

It should be noted that OLE objects in RTF files are stored as OLE Streams an undocumented format (as far as we know). The Profiler is able to parse it nonetheless and it can be observed how this format can contain some interesting information.

OLE Stream metadata

Apart from the original file name we can observe paths which include the user name.

News for version 0.9.3

The new version is out with the following news:

– subdivided the Python SDK into modules
exposed many core and file format classes to Python (part 2)
exposed filters to Python
introduced Python hooks
introduced Python key providers
– improved SDK documentation
added extensions view
added file formats scan option
added decryption keys view
– fixed occasional concurrency issue with large files
– fixed embedded files manual addition issue (affected versions: >= 0.9.1)

Most of the items in the list have been demonstrated in previous posts. The only addition left to discuss is the key dialog. When a file is encrypted and gets decrypted with a key either provided by the user or by a script, then this key ends up in a special list of matched keys. This list can now be inspected by the user.

If some files have been decrypted an additional “Decryption keys” button will be shown. Just click on it and you’ll get the list of matched keys.

That’s all. Enjoy!

Detect broken PE manifests

In the previous post we’ve seen a brief introduction of how hooks work. If you haven’t read that post, you’re encouraged to do so in order to understand this one. What we’re going to do in this post is something practical: verifying the XML correctness of PE manifests contained in executables in the Windows directory.

The hook INI entry:

[PE: verify manifests]
file = pe_hooks.py
scanned = detectBrokenManifest
mode = batch
formats = PE

And the python code:

from Pro.Core import *
from Pro.PE import *

def detectBrokenManifest(sp, ud):
    sp.exclude()
    pe = sp.getObject()
    it = pe.ResourceIterator()
    if it.MoveToRoot(RES_TYPE_CONFIGURATION_FILES) == False:
        return
    while it.Next() and it.RootName() == RES_TYPE_CONFIGURATION_FILES:
        s = it.Data()
        offs = pe.RvaToOffset(s.Num(0)) # same as s.Num("OffsetToData")
        sz = s.Num(1) # same as s.Num("Size")
        if offs == INVALID_OFFSET or sz == 0:
            continue
        bytes = pe.Read(offs, sz)
        xml = NTXml()
        if xml.parse(bytes) != NTXml_ErrNone:
            sp.include()
            break

That’s it!

What the code above does is to ask the PE object for a resource iterator. This class, as our customers can observe from the SDK documentation, is capable of both iterating and moving to a specific resource directory or item. Thus, first it moves to the RES_TYPE_CONFIGURATION_FILES directory and then goes through all its items. If the XML parsing does fail, then the file is included in our final report.

So let’s proceed and do the actual scan. First we need to activate the extension from the extensions view:

Then we need to specify the Windows directory as our scan directory and the kind of file format we’re interested scanning (PE).

Let’s wait for the scan to complete and we’ll get the final results.

So seems these file have a problem with their manifests. Let’s open one and go to its manifest resources:

(if the XML is missing new-lines, just hit “Run action (Ctrl+R)->XML indenter”)

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!-- Copyright (c) Microsoft Corporation -->
<assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0">
  <assemblyIdentity
    name=""Microsoft.Windows.Shell.DevicePairingFolder""
    processorArchitecture=""x86""
    version=""5.1.0.0""
    type="win32"/>
  <description>Wireless Devices Explorer</description>
  <dependency>
    <dependentAssembly>
      <assemblyIdentity
        type="win32"
        name="Microsoft.Windows.Common-Controls"
        version="6.0.0.0"
        processorArchitecture="*"
        publicKeyToken="6595b64144ccf1df"
        language="*" />
    </dependentAssembly>
  </dependency>
</assembly>

As you can see some attributes in assemblyIdentity contain double quotes. I don’t know whether this DLL has been created with Visual C++, but I do remember that this could happen when specifying manifests fields in the project configuration dialog.

Exposing the Core (part 4, Hooks)

Hooks are an extremely powerful extension to the scanning engine of Cerbero Suite. They allow the user to do customize scans and do all sorts of things. Because there’s basically no limit to the applications, I’ll just try to give a brief introduction in this post. In the following post I’ll demonstrate their use with a real-world case.

Just like key providers introduced in the previous post, hooks have their INI configuration file as well (hooks.cfg). This can contain a minimal hook entry:

[Test Hook]
file = test_hooks.py
scanned = scanned

And the python code:

def scanned(sp, ud):
    print(sp.getObjectFormat())

scanned gets called after every file scan and prints out the format of the object. This function is not being called from the main thread, so it’s not possible to call UI functions. However, print is thread-safe and when doing a batch scan will just output to stdout.

Now let’s open Cerbero Suite and go to the new Extensions view.

Extensions view

You’ll notice that the box next to the name of the hook we just created is unchecked. This means that it’s disabled and hence won’t be called. We can enable it manually or we may even specify from the INI file to enable our extension by default:

[Test Hook]
file = test_hooks.py
scanned = scanned
enable = yes

We also specify the scan mode we are interested in:

; not specifying a mode equals to: mode = single|batch
mode = batch

Now the extension will be notified only when doing batch scans. To be even more selective, it’s possible to specify the file format(s) we are interested in:

; this is an optional field, you can omit it and the hook will be notified for any format
formats = PE|SWF

Ok, now let’s create a small sample which actually does something. Let’s say we want to perform a search among the disassembled code of Java Class files and include in the resulting report only those files which contain a particular string.

The configuration entry:

[Search Java Class]
file = test_hooks.py
scanned = searchJavaClass
mode = batch
formats = Class

And the code:

def searchJavaClass(sp, ud):
    from Pro.Core import NTTextBuffer
    cl = sp.getObject()
    out = NTTextBuffer()
    cl.Disassemble(out)
    # search string
    ret = out.buffer.find("HelloWorld") != -1
    sp.include(ret)

Let’s activate the extension by checking its box and then perform a custom scan only on files identified as Java Classes.

Class scan

The result will be:

Class report

The method ScanProvider::include(bool b) is what tells Cerbero Suite which files have to be included in the final report (its counterpart is ScanProvider::exclude(bool b)). Of course, there could be more than one hook active during a scan and a file can be both excluded and included. The logic is that include has priority over exclude and once a file has been included by a hook it can’t be excluded by another one.

Although the few lines above already have a purpose, it’s not quite handy having to change the code in order to perform different searches. Thus, hooks can optionally implement two more callbacks: init and end. Both these callbacks are called from the main UI thread (so that it’s safe to call UI functions). The first one is called before any scan operation is performed, while the latter after all of them have finished.

The syntax for for these callbacks is the following:

def init():
    print("init")
    return print  # returns what the other callbacks will get as their 'ud' argument

def end(ud):
    ud("end")

Instead of using ugly global variables, init can optionally return the user data passed on to the other callbacks. end is useful to perform cleanup operations. But in our sample above we don’t really need to clean up anything, we just need an input box to ask the user for a string to be searched. So we just need to add an init callback.

[Search Java Class]
file = test_hooks.py
init = initSearchJavaClass
scanned = searchJavaClass
mode = batch
formats = Class

And add the new logic to the code:

def initSearchJavaClass():
    from Pro.UI import ProInput
    return ProInput.askText("Insert string:")

def searchJavaClass(sp, ud):
    if ud == None:
        return
    from Pro.Core import NTTextBuffer
    cl = sp.getObject()
    out = NTTextBuffer()
    cl.Disassemble(out)
    # search string
    ret = out.buffer.find(ud) != -1
    sp.include(ret)

Of course, this sample could be improved endlessly by adding options, regular expressions, support for more file formats etc. But that is beyond the scope of this post which was just briefly introduce hooks.

The upcoming version of Cerbero Suite which includes all the improvements of the previous weeks is almost ready. Stay tuned!

Exposing the Core (part 3, Key Providers)

This post will be about key providers, which are the first kind of extension to the scan engine we’re going to see. Key providers are nothing else than a convenient way to provide keys through scripting to files which require a decryption key (e.g. an encrypted PDF).

Let’s take for instance an encrypted Zip file. If we’re not doing a batch scan, Cerbero Suite will ask the user with its dialog to enter a decryption key. While this dialog already has the ability to accept multiple keys and also remember them, there are things it can’t do. For example it is not suitable for trying out key dictionaries (copy and pasting them is inefficient) or to generate a key based on environmental factors (like the name of the file requiring the decryption).

This may sound all a bit complicated, but don’t worry. One of the main objectives of Cerbero Suite is to allow users to do things in the simplest way possible. Thus, showing a practical sample is the best way to demonstrate how it all works.

You’ll notice that the upcoming version of Cerbero Suite contains a keyp.cfg_sample in its config directory. This file can be used as template to create our first provider, just rename it to keyp.cfg. As all configuration files, this is an INI as well. This is what an entry for a key provider looks like:

[KeyProvider Test]
file = key_provider.py
callback = keyProvider
; this is an optional field, you can omit it and the provider will be used for any format
formats = Zip|PDF

Which is pretty much self explaining. It tells Cerbero Suite where our callback is located (the relative path defaults to the plugins/python directory) and it can also optionally specify the formats which may be used in conjunction with this provider. The Python code can be as simple as:

from Pro.Core import *

def keyProvider(obj, index):
    if index != 0:
        return None
    l = NTVariantList()
    l.append("password")
    return l

The provider returns a single key (‘password’). This means that when one of the specified file formats is encrypted, all registered key providers will be asked to provide decryption keys. If one key works, the file is automatically decrypted.

The returned list can contain even thousands of keys, it is up to the user to decide the amount returned. The index argument can be used to decide which bulk of keys must be returned, it starts at 0 and is incremented by l.size(). The key provider will be called until a match is found or it doesn’t return any more keys. Thus, be careful not to always return a key without checking the index, otherwise it’ll result in an endless loop.

When a string is appended to the list, then it will be converted internally by the conversion handlers to bytes (this means that a single string could, for instance, first be converted to UTF8 then to Ascii in order to obtain a match). Sometimes you want to return the exact bytes to be matched. In that case just append a bytearray object to the list.

The same sample could be transformed into a key generation based on variables:

from Pro.Core import *

def keyProvider(obj, index):
    if index != 0:
        return None
    name = obj.GetStream().name()
    # do some operations involving the file name
    variable_part = ...
    l = NTVariantList()
    l.append("static_part" + variable_part)
    return l

And this comes handy when we want to avoid typing in passwords for certain Zip archives which have a fixed decryption key schema.

So, to sum up key providers are powerful and easy-to-use extensions which allow us to test out key dictionaries on various file formats (those for which Cerbero Suite supports decryption) and to avoid the all too frequent hassle of having to type common passwords.

Exposing the Core (part 2)

The release date of the upcoming 0.9.3 version is drawing nearer. Several format classes have already been exposed to Python and in this post I’m going to show you some code snippets. Since it’s impossible to demonstrate all format classes (12 have already been exposed) and all their methods (a single class may contain dozens of methods), the purpose of the snippets below is only to give the reader an idea of what can be achieved.

The SDK organization has changed a bit: because of its increasing size it made sense to subdivide it into modules. Thus, there’s now the Pro.Core module, the Pro.UI one and one module for each format (e.g. Pro.PE).

PDF

This is how we can output to text the raw stream of a PDF:

from Pro.Core import *
from Pro.PDF import *

c = createContainerFromFile(fname)
pdf = PDFObject()
pdf.Load(c)
objtable = pdf.BuildObjectTable()
pdf.SetObjectTable(objtable)
oid = PDFObject.OBJID(3, 0)
ret, dict, content, info = pdf.ParseObject(objtable, oid)
out = NTTextBuffer()
out.printHex(content)
print(out.buffer)

Output:

         0  1  2  3  4  5  6  7   8  9  A  B  C  D  E  F    Ascii

0000   48 89 24 8D CD 0A 83 30  10 84 EF 81 BC C3 1C 93    H.$....0........
0010   8B 4D 52 63 E3 B5 D0 0A  42 A1 D0 DC C4 83 D4 F8    .MRc....B.......
0020   D3 D6 0A 2A F5 F5 BB B6  B0 CC 2E C3 37 3B 1A 2D    ...*........7;.-
0030   67 2A B2 C8 38 D3 C8 A1  F0 80 C6 8A 18 17 14 7B    g*..8..........{
0040   94 0A 35 67 BB EC 66 D0  CE 1B D1 83 34 75 48 92    ..5g..f.....4uH.
0050   04 46 C7 B0 0E 53 E0 EC  48 E3 09 3C 1B 4A FB 86    .F...S..H..<.J..
0060   18 43 AF 14 68 19 7D 88  1C 05 52 05 3F 50 D7 DF    .C..h.}...R.?P..
0070   C7 73 3B FD FD A7 2B 67  85 B8 CA 58 89 6A 5E 02    .s;...+g...X.j^.
0080   96 2E A0 E9 C3 AB 46 F5  AE 31 8C 52 5B F1 91 C6    ......F..1.R[...
0090   8A 20 95 C0 32 A2 0B 53  D8 CC 48 96 3E E7 EC 44    . ..2..S..H.>..D
00A0   CD 5F 01 06 00 88 1E 2A  AA 0D 0A                   ._.....*...    

Streams in PDFs are usually compressed. Here’s how we can decode the same stream:

from Pro.Core import *
from Pro.PDF import *

c = createContainerFromFile(fname)
pdf = PDFObject()
pdf.Load(c)
objtable = pdf.BuildObjectTable()
pdf.SetObjectTable(objtable)
oid = PDFObject.OBJID(3, 0)
ret, dict, content, info = pdf.ParseObject(objtable, oid)
content = pdf.DecodeObjectStream(content, dict, oid)
out = NTTextBuffer()
out.printHex(content)
print(out.buffer)

Output:

        0  1  2  3  4  5  6  7   8  9  A  B  C  D  E  F    Ascii

0000   31 20 67 0D 0A 30 2E 35  20 47 0D 0A 31 20 4A 20    1 g..0.5 G..1 J 
0010   30 20 6A 20 31 20 77 20  34 20 4D 20 5B 33 20 5D    0 j 1 w 4 M [3 ]
0020   30 20 64 0D 0A 2F 47 53  32 20 67 73 0D 0A 31 20    0 d../GS2 gs..1 
0030   69 20 0D 0A 31 39 38 20  36 36 36 20 32 31 34 20    i ..198 666 214 
0040   35 38 20 72 65 0D 0A 42  0D 0A 42 54 0D 0A 2F 46    58 re..B..BT../F
0050   32 20 31 20 54 66 0D 0A  31 32 20 30 20 30 20 31    2 1 Tf..12 0 0 1
0060   32 20 32 31 37 2E 38 38  20 36 39 30 20 54 6D 0D    2 217.88 690 Tm.
0070   0A 30 20 30 20 30 20 31  20 6B 0D 0A 30 20 54 63    .0 0 0 1 k..0 Tc
0080   0D 0A 30 20 54 77 0D 0A  5B 28 50 29 34 30 28 61    ..0 Tw..[(P)40(a
0090   73 74 65 20 74 68 65 20  66 69 65 6C 64 20 61 6E    ste the field an
00A0   64 20 6D 6F 29 31 35 28  76 29 32 35 28 65 29 30    d mo)15(v)25(e)0
00B0   28 20 74 6F 20 68 65 72  65 29 31 35 28 2E 29 5D    ( to here)15(.)]
00C0   54 4A 0D 0A 45 54 0D 0A                             TJ..ET..        

We might also want to iterate through the key/value pairs of a PDF dictionary. Thus, iterators have been implemented everywhere they could be applied. While they don’t yet support the standard Python syntax they are very easy to use:

from Pro.Core import *
from Pro.PDF import *

c = createContainerFromFile(fname)
pdf = PDFObject()
pdf.Load(c)
objtable = pdf.BuildObjectTable()
pdf.SetObjectTable(objtable)
oid = PDFObject.OBJID(3, 0)
ret, dict, content, info = pdf.ParseObject(objtable, oid)
it = dict.iterator()
while it.hasNext():
    k, v = it.next()
    print(k + " - " + v)

Output:

/Length - 171
/Filter - /FlateDecode

Iterating through the objects of a PDF amounts to the same logic:

from Pro.Core import *
from Pro.PDF import *

c = createContainerFromFile(fname)
pdf = PDFObject()
pdf.Load(c)
objtable = pdf.BuildObjectTable()
it = objtable.iterator()
while it.hasNext():
    k, v = it.next()
    # print out the object id
    print(str(k >> 32))

CFBF (DOC, XLS, PPT, MSI, etc.)

Iterating through the directories of a CFBF can be as simple as:

from Pro.Core import *
from Pro.CFBF import *

def visitor(obj, ud, dir_id, children):
    name = obj.DirectoryName(dir_id)
    print(name)
    return 0

c = createContainerFromFile(fname)
cfb = CFBObject()
cfb.Load(c)
dirs = cfb.BuildDirectoryTree()
cfb.SetDirectoryTree(dirs)
cfb.VisitDirectories(dirs, visitor, None)

Output:

Root Entry
CompObj
Ole
1Table
SummaryInformation
WordDocument
DocumentSummaryInformation

Retrieving a stream is equally easy:

from Pro.Core import *
from Pro.CFBF import *

c = createContainerFromFile(fname)
cfb = CFBObject()
cfb.Load(c)
dirs = cfb.BuildDirectoryTree()
cfb.SetDirectoryTree(dirs)
s = cfb.Stream(1)
b = s.read(0, s.size()) # read bytes
t = NTTextBuffer()
t.printHex(b)
print(t.buffer)

Output:

        0  1  2  3  4  5  6  7   8  9  A  B  C  D  E  F    Ascii

0000   01 00 FE FF 03 0A 00 00  FF FF FF FF 06 09 02 00    ................
0010   00 00 00 00 C0 00 00 00  00 00 00 46 18 00 00 00    ...........F....
0020   4D 69 63 72 6F 73 6F 66  74 20 57 6F 72 64 2D 44    Microsoft Word-D
0030   6F 6B 75 6D 65 6E 74 00  0A 00 00 00 4D 53 57 6F    okument.....MSWo
0040   72 64 44 6F 63 00 10 00  00 00 57 6F 72 64 2E 44    rdDoc.....Word.D
0050   6F 63 75 6D 65 6E 74 2E  38 00 F4 39 B2 71 00 00    ocument.8..9.q..
0060   00 00 00 00 00 00 00 00  00 00                      ..........      

SWF

Here’s how to output the disasm of an ActionScript2 Flash file:

from Pro.Core import *
from Pro.SWF import *

c = createContainerFromFile(fname)
swf = SWFObject()
swf.Load(c)
if swf.IsCompressed():
    swf.Decompress()
tl = swf.EnumerateTags()
swf.SetStoredTags(tl)
out = NTTextBuffer()
swf.AS2Disassemble(out)
print(out.buffer)

The same can be done for ActionScript3 using the ABCFileObject class.

Class

This is how to disassemble a Java Class file:

from Pro.Core import *
from Pro.Class import *

c = createContainerFromFile(fname)
cl = ClassObject()
cl.Load(c)
cl.ProcessClass()
out = NTTextBuffer()
cl.Disassemble(out)
print(out.buffer)

DEX

This is how to disassemble an Android DEX file class:

from Pro.Core import *
from Pro.DEX import *

c = createContainerFromFile(fname)
dex = DEXObject()
dex.Load(c)
# disassemble the last class
classes = dex.Classes()
token = classes.Count() - 1
out = NTTextBuffer()
dex.Disassemble(out, token)
print(out.buffer)

In the upcoming post(s) I’m going to put it all together and do some very interesting things.
So stay tuned as the best has yet to come!