News for version 0.9.9

The new 0.9.9 version of the Profiler is out with the following news:

added support for docked views in the main window
added scanning and rload (report load) hook notifications
partially exposed custom views to Python
exposed addEmbeddedObject method to Python
exposed NTContainer find methods to Python
improved importing of anonymous records (C11)
– added recognition of volatile keyword in types
– moved the message box constants to the Pro.Core module
– added tools view
– added quoted-printable decoding filter
added format quota calculator extension
added experimental EML attachment detection extension

Improved importing of anonymous records

C11 supports anonymous records like the following:

struct test {
    union {
        struct {
            unsigned int a;
            unsigned int b;
        };
        struct {
            unsigned int c;
            unsigned int d;
        };
        struct {
            unsigned int e;
            unsigned int f;
        };
    };
};

Notice that not only is the union anonymous but even its substructures are. The Header Manager is now capable of correctly importing this code. As usual anonymous types will be renamed (both their type and name).

Creating undetected malware for OS X

We have discovered a way to defeat current anti-malware solutions. We will publicly disclose the full details of the issue in a few weeks.

In the meantime, we’re more than happy to confidentially disclose the information with interested organizations (either security vendors or known companies which could benefit from it). Just send an email to: info@icerbero.com

-----BEGIN PGP PUBLIC KEY BLOCK-----
mQENBE1j8U0BCACm3tMNVVDb4gIEGdPYq3le5gzBngN43J7SXvGH6nlDnG6s/zS1
lBtecoqvtgXlS9KDonzq4KR0AfcEQh8ziwCgRbkgfQUxyIvJxt+cxW2zblnP37UQ
AuwhqO/Sc1yG3cT8wFSoaiF+tXJca879WEvimTyaoZSlXb3JuKt4UmDYbrOSLfDL
vd1rJ59R7GE5B2ThnSKDU/8pSvYVMKJdq0ArM8Nwg7gmUaBwpsvtEaGFB0gh3kBv
C/clsY8MR+MOBVI2f+95kekLIrUmOKzjXp5GkTgt5a6hqobU8jkVixN/KqgnT4Aj
LGPIyZkKgo/735SCT0JuWJKSHqJ+tw4aYUt7ABEBAAG0HkNlcmJlcm8gVUcgPGlu
Zm9AaWNlcmJlcm8uY29tPokBPgQTAQIAKAUCTWPxTQIbIwUJCWYBgAYLCQgHAwIG
FQgCCQoLBBYCAwECHgECF4AACgkQO3RcR15Gr56YXQf7BIKai3Y2i1xz4KAXfWmP
bsfuzHV4ZSsx8o1rnihtmXfN6qbPR1ySEfP/XgRXLZ2coerMYtx/Ydr4KnXlD1Sa
K7rGdTpqlwZ46p9RqtliFGELeglmHzD5ZCxawmHXf14OvTFgJKYqztFjBOvYMthI
xn5hdx/AqixCky5BMjdh7909cUP5Xzi0oYlpHcmiBsdPx8LPMh+DqGg9W0iahgnd
X+E0dW0dOEpg101esGMsHGaSbxw0+5ybh8XCIAl/50GLCVQWSE/8hxJnNoQy3AI/
P/d2olyGFVYbFVm+MUoRpx0zWgGP90n/Q9toGwl4qhhviyEl6vg6syYTuruhRcx7
D7kBDQRNY/FNAQgAskW9XYOXUd+DnEqGJywltTDpxSwwpCrfnqJG90YimVkK396G
ZG8uI5AnGqJ/+gThvgAMTY826WwlDP3DOyhmv1Iq7hKXDh9w2O5q8a1nsdaiGKws
7RBJ04xgfciifZRueGdEioiAFS5YmDLjdrBh6rX+6UfXTkbv1x1qodn1R9wFPxxS
nadpKwskG4YszNeViJxHMZTmnuKH9AOvCH7qiyWERNejeLRy1yFXVwD2HnCEjCNT
Loa3HvO5aJDT4Lww/w0McLPU0Tso5qQlXKk/I0C/llGD87rzuDffBswPQYfn2FkI
bHT5wdYh8Si+tA0oLI/bjRO254iFHDVgT/Vm3wARAQABiQElBBgBAgAPBQJNY/FN
AhsMBQkJZgGAAAoJEDt0XEdeRq+e+D4H/0W4oPHGv04y6KcuAR7XbgoXQ5fJVghY
XeKuYXD95WMT3W3PyoCirst9dX1MeJJ/wxi7dBCjT0iBbeb7mDERBQLi7L3hJnpg
wz1tokLb0QL+HNKIYZ8PsuuW3yQsbjSu1hCsCqNFe9nY3wkEDa3TWjjk5i1ejnnb
PCvGTOO/siwXGgZq7YWvoafCsdgbAwW8G6pO9BjZrrbDMMgFtQLWHLNBzDHTpWL3
BqjLlYisENQAO63FSAcu1ubhzFtIcVsjW8cgAxHQy4nN2RJHv23il+/PLsHquElP
gG4qSk8PudeEQUhFLLANRCSQ5yYlBhv4hJGGdAvYvYZQC36Nljg5WHI=
=SD9C
-----END PGP PUBLIC KEY BLOCK-----

EML attachment detection and inspection

The upcoming 0.9.9 version of the Profiler includes some very useful SDK additions. Among these, the addEmbeddedObject method (to add embedded objects) and a new hook notification called ‘scanning’. The scanning notification should be used for long operations and/or to add embedded objects. In this post we’ll demonstrate these new features with a little script to detect attachments in EML files.

EML attachments

One of the advantages of using the Profiler is that we are be able to inspect the sub-files of the attachments as well. The screenshot above shows a PNG contained in an ODT attachment. Nice, isn’t it?

But the nicest part is how little code is necessary to extend the functionality of the Profiler. These are the lines to add to the user hook configuration file:

[EML: detect attachments]
file = eml.py
scanning = detectEmlAttachments

And this is the Python code:

from Pro.Core import INVALID_STREAM_OFFSET

def detectEmlAttachmentsCb(offset, npattern, sp):
    c = sp.getObjectStream()
    # hdr range
    m = c.findFirst("\n--".encode("ascii"), 0, offset, False)
    hdrstart = 0 if m.offset == INVALID_STREAM_OFFSET else m.offset
    m = c.findFirst("\r\n\r\n".encode("ascii"), offset)
    hdrend = c.size() if m.offset == INVALID_STREAM_OFFSET else m.offset
    # make sure it's an attachment
    m = c.findFirst("Content-Disposition: attachment".encode("ascii") , hdrstart, hdrend - hdrstart)
    if m.offset == INVALID_STREAM_OFFSET:
        return 0
    # data range
    datastart = hdrend + 4
    m = c.findFirst("\r\n\r\n".encode("ascii"), datastart)
    dataend = c.size() if m.offset == INVALID_STREAM_OFFSET else m.offset
    # retrieve file name (if any)
    name = "no_name"
    m = c.findFirst('name='.encode("ascii"), hdrstart, hdrend - hdrstart)
    if m.offset != INVALID_STREAM_OFFSET:
        namestart = m.offset + 5
        namedel = "\r\n"
        if c.read(namestart, 1) == '"'.encode("ascii"):
            namedel = '"'
            namestart = namestart + 1
        m = c.findFirst(namedel.encode("ascii"), namestart)
        if m.offset != INVALID_STREAM_OFFSET:
            namesize = min(m.offset - namestart, 200)
            name = c.read(namestart, namesize).decode("utf-8")
    # add attachment
    sp.addEmbeddedObject(datastart, dataend - datastart, "?", name, "")
    return 0

def detectEmlAttachments(sp, ud):
    sp.getObjectStream().find(detectEmlAttachmentsCb, sp, "Content-Transfer-Encoding: base64".encode("ascii"))

That’s it. Of course, this is just a demonstration, to improve it we could add support for more encodings apart from ‘base64’ like ‘Quoted-Printable’ for instance.

Some email programs like Thunderbird store EML files by appending them in one single file. In fact, as you can see, the screenshot above displays the attachments of an entire Inbox database. 😉

EML attachment types

Also notice that in the code the addEmbeddedObject method is called by specifying a base64 decode filter to load the file. We can, of course, specify multiple filters and Lua ones as well. This makes it extremely easy to load files without having to write code to decode/decrypt/decompress them. The “?” parameter leaves the Profiler to identify the format of the attachment.

Format quota calculator

In the upcoming 0.9.9 version of the Profiler it will be possible to create docked views even in the context of the main window. This feature combined with custom views is extremely useful if we want to create custom reports at the end of a scan.

Some time ago I needed a little script to calculate the format quotas of files in a specific directory and their sub-files: we’ll use this sample to demonstrate the new features. For example we could use it to determine what kind of files and in what percentage the System32 directory on Windows contains. Or we could use it to determine the quotas of files in a Zip archive. To make it even more useful, the script now asks the user before the scan to enter the nesting range to consider. For example the value ‘0’ means all levels (starting from 0). If we want to calculate the quotas of top level files only, we must insert ‘0-0’ (start-end). The files contained in a Zip archive can be calculated with the value ‘1-1’ and if we want to include their sub-files we must insert ‘1’.

System32 quotas

We’re probably going to include the script in the upcoming release. But in case we don’t, in order to try it out, add the following lines to the hooks configuration file:

[Format Quota Calculator]
file = quotas.py
init = typeQuotaCalcaulatorInit
end = typeQuotaCalcaulatorEnd
scanned = typeQuotaCalcaulatorScanned

And create a ‘quotas.py’ file in your ‘plugins/python’ user directory with the following content:

from os import path
import random

def generateColor():
    c = ""
    for i in range(3):
        c = c + "%0.2X" % ((random.randint(0, 200) + 300) >> 1)
    return c

def typeQuotaCalcaulatorInit():
    random.seed(0)
    # ask for nesting levels to consider
    from Pro.UI import ProInput
    ns = ProInput.askText("Format Quota Calculator (nesting level: from(-to))", "0")
    lstart = 0
    lend = -1
    if ns != None:
        ns = ns.split("-")
        if len(ns) > 0:
            lstart = int(ns[0])
        if len(ns) > 1:
            lend = int(ns[1])
    return { "lstart" : lstart, "lend" : lend, "total" : 0, "quotas" : { } }

def typeQuotaCalcaulatorEnd(ud):
    from Pro.UI import proContext, ProView
    from html import escape
    prec = "%.2f"
    mbsize = 1024 * 1024
    u = ud["total"] / 100
    # prepare content
    s = "Total size: " + (prec % (ud["total"] / mbsize)) + " MBs\n"
    ui = ""
    for k,q in ud["quotas"].items():
        ps = (prec % (q / u))
        ss = (prec % (q / mbsize))
        s = s + "\n" + k + ": " + ps + "% (" + ss + " MBs)"
        ui = ui + ""
    ui = ui + ""
    # display view
    ctx = proContext()
    v = ctx.createView(ProView.Type_Custom, "Format quotas")
    v.setup(ui)
    v.getView(1).setText(s)
    ctx.addView(v)

def typeQuotaCalcaulatorScanned(sp, ud):
    # check nesting
    nesting = sp.scanNesting()
    if ud["lstart"] > nesting or (ud["lend"] >= 0 and ud["lend"] < nesting):
        return
    c = sp.getObjectStream()
    fmt = sp.getObjectFormat()
    # if we didn't recognize the file, use extension as format identifier
    # we could also use an external signature db...
    if fmt == "":
        fmt = path.splitext(c.name())[1]
        if len(fmt) > 0:
            fmt = fmt[1:] # skip dot
    if len(fmt) == 0:
        fmt = "?"
    else:
        fmt = fmt.upper()
    # add to quotas
    size = c.size()
    ud["total"] = ud["total"] + size
    if not fmt in ud["quotas"]:
        ud["quotas"][fmt] = 0
    ud["quotas"][fmt] = ud["quotas"][fmt] + size

Remember to activate the hook from the UI before running a scan.

Of course, the view will be displayed even after an individual file scan in the workspace.

PDF quotas

In order to improve the script, we could use an external signature database for those file formats not recognized automatically.

This is a perfect example of the capabilities to extend the functionality of the Profiler. While there’s yet no estimated release date for the upcoming version, keep in tune as we hope to publish very interesting stuff soon.

Custom Views

The upcoming 0.9.9 version of the Profiler will partially expose the use of custom views. These views are used internally by the Profiler to create complex graphical UIs using short XML strings. While at the moment extensions can use PySide to create complex UIs, it’s better to avoid it if possible, since it involves an extra dependency and also because PySide might not be ported to Qt 5 in the future.

But let’s see a code snippet:

from Pro.UI import *

ctx = proContext()
v = ctx.createView(ProView.Type_Custom, "Debug Directory")
v.setup("<ui><vsplitter><table id='0'/><hex id='1'/></vsplitter></ui>")
ctx.addView(v)

These few lines will display the following view:

Empty custom view

Controls can be organized in layouts (hlayout/vlayout), splitters (hsplitter/vsplitter) and tabs (tab). These elements are called containers. Available controls are: label, pie, plot, table, tree, hex, text and media.

More controls will be available in the future and not all of the current ones can be used as it is. Some controls make sense only in combination with a callback to be notified about changes of the state of the control. The notification system will be made available to Python as well in the future, but it made sense to release a partial solution in the meantime, because many views don’t require notifications and only need a way to display information at the end of an operation.

Let’s see for example how to make use of the UI above to display information.

Custom view

This code replicates the Debug Directory UI in Portable Executables.

from Pro.UI import *

ctx = proContext()
obj = ctx.currentScanProvider().getObject()
dbgdir = obj.DebugDirectory().MakeSingle()
dbgdata = ctx.currentScanProvider().getObjectStream()
dbgdata.setRange(*obj.DebugDirectoryData(dbgdir))

v = ctx.createView(ProView.Type_Custom, "Debug Directory")
v.setup("<ui><vsplitter><table id='0'/><hex id='1'/></vsplitter></ui>")
v.getView(0).setStruct(dbgdir)
v.getView(1).setData(dbgdata)
ctx.addView(v)

Elements in a view can have attributes. We’ve only seen the id attribute used to identify the embedded controls. There are two kind of attributes: shared attributes and individual ones. Only controls have these shared attributes: width, height, min-width, max-width, fixed-width and fixed-height. If a c is prefixed to the width/height word, then the size can be expressed in characters. e.g.: fixed-cwidth=’10’. Additionally, since version 1.3, there’s also wfixed and hfixed. Both are booleans which, if true, set the fixed size policy.

Here’s a list of individual attributes for controls and containers.

  • ui
    • bgcolor (e.g. ffffff)
  • hlayout/vlayout (hl/vl)
    • margin
    • spacing
    • align (hcenter, vcenter, center, top, left, bottom, right)
  • hsplitter/vsplitter (hs/vs)
    • sizes/csizes (separated by -)
  • tab
    • index
    • titles (separated by 😉
  • label
    • bgcolor (e.g. ffffff)
    • select (bool)
    • margin
  • text
    • readonly (bool)
    • linenr (bool, show line number)
    • hline (bool, highlight current line)
    • hword (bool, highlight current word)
    • wrap (bool)
  • combo (since version 1.3)
    • edit (bool)
    • text (string, only if editable)
  • btn (since version 1.3)
    • text (string, only if editable)
  • check (since version 1.3)
    • checked (bool)
    • text (string, only if editable)
  • tline (text-line, since version 2.5)

While this post doesn’t present many usage examples, we’ll try to show additional ones in future posts.

News for version 0.9.8

Since 0.9.7 has been a massive release with lots of changes, we dedicated the new 0.9.8 version of the Profiler to improve things and fix minor bugs. Here’s the change list:

– improved support for Windows 8.1 PEs
– added language options to Header Manager
– improved anonymous types renaming logic
– improved TrueType font disassembler
– many small improvements
– fixed some minor bugs

Since some improvements are PE related, PE Insider has been updated as well.

Enjoy!

PE Insider

It is always nice to give something back to the community and although this is unfortunately not always possible, we’re happy to announce the release of PE Insider, a free PE viewer which shares the same codebase for inspection as Cerbero Profiler and hence supports the entire PE specification and is incredibly fast and stable. We’re always very busy, but I was finally convinced by Ange Albertini to create this utility. 😉

PE Insider

The utility clearly stands light-years away in terms of functionality compared to the Profiler, but it does have some things which go beyond the simple format inspection including MSIL disassembly, navigation, ranges and resource preview.

Of course there’s room for improvement, but in the meantime here’s a first version. Enjoy!

P.S. To keep up-to-date with news regarding this utility either subscribe to our twitter account or follow the blog.

News for version 0.9.7

The new 0.9.7 version of the Profiler is out with the following news:

introduced C++ class/struct parsing with Clang
introduced headers, layouts and manual analysis in hex mode
exposed all the above to the Python SDK
added capability to turn into a portable application
– added SHA-3 hashes
– updated Qt to 4.8.5
– updated OpenSSL
– behavior change: displaying table flags now requires a double click

Enjoy!

Dissecting an ELF with C++ Types

While there are more interesting targets which could be manually analyzed with the new features provided in the Profiler, I decided to write a small post about ELF, also because official support for ELF will be added sooner or later.

Let’s start by importing the types contained in ‘elf.h’. You’ll probably find this header in ‘/usr/include’. Everything we’re interested in is in this file, so we can avoid importing other stuff. I added some predefines in order to avoid includes:

#define int8_t char
#define uint8_t unsigned char
#define int16_t short
#define uint16_t unsigned short
#define int32_t int
#define uint32_t unsigned int
#define int64_t long long
#define uint64_t unsigned long long

Then I pasted ‘elf.h’ into the Header Manager after the HEADER_START directive and clicked on ‘Import’.

ELF types import

We now have a header (elf) with all the types we need to start the manual analysis.

Since this is just a demonstration I didn’t do a full analysis of the ELF format. I limited the scope to finding the imported symbols and their strings.

ELF analysis

Every ELF starts with a _Elf64_Ehdr header (Elf32_Ehdr for 32-bit files, in this case it’s a 64-bit ELF). The header specifies the offset, number and size of the sections (we’ll just assume the standard 0x40 size here). The ‘name’ field of sections is just an index into a ‘SHT_STRTAB’ section whose index is specified by the header. The contents of a section are specified by its type, so finding the symbol table is pretty straight-forward. In this ELF we have a SHT_DYNSYM section. This section is just an array of _Elf64_Sym structures. Again, their ‘st_name’ field is just an index into another SHT_STRTAB section (the interval in the screenshot named ‘.dynstr’).

As already mentioned in the previous post, we can create a layout programmatically as well:

from Pro.Core import *
from Pro.UI import *

def buildElfLayout(obj, l):
    hname = "elf"
    hdr = CFFHeader()
    if hdr.LoadFromFile(hname) == False:
        return
    sopts = CFFSO_GCC | CFFSO_Pack1
    d = LayoutData()
    d.setTypeOptions(sopts)
    
    # add header
    ehdr = obj.MakeStruct(hdr, "_Elf64_Ehdr", 0, sopts)
    d.setColor(ntRgba(255, 0, 0, 70))
    d.setStruct(hname, "_Elf64_Ehdr")
    l.add(0, ehdr.Size(), d)

    # add sections (we assume that e_shentsize is 0x40)
    e_shoff = ehdr.Num("e_shoff")
    e_shnum = ehdr.Num("e_shnum")
    esects = obj.MakeStructArray(hdr, "_Elf64_Shdr", e_shoff, e_shnum, sopts)
    d.setStruct(hname, "_Elf64_Shdr")
    d.setArraySize(e_shnum)
    l.add(e_shoff, esects.TotalSize(), d)

hv = proContext().getCurrentView()
if hv.isValid() and hv.type() == ProView.Type_Hex:
    c = hv.getData()
    obj = CFFObject()
    obj.Load(c)
    lname = "ELF_ANALYSIS" # we could make the name unique
    l = proContext().getLayout(lname) 
    buildElfLayout(obj, l)
    # apply the layout to the current hex view
    hv.setLayoutName(lname)

Moreover, the imported types can be used to do other operations not related to layouts. For instance let’s write few lines of code to print out the symbol names for this ELF:

from Pro.Core import *

obj = proCoreContext().currentScanProvider().getObject()

hdr = CFFHeader()
if hdr.LoadFromFile("elf"):
    syms = obj.MakeStructArray(hdr, "_Elf64_Sym", 0x39A0, 2179, CFFSO_GCC | CFFSO_Pack1)
    it = syms.iterator()
    while it.hasNext():
        s = it.next()
        name_offs = s.Num(0) + 0x105E8 # .dynstr offset
        name = obj.ReadUInt8String(name_offs, 0x1000)[0].decode("utf-8")
        print(name)

The output will be:

endgrent
__ctype_toupper_loc
iswlower
sigprocmask
__snprintf_chk
getservent
wcscmp
putchar
strcasecmp
localtime
mblen
__vfprintf_chk
; etc.

Rememebr that the advantages of using CFFStructs rely not only in their dynamism or easiness in displaying them graphically, but also security. Contrary to a structure pointer in C, there’s no risk of crash when accessing members in a CFFStruct.

Today some final tests will be performed on the new version and if everything goes well, it will be released tomorrow or the day after. So stay tuned!

C++ Types: Under the Hood

In this post we’re going to explore the SDK part of the Profiler associated to imported structures and also all the C++ internals connected to the layout creation of structures/classes.

At first I thought about subdividing the material into several posts, but at the end it’s probably better to have it all together for future reference.

Layouts

In the SDK a Layout is the class to be used when we need to create a graphical analysis of raw data. While we can create and handle headers from the UI, it is also possible to do it programmatically.

class LayoutInterval

    end
    start

class LayoutData

    arraySize() -> UInt32
    getColor() -> NTRgb
    getDescription() -> NTUTF8String
    getHeader() -> NTUTF8String
    getType() -> NTUTF8String
    setArraySize(UInt32 n)
    setColor(NTRgb rgba)
    setDescription(NTUTF8String const & description)
    setStruct(NTUTF8String const & hdr, NTUTF8String const & type)
    setTypeOptions(UInt32 opt)
    typeOptions() -> UInt32

class LayoutPair

    first
    second

class Layout

    add(MaxUInt offset, MaxUInt size, LayoutData data)
    add(LayoutInterval interval, LayoutData data)
    at(UInt32 i) -> LayoutPair
    at(LayoutPair const & lp) -> UInt32
    at(LayoutInterval interval) -> UInt32
    count() -> UInt32
    fromXml(NTUTF8String const & xstr) -> bool
    getMatches(MaxUInt offset, MaxUInt size) -> LayoutPairList
    getOverlappingWith(MaxUInt offset, MaxUInt size) -> LayoutPairList
    isModified() -> bool
    isNull() -> bool
    isValid() -> bool
    layoutName() -> NTString
    remove(MaxUInt offset, MaxUInt size)
    remove(LayoutInterval interval)
    renameLayout(NTString const & name) -> bool
    saveIfModified()
    setModified(bool b)
    toXml() -> NTUTF8String

Creating a layout is straightforward:

from Pro.Core import *

# create a new layout or retrieve an existing one from the project
layout = proCoreContext().getLayout("LAYOUT_NAME")
# create data
data = LayoutData()
data.setDescription("text")
data.setColor(ntRgba(0xFF, 0, 0, 0x70))
# add interval
layout.add(70, 30, data)

The data can be associated to a structure (or array of structures) as well. Please remember that the name of a header is always relative to header sub-directory of the user directory. Saving the layout is not necessary: it’s automatically saved in the project.

Attaching a layout to a hex view is also very easy:

from Pro.UI import *

hv = proContext().getCurrentView()
if hv.type() == ProView.Type_Hex:
    hv.setLayoutName("LAYOUT_NAME")

Of course, layouts can be used for operations not related to graphical analysis as well.

Headers

Headers are part of the CFF Core and as such the naming convention of the CFFHeader class isn’t camel-case.

class CFFHeaderAliasData

    category
    name
    type
    value
    vtype

class CFFHeaderStructData

    name
    schema
    type

class CFFHeaderTypeDefData

    name
    type

class CFFHeader

    AC_Define
    AC_Enum
    AC_Last
    AVT_Integer
    AVT_Last
    AVT_Real
    AVT_String

    BeginEdit()
    Close()
    EndEdit()
    Equals(CFFHeader s) -> bool
    static GetACName(int category) -> char const *
    static GetAVTName(int vtype) -> char const *
    GetAliasCount() -> UInt32
    GetAliasData(UInt32 i) -> CFFHeaderAliasData
    GetStructBaseData(UInt32 i) -> CFFHeaderStructData
    GetStructCount() -> UInt32
    GetStructData(UInt32 i) -> CFFHeaderStructData
    GetStructData(char const * name) -> CFFHeaderStructData
    GetTypeDefCount() -> UInt32
    GetTypeDefData(UInt32 i) -> CFFHeaderTypeDefData
    InsertAlias(char const * name, int category, char const * type, int vtype, char const * value)
    InsertStruct(char const * name, char const * type, char const * schema)
    InsertTypeDef(char const * name, char const * type)
    IsModified() -> bool
    IsNull() -> bool
    IsValid() -> bool
    LoadFromFile(NTString const & name) -> bool
    LoadFromXml(NTXml xml) -> bool
    LoadFromXml(NTUTF8String const & xml) -> bool
    SetModified(bool b)

A CFFHeader represents an abstract database in which structures/classes and other things are stored. While we won’t use most of its methods, some of them are very useful for common operations.

Let’s say we want to retrieve a specific structure from a header and use it.

from Pro.Core import *

def output(s):
    out = proTextStream()
    s.Dump(out)
    print(out.buffer)

obj = proCoreContext().currentScanProvider().getObject()
hdr = CFFHeader()
if hdr.LoadFromFile("WinNT"):
    s = obj.MakeStruct(hdr, "_IMAGE_DOS_HEADER", 0, CFFSO_Pack1)
    output(s)

The output of this snippet is:

e_magic   : 5A4D
e_cblp    : 0090
e_cp      : 0003
e_crlc    : 0000
e_cparhdr : 0004
e_minalloc: 0000
e_maxalloc: FFFF
e_ss      : 0000
e_sp      : 00B8
e_csum    : 0000
e_ip      : 0000
e_cs      : 0000
e_lfarlc  : 0040
e_ovno    : 0000
e_res.0   : 0000
e_res.1   : 0000
e_res.2   : 0000
e_res.3   : 0000
e_oemid   : 0000
e_oeminfo : 0000
e_res2.0  : 0000
e_res2.1  : 0000
e_res2.2  : 0000
e_res2.3  : 0000
e_res2.4  : 0000
e_res2.5  : 0000
e_res2.6  : 0000
e_res2.7  : 0000
e_res2.8  : 0000
e_res2.9  : 0000
e_lfanew  : 000000F8

We can specify the following options when retrieving a structure:

CFFSO_EndianDefault
CFFSO_EndianLittle
CFFSO_EndianBig
CFFSO_EndiannessDefault
CFFSO_EndiannessLittle
CFFSO_EndiannessBig

CFFSO_PointerDefault
CFFSO_Pointer16
CFFSO_Pointer32
CFFSO_Pointer64

CFFSO_PackNone
CFFSO_Pack1
CFFSO_Pack2
CFFSO_Pack4
CFFSO_Pack8
CFFSO_Pack16

CFFSO_NoCompiler
CFFSO_VC
CFFSO_GCC
CFFSO_Clang

These are the same options which are available from the UI when adding a structure to a layout.

When options are not specified, they default to the default structure options of the object. It’s possible to specify the default structure options with this method:

SetDefaultStructOptions(UInt32 options)

We’ll see later the implications of the various flags.

When I said that a CFFHeader represents an abstract database, I meant that it is not really bound to a specific format internally. All it cares about is that data is retrieved or set. The standard format used by headers is SQLite and you’ll need to use that format when creating layouts associated to structures. However, when using structures from Python it can be handy to avoid an associated header file. When the number of structures is very limited and you don’t need write or other complex operations, structures can be stored into an XML string. In fact, the internal format of structures is XML. Let’s take a look at one:


  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  

We can inspect the format of a structure stored in a header from the Header Manager in the Explore tab by double clicking on it. But we can also avoid creating a header altogether and output the schema of parsed structures directly when importing them from C++. Just check ‘Test mode’ and as ‘Output’ select ‘schemas’.

Output schemas

Let’s import a simple structure such as:

struct A
{
    int a;
};

The output will be:


  

To use this structure from Python we can write the following code:

schema = """
""" hdr = CFFHeader() if hdr.LoadFromXml(schema): s = obj.MakeStruct(hdr, "A", 0) output(s)

As you can see it’s very simple. I’ll use this method for the examples in the rest of the post, because they’re just examples and there’s no point in creating a header file for them.

Pointers

CFFSO_Pointer16
CFFSO_Pointer32
CFFSO_Pointer64

As a rule of thumb if a structure contains a pointer (or a vtable pointer) it is always a good idea to specify the desired size. When the size is omitted both in the explicit options and in the default structure options, the size will be set to the default pointer size of an object, which apart for PEObjects and MachObjects will always be 32bits.

Endianness

CFFSO_EndianLittle
CFFSO_EndianBig
# or
CFFSO_EndiannessLittle
CFFSO_EndiannessBig

When endianness is not specified it will be set to the default of the object. While internally it’s already possible to have individual fields with different endianness, an extra XML field attribute to specify it will be added in the future.

Arrays

The first thing to say is that there’s a difference between an array of top level structures and an array of fields. Creating a top level array of structures is easy:

s = obj.MakeStructArray(hdr, "A", 0, 10)

The support of arrays is somewhat limited. Multidimensional arrays are only partially supported, in the sense that they will be converted to a single dimension. For instance:

struct A
{
    int a[10][10];
};

Or in XML:


  

Will be convrted to:

a.0 : 00905A4D
a.1 : 00000003
a.2 : 00000004
a.3 : 0000FFFF
a.4 : 000000B8
a.5 : 00000000
a.6 : 00000040
a.7 : 00000000
a.8 : 00000000
a.9 : 00000000
a.10: 00000000
a.11: 00000000
a.12: 00000000

; etc.

Also notice that to access an array element in a CFFStruct the syntax to use is not “a[15]” but “a.15”, e.g.:

print(s.Str("a.15"))

Sub-structures

The only thing to mention about Sub-structures is that complex sub-types are always dumped separately, e.g.:

struct A
{
    int a;
    struct SUB
    {
        int sub;
    } b;
};

In XML:


  



  
  

In Python:

schema = """
""" hdr = CFFHeader() if hdr.LoadFromXml(schema): s = obj.MakeStruct(hdr, "A", 0) output(s)

The output:

a    : 00905A4D
b.sub: 00000003

Being a separate type, we can also use ‘A::Sub’ without its parent.

A new thing we’ve just seen is the presence of multiple structures in a single XML header. I’ve pasted the whole Python code once again just for clarity, in the next examples I won’t repeat it, since the Python code never changes, only the header string does.

Unions

Unions just like sub-structures are fully supported. The only thing to keep in mind is that when we have a top level union, meaning not contained in another structure, such as:

union A
{
    int a;
    short b;
};

Then to access its members it is necessary to add a ‘u.’ prefix. The reason for this is that CFFStructs support unions only as members, so the union above will result in a CFFStruct with a union member called ‘u’.

u.a: 00905A4D
u.b: 5A4D

Anonymous types

Anonymous types are only partially supported in the sense that they are given a name when imported. A type such as the following:

struct A
{
    union
    {
        int a;
        int b;
    } u;
};

Results in the following xml:


  
  



  

As you can see a ‘_Type_’ + number naming convention has been used to rename anonymous types. The first character (‘_’) in the name represents the default anonymous prefix. This prefix is customizable. If a typedef is found for an anonymous type, then the new name for that type will created by using the anonymous prefix + the typedef name.

Bit-fields

Bit-fields are fully supported.

struct A
{
    int a : 1;
    int b : 4;
};

  
  

Output:

a: 01
b: 06
 : 0482D2

The unnamed field at the end represents the unused bits given the field size, in this case we have an ‘int’ type and we’ve used only 5 bits of it.

There are significant differences in how compilers handle bit-fields. Visual C++ behaves differently than GCC/Clang. Some of the differences are summarized in this message by Richard W.M. Jones.

Another important difference I noticed is how bit fields are coalesced when the type changes, e.g.:

struct A
{
    int a : 1;
    short b : 1;
    int c : 1;
};

Without going now into how they are coalesced, the thing to remember is that the Profiler handles all these cases, but you need to specify the compiler to obtain the correct result.

Namespaces

Namespaces are fully supported.

namespace N
{

struct A
{
    int a;
};

}

Results in:


  

Moreover, just as in C++ we can use namespaces to encapsulate #include directives.

namespace N
{

#include 

}

This will cause all the types declared in ‘Something’ to be prefixed by the namespace (‘N::’). This can be very handy when we want to include types with the same name into the same header file.

Inheritance

Inheritance is fully supported.

struct A
{
    int a;
};

struct B : public A
{
    int b;
};

XML:


  



  
    
  
  

Output:

a: 00905A4D
b: 00000003

Same with multiple inheritance:


  



  



  
    
    
  
  

Output:

a: 00905A4D
b: 00000003
c: 00000004

VTables

The presence of virtual table pointers in structures which require them is fully supported. Let’s take for instance:

struct A
{
    virtual void v() { }
    int a;
};

XML:


  
  
  
  
  

Output:

__vtable_ptr_0: 00905A4D
a             : 00000003

Let’s see an example with multiple inheritance:

struct A
{
    virtual void va() { }
    int a;
};

struct B
{
    virtual void vb() { }
    int b;
};

struct C : public A, public B
{
    int c;
};

Output:

__vtable_ptr_0: 00905A4D
__vtable_ptr_1: 00000003
a             : 00000004
b             : 0000FFFF
c             : 000000B8

When virtual tables are involved it is very important to specify the compiler, because things can vary a great deal between VC++ and GCC/Clang.

Virtual Inheritance

Virtual inheritance is fully supported. Virtual inheritance is a C++ feature to be used in scenarios which involve multiple inheritance with a common base class.

Let’s take the complex case of:

struct A
{
    int a;
    virtual void va() {}
};

struct B : public virtual A
{
    virtual void vb() {}
};

struct B2
{
    virtual void vb2() {}
};

struct C : public virtual A, public B
{
    int b;
    virtual void vc() {}
};

struct TOP
{
    int top;
    C c;
    virtual void vtop() {}
};

Output (Visual C++):

__vtable_ptr_0  : 00905A4D
top             : 00000003
c.__vtable_ptr_0: 00000004
c.__vtable_ptr_1: 0000FFFF
c.__vtable_ptr_2: 000000B8
c.b             : 00000000
c.a             : 00000040

Output (GCC):

__vtable_ptr_0  : 00905A4D
top             : 00000003
c.__vtable_ptr_0: 00000004
c.b             : 0000FFFF
c.a             : 000000B8

As you can see the layout differs from Visual C++ to GCC. Another thing to notice is that members of virtual base classes are appended at the end. There’s a very good presentation by Igor Skochinsky on C++ decompilation you can watch for more information.

Field alignment

Field alignment is an important factor. Structures which are not subject to packing constraints are aligned up to their biggest native member. It’s more complex than this, because sub-structures influence parent structures but not vice versa. Suffice it to say that there are some internal gotchas, but the Profiler should handle all cases correctly.

Packing

CFFSO_Pack1
CFFSO_Pack2
CFFSO_Pack4
CFFSO_Pack8
CFFSO_Pack16

When a packing constraint is applied, fields are aligned to either the field size or the packing whichever is less. A packing constraint of 1 is essential if we want to read raw data without any kind of padding between fields. For instance, PE structures in WinNT.h are all pragma packed to 1, so we must specify the same packing when using them.

Templates

And for the end a little treat: C++ templates. Let’s take for instance:

template 
struct A
{
    T a;
};

template 
struct B
{
    T b;
};

XML:


  



  

We can specify template parameters following the C++ syntax:

s = obj.MakeStruct(hdr, "B>", 0)

Output:

b.a: 00905A4D

So, even nested templates are supported. 😉