Sunday, February 2, 2014

Page Heap

A large class of vulnerabilities in software is memory corruptions due to buffer overflows and underflows. An example of this type of vulnerability is an attempt to write more data into a buffer than the size of the buffer itself (buffer overflow). Page Heap is a class of Heap Allocation Policies that can be used when diagnosing or fuzzing for new buffer mismanagement vulnerabilities.

The difficulty of dealing with buffer mismanagement vulnerabilities is that the memory corruption is often not observable at the time that it happens. In the best case, memory gets corrupted badly enough at the time of the corruption, such that the program crashes immediately. In the worst case, although the memory corruption occurred, memory doesn’t get corrupted badly enough at the time of corruption (a “silent” corruption) and therefore there is no immediate crash. Rather the effect of the corruption is observable through some indirect misbehavior of the program later on in execution. An example of this worst case scenario is that an unexpected variable might have its value changed, thereby leading to a different and unexpected code execution path later on. Page Heap is useful for forcing a “fail-fast” policy for memory safety. Page Heap aims to trigger the crash immediately when the memory corruption occurs.

How it Works

In the x86 architecture, Virtual Memory gives each process the appearance that it has 4 GB of address space, whether the memory pages are actually committed or not. When an address on a page that is reserved but not committed is dereferenced, a page fault is generated by the hardware. Page Heap exploits this fact by reserving a guard page at the beginning or end of a page to make sure that any adjacent reads or writes to a buffer cross a page boundary onto a page that is not committed, leading to an immediate page fault.

Guard Pages are Reserved rather than Committed. Using the !address command returns MEM_RESERVE for the State, and PAGE_NOACCESS for the Protection, thereby disallowing reading, writing or executing on the page without taking up any physical memory or space in the page file. Due to the fact that guard pages are inaccessible but are taking up Virtual Address space, Virtual Address space is wasted. For this reason, page heap is useful for debugging, but could lead to problems in a production environment.

For the 3 main types of Heap Policies below, the following code (compiled to PageHeap.exe) will be used as an example, and the “vulnerable line of code here” comment will be replaced by one of the multiple lines of code in each section to trigger the crash.


#include "stdafx.h"

class Buffer
{
public:
       Buffer()
       {
              //initialize our buffers
              memcpy(buffer1, "AAAAAAA", 8);
              memcpy(buffer2, "BBBBBBB", 8);
       }

       char buffer1[8];
       char buffer2[8];
};

void main(int argc, char* argv[])
{
       Buffer * myBuffer = new Buffer(); 
       //vulnerable line of code here

       delete myBuffer;
       while (true){}
}


Full Page Heap

Command: “gflags /p /enable PageHeap.exe /full”

Description:

This Heap Allocation Policy allocates buffers at the END of the memory page. A buffer OVERflow would attempt to access memory PAST the END of the page into the guard page, triggering an immediate page fault.

Vulnerable line of code:


memcpy(myBuffer->buffer2 - 9, myBuffer->buffer1, 8); //Buffer Underflow (write)
memcpy(myBuffer->buffer2 + 1, myBuffer->buffer1, 8); //Buffer Overflow (write)
memcpy(myBuffer->buffer2, myBuffer->buffer1 + 9, 8); //Buffer Overflow (read)


Allocation:
Full Page Heap-allocation at end of page with guard page following

Reverse Page Heap

Command: “gflags /p /enable PageHeap.exe /full /backwards”

Description:

This Heap Allocation Policy allocates buffers at the BEGINNING of the memory page. A buffer UNDERflow would attempt to access memory BEFORE the BEGINNING of the page into the guard page, triggering an immediate page fault. Due to the way Full Page Heap handles Buffer Underflow write, one might expect Reverse Page Heap to catch Buffer Overflow writes, but it does not.

Vulnerable line of code:


memcpy(myBuffer->buffer2, myBuffer->buffer1 - 1, 8); //Buffer Underflow (read)
memcpy(myBuffer->buffer2 - 9, myBuffer->buffer1, 8); //Buffer Underflow (write)


Allocation:
Reverse Page Heap-allocation at beginning of page with guard page preceding

Standard Page Heap

Command: “gflags /p /enable PageHeap.exe”

Description:

This Heap Allocation Policy tries to save Virtual Address space by avoiding allocating extra guard pages for each allocation. Rather, it adds a special pattern before and after each allocation. This allows it to detect Buffer Overflow writes and Buffer Underflow writes, but not Over/Under flow reads of any kind. One noteworthy point is that the integrity of the surrounding patterns are not checked until the buffer is freed. The significance of this fact is that it does not provide a perfect “fail-fast” scenario because a corruption only causes failure on free, not at the time of corruption.

Vulnerable line of code:


memcpy(myBuffer->buffer2 - 9, myBuffer->buffer1, 8); //Buffer Underflow (write)
memcpy(myBuffer->buffer2 + 1, myBuffer->buffer1, 8); //Buffer Overflow (write)


Allocation:
Standard Page Heap-allocation with pattern preceding and following


References:

http://msdn.microsoft.com/en-us/library/ms220938(v=vs.90).aspx

Saturday, November 16, 2013

Visual Heap Spray

Prerequisite Reading:
Previous “Low Fragmentation Heap ReAllocation for Use After Free Exploitation” article
Previous “Attacking V-Table Pointers” article
Heap Sprays are a common method attackers use to introduce determinism in a program’s address space. They aim to control a program’s memory layout in such a way that an attacker can reliably predict what will be in memory at a certain address (Address of Interest) at a certain point in execution.

For example, if there is a Use-After-Free bug, on an object with a V-Table, the object can be reallocated and the offset of the V-Table pointer in the object can point to an address that the attacker knows will contain the spray (Address of Interest). This knowledge often comes from trial and error when writing the exploit.

The Address of Interest makes a big difference in the quality of the exploit. For example, a very popular Address of Interest is 0x0c0c0c0c. The reasoning behind this Address of Interest is that the address must be low in the process’s address space (the highest Nibble of this address is 0x0), yet must be at a higher address in memory than the heap being sprayed (the second highest Nibble is 0xc) so that when the heap grows due to the memory pressure of the spray, it will grow into this address. Using high addresses such as 0xc0c0c0c (the highest Nibble is 0xc) would require that the application freezes for a longer period of time before the heap spray is complete. A victim that is being targeted might get bored and close the process (web browser in this case) due to the fact that it has appeared to freeze during the long time taken to spray, thereby precluding any possibility of successful exploitation.
Visualizations:
Below are some memory usage visualizations taken with the vmmap tool from SysInternals before and after the heap spray. The orange color represents the Backend Heap in the process’s address space. Two things to notice are the large growth in the orange segment of the graphs below and the difference in the “Committed” usage before and after the spray (it grows from about 136 MB to about 698 MB).
Before Spray:
 
Memory Usage before Heap Spray
 After Spray:
Memory Usage after Heap Spray
Below are graphical representations of the memory layout before and after the spray. The “After Spray” visualization has the approximate address of 0x0c0c0c0c marked for the reader’s convenience. One might make the argument that since 0x0c0c0c0c is relatively early in the heap spray, the heap spray could have been reduced to minimize the time the victim has to wait for the spray to finish.

Before Spray:
 
Memory Layout before Heap Spray

After Spray:
Memory Layout after Heap Spray

How to Heap Spray in Internet Explorer:

In IE, heap sprays are often done by allocating and assigning large strings from JavaScript. Sprays are often done on the Backend Heap (rather than the Low Fragmentation Heap). In order to get strings allocated on the Backend Heap, the strings must be larger than 16KB. Example JavaScript follows:

for (var z = primeAmount; z < numObjects; z++)
    objectArray[z].title = pattern;

The Heap Spraying technique does not come without some drawbacks, leading to some researchers referring to heap sprays as “For the 99%”. In some cases, exploitation can be made more reliable by finding multiple "good" bugs rather than heap spraying:
  • It might take a long time to spray (user might get impatient and terminate the program).
  • Depending on preexisting memory layout due to external factors (loaded plugins, other webpages visited prior to this one, etc), spraying can be unreliable.
  • Too much spraying might cause the Operating System to swap memory pages out to disk (depending on how much physical memory the victim’s machine has) and JavaScript Exceptions.
  • New IE mitigations might prevent highjacking virtual function calls.
  • There is no guarantee that the Address of Interest will contain the spray-an executable image or something else might be mapped at the Address of Interest, depending on the address space and system configuration unique to the victim. 
Libraries:
The community has done some great work to reduce the barrier of entry into this space. Multiple open source libraries have been written by researchers to abstract away the details of heap mechanics. In the example presented in this article, the heap reallocation/spray was done manually, but libraries such as HeapLib by Alex Sotirov and HeapLib2 by Chris Valasek allow users to just call into them in order to perform reallocation/sprays. Code review of HeapLib2 shows that this article, the prerequisite readings and HeapLib2 all use the same technique to reallocate and spray the heap.

Tuesday, October 8, 2013

Low Fragmentation Heap ReAllocation for Use After Free Exploitation

Use After Free (UAF) vulnerabilities occur when the application frees an object on the heap, and then tries to erroneously use it again (usually due to a stale pointer reference to the freed object). A common exploitation technique to target a UAF vulnerability is to try to reallocate heap memory between the time when the application frees the memory and when it erroneously dereferences the stale pointer to the freed memory. This reallocation would fill the “hole” left by the application when it initially freed the object. The typical timeline of this type of attack is as follows:

1) Application frees object on the heap
2) Attacker reallocates objects on the heap
3) Application erroneously dereferences freed pointer

Note that in normal circumstances, A UAF would crash the program due to an Access Violation. However, when the application is being exploited, the attacker’s reallocation serves 2 main purposes: 1) Make sure that there is data at the location pointed to by the stale pointer so the application doesn’t crash on erroneous dereference and 2) Craft the data in the reallocation in such a way that the dereference would help the attacker gain control over the Instruction Pointer (EIP on x86).
In Internet Explorer, exploits often use JavaScript to reallocate freed objects on a heap with the Windows Low Fragmentation Heap (LFH) policy activated and groom the backend heap in order to eventually gain control over the Instruction Pointer. The connection between JavaScript and freed objects is as follows:

"When string attributes of HTML objects are assigned in JavaScript, the strings are often stored in the same Front End LFH that the freed objects were stored in."

The significance of this statement is that attackers can craft maliciously formatted strings in JavaScript and have the strings fill holes left by the freed objects. In order to reallocate the holes left by the freed objects, the attacker must make sure the strings are of the same size as the freed objects, so that the stings will get allocated in the same LFH bucket as the freed object. Once the object of interest has been freed, and the attacker can assign those strings as attributes to an array of HTML nodes, and those strings are likely to be allocated on the same LFH as the freed object. Eventually, if this process is repeated enough, a reallocation of the “hole” left by the freed object would occur. This means that the next time the application dereferences the stale pointer (ie for a virtual function call), it would get the attacker’s string rather than the object it expects to be there. This would be how the attacker initially takes control of the Instruction Pointer. Below is an example of what this process may look like in JavaScript:

//create an array of HTML elements whose string attributes we will assign later
for (var i = 0 ; i < numObjects; i++)
    objectArray[i] = document.createElement('div');

/*
"prime" LFH by allocating a few strings of the same size as the object of interest to enable the LFH bucket for this allocation size
*/
for (var x = 0; x < primeAmount; x++)
    objectArray[x].title = pattern;

//application allocates object here

//application frees object here

//fill the hole left by the freed object assuming primeAmount < numObjects
for (var z = primeAmount; z < numObjects; z++)
    objectArray[z].title = pattern; //attributes should be allocated on LFH

//application erroneously uses object here

The JavaScript “pattern” string above was carefully crafted to meet the following criteria:

1) It must be the same size as the erroneously freed object, so that it will be reallocated in the same LFH Bucket.
2) Its value must be specifically crafted to help gain code execution. For example, if the freed object was a C++ object with a V-Table pointer, one of the first few DWORDs must point to a location in memory which the attacker controls (usually via a Heap Spray of the backend heap) that contains a malicious V-Table. For more information, see prior article about V-Tables.


For more information about the Low Fragmentation Heap, see:

Sunday, June 30, 2013

Heap Debugging Tricks

On Windows, the LFH (Low Fragmentation Heap) is commonly used to dynamically allocate small chunks of memory (<16KB in size). Many programs use the LFH as it is intended for high performance allocation/free of small objects, even in a multithreaded environment. Debugging memory corruptions on the heap can often be complex, but the following Windbg tricks may help:
  • !heap -p -a MEMORY_ADDRESS
This command is extremely useful when tracking down Use-After-Free vulnerabilities in applications. When run on a memory address, there are 2 relevant scenarios:

1) The memory address is inside a current allocation- Windbg displays the call stack that led to the allocation.


2) The memory address is inside an allocation that was freed- Windbg displays the call stack that led to the free.



Using this command requires the page heap to be enabled. Page heap can be enabled per application using the gflags.exe utility that is distributed in the Windbg package.


  • Caller Based Conditional Breakpoint

This Caller Based Conditional Breakpoint is very useful when one is interested in the place where the object gets allocated/freed, especially when the allocation/free codepath is very “hot” (meaning it is executed very often). The idea behind this breakpoint is to only break when there is a certain function (MemberFunction in the example below) on the call stack (ie only break if a certain function called this function directly or indirectly). If a breakpoint is placed on the constructor/destructor of the object’s class, and the object is allocated/freed very often, this could result in the breakpoint getting hit many more times that a human can reasonably look at. For this reason, placing a Caller Based Conditional Breakpoint in the constructor/destructor usually greatly reduces the number of times the breakpoint is hit, allowing a human to reasonably be able to investigate each time the breakpoint is hit. The Caller Based Conditional Breakpoint is of the following format:
bp Module!MyFunctionWithConditionalBreakpoint "r $t0 = 0;.foreach (v { k }) { .if ($spat(\"v\", \"*Module!ClassA:MemberFunction*\")) { r $t0 = 1;.break } }; .if($t0 = 0) { gc }"


  • Register Watching Breakpoint

The Register Watching Breakpoint is a slight variation on the Caller Based Conditional Breakpoint. In the Register Watching Breakpoint, instead of actually breaking when the appropriate caller is on the call stack, the command in Caller Based Conditional Breakpoint is modified to just print the CPU registers (using the ‘r’ command). Very often, in destructors, the address of the object that is being destroyed is in one of the CPU registers. The end result would be that each time the Register Watching Breakpoint is hit, the CPU registers will be printed rather than breaking execution. Since computers are very deterministic machines, if there is no entropy introduced in the program’s execution, one can count the number of times the breakpoint was hit before the object of interest was freed, and predict it the next time the program is run, and eventually get a live debugging session which is watching the object as it is being freed. The Register Watching Breakpoint is of the following format:

bp Module!MyFunctionWithConditionalBreakpoint "r $t0 = 0;.foreach (v { k }) { .if ($spat(\"v\", \"* Module!ClassA:MemberFunction *\")) { r; r $t0 = 1; gc; } }; .if($t0 = 0) { gc }"


References:


Monday, April 29, 2013

ROP (Return Oriented Programming)


Prerequisite Reading: previous “Stack Pivoting” article
Cyber security seems to be an arms race between attackers and defenders (in addition to the arms race between nations). Every time defenders devise a new mechanism to defend computers and mitigate exploits, attackers seem to find a way around it. Such was the case with DEP (Data Execution Prevention). Defenders used this mechanism to prevent execution from regions of memory that were supposed to contain data only rather than code. This was supposed to prevent attackers from executing shellcode from memory structures such as the program stack or the heap. To bypass DEP, the ROP exploitation technique was devised. It is similar to the idea of Ret2LibC [1].
ROP works by taking advantage of the fact that the attacker can manipulate program execution control data. In this technique, the attacker injects a fake call stack and executes a “stack pivot” (see prerequisite reading) to pivot to it. The call stack can be thought of as recording the causality chain [2] that specifies how execution got to its current position (ie which functions called which functions in order to get to the current function). When returning from the current function, the normal call stack serves the purpose of controlling where the execution will return to. For example with the normal stack, the immediate return address is supposed to be in the function that directly called the currently executing function. Rather than pointing in the functions that are part of the current causality chain, each return address in the fake call stack points to what is known as a “ROP Gadget”.
A ROP Gadget is any “useful instruction(s)” to an attacker followed by a return instruction. The instruction(s) that are considered “useful” depend on the vulnerability the attacker is trying to exploit. The return instruction gets the next value off of the attacker controlled call stack, which in turn points to the next ROP Gadget to be executed. One crucial property of a ROP gadget is that it must be at a predictable address in memory every time the vulnerable program is executed, so the fake call stack can accurately point to the intended ROP gadgets. A stack pivot gadget is a subset of the more general ROP Gadget in that the useful instruction(s) of a stack pivot gadget switches the value of the ESP register from the real stack to the fake stack. Examples of the stack pivot instructions are given in the prerequisite reading. Here are some examples of general ROP gadgets:
  • push EAX
    ret
  • pop ECX
    ret
  • sub EAX, 4
    ret
  • pop EBX
    xor EAX, EAX
    ret
  • add ECX, 8
    ret
ROP exploits work because the attacker tricks the machine into using attacker controlled program control data which is injected into a page that’s not necessarily executable. This technically is allowed even with DEP enabled because the fake call stack bytes injected by the attacker are technically not being executed. Rather, they are just controlling where execution of the CPU will go next. In some sense, the attacker is actually turning the process’s address space against itself by using instructions already present in executable sections, but just executing them in different orders to achieve the attacker’s intentions. Often the end goal of ROP exploits is to make executable a currently non-executable region in memory, so that shellcode that has been injected into that region can be executed. This requires a return address on the fake call stack to contain a pointer to a function (such as VirtualProtect) along with the required parameters.
The execution of a ROP exploit looks similar to the following once the fake call stack is injected in memory:
  1. Execute stack pivot gadget to pass control to the fake call stack
  2. Execute a ROP Gadget at the top of the fake callstack
    1. Execute “useful instruction(s)"
    2. Execute a return instruction
      1. If the next return address is another ROP gadget, goes back to step 2.1
      2. Else if the next return address is a function, executes that function using parameters that are on the fake stack
Above, step 2.2.1 is implicitly “goto step 2.1” if the return address points to another ROP gadget. This forms a repetitive chain, also known as a “ROP Chain”. A function can be executed if the fake call stack contains the function address and the required parameters (step 2.2.2).
An example of a ROP exploit follows. A local variable buffer on the stack has already been overflowed and the return address of the current stack frame has been overwritten with the address of the below stack pivot gadget. That function has already returned to the stack pivot gadget below and the stack pivot instruction below has already been executed.
Stack Pivot Gadget:
5c   pop ESP    //actual stack pivot instruction (already executed)
c3   ret               //EIP points here. This is the next instruction to be executed

Fake Call Stack:


Fake Call stack right before "ret" instruction of the stack pivot gadget is executed
Above, the ROP exploit is ready to be executed. The ROP exploit leverages a stack based buffer overflow vulnerability (with DEP enabled on the target process) to pop a message box saying “You got pwn3d”, which represents arbitrary code execution. As presented above, the fake call stack has the addresses of various functions to execute, along with arguments to those functions. These steps correspond to 2.2.2 of the ROP execution steps outlined above.

Defenders created DEP to stop shellcode execution from data-only regions of memory. Attackers created ROP to bypass DEP. Then, Defenders created ASLR to stop ROP exploits. And so the cyber security arms race goes on and on…


References:

Monday, February 11, 2013

Attacking V-Table Pointers

A common attack vector for software written in C++ is V-table pointer overwrites. When C++ objects are allocated on the heap, such as when the "new" keyword is used, they often get put next to other objects that are also on the heap. If there is an unbounded write to one of the objects on the heap before an object using V-tables, this type of attack is feasible.

Windows has mitigations in its userland heap manager that can make it difficult to guess which objects will be next to each other on the heap. This means that even if an attacker knows that there is an unbounded write to an object on the heap, the attacker would not know what object is right after it on the heap, making it much more difficult to exploit reliably.


The following example code uses Virtual functions, which imply V-table usage when compiled with the Microsoft Visual C++ compiler:


/*
the following class definitions were modified from
http://en.wikipedia.org/wiki/Virtual_function_table
*/

#include <iostream>
using namespace std;

class B1    //base class
{
public:
  virtual void f0() {}
  virtual void f1() {}
};

class B2    //base class
{
public:
  virtual void f2() {}
  virtual void f3() {}
};

class D : public B1, public B2 {    //derived class inherits both base classes
public:
  void d() {}
  void f0() {}  // override B1::f0()
  void f1() {}  // override B1::f1()
  void f2() {}  // override B2::f2()
  void f3() {}  // override B2::f3()
};

int main(int argc, char* argv[])
{
    B2 *b2 = new B2();
    D  *d  = new D();
    d->f0();    //vtable lookup
    d->f1();    //vtable lookup
    d->f2();    //vtable lookup
    d->f3();    //vtable lookup
}


Below is the relevant assembly code of the above C++ code showing how virtual functions are accessed in objects that make use of virtual functions:


    d->f0();    //V-table lookup
mov         eax,dword ptr [ebp-14h]//eax=address of d object
mov         edx,dword ptr [eax]    //edx=first dword in d object(pointer to B1 V-table)
mov         eax,dword ptr [edx]    //eax=first entry in B1 V-Table(address of f0)
call        eax

    d->f1();    //V-table lookup
mov         eax,dword ptr [ebp-14h]//eax=address of d object
mov         edx,dword ptr [eax]    //edx=first dword in d object(pointer to B1 V-table)
mov         eax,dword ptr [edx+4]  //eax=second entry in B1 V-Table(address of f1)
call        eax

    d->f2();    //V-table lookup
mov         eax,dword ptr [ebp-14h]//eax=address of d object
mov         edx,dword ptr [eax+4]  //edx=second dword in d object(pointer to B2 V-table)
mov         eax,dword ptr [edx]    //eax=first entry in B2 V-table(address of f2)
call        eax

    d->f3();    //V-table lookup
mov         eax,dword ptr [ebp-14h]//eax=address of d object
mov         edx,dword ptr [eax+4]  //edx=second dword in d object(pointer to B2 V-table)
mov         eax,dword ptr [edx+4]  //eax=second entry in B2 V-table(address of f3)
call        eax


The common pattern in all of these virtual function lookups is as follows:

  1. Dereference the object pointer which contains the V-table.
  2. Dereference the relevant V-Table pointer within the object from step 1.
  3. Dereference the relevant function pointer inside the V-table from step 2.
  4. Call the function found in step 3.


In Windbg, we can verify that d was indeed allocated on the heap because our local variables are:

0:000> dv
           argc = 0n1
           argv = 0x00574660
              d = 0x00574720
             b2 = 0x005746e0


More info about where our d object is allocated:
0:000> !heap -p -a 0x00574720

    address 00574720 found in
    _HEAP @ 570000

      HEAP_ENTRY Size Prev Flags    UserPtr UserSize - state
        005746f8 0009 0000  [00]   00574700    0002c - (busy)

Below is a graphical depiction of the relationship between our heap objects and the V-tables they reference. 



Normal V-table layout


If we wanted to exploit a V-table pointer overwrite and highjack a call to d->f1(), we have to make sure our "fake V-table" and attacker code is in place before executing the call. For this example let's assume the "fake V-table" is at 0xDEADBEEF and our attacker code is at 0x41414141. This can be achieved by memory spraying, which would ensure that the following predicates would be true:


  1. Address 0xDEADBEEF has already been allocated and is readable.
  2. The DWORD at 4 bytes past 0xDEADBEEF(doing the math, that would just be 0xDEADBEF3) is the address of attacker code that we want to execute.
  3. Our attacker code exists at 0x41414141.

We would need to overwrite the pointer(stored in the heap-allocated object) to the B1 class V-table with the value 0xDEADBEEF. In the following example, we overwrote the pointer(stored at 0x00574720) to the B1 V-table with the value 0xDEADBEEF.


Now, if a call to d->f1() happened, the follow sequence of events would occur:


    d->f1();    //V-table lookup
mov         eax,dword ptr [ebp-14h]//eax=address of d object
mov         edx,dword ptr [eax]    //edx=first dword in d object(our 0xDEADBEEF value)
mov         eax,dword ptr [edx+4]  //eax=0x41414141
call        eax                    //call into our attacker code instead of d->f1()

Overwritten V-table pointers with sprayed fake V-table and attacker code