Monday, June 30, 2014

Destroying ROP gadgets with Inline code

Prerequisite Reading:
Previous ROP (Return Oriented Programming) article
Traditionally in computer science, software developers using higher level languages and abstractions should not need to think about how the lower levels of the system works. For example, when writing a network application, one should ideally not need to worry about how the sequence numbers of the TCP protocol works. Two possible exceptions to this rule could be for security and performance. For security specifically, learning about instruction sequences emitted by compilers might help to avoid writing higher level (C/C++) code that could be used in ROP exploits.
Normal non-inline functions have a binary code layout where multiple callers execute x86 "call" instructions to redirect execution to the address of the single instance of the non-inline function code in memory. However, an inline function in C/C++ is function whose emitted code is inserted by the compiler directly into the possibly multiple call sites of that function throughout the program. An example follows:
#include <Windows.h>

LPVOID notInlined()
       return VirtualAlloc(NULL, 4096, MEM_COMMIT, PAGE_EXECUTE_READWRITE);

__forceinline LPVOID inlined()
       return VirtualAlloc(NULL, 4096, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
void main()
       notInlined();//a call instruction will be placed here
       inlined();//the function’s code itself will be placed here
       //some additional code here
VirtualAlloc is a function that can be abused by ROP exploits to allocate Readable, Writeable and Executable memory. As shown in the C code above, the functions notInlined and inlined both call VirtualAlloc. Except for the __forceinline keyword in inlined, both notInlined and inlined are exactly identical in the C code. However, the binary code layout of each function looks very different.

notInlined disassembly:
push    ebp
mov     ebp, esp
push    40h
push    1000h
push    1000h
push    0
call    dword ptr[inlined!_imp__VirtualAlloc(0121b000)]
pop     ebp
main function disassembly:   
push    ebp
mov    ebp, esp
call    inlined!ILT + 0(_notInlined)(011f1005)
push    40h  //this
push    1000h  //is
push    1000h  //code  
push    0  //of
call    dword ptr[inlined!_imp__VirtualAlloc(0121b000)]  //inlined
//some additional code here
xor     eax, eax
pop     ebp
In the above disassembly, the code for notInlined is in its own function as we would expect, and can be executed and returned from, by an x86 “call” instruction from anywhere in the program. However, the disassembly for the inlined function (in red text) is placed inline in the main function (the call site).
The significance of the differing in-binary layouts of the two functions is that notInlined contains a very useful ROP gadget that ROP exploits can use, whereas the code for inlined does not contain the same ROP gadget. This difference is due to the fact that there is no x86 “ret” instruction in the code of inlined. If a ROP chain tried to execute inlined, if would be much more difficult to return from inlined back to the ROP chain.
In summary, the inline keyword can be used as an architecture, compiler, and OS portable way to destroy ROP gadgets in code where often abused APIs are called. The cost of inlining code however, is that it increases the code size in the binary. The reason for a larger code size in the example above is that if inlined was called from a large number of places in the program, the full code of inlined would be inserted in the binary that many times. As with all exploit mitigation schemes, there still might be ways to bypass this technique such as using jmp instructions rather than ret instructions to chain together gadgets.


Sunday, April 27, 2014


Prerequisite Reading:
Previous “Attacking V-Table Pointers” article
The web browser is a war zone. We continue to see the latest and most cutting edge research, mitigation technologies, and exploitation techniques in popular web browsers such as Internet Explorer. One advanced mitigation technology in particular is VTGuard, a run-time security check introduced in Internet Explorer 10. VTGuard verifies VTable pointers before calling into them in an effort to mitigate Use-After-Free Exploitation.
VTGuard relies on a secret cookie which should not be known by the attacker who is redirecting the VTable call. This secret cookie varies with the ASLR load address of the DLL in which the object’s VTable is implemented. Although the actual check occurs dynamically at runtime, the checking code is emitted by the compiler at compile-time, implying that the original source code needs to be modified to take advantage of this mitigation.
Below is the disassembly of a VTGuard cookie check before a virtual function call:
mov     eax,dword ptr [ebx] //ebx is a pointer to our CElement object. Now eax has a pointer to the VTable.
cmp dword ptr [eax+308h],offset MSHTML!__vtguard (728d76ee) //we check a cookie at the end of the VTable before we trust it to be a true VTable
jne     MSHTML!CElement::fireEvent+0x189 (7284c30f) //if its not, bail
mov     ecx,ebx //else, store this * into ECX as per C++ thiscall convention
call    dword ptr [eax+150h] //call into the VTable pointer
Assuming object d is an instance of a class that inherits from class B1, below is a depiction of how object d would be laid out in memory, with VTGuard in place.
VTable with VTGuard in place

Due to the difficulty of finding and removing all Use-After-Free vulnerabilities, VTGuard has its place as a strategic mitigation for increasing the difficulty of exploiting such vulnerabilities. Even if the attacker is able to reallocate the heap hole in a Use-After-Free vulnerability and craft a fake VTable (see prerequisite reading), VTGuard would still need to be bypassed.

Sunday, February 2, 2014

Page Heap

A large class of vulnerabilities in software is memory corruptions due to buffer overflows and underflows. An example of this type of vulnerability is an attempt to write more data into a buffer than the size of the buffer itself (buffer overflow). Page Heap is a class of Heap Allocation Policies that can be used when diagnosing or fuzzing for new buffer mismanagement vulnerabilities.

The difficulty of dealing with buffer mismanagement vulnerabilities is that the memory corruption is often not observable at the time that it happens. In the best case, memory gets corrupted badly enough at the time of the corruption, such that the program crashes immediately. In the worst case, although the memory corruption occurred, memory doesn’t get corrupted badly enough at the time of corruption (a “silent” corruption) and therefore there is no immediate crash. Rather the effect of the corruption is observable through some indirect misbehavior of the program later on in execution. An example of this worst case scenario is that an unexpected variable might have its value changed, thereby leading to a different and unexpected code execution path later on. Page Heap is useful for forcing a “fail-fast” policy for memory safety. Page Heap aims to trigger the crash immediately when the memory corruption occurs.

How it Works

In the x86 architecture, Virtual Memory gives each process the appearance that it has 4 GB of address space, whether the memory pages are actually committed or not. When an address on a page that is reserved but not committed is dereferenced, a page fault is generated by the hardware. Page Heap exploits this fact by reserving a guard page at the beginning or end of a page to make sure that any adjacent reads or writes to a buffer cross a page boundary onto a page that is not committed, leading to an immediate page fault.

Guard Pages are Reserved rather than Committed. Using the !address command returns MEM_RESERVE for the State, and PAGE_NOACCESS for the Protection, thereby disallowing reading, writing or executing on the page without taking up any physical memory or space in the page file. Due to the fact that guard pages are inaccessible but are taking up Virtual Address space, Virtual Address space is wasted. For this reason, page heap is useful for debugging, but could lead to problems in a production environment.

For the 3 main types of Heap Policies below, the following code (compiled to PageHeap.exe) will be used as an example, and the “vulnerable line of code here” comment will be replaced by one of the multiple lines of code in each section to trigger the crash.

#include "stdafx.h"

class Buffer
              //initialize our buffers
              memcpy(buffer1, "AAAAAAA", 8);
              memcpy(buffer2, "BBBBBBB", 8);

       char buffer1[8];
       char buffer2[8];

void main(int argc, char* argv[])
       Buffer * myBuffer = new Buffer(); 
       //vulnerable line of code here

       delete myBuffer;
       while (true){}

Full Page Heap

Command: “gflags /p /enable PageHeap.exe /full”


This Heap Allocation Policy allocates buffers at the END of the memory page. A buffer OVERflow would attempt to access memory PAST the END of the page into the guard page, triggering an immediate page fault.

Vulnerable line of code:

memcpy(myBuffer->buffer2 - 9, myBuffer->buffer1, 8); //Buffer Underflow (write)
memcpy(myBuffer->buffer2 + 1, myBuffer->buffer1, 8); //Buffer Overflow (write)
memcpy(myBuffer->buffer2, myBuffer->buffer1 + 9, 8); //Buffer Overflow (read)

Full Page Heap-allocation at end of page with guard page following

Reverse Page Heap

Command: “gflags /p /enable PageHeap.exe /full /backwards”


This Heap Allocation Policy allocates buffers at the BEGINNING of the memory page. A buffer UNDERflow would attempt to access memory BEFORE the BEGINNING of the page into the guard page, triggering an immediate page fault. Due to the way Full Page Heap handles Buffer Underflow write, one might expect Reverse Page Heap to catch Buffer Overflow writes, but it does not.

Vulnerable line of code:

memcpy(myBuffer->buffer2, myBuffer->buffer1 - 1, 8); //Buffer Underflow (read)
memcpy(myBuffer->buffer2 - 9, myBuffer->buffer1, 8); //Buffer Underflow (write)

Reverse Page Heap-allocation at beginning of page with guard page preceding

Standard Page Heap

Command: “gflags /p /enable PageHeap.exe”


This Heap Allocation Policy tries to save Virtual Address space by avoiding allocating extra guard pages for each allocation. Rather, it adds a special pattern before and after each allocation. This allows it to detect Buffer Overflow writes and Buffer Underflow writes, but not Over/Under flow reads of any kind. One noteworthy point is that the integrity of the surrounding patterns are not checked until the buffer is freed. The significance of this fact is that it does not provide a perfect “fail-fast” scenario because a corruption only causes failure on free, not at the time of corruption.

Vulnerable line of code:

memcpy(myBuffer->buffer2 - 9, myBuffer->buffer1, 8); //Buffer Underflow (write)
memcpy(myBuffer->buffer2 + 1, myBuffer->buffer1, 8); //Buffer Overflow (write)

Standard Page Heap-allocation with pattern preceding and following


Saturday, November 16, 2013

Visual Heap Spray

Prerequisite Reading:
Previous “Low Fragmentation Heap ReAllocation for Use After Free Exploitation” article
Previous “Attacking V-Table Pointers” article
Heap Sprays are a common method attackers use to introduce determinism in a program’s address space. They aim to control a program’s memory layout in such a way that an attacker can reliably predict what will be in memory at a certain address (Address of Interest) at a certain point in execution.

For example, if there is a Use-After-Free bug, on an object with a V-Table, the object can be reallocated and the offset of the V-Table pointer in the object can point to an address that the attacker knows will contain the spray (Address of Interest). This knowledge often comes from trial and error when writing the exploit.

The Address of Interest makes a big difference in the quality of the exploit. For example, a very popular Address of Interest is 0x0c0c0c0c. The reasoning behind this Address of Interest is that the address must be low in the process’s address space (the highest Nibble of this address is 0x0), yet must be at a higher address in memory than the heap being sprayed (the second highest Nibble is 0xc) so that when the heap grows due to the memory pressure of the spray, it will grow into this address. Using high addresses such as 0xc0c0c0c (the highest Nibble is 0xc) would require that the application freezes for a longer period of time before the heap spray is complete. A victim that is being targeted might get bored and close the process (web browser in this case) due to the fact that it has appeared to freeze during the long time taken to spray, thereby precluding any possibility of successful exploitation.
Below are some memory usage visualizations taken with the vmmap tool from SysInternals before and after the heap spray. The orange color represents the Backend Heap in the process’s address space. Two things to notice are the large growth in the orange segment of the graphs below and the difference in the “Committed” usage before and after the spray (it grows from about 136 MB to about 698 MB).
Before Spray:
Memory Usage before Heap Spray
 After Spray:
Memory Usage after Heap Spray
Below are graphical representations of the memory layout before and after the spray. The “After Spray” visualization has the approximate address of 0x0c0c0c0c marked for the reader’s convenience. One might make the argument that since 0x0c0c0c0c is relatively early in the heap spray, the heap spray could have been reduced to minimize the time the victim has to wait for the spray to finish.

Before Spray:
Memory Layout before Heap Spray

After Spray:
Memory Layout after Heap Spray

How to Heap Spray in Internet Explorer:

In IE, heap sprays are often done by allocating and assigning large strings from JavaScript. Sprays are often done on the Backend Heap (rather than the Low Fragmentation Heap). In order to get strings allocated on the Backend Heap, the strings must be larger than 16KB. Example JavaScript follows:

for (var z = primeAmount; z < numObjects; z++)
    objectArray[z].title = pattern;

The Heap Spraying technique does not come without some drawbacks, leading to some researchers referring to heap sprays as “For the 99%”. In some cases, exploitation can be made more reliable by finding multiple "good" bugs rather than heap spraying:
  • It might take a long time to spray (user might get impatient and terminate the program).
  • Depending on preexisting memory layout due to external factors (loaded plugins, other webpages visited prior to this one, etc), spraying can be unreliable.
  • Too much spraying might cause the Operating System to swap memory pages out to disk (depending on how much physical memory the victim’s machine has) and JavaScript Exceptions.
  • New IE mitigations might prevent highjacking virtual function calls.
  • There is no guarantee that the Address of Interest will contain the spray-an executable image or something else might be mapped at the Address of Interest, depending on the address space and system configuration unique to the victim. 
The community has done some great work to reduce the barrier of entry into this space. Multiple open source libraries have been written by researchers to abstract away the details of heap mechanics. In the example presented in this article, the heap reallocation/spray was done manually, but libraries such as HeapLib by Alex Sotirov and HeapLib2 by Chris Valasek allow users to just call into them in order to perform reallocation/sprays. Code review of HeapLib2 shows that this article, the prerequisite readings and HeapLib2 all use the same technique to reallocate and spray the heap.

Tuesday, October 8, 2013

Low Fragmentation Heap ReAllocation for Use After Free Exploitation

Use After Free (UAF) vulnerabilities occur when the application frees an object on the heap, and then tries to erroneously use it again (usually due to a stale pointer reference to the freed object). A common exploitation technique to target a UAF vulnerability is to try to reallocate heap memory between the time when the application frees the memory and when it erroneously dereferences the stale pointer to the freed memory. This reallocation would fill the “hole” left by the application when it initially freed the object. The typical timeline of this type of attack is as follows:

1) Application frees object on the heap
2) Attacker reallocates objects on the heap
3) Application erroneously dereferences freed pointer

Note that in normal circumstances, A UAF would crash the program due to an Access Violation. However, when the application is being exploited, the attacker’s reallocation serves 2 main purposes: 1) Make sure that there is data at the location pointed to by the stale pointer so the application doesn’t crash on erroneous dereference and 2) Craft the data in the reallocation in such a way that the dereference would help the attacker gain control over the Instruction Pointer (EIP on x86).
In Internet Explorer, exploits often use JavaScript to reallocate freed objects on a heap with the Windows Low Fragmentation Heap (LFH) policy activated and groom the backend heap in order to eventually gain control over the Instruction Pointer. The connection between JavaScript and freed objects is as follows:

"When string attributes of HTML objects are assigned in JavaScript, the strings are often stored in the same Front End LFH that the freed objects were stored in."

The significance of this statement is that attackers can craft maliciously formatted strings in JavaScript and have the strings fill holes left by the freed objects. In order to reallocate the holes left by the freed objects, the attacker must make sure the strings are of the same size as the freed objects, so that the stings will get allocated in the same LFH bucket as the freed object. Once the object of interest has been freed, and the attacker can assign those strings as attributes to an array of HTML nodes, and those strings are likely to be allocated on the same LFH as the freed object. Eventually, if this process is repeated enough, a reallocation of the “hole” left by the freed object would occur. This means that the next time the application dereferences the stale pointer (ie for a virtual function call), it would get the attacker’s string rather than the object it expects to be there. This would be how the attacker initially takes control of the Instruction Pointer. Below is an example of what this process may look like in JavaScript:

//create an array of HTML elements whose string attributes we will assign later
for (var i = 0 ; i < numObjects; i++)
    objectArray[i] = document.createElement('div');

"prime" LFH by allocating a few strings of the same size as the object of interest to enable the LFH bucket for this allocation size
for (var x = 0; x < primeAmount; x++)
    objectArray[x].title = pattern;

//application allocates object here

//application frees object here

//fill the hole left by the freed object assuming primeAmount < numObjects
for (var z = primeAmount; z < numObjects; z++)
    objectArray[z].title = pattern; //attributes should be allocated on LFH

//application erroneously uses object here

The JavaScript “pattern” string above was carefully crafted to meet the following criteria:

1) It must be the same size as the erroneously freed object, so that it will be reallocated in the same LFH Bucket.
2) Its value must be specifically crafted to help gain code execution. For example, if the freed object was a C++ object with a V-Table pointer, one of the first few DWORDs must point to a location in memory which the attacker controls (usually via a Heap Spray of the backend heap) that contains a malicious V-Table. For more information, see prior article about V-Tables.

For more information about the Low Fragmentation Heap, see:

Sunday, June 30, 2013

Heap Debugging Tricks

On Windows, the LFH (Low Fragmentation Heap) is commonly used to dynamically allocate small chunks of memory (<16KB in size). Many programs use the LFH as it is intended for high performance allocation/free of small objects, even in a multithreaded environment. Debugging memory corruptions on the heap can often be complex, but the following Windbg tricks may help:
  • !heap -p -a MEMORY_ADDRESS
This command is extremely useful when tracking down Use-After-Free vulnerabilities in applications. When run on a memory address, there are 2 relevant scenarios:

1) The memory address is inside a current allocation- Windbg displays the call stack that led to the allocation.

2) The memory address is inside an allocation that was freed- Windbg displays the call stack that led to the free.

Using this command requires the page heap to be enabled. Page heap can be enabled per application using the gflags.exe utility that is distributed in the Windbg package.

  • Caller Based Conditional Breakpoint

This Caller Based Conditional Breakpoint is very useful when one is interested in the place where the object gets allocated/freed, especially when the allocation/free codepath is very “hot” (meaning it is executed very often). The idea behind this breakpoint is to only break when there is a certain function (MemberFunction in the example below) on the call stack (ie only break if a certain function called this function directly or indirectly). If a breakpoint is placed on the constructor/destructor of the object’s class, and the object is allocated/freed very often, this could result in the breakpoint getting hit many more times that a human can reasonably look at. For this reason, placing a Caller Based Conditional Breakpoint in the constructor/destructor usually greatly reduces the number of times the breakpoint is hit, allowing a human to reasonably be able to investigate each time the breakpoint is hit. The Caller Based Conditional Breakpoint is of the following format:
bp Module!MyFunctionWithConditionalBreakpoint "r $t0 = 0;.foreach (v { k }) { .if ($spat(\"v\", \"*Module!ClassA:MemberFunction*\")) { r $t0 = 1;.break } }; .if($t0 = 0) { gc }"

  • Register Watching Breakpoint

The Register Watching Breakpoint is a slight variation on the Caller Based Conditional Breakpoint. In the Register Watching Breakpoint, instead of actually breaking when the appropriate caller is on the call stack, the command in Caller Based Conditional Breakpoint is modified to just print the CPU registers (using the ‘r’ command). Very often, in destructors, the address of the object that is being destroyed is in one of the CPU registers. The end result would be that each time the Register Watching Breakpoint is hit, the CPU registers will be printed rather than breaking execution. Since computers are very deterministic machines, if there is no entropy introduced in the program’s execution, one can count the number of times the breakpoint was hit before the object of interest was freed, and predict it the next time the program is run, and eventually get a live debugging session which is watching the object as it is being freed. The Register Watching Breakpoint is of the following format:

bp Module!MyFunctionWithConditionalBreakpoint "r $t0 = 0;.foreach (v { k }) { .if ($spat(\"v\", \"* Module!ClassA:MemberFunction *\")) { r; r $t0 = 1; gc; } }; .if($t0 = 0) { gc }"


Monday, April 29, 2013

ROP (Return Oriented Programming)

Prerequisite Reading: previous “Stack Pivoting” article
Cyber security seems to be an arms race between attackers and defenders (in addition to the arms race between nations). Every time defenders devise a new mechanism to defend computers and mitigate exploits, attackers seem to find a way around it. Such was the case with DEP (Data Execution Prevention). Defenders used this mechanism to prevent execution from regions of memory that were supposed to contain data only rather than code. This was supposed to prevent attackers from executing shellcode from memory structures such as the program stack or the heap. To bypass DEP, the ROP exploitation technique was devised. It is similar to the idea of Ret2LibC [1].
ROP works by taking advantage of the fact that the attacker can manipulate program execution control data. In this technique, the attacker injects a fake call stack and executes a “stack pivot” (see prerequisite reading) to pivot to it. The call stack can be thought of as recording the causality chain [2] that specifies how execution got to its current position (ie which functions called which functions in order to get to the current function). When returning from the current function, the normal call stack serves the purpose of controlling where the execution will return to. For example with the normal stack, the immediate return address is supposed to be in the function that directly called the currently executing function. Rather than pointing in the functions that are part of the current causality chain, each return address in the fake call stack points to what is known as a “ROP Gadget”.
A ROP Gadget is any “useful instruction(s)” to an attacker followed by a return instruction. The instruction(s) that are considered “useful” depend on the vulnerability the attacker is trying to exploit. The return instruction gets the next value off of the attacker controlled call stack, which in turn points to the next ROP Gadget to be executed. One crucial property of a ROP gadget is that it must be at a predictable address in memory every time the vulnerable program is executed, so the fake call stack can accurately point to the intended ROP gadgets. A stack pivot gadget is a subset of the more general ROP Gadget in that the useful instruction(s) of a stack pivot gadget switches the value of the ESP register from the real stack to the fake stack. Examples of the stack pivot instructions are given in the prerequisite reading. Here are some examples of general ROP gadgets:
  • push EAX
  • pop ECX
  • sub EAX, 4
  • pop EBX
    xor EAX, EAX
  • add ECX, 8
ROP exploits work because the attacker tricks the machine into using attacker controlled program control data which is injected into a page that’s not necessarily executable. This technically is allowed even with DEP enabled because the fake call stack bytes injected by the attacker are technically not being executed. Rather, they are just controlling where execution of the CPU will go next. In some sense, the attacker is actually turning the process’s address space against itself by using instructions already present in executable sections, but just executing them in different orders to achieve the attacker’s intentions. Often the end goal of ROP exploits is to make executable a currently non-executable region in memory, so that shellcode that has been injected into that region can be executed. This requires a return address on the fake call stack to contain a pointer to a function (such as VirtualProtect) along with the required parameters.
The execution of a ROP exploit looks similar to the following once the fake call stack is injected in memory:
  1. Execute stack pivot gadget to pass control to the fake call stack
  2. Execute a ROP Gadget at the top of the fake callstack
    1. Execute “useful instruction(s)"
    2. Execute a return instruction
      1. If the next return address is another ROP gadget, goes back to step 2.1
      2. Else if the next return address is a function, executes that function using parameters that are on the fake stack
Above, step 2.2.1 is implicitly “goto step 2.1” if the return address points to another ROP gadget. This forms a repetitive chain, also known as a “ROP Chain”. A function can be executed if the fake call stack contains the function address and the required parameters (step 2.2.2).
An example of a ROP exploit follows. A local variable buffer on the stack has already been overflowed and the return address of the current stack frame has been overwritten with the address of the below stack pivot gadget. That function has already returned to the stack pivot gadget below and the stack pivot instruction below has already been executed.
Stack Pivot Gadget:
5c   pop ESP    //actual stack pivot instruction (already executed)
c3   ret               //EIP points here. This is the next instruction to be executed

Fake Call Stack:

Fake Call stack right before "ret" instruction of the stack pivot gadget is executed
Above, the ROP exploit is ready to be executed. The ROP exploit leverages a stack based buffer overflow vulnerability (with DEP enabled on the target process) to pop a message box saying “You got pwn3d”, which represents arbitrary code execution. As presented above, the fake call stack has the addresses of various functions to execute, along with arguments to those functions. These steps correspond to 2.2.2 of the ROP execution steps outlined above.

Defenders created DEP to stop shellcode execution from data-only regions of memory. Attackers created ROP to bypass DEP. Then, Defenders created ASLR to stop ROP exploits. And so the cyber security arms race goes on and on…