When a programmer builds his native(for example, C code-not any interpreted languages) source code, there are many complex steps that happen behind the scenes to generate the end product: executable binaries. The process can be broken into 2 high level steps: compiling and linking. Compiling actually parses the source code and generates binary "object" files. The linking step links all these object files together into the final product: an executable.
The final executable has the machine code that the CPU of the computer is able to understand and execute once the executable is loaded in memory. In fact, the executable file is so close to the final image loaded into memory, that Windows actually memory maps the executable file into from disk into memory and does a few small fixups(such as relocations) to prepare for execution.
Windows uses the PE(Portable Executable) format. This file format is mostly what I will talk about in this post. At the beginning of a PE executable file, there exists the PE header. In writing this blog post, I referenced Matt Pietrek's lengthy and informative article "Peering Inside the PE: A Tour of the Win32 Portable Executable File Format" which can be found here(http://msdn.microsoft.com/en-us/library/ms809762.aspx). If you are interested seeing the exact way the data structures are defined and layed out, check out WinNt.h, which is publicly distributed by Microsoft.
At the beginning of a PE file, is a header which gives information about the kind of executable this is, and the different locations of different things in the file. A depiction of the PE header is shown below:
Rest of executable
e_lfanew(points to PE header)(1.1)
1) DOS Header-this part of the file contains MS-DOS based executable code which is automatically included in the executable, and meant to print a message similar to "This program cannot be run in DOS mode." Below is a screenshot of the hex display of the DOS header of user32.dll:
|DOS header of user32.dll|
2) PE Header-this is the actual header that identifies the executable as a PE executable
2.1) Signature-this is the signature of the header. If indeed, this is a PE file, the textual representation of this signature will be "PE\0\0", because WinNt.h defines the NT signature as 0x50450000. Note the difference in endianness between the numerical value and the textual characters visible in the file.
2.2) FileHeader-this structure contains information about the machine architecture that this executable is meant for, and other characteristics of this file.
2.3) OptionalHeader-this header isn't actually optional in PE executables, and contains information such as the linker version, operating system version, and flags.
3) SectionTable-this contains information about sections of the file. It is essentially an array of IMAGE_SECTION_HEADER structures, discussed below.
A section within an executable is the way the parts of the binary are organized. Some sections contain the actual machine code executed by the CPU, while other sections contain the program's global variables. PE guarantees a section's contents is stored contiguously in the executable file. Here are the noteworthy sections I found:
- .text-executable code goes here. In the PE format, all the object files are essentially concatenated into this segment of the executable.
- .data-string literals, global and static variables from all the object files and static libraries go here
- .reloc-base relocations that need to be performed at program load time are detailed here. These are modifications to instructions or initialized variables that are made if the loader cannot load the file at the location anticipated by the linker(note: x86 jmp and call instructions are relative, so they don’t need to be relocated)
As shown in the table above, after the PE Header in the executable file is the "Section Table". The Section Table contains an array of IMAGE_SECTION_HEADER structures. Here are the fields I found interesting in the IMAGE_SECTION_HEADER structure:
- PointerToRawData-the offset from the base of the image to where the raw machine code resides
- NumberOfRelocations- number of relocation to make when loading this section
- NumberOfLineNumbers-more info about the original source
In terms of Computer forensics and virus dissection, I was interested in the following points(structure/field names are as they appear in WinNt.h):
- _IMAGE_FILE_HEADER.TimeDateStamp can be used for executable forensics
- _IMAGE_OPTIONAL_HEADER.MajorLinkerVersion & _IMAGE_OPTIONAL_HEADER.MinorLinkerVersion- these fields can also be used for forensics/getting info on the source of the binary
- _IMAGE_IMPORT_DESCRIPTOR.TimeDateStamp-the time/date the DLL we are linking to was compiled.
- _IMAGE_EXPORT_DIRECTORY.TimeDateStamp-the time/date when this file was created, this field is in the .edata section of binaries that export functions(usually libraries)
- _IMAGE_RESOURCE_DIRECTORY.TimeDateStamp-the time/date that a resource(such as a picture) was created
- _IMAGE_OPTIONAL_HEADER.SizeOfImage/_IMAGE_OPTIONAL_HEADER.SizeOfHeaders-these fields might be manipulated to allow us to "hide" or store extra data inside an executable
- _IMAGE_AUX_SYMBOL-contains information about line numbers, used for debugging. This might give some insight into how many lines of code the source code to this program was.
- Hooking function calls to other DLLs is apparently easy-just overwrite the value of the function pointer in the IAT(Import Address Table), since the IAT is writeable(I don’t know if this has changed with the introduction of DEP).