Gealber Morales

Posted on Sep 18, 2023 • Originally published at gealber.com

Challenge RE #8

#reverseengineer #assembly #c #challenge

At this point we don't need further introduction, I'll bring you my solution to the 8th challenge.

Description

The description of the challenge can be found in Challenge RE #8. In the previous ones I forgot the share the link, my bad.

Introduction

Assembly code to understand


<f>:
   0:       push   r12
   2:       test   rsi,rsi
   5:       mov    r12,rdi
   8:       push   rbp
   9:       mov    rbp,rdx
   c:       push   rbx
   d:       mov    rbx,rsi
  10:       je     32 
  12:       nop    WORD PTR [rax+rax*1+0x0]
  18:       mov    rsi,QWORD PTR [rbx]
  1b:       mov    rdi,rbp
  1e:       call   r12
  21:       test   eax,eax
  23:       je     56 
  25:       js     40 
  27:       mov    rbx,QWORD PTR [rbx+0x18]
  2b:       test   rbx,rbx
  2e:       xchg   ax,ax
  30:       jne    18 
  32:       pop    rbx
  33:       pop    rbp
  34:       xor    eax,eax
  36:       pop    r12
  38:       ret
  39:       nop    DWORD PTR [rax+0x0]
  40:       mov    rbx,QWORD PTR [rbx+0x10]
  44:       test   rbx,rbx
  47:       je     32 
  49:       mov    rsi,QWORD PTR [rbx]
  4c:       mov    rdi,rbp
  4f:       call   r12
  52:       test   eax,eax
  54:       jne    25 
  56:       mov    rax,rbx
  59:       pop    rbx
  5a:       pop    rbp
  5b:       pop    r12
  5d:       ret

This time f it's way more longer than in our previous challenges, instead of complaining about this and give up at the first moment. It's better to try to understand small pieces of this code.

Let's start from instructions in memory 0 to instruction in memory 0x10. From these instructions we can notice, that our parameters came from registers rdi, rsi and rdx. We perform a check on one of them, to see if it's equal to 0.

Now interesting enough, we have that in rdi we receive a callback function. If you take a look, rdi it's moved to r12 who later it's been called on instruction in 1e. For that reason we can infer, that one of the arguments passed to f it's a callback function.

What can we tell about the signature of this callback function? Well tracking the use of register r12 you can notice that the function takes two arguments, rsi and rdi and return the value int eax. Every call r12 it's followed by a check on eax, where if we have zero or NULL value we jump to the end of the program, otherwise we jump to position 0x25. Looking at the use of its arguments, we can take a save assumption that the first one can be number while the second...well the second one is not so straight forward, let's take a look at its use in the program.

Identifying `rbx` layout

The second argument it's been taken from rbx, this register it's only modified once, when we copied rsi into it on instruction mov rbx,rsi. After that we've been calling it in this three times:

;; first time
mov rsi,QWORD PTR [rbx]

;; more down in position 0x27
mov rbx,QWORD PTR [rbx+0x18]

;; more down in position 0x40
mov rbx,QWORD PTR [rbx+0x10]

In the last two cases, we are overwriting the value of rbx, with QWORD PTR [rbx+0x18] and QWORD PTR [rbx+0x10]. The padding or offset from rbx it's 0x18 = 24 and 0x10 = 16, so we can assume the data stored in rbx has to following arrangement.

[0......|0x10.....|0x18....]

And the data in [rbx+0x10] and [rbx+0x18] can overwrite rbx. I don't know you, but two me this smells like a binary tree, where [rbx+0x10] and [rbx+0x18] are the left and right children of the node rbx.

This is a huge insight. If we are on the right path, we can be looking at a binary search here, but's let's not take hasty conclusions.

First resume

At the moment, we have that:

r12 contains a callback function
rbx contains a binary tree

With this in mind we can start declaring our Node structure and our callback function.

typedef struct Node
{
    int data;
    struct Node *left;
    struct Node *right;
} Node;

// signature of callback function
int g_callback(int val, Node *n);

Signature of f

We already know that the arguments for f comes from registers:

rdi, which is a callback function
rsi, which is our binary tree
rdx, which it's data of a node of our binary tree, let's assume the data are just numbers, for simplicity. In any case later we can check if this is the case.

What it's missing it's our returned value. For this, looking at the last ret you can notice that what we return it's actually the value in rbx which hold a Node of our binary tree.

With this in mind we can assume the following signature for f


Node *f(int (int, Node *), Node *, int);

Main logic

With the signature, will be easier to infer the logic in the program. Having a clear idea of what are the structures involved also help us a lot. I mean, we haven't even looked at the logic, and you might be guessing that this has to be some kind of search in a binary tree. Maybe I'm the only one imagining that, given that I'm my only user in my blog 😟. Anyway let's continue.

Let's gather all our finding and put it in C code

typedef struct Node
{
    int data;
    struct Node *left;
    struct Node *right;
} Node;

int g_callback(int val, Node *n);


Node *f(int cb(int, Node *), struct Node *n, int data)
{
    return NULL;
}

Ok, this looks good. Does it? Don't doubt on yourself great Gealber, this is amazing.

Let's analyze the logic by small chunks, first chunk

  18:       mov    rsi,QWORD PTR [rbx]
  1b:       mov    rdi,rbp
  1e:       call   r12
  21:       test   eax,eax
  23:       je     56 
  25:       js     40
  ;; in 56 we just return the Node

Nothing fancy, now that we know more about the layout of rbx, we know it's a Node, so this code can be interpreted as a call to to the callback function we passed as an argument. In C code would be:

int ret = cb(n, data);
if (ret == 0)
    return n;
if (ret < 0) {
    // do something
}

Second chunk of code...maybe chunk is not the right word Gealber 🤔

  27:       mov    rbx,QWORD PTR [rbx+0x18]
  2b:       test   rbx,rbx
  2e:       xchg   ax,ax
  30:       jne    18

In this case, we first overwrite our current node for our right child, and later perform a check on it, followed by a jump to 18 if it's not NULL. Interesting, the code in 0x18 it's our previous chunk of code. This seems as a recursion call, because the code in 0x18 it's actually the start of our f method. Given the nature of binary trees and the operations we normally perform with them, is not a crazy assumption to say that this is a recursion call.

Taking this into C, would be like this

Node *f(int cb(int, Node *), struct Node *n, int data)
{
    int ret = cb(n, data);
    if (ret == 0)
        return n;

    if (ret < 0) {
        // do something
    }

    n = n->right;
    if (n != NULL)
        n = f(cb, n, data);

    return NULL;
}

The next most important part, is the code in 0x40. This code it's quite similar to our previous one, just that in this case we check with our left node. The way to reach this code is from instruction 0x25, with js 40.

A way to put this in C would be

n = n->left;
if (n != NULL)
    n = f(cb, n, data);

We already can make a correct forecast of what this function does, in a simple way this code performs a search in a binary tree, with a supplied comparison callback function.

Let's put all this together in a clean way

typedef struct Node
{
    int data;
    struct Node *left;
    struct Node *right;
} Node;

// f performs a search in a binary tree
// using cb callback function as our comparison data
Node *f(int cb(int, Node *), struct Node *n, int data)
{
    int ret = cb(n, data);
    if (ret == 0)
        return n;

    if (ret < 0) {
       return f(cb, n->left, data);
    }

    return f(cb, n->right, data);
}

Final notes

Take into account that data doesn't has to be an int, given that we are supplying a comparison function we can declare data as void *. As long as the comparison function can handle it, won't be a problem.

As a personal note, I found this challenge quite instructive, not because it taught me a new data structure. What I found valuable it's that it showed me how a structure is defined in assembly, basically there's just an offset between field and another. The more I do this challenges the more I realize that C is so close to assembly 😅.

DEV Community

Challenge RE #8

Description

Introduction

Identifying `rbx` layout

First resume

Signature of f

Main logic

Final notes

Top comments (0)

Read next

Guess what will be the output?

Implementing I2C for the ATtiny85

Implement the SPWM 3phase/1phase inverter for electrostatic precipitator controller using STM32103 MCU

Import <ncurses.h> library in C on Mac

Description

Introduction

Identifying rbx layout

First resume

Signature of f

Main logic

Final notes

Read next

Guess what will be the output?

Implementing I2C for the ATtiny85

Implement the SPWM 3phase/1phase inverter for electrostatic precipitator controller using STM32103 MCU

Import <ncurses.h> library in C on Mac

Identifying `rbx` layout