arinak1017

Posted on Dec 6

GCC: Automatic Function Multi-Versioning Pt.2

#gcc #compiling #afmv

Before we start...

If you have just stumbled upon my SPO600 series of blog posts, it has been created to document and share my learnings as I progress through my Software Portability and Optimization course at Seneca Polytechnic.

In this blog post, I will talk about the code experiments I conducted as part of Stage 2 of our course Project: implementing a new experimental feature in the GNU Compiler Collection (GCC) — Automatic Function Multi-Versioning.

If you are curious to learn more about the project, feel free to check out my previous blog posts:

Analyzing the Demo Pass

In my previous post, I shared how I integrated the demo pass provided to us by our professor.

After I made sure that I could get the new passes running, I decided to take a closer look at the functionality of the demo pass:

// Omitted the License and the include statements
// ============================================================= vvv
// Test pass
namespace {

const pass_data pass_data_ctyler =
{
  GIMPLE_PASS, /* type */
  "ctyler", /* name */
  OPTGROUP_NONE, /* optinfo_flags */
  TV_NONE, /* tv_id */
  PROP_cfg, /* properties_required */
  0, /* properties_provided */
  0, /* properties_destroyed */
  0, /* todo_flags_start */
  0, /* todo_flags_finish */
};

class pass_ctyler : public gimple_opt_pass
{ 
public:
  pass_ctyler (gcc::context *ctxt)
    : gimple_opt_pass (pass_data_ctyler, ctxt)
  {}

  /* opt_pass methods: */
  bool gate (function *)  final override { 
    return 1; // always execute pass
  }
  unsigned int execute (function *) final override;

}; // class pass_ctyler

unsigned int
pass_ctyler::execute (function *fun)
  {
    basic_block bb;

    int bb_cnt = 0, stmt_cnt = 0;

    FOR_EACH_BB_FN (bb, fun)
      {
        bb_cnt++;
        if (dump_file)
          {
            fprintf (dump_file, "===== Basic block count: %d =====\n", bb_cnt);
          }

        for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
          {
            gimple *g = gsi_stmt (gsi);
            stmt_cnt++;
            if (dump_file)
              {
                fprintf (dump_file, "----- Statement count: %d -----\n", stmt_cnt);
                print_gimple_stmt (dump_file, g, 0, TDF_VOPS|TDF_MEMSYMS);
              }
         }
       }

    return 0;

    if (dump_file)
      {
        fprintf (dump_file, "\n\n##### End ctyler diagnostics, start regular dump of current gimple #####\n\n\n");
      }

  }


} // anon namespace

gimple_opt_pass *
make_pass_ctyler (gcc::context *ctxt)
{
  return new pass_ctyler (ctxt);
}

// ============================================================= ^^^

I first noticed that the if loop was placed below a return statement, making it ineffective. I fixed this by moving the if loop above the return statement.

Then, I decided to break the code down into sections to understand the logic of the demo pass better:

Metadata Declaration: in the first section, the metadata of the pass is defined (type, name, options, and properties)
Pass Object: a pass object of type Gimple Opt Pass is declared
Gate Method: this method determined whether the pass should run.
Execution Logic: the actual logic of the pass is implemented in the execute function. It uses the FOR_EACH_BB_FN macro to loop through basic blocks. Inside this loop, a nested loop iterates through statements, printing the statement number and content
The if loop to print the final message was initially placed below the return statement, making it unreachable
Pass Object Declaration: finally, the pass object of type pass_tyler is declared

Shortly after, the updated version of the pass was released:

/* Test pass
   Chris Tyler, Seneca Polytechnic College, 2024-11 
   Modelled on tree-nrv.cc

This file is part of GCC.

GCC is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3, or (at your option)
any later version.

GCC is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with GCC; see the file COPYING3.  If not see
<http://www.gnu.org/licenses/>.  */

#define INCLUDE_MEMORY
#include "config.h"
#include "system.h"
#include "coretypes.h"
#include "backend.h"
#include "tree.h"
#include "gimple.h"
#include "tree-pass.h"
#include "ssa.h"
#include "gimple-iterator.h"
#include "gimple-walk.h"
#include "internal-fn.h"
#include "gimple-pretty-print.h"

// Added headers:
#include "gimple-ssa.h"
#include "cgraph.h"
#include "attribs.h"
#include "pretty-print.h"
#include "tree-inline.h"
#include "intl.h"

// Included for dump_printf:
//#include "tree-pretty-print.h"
//#include "diagnostic.h"
//#include "dumpfile.h"
//#include "builtins.h"
//#include <stdlib.h>

// =================================================== vvv
// Test pass



namespace {

const pass_data pass_data_ctyler =
{
  GIMPLE_PASS, /* type */
  "ctyler", /* name */
  OPTGROUP_NONE, /* optinfo_flags */
  TV_NONE, /* tv_id */
  PROP_cfg, /* properties_required */
  0, /* properties_provided */
  0, /* properties_destroyed */
  0, /* todo_flags_start */
  0, /* todo_flags_finish */
};

class pass_ctyler : public gimple_opt_pass
{ 
public:
  pass_ctyler (gcc::context *ctxt)
    : gimple_opt_pass (pass_data_ctyler, ctxt)
  {}

  /* opt_pass methods: */
  bool gate (function *)  final override { 
    return 1; // always execute pass
  }
  unsigned int execute (function *) final override;

}; // class pass_ctyler

unsigned int
pass_ctyler::execute (function *)
  {
    struct cgraph_node *node;
    int func_cnt = 0;

    FOR_EACH_FUNCTION (node)
      {
        if (dump_file)
          {
            fprintf (dump_file, "=== Function %d Name '%s' ===\n", ++func_cnt, node->name() );
          }
      }

    if (dump_file)
      {
        fprintf (dump_file, "\n\n#### End ctyler diagnostics, start regular dump of current gimple ####\n\n\n");
      }

    return 0;

  }


} // anon namespace

gimple_opt_pass *
make_pass_ctyler (gcc::context *ctxt)
{
  return new pass_ctyler (ctxt);
}

// =================================================== ^^^

The contents of the dump file:

Hello World Program, pass output

;; Function main (main, funcdef_no=0, decl_uid=3267, cgraph_uid=1, symbol_order=0)

=== Function 1 Name 'printf' ===
=== Function 2 Name 'main' ===


#### End ctyler diagnostics, start regular dump of current gimple ####


int main ()
{
  int D.3270;
  int _3;

  <bb 2> :
  printf ("Hello");
  _3 = 0;

  <bb 3> :
<L0>:
  return _3;

}

This pass uses a different accessor macro called FOR_EACH_FUNCTION, which cycles through the code and prints the function name; this version of the pass seems to have a considerably simpler logic.

To test the pass, I extracted the test case files provided by our professor in the /public/spo600-test-clone.tgz file.

In the Makefile, I set DUMP_ALL to 1 to make sure that all files would be dumped into the current directory during the build process.

The test files contain two implementations of the scale_samples function, which were effectively identical, and two implementations of the scale_sample function, which were significantly different. The goal of our pass = pruning the identical functions while leaving the significantly different ones untouched.

After building the files, our pass generated two dump files:

Each dump file listed the function names at the top, followed by the full content of the dump. Since the files are quite lengthy, I can’t display them in full here, but they looked something like this:

;; Function main (main, funcdef_no=24, decl_uid=3961, cgraph_uid=25, symbol_order=24) (executed once)

=== Function 1 Name '__builtin_cpu_supports' ===
=== Function 2 Name '__builtin_cpu_init' ===
=== Function 3 Name 'scale_samples.resolver' ===
=== Function 4 Name 'scale_samples' ===
=== Function 5 Name 'scale_samples.arch_x86_64_v3' ===
=== Function 6 Name 'printf' ===
=== Function 7 Name 'vol_createsample' ===
=== Function 8 Name 'calloc' ===
=== Function 9 Name 'main' ===
=== Function 10 Name 'scale_samples' ===
=== Function 11 Name 'sum_sample' ===


#### End ctyler diagnostics, start regular dump of current gimple ####


int main ()
{
  unsigned long ivtmp.123;
  vector(4) int vect_t_16.116;
  vector(4) int vect_t_22.115;
  vector(4) int vect__14.114;
  vector(8) short int vect__13.113;
  int t;
  int16_t * out;
  int16_t * in;
  void * _10;
  unsigned long _12;
  int _33;
  int _34;

  <bb 2> [local count: 10737416]:
  # DEBUG BEGIN_STMT
  # DEBUG BEGIN_STMT
  # DEBUG ttl => 0
  # DEBUG BEGIN_STMT
  # DEBUG BEGIN_STMT
  # DEBUG BEGIN_STMT
  in_3 = calloc (50000000, 2);
  # DEBUG in => in_3
  # DEBUG BEGIN_STMT
  out_5 = calloc (50000000, 2);
  # DEBUG out => out_5
  # DEBUG BEGIN_STMT
  vol_createsample (in_3, 50000000);
  # DEBUG BEGIN_STMT
  scale_samples (in_3, out_5, 50000000, 50);
  # DEBUG BEGIN_STMT
  # DEBUG buff => out_5
  # DEBUG samples => 50000000
  # DEBUG INLINE_ENTRY sum_sample
  # DEBUG BEGIN_STMT
  # DEBUG BEGIN_STMT
  # DEBUG BEGIN_STMT
  # DEBUG x => 0
  # DEBUG t => t_18(D)
  # DEBUG BEGIN_STMT
  ivtmp.123_21 = (unsigned long) out_5;
  _12 = ivtmp.123_21 + 100000000;

  <bb 3> [local count: 1063004408]:
  # vect_t_22.115_20 = PHI <vect_t_16.116_15(5), { 0, 0, 0, 0 }(2)>
  # ivtmp.123_31 = PHI <ivtmp.123_22(5), ivtmp.123_21(2)>
  # DEBUG x => NULL
  # DEBUG t => NULL
  # DEBUG BEGIN_STMT
  # DEBUG D#26 => D#27 * 2
  # DEBUG D#25 => out_5 + D#26
  _10 = (void *) ivtmp.123_31;
  vect__13.113_25 = MEM <vector(8) short int> [(int16_t *)_10];
  vect__14.114_24 = [vec_unpack_lo_expr] vect__13.113_25;
  vect__14.114_23 = [vec_unpack_hi_expr] vect__13.113_25;
  vect_t_16.116_19 = vect_t_22.115_20 + vect__14.114_24;
  vect_t_16.116_15 = vect_t_16.116_19 + vect__14.114_23;
  # DEBUG D#24 => *D#25
  # DEBUG D#23 => (int) D#24
  # DEBUG t => D#22
  # DEBUG BEGIN_STMT
  # DEBUG x => NULL
  # DEBUG t => D#22
  # DEBUG BEGIN_STMT
  ivtmp.123_22 = ivtmp.123_31 + 16;
  if (_12 != ivtmp.123_22)
    goto <bb 5>; [98.99%]
  else
    goto <bb 4>; [1.01%]

  <bb 5> [local count: 1052266995]:
  goto <bb 3>; [100.00%]

  <bb 4> [local count: 10737416]:
  _33 = .REDUC_PLUS (vect_t_16.116_15);
  _34 = t_18(D) + _33;
  # DEBUG BEGIN_STMT
  # DEBUG buff => NULL
  # DEBUG samples => NULL
  # DEBUG x => NULL
  # DEBUG t => NULL
  # DEBUG ttl => _34
  # DEBUG BEGIN_STMT
  printf ("Result: %d\n", _34);
  # DEBUG BEGIN_STMT
  return 0;

}

This is just one of the blocks from the dump file, specifically from the main function. My goal is to compare the binaries of the two cloned functions within the dump file to check if they are identical. However, as discussed in class (and as John pointed out), the binaries of the two cloned functions in the dump file are not identical, even when the functions themselves are effectively the same.

For a deeper dive into this issue, you can check out my classmate John’s blog, where he explores this problem in detail, providing really good examples.

My Approach

Looking at the problem from a bird’s eye view, I think we should first develop a working logic that compares the binaries of both of the cloned functions, even if it classifies them as no-prune due to some minor differences, it’s essential to establish a basic comparison mechanism. Once this is achieved, we can fine-tune the comparing logic to ignore some specific parts of the binaries. For example:

Credit: Borrowed from John’s blog
Identifying the Essence of a Function Clone

In this specific example, we could focus on checking the type instead of performing a word-by-word comparison. The advantage of this approach is that it at least provides some working foundation to build upon.

However, developing this logic presents challenges since it goes far beyond simple loops and logic operations typically used in day-to-day coding. I spent a significant amount of time exploring the GCC Gimple Documentationto better understand the required macros and functions. But, to be honest, navigating the documentation felt like diving into a rabbit hole... There is no efficient way to search for a specific macro or loop, so finding clues/helpful pointers usually requires looking through the whole docs. On top of that, some of the documentation sections are incomplete, unclear, or perhaps just too complex for me to understand.

While doing the experiments, I was able to come up with draft logic for the execute and compare_gimple_functions:

bool compare_gimple_functions(gimple_seq g1, gimple_seq g2) {
    gimple_stmt_iterator gsi1 = gsi_start(g1);
    gimple_stmt_iterator gsi2 = gsi_start(g2);

    while (!gsi_end_p(gsi1) && !gsi_end_p(gsi2)) {
        gimple *stmt1 = gsi_stmt(gsi1);
        gimple *stmt2 = gsi_stmt(gsi2);

        // Check if the statements are equivalent
        if (!stmt1 || !stmt2 || !simple_cst_equal(stmt1, stmt2)) {
            return false;
        }

        gsi_next(&gsi1);
        gsi_next(&gsi2);
    }

    return gsi_end_p(gsi1) == gsi_end_p(gsi2);
}

unsigned int pass_ctyler::execute(function *) {
    struct cgraph_node *node;

    FOR_EACH_FUNCTION(node) {
        if (!node->analyzed)
            continue;

        std::string function_name = IDENTIFIER_POINTER(DECL_NAME(node->decl));
        if (function_name.find("_1") != std::string::npos) {
            size_t pos = function_name.find_last_of("_");
            std::string base_func_name = function_name.substr(0, pos);

            struct cgraph_node *base_node = node->get_base_function();
            if (base_node && base_node != node) {
                if (compare_gimple_functions(base_node->get_body(), node->get_body())) {
                    fprintf(dump_file, "PRUNE: %s\n", base_func_name.c_str());
                } else {
                    fprintf(dump_file, "NOPRUNE: %s\n", base_func_name.c_str());
                }
            }
        }
    }
    return 0;
}

The logic I developed currently has three errors, mainly due to incorrect usage of GIMPLE functions and macros. I tried digging into cgraph.cc to better understand the issue, but this only added to my confusion. Unfortunately, I ended up corrupting the project and had to rebuild it from scratch. For now, my logic attempts to get the binaries for the cloned functions and compare them using the compare_gimple_functions; if they are the same, it is supposed to print PRUNE, and if they differ, it will print NOPRUNE.

Afterthoughts

This code is the result of a lot of trial and error, and although I don’t have any feasible results to show, I will continue learning more about the code base and try to fix my logic.

All in all, this journey has been rewarding despite constant setbacks due to the extreme complexity of the GCC codebase.

DEV Community

GCC: Automatic Function Multi-Versioning Pt.2

Before we start...

Analyzing the Demo Pass

My Approach

Afterthoughts

Top comments (0)

Read next

🌍 Si po Transformohet Industria e Udhëtimeve me Teknologjinë ✈️

Copilot vs. Cody: All you need to know

ASP.NET Interview Questions: Part 1 - (10 Q & A)

Pushpa's Rulebook for Developers: How to Thrive (Not Just Survive) in the Tech World 🚀