DEV Community

Jennie
Jennie

Posted on • Edited on

How to find what is the dependency of a function, class, or variable in ES6 via AST

About half year ago, I try to be creative and figuring out how to find what my code change could affect in a large code base, to avoid some tedious problems. And I wrote an article of idea(or nonsense🙈) about it.

I naively thought the open souce world has all the Lego pieces I need, with a clear manual in my head, I could build the complicated stuffs like space ship right away.

But, there's aways a "but". The reality is, NO. I found some ready to use pieces, like @babel/parser, @babel/traverse, enhanced-resolve by webpack, and something similar but couldn't fulfill my need. One of them is built by a Russian guy, and it works well for solving to file level. Others are webpack plugins depending on webpack’s stat.json, which I chose not to rely on, as it could limit the possibilities, and it's way of resolving dependencies starting from an entry file does not fit for the idea.

So I did my own, again, and here’s how I make the packages in this ast-lab repo.

The plan

After investigation, I made my 1st phase plan to cover ES modules, it could be summarized into 2 parts:

  1. Use AST to find imports, exports and the relationships of a file's root declarations (variable, function, class, etc.).
  2. Get the data from all ES files under one path(usually repo source path), and re-organise them into a dependency map, then figure out the impact from a declartion entry.

The AST of imports

In an ES file, other than dynamic import, keyword import and export always appear in the root of the file. Here’s an example:

Import AST

Thanks master Lihau create a useful tool AST explorer that uses @babel/parser as AST parser, my little experiment cannot work without it.

They have easy to identify type definitions, like:

  • export default a is ExportDefaultDeclaration
  • export function a {} or export a are ExportNamedDeclaration
  • export * as a is ExportAllDeclaration
  • import a from ‘module_path’ is ImportDeclaration
  • dynamic import const importedModule = import(‘module_path’) the import callee is Import type, the expression is just CallExpression like other function calls.

Each type may have slightly different structure, and multiple situations to handle with. Take import as example:

ImportDeclaration

Though module exports has a name, when they are imported in another file, they could be used via alias, which means alias name is also required.

There are 3 types of aliases in importDeclaration:

  1. import a declared a as alias of module default export, marked as ImportDefaultSpecifier
  2. import * as a declared a as alias of all things a module exported, marked as ImportNamespaceSpecifier
  3. import { a as b } declared "b" as alias of a module's named export "a", same as the one without alias, it is marked as ImportSpecifier

All of them own a local.name property to store the alias name, and for ImportSpecifier, you may get the original name from imported.name.

Module path is stored in node’s source.value property. But the file represented by this path could be uncertain. It could be:

  • relative js/jsx/ts/tsx/json or other resolveable path (configured in your packing tools)
  • npm module from any parent paths’ node_module folder, or other specified paths
  • npm module symlinked from somewhere else, like mono-repo
  • fake npm module use tool’s alias feature

Now I understand why Node creator regrets about making this design, it does cause troubles.

I found Webpack’s enhanced-resolver to help. By default it uses Node’s file system module(fs) to verify whether a file exists in user's file system. It is able to resolve customised module path, symlink, alias by simple configurations, and it will return the resolved absolute path as result.

Problem solved? Wait, there are some special cases to be aware of :

  1. export * from ‘mymodule’ This syntax could be turned on by a babel plugin. To know what it actually exports, I will need to parse another file, which could make things comlicated, and possibly have circular relations. To avoid unpredictable problems, I chose to mark it as extension, and solve it later after I've resolved all the data in the project.
  2. export { a, default as b } from 'mymodule' This line is quite tricky. It includes export, import information at the same time, and a, b actually could not be used anywhere else in the file.
  3. Dynamic import I loved it so much, but the declaration it imported may exist in a local scope.

Exports, and other root imports could be handled in a similar way, with the right types and properties.

Use @babel/traverse to traverse AST

AST is a tree data structure. To work with tree, we usually traverse nodes starting from the root node with DFS(depth first search) or BFS(breadth first search).

I used @babel/traverse package for AST traversing with DFS. It accepts a Visitor object, which takes a callback function or enter, exit callback functions as value, and types of Nodes as key. E.g.:



traverse(ast, {
  Node(path) {
    console.log(`I just visited a ${path.node.type} node!`);
  }
});


Enter fullscreen mode Exit fullscreen mode

The best part of it is it provides various useful alias to reduce the work. E.g., Function type represents either ArrowFunction or FunctionDeclaration. All the visitors "matches" this node type would be executed, from top to bottom.

It also provides useful information, utils in a path object. You may get the path object from a Visitor function:



traverse(ast, {
  Scopable(path) {
    if (path.isFunction()) {
      console.log(`I just visited a function node through Scopable!`);
    }
  },
  Function() {
    console.log(`I just visited a function node through Function!`);
  }
});


Enter fullscreen mode Exit fullscreen mode

As a result, I extracted the import and export data in the following way:



const imports = [];
traverse(ast, {
  ImportDeclaration({ node }) {
    const modulePath = node.source.value;
    node.specifiers.forEach(specifier => {
      const alias = specifier.local.name;
      let name;
      switch (specifier.type) {
        case 'ImportSpecifier':
          name = specifier.imported.name;
          break;
        case 'ImportDefaultSpecifier':
          name = 'default';
          break;
        case 'ImportNamespaceSpecifier':
          name = '*';
          break;
      }
      if (name) {
        imports.push({
          alias,
          name,
          source: modulePath,
        });
      }
    });
  }
});


Enter fullscreen mode Exit fullscreen mode

Figuring out relationships within a file

A file may contain as many variables, functions, classes as they want. And often depend on each other, especially in functional programming. Which means, knowing what affects a file is not enough, not even useful in some cases, I got to be more precise, limiting down the coverage to declarations instead of files.

It requires the program to understand what does each declaration rely on. A word comes immediately to my mind - scope.

A declared name could be declared again, and in a context the local declared would be used instead of it's parent scopes. Thus, to figure out the right "relationships", I will need to resolve scopes.

Remember that I mentioned @babel/traverse visits AST with DFS, this makes it perfectly fit for using stacks.

Since the declaration could happen anywhere in the scope, I chose to store all local declared declarations , and the used ones in the stack while traverse() entering a node, and remove those local declared from the used set before exiting the scope. Then once I exit to the file scope, I know what does this root declaration depend on.

Scope

When to create the stack, and push the declarations? This question made me realize an interesting “in-consistent” design in Javascript that I never questioned about.

Here’s a simple function:



const param = I am the param;
function func(param) {
    return `You passed a param: ${param}.`;
}
func(suprise :p);


Enter fullscreen mode Exit fullscreen mode

It’s clear that the whole function block creates a scope, and in this scope it declared a new variable param overwrites the param in its context without any variable declaration keyword.

Same happens to catch:



try {
  doSomething();
} catch (err) {
  console.error(`You know what? You have a new variable “err” created!`);
}


Enter fullscreen mode Exit fullscreen mode

But how about other scopables, like for loop:



for (let i = 0; i < length; i++) {
  console.log(`Loop counter:${i}`);
}


Enter fullscreen mode Exit fullscreen mode

With the same brackets, but to declare a new param, you got to use keyword let, var, or const. This is also why in AST the function type is FunctionDeclaration, catch is CatchClause, but for, if, switch, while, function calls, try... are called Statement.

This solution works for most of the cases except:

  1. var
    According to the ECMA specification, it stores the variable names in a special list called VarNames in the global context, which causes the famous hoisting issue. And it means that I have to handle it differently.

  2. eval
    It's a disaster. Techniqually it's just a function accepting string as parameter, and could execute the string in runtime.
    So in AST those code are still string. To understand what's going on in the string, I got to parse it to AST. What if there's another eval in this string?
    And it could be called "indirectly". According to ECMA, if it's called indirectly, will be executed in the global context.

And then for a simple file like:



import { a } from 'fileA';
const b = () => a;
export default b;


Enter fullscreen mode Exit fullscreen mode

I could get a map like following:



{
  imports: {
    a: { source: '/repo/fileA.js', name: 'a', alias: 'a' }
  },
  exports: ['default'],
  relations: {
    default: ['b'],
    b: ['a']
  }
}


Enter fullscreen mode Exit fullscreen mode

Get the dependency map

An ES module only knows what declaration changed could affect itself, but no idea who is relying on its exports. Which means, to understand the impact of changing a function/variable/class of a module, searching through all the modules in the file system may find them all, but unnecessary for most of time, because we have practiced to keep files having dependency relationships under one folder/repository.

Let's say I have a simple repo like this:



/repo
- a.js
- b.js
- c.js


Enter fullscreen mode Exit fullscreen mode

And the information I get from the steps above:



// file a.js relies on default of b.js
// it has 2 private declarations
// and exports default
const a = {
  imports: { 
    b: [{ source: '/b.js', name: 'default', alias: 'b' }] 
  },
  exports: ['default'],
  relations: {
    default: ['aPrivateFunc'],
    aPrivateFunc: ['b', 'aPrivateConst'],
    aPrivateConst: []
  }
};
// file b.js relies on all stuff exported in c.js
// it exports default, no privates
const b = {
  imports: { c: [{ source: '/c.js', name: '*',  alias: 'c' }] },
  exports: ['default'],
  relations: { default: ['c'] }
};
// file c.js does not rely on any other files
// it exports c function/variable/class
const c = {
  imports: {},
  exports: ['c'],
  relations: { c: [] }
};


Enter fullscreen mode Exit fullscreen mode

It is clearer once draw this relationship out:

Relation Graph

It's a Directed Graph in computer science. So firstly, we could reorganize the data above into a map could represent graph nodes and directions like this:



const relationGraph = {
    '/c.js': {
      'c': {
        file: '/a.js',
        name: 'aPrivateFunc',
        affects: [relationGraph['/b.js'].default]
      }
    },
    '/b.js': {
      default: {
        file: '/b.js',
        name: 'default',
        affects: [relationGraph['/a.js'].aPrivateFunc]
      }
    },
    '/a.js': {
      aPrivateFunc: {
        file: '/a.js',
        name: 'aPrivateFunc',
        affects: [relationGraph['/a.js'].default]
      },
      default: { file: '/a.js', name: 'default', affects: [] }
    }
};


Enter fullscreen mode Exit fullscreen mode

And then I could do graph traversal from the declaration to check with as following:



// check the impact of c in c.js
const entry = { file: '/c.js', declaration: 'c' };
const { file, declaration } = entry;
const result = {};
const traversed = new Set();
let traversalQueue = relationGraph[file][declaration].affects;
let node;
while(node = traversalQueue.shift()) {
  // Prevent circulation
  if (traversed.has(node)) continue;
  const { file, name, affects } = node;
  if (!result[file]) {
    // Use set to dedupe
    result[file] = new Set([ name ]);
  } else {
    result[file].add(name);
  }
  traversed.add(node);
  traversalQueue = traversalQueue.concat(relationGraph[file][name].affects);
}
return result;


Enter fullscreen mode Exit fullscreen mode

The result of the example is:



{
  'b.js': Set(1) {'default'}
  'a.js': Set(2) {'aPrivateFunc', 'default'}
}


Enter fullscreen mode Exit fullscreen mode

Top comments (3)

Collapse
 
showmeyourhits profile image
Alex Ozhigin

Good article, thanks

Have you tried running this code code on larger (~300 files) repos?

Collapse
 
yalondpsi profile image
Yalon

Great article!
I have a question please - is it possible to get a value of a variable in a program according to a specific line number? (when traversing the AST?)
Thank you

Collapse
 
jennieji profile image
Jennie

can for the variables without reassigning