DEV Community

Cover image for Writing a code analyzer in TypeScript (from scratch)
Derk-Jan Karrenbeld for XP Bytes

Posted on • Edited on

Writing a code analyzer in TypeScript (from scratch)

Exercism is an online platform designed to help you improve your coding skills through practice and mentorship.

Exercism provides you with thousands of exercises spread across numerous language tracks. Once you start a language track you are presented with a core set of exercises to complete. Each one is a fun and interesting challenge designed to teach you a little more about the features of a language.

At the moment of writing, I'm a maintainer of the JavaScript and TypeScript track and recently we've been working on automating a part of the experience.

This article will introduce you to AST parsing and walking using ESTree compatible tools. It specifically looks at certain token types, most commonly found in JavaScript and TypeScript code.

It teaches you how to explore these trees yourself and refers to code samples and actual production implementations.

When reading this article, think about your own JavaScript and TypeScript code. Once you understand how the browser (and tooling, like eslint) parses your code, you might better understand how the language is defined and constructed.

🚧 The links to GitHub are all master links, meaning the contents may change between writing this article and you clicking them. However, in order to make sure the code samples make sense, the analyzer repository links refer to a specific commit (9ff332b). This means that the code you see might not be the same as what is used today.

Table of Contents

Photo called 'Feel the freedom' in Dungeness, United Kingdom, displaying a red Volkswagen Samba parked near brown house.

πŸ“ The exercise

For this article, I'm going to write the analyzer for the gigasecond exercise, for both the TypeScript Γ‘nd JavaScript track. The description is a mere two lines:

Given a moment, determine the moment that would be after a gigasecond has passed.

A gigasecond is 10^9 (1,000,000,000) seconds.

The canonical data hints at the code I need to write, but luckily the exercise is implemented in both the JavaScript and TypeScript tracks.

JavaScript implementation

The JavaScript implementation expects us to write a named export gigasecond which returns a Date that is a gigasecond past the input Date.

  // Test case from the JavaScript track
  test('tells a gigasecond anniversary since midnight', () => {
    const gs = gigasecond(new Date(Date.UTC(2015, 8, 14)));
    const expectedDate = new Date(Date.UTC(2047, 4, 23, 1, 46, 40));
    expect(gs).toEqual(expectedDate);
  });
Enter fullscreen mode Exit fullscreen mode

TypeScript implementation

The TypeScript implementation expects us to write a default export Gigasecond which is a class that has a date() function which returns a Date that is a gigasecond past the constructor Date.

  // Test case from the TypeScript track
  it('tells a gigasecond anniversary since midnight', () => {
    const gs = new Gigasecond(new Date(Date.UTC(2015, 8, 14)))
    const expectedDate = new Date(Date.UTC(2047, 4, 23, 1, 46, 40))
    expect(gs.date()).toEqual(expectedDate)
  })
Enter fullscreen mode Exit fullscreen mode

πŸ’― Optimal solutions

Before tackling how to write an analyzer for these two implementations, I first have to establish what the optimal solutions are. If I know the intended code result, I can try to recognise that and work from there.

JavaScript solution

The implementation in JavaScript is straightforward. It uses the Date constructor together with Date#getTime and a constant to generate a the appropriate result.

const GIGASECOND_IN_MS = (10 ** 9) * 1000

export function gigasecond(date) {
  return new Date(date.getTime() + GIGASECOND_IN_MS)
}
Enter fullscreen mode Exit fullscreen mode

It is vital to note the peculiarities here:

  • Optimal is extracting the GIGASECOND_IN_MS value as a top-level constant
  • The constant's value ((10 ** 9) * 1000) can optimally be written in many equally valid forms. Writing the number out, however, is deemed a smell. All the following SHOULD be recognised as optimal:
    • 10 ** 12
    • 1e9 * 1e3
    • 1e12,
    • Math.pow(10, 9) * 1000
    • Math.pow(10, 12)
  • Date#valueOf is not optimal. It is marked as "This method is usually called internally by JavaScript and not explicitly in code.", even though it's functionally equivalent.
  • Finally, Date.parse(date) is not a good candidate as it's supposed to work with strings only. The reason it returns the same value as getTime when given a date, is because that date is coerced to a string and then parsed.

TypeScript solution

The TypeScript implementation expects a class as default export, and has a method date(). The algorithm is exactly the same as in the JavaScript solution, but it requires type annotations.

const GIGASECOND_IN_MS = (10 ** 9) * 1000

export default class Gigasecond {

  private readonly futureDate: Readonly<Date>

  constructor(date: Readonly<Date>) {
    this.futureDate = Object.freeze(new Date(date.getTime() + GIGASECOND_IN_MS))
  }

  date(): Readonly<Date> {
    return this.futureDate
  }
}
Enter fullscreen mode Exit fullscreen mode

Apart from the variations and rules as described earlier for JavaScript, the calculation may be done either in the constructor (as shown above) or in the date function. In that last case, it will look as follows:

const GIGASECOND_IN_MS = (10 ** 9) * 1000

export default class Gigasecond {

  constructor(private readonly date: Date) { }

  date(): Date {
    return new Date(date.getTime() + GIGASECOND_IN_MS)
  }
}
Enter fullscreen mode Exit fullscreen mode

πŸ‘©πŸ½β€πŸ’» Analysing the code

Now it's time to actually write the analyzer. We'll focus on the JavaScript implementation first. Because there are already JavaScript analyzers running in the wild and that work is open source, this example will use the utilities and base class from the javascript-analyzer repository.

πŸ’¬ Abstract Syntax Trees

The JavaScript Analyzer will be working the Abstract Syntax Tree (AST) of the code solutions. There are other ways to write an analyzer, but for the sake of this article, AST parsing is the way to go.

yarn add @typescript-eslint/parser @typescript-eslint/typescript-estree
Enter fullscreen mode Exit fullscreen mode

The TypeScript ESLint team has built a great parser that outputs an ESTree, a specced format of tokens and information about the input code. It can work with both JavaScript and TypeScript and is therefore great for our use case. I prefer working with this type of tree, because of the spec which allows for interoptability with other tools.

The parser deals with the eslint configuration and then invokes the typescript-estree package which compiles the code using TypeScript and transforms the result to match the ESTree. You can head on over to AST Explorer and try this out yourself by pasting the example code from above into the input field and selecting @typescript-eslint/parser. ⚠ Note: The version of the parser here is usually not up-to-date with the latest parser.

You might wonder: Why not use the TypeScript Compiler Api? It has AST parsing built-in the TypeScript language. It also exposes a lot of helper functions (like isIdentifer).

The reason is that the output is not in the ESTree spec format, which means that you won't be able to use other tools with it. In fact, the TypeScript-ESTree package, actually uses the compiler under the hood but then transforms it to the spec, which means no lock-in. You're not bound by changes in the compiler.

πŸƒπŸ½β€πŸ’¨ Running the parser

Now that the packages are in place, let's parse the input code.

import { parse, TSESTreeOptions } from "@typescript-eslint/typescript-estree";

const options: TSESTreeOptions = {
  comment: false,
  jsx: false
}

const program = parse(source, options)
// => Program({ body: [...], sourceType: "module", tokens: [...] })
Enter fullscreen mode Exit fullscreen mode

This gives us the same output as you see in AST Explorer, with at the root a Program and its body. We won't need the other fields, but tokens is interesting. It lists the parsed tokens from the source, as it was building the tree.

πŸ’Ž You can find this in parsers/AstParser.ts

πŸ”Ž Finding the main entrypoint

We're looking for a function called gigasecond. We know the following things:

  • It must be exported
  • Its name must be gigasecond

The input code declares a function, like in the optimal solution above, so the tree holds a FunctionDeclaration with an Identifier:

export function gigasecond(date) {
  // ...
}
Enter fullscreen mode Exit fullscreen mode
{
  type: "FunctionDeclaration",
  id: {
    type: "Identifier",
    name: "gigasecond"
  },
  generator: false,
  expression: false,
  async: false,
  params: [ ... ],
  // The body is a block { ... }
  body: {
    type: "BlockStatement",
    body: [ ... ]
  }
}
Enter fullscreen mode Exit fullscreen mode

This is something it can search for. The most common way to search in an AST is by walking that tree. You start at some node (usually the root / program) and visit each item.

We know that our parsed EStree is compatible with eslint and that eslint, just like prettier can recognise (and transform) code.

# Not a dev dependency!
yarn add eslint
Enter fullscreen mode Exit fullscreen mode
import { TSESTree } from "@typescript-eslint/typescript-estree"
import { traverse } from 'eslint/lib/util/traverser'

traverse(program, {
  enter(node: TSESTree.Node) {
    // ...
  }
})
Enter fullscreen mode Exit fullscreen mode

When writing this code, TypeScript will complain that there are no types for this library, which is unfortunately still true at moment of writing. However, you can copy this declarations.d.ts I wrote in order to get type completion.

The enter method will be called for each node in the program. Inside the enter block, your in the "TraverserContext" which exposes two methods:

  • this.skip(): skips the node from further traversal, meaning it will not visit any other keys (and therefore children) of the current node;
  • this.break(): completely stop traversing.

Finding the entrypoint is now straightforward.

import { TSESTree, AST_NODE_TYPES } from "@typescript-eslint/typescript-estree"

let entry: TSESTree.FunctionDeclaration | undefined = undefined

traverse(program, {
  enter(node: TSESTree.Node) {
    switch (node.type) {

      // function name() {}
      case AST_NODE_TYPES.FunctionDeclaration:
        if (node.id && node.id.name === 'gigasecond') {
          entry = node
          this.break()
        }
        break;
    }
  }
})

entry
// => FunctionDeclaration({
//      id: { type: "Identifier", name: "gigasecond" }, ...
//    })
Enter fullscreen mode Exit fullscreen mode

Unfortunately the walker above only finds FunctionDeclaration and fails on equivalent code usign an ArrowFunctionExpression or FunctionExpression. More on that later.

πŸ’Ž You can find this in analyzers/utils/extract_main_method.ts

πŸ”Ž Finding the top-level constant

The code finds the first of the two components. Now it also needs to find the second. A top-level const, but the name is not known.

const GIGASECOND_IN_MS = (10 ** 9) * 1000
Enter fullscreen mode Exit fullscreen mode
{
  type: "VariableDeclaration",
  declarations: [
    {
      type: "VariableDeclarator",
      id: {
        type: "Identifier",
        name: "GIGASECOND_IN_MS"
      },
      init: {
        type: "BinaryExpression",
        operator: "*",
        // Left is another BinaryExpression with **
        left: { ... },
        // Right is a Literal
        right: { ... }
      }
    }
  ],
  kind: "const"
}
Enter fullscreen mode Exit fullscreen mode

Nothing here is particularly helpful. Given the variations of data it needs to accept, I can't rely on init being a certain type. The name is also not fixed as it's not exported and therefore not tested.

However, there are a few constraints that will help here:

  • It must be a top-level constant
  • It can not be named gigasecond
  • In an optimal solution, there is really only one top-level constant that is not the entry,
type FoundConst = { kind: TSESTree.VariableDeclaration['kind'] }
  & TSESTree.VariableDeclarator

let bigNumber: FoundConst | undefined = undefined

traverse(program, {
  enter(node: TSESTree.Node) {
    switch (node.type) {

      // const NAME = ...
      case AST_NODE_TYPES.VariableDeclaration:
        const declaration = node.declarations.find(
          (declaration) => declaration.id && declaration.id.name !== 'gigasecond')
        )

        if (declaration) {
          bigNumber = { kind: node.kind, ...declaration }
          this.break()
        }

        break;

      default:
        // This doesn't declare a variable, so skip the node
        this.skip()
    }
  }
})
Enter fullscreen mode Exit fullscreen mode

Later, I can check bigNumber['kind'] and make sure it's const, or otherwise attach a message that const is preferred.

πŸ’Ž You can find this in analyzers/utils/find_top_level_constants.ts

The algorithm

Now that I found the entry point, I can figure out what the name of the argument is (date). Because I also know the top-level constant, I know what the constant name is GIGASECOND_IN_MS.

new Date(...)
Enter fullscreen mode Exit fullscreen mode

Nothing too fancy here. It's a new expression with an expression as the first argument.

{
  type: "NewExpression",
  callee: {
    type: "Identifier",
    name: "Date"
  },
  arguments: [ ... ]
}
Enter fullscreen mode Exit fullscreen mode
let newDate: TSESTree.NewExpression | undefined = undefined

traverse(program, {
  enter(node: TSESTree.Node) {
    switch (node.type) {

      // new Date(...)
      case AST_NODE_TYPES.NewExpression:
        if (
          node.callee.type === AST_NODE_TYPES.Identifier
          && node.callee.name === 'Date'
        ) {
          newDate = node;
          this.break()
        }
        break;

      default:
        // This doesn't declare a variable, so skip the node
        this.skip()
    }
  }
})
Enter fullscreen mode Exit fullscreen mode

πŸ’Ž You can find this in analyzers/utils/find_new_expression.ts

The inner expression is of type BinaryExpression. In EStree compatible output, and operator with two components (such as +, -, *) is a binary expression, whereas those with one component (such as ~ and !) are unary expressions.

date.getTime() + GIGASECOND_IN_MS
Enter fullscreen mode Exit fullscreen mode
{
  type: "BinaryExpression",
  operator: "+",
  left: {
    type: "CallExpression",
    callee: {
      type: "MemberExpression",
      object: {
        type: "Identifier",
        name: "date"
      },
      property: {
        type: "Identifier",
        name: "getTime"
      }
    },
    arguments: []
  },
  right: {
    type: "Identifier",
    name: "GIGASECOND_IN_MS"
  }
}
Enter fullscreen mode Exit fullscreen mode

Quite a few things we've already seen and also a few new types. Let's look at those.

Properties of objects

When the parser encounters a object property accessor (object.property), it is parsed as a MemberExpression. Depending on the way of writing, the property is either an Identifier or a Literal.

date.getTime
// ^       ^
// object  property
// |       |
// |       identifier (name = getTime)
// identifier (name = date)

date['getTime']
// ^       ^
// object  property
// |       |
// |       literal (value = getTime)
// identifier (name = date)
Enter fullscreen mode Exit fullscreen mode

"Executing" a property

If there are parentheses behind the MemberExpression, the entire expression is parsed as a child of a CallExpression. This is also the case for parentheses following an indentifier.

date.getTime ( )
// ---------| ^ |
// ^        | argument(s) of call expression
// member expression
//              |
// -------------|
// call expression

gigasecond(INPUT)
// ------|   ^   |
// ^     | argument of call expression
// identifier    |
//               |
// --------------|
// call expression
Enter fullscreen mode Exit fullscreen mode

Matching the identifiers

There are two identifiers provided by the source code that I need to find and match against:

  • the gigasecond first argument (used in arg.getTime())
  • the top-level constant (used in time + CONSTANT)

const argumentName = entry.id.name
// => "gigasecond"
const constantName = bigNumber.id.name
// => "GIGASECOND_IN_MS"

let optimalExpression: boolean = false

// NOTE: passing in the newDate as root, so this is a subtree traversal!
traverse(newDate, {
  enter(node: TSESTree.Node) {
    switch (node.type) {

      // new Date(x.z() + y)
      case AST_NODE_TYPES.BinaryExpression:
        this.break()

        if (node.operator !== '+') {
          optimalExpression = false;
          return;
        }


        // This allows the order to be reversed
        const leftType = node.left.type
        const constSide = leftType === AST_NODE_TYPES.Identifier
          ? node.left
          : node.right
        const expressionSide = leftType === AST_NODE_TYPES.CallExpression
          ? node.left
          : node.right

        if (constSide === expressionSide) {
          // throw new Error("not optimal! this is not x.z() + y")
          optimalExpression = false
          return
        }

        if (constSide.id.name !== constantName) {
          optimalExpression = false
          return
        }

        const { object, property } = expressionSide.callee
        optimalExpression =
          object.type === AST_NODE_TYPES.Identifier
          && object.name === argumentName
          && ((
            property.type === AST_NODE_TYPES.Identifier
            && property.name === 'getTime'
          ) || (
            property.type === AST_NODE_TYPES.Literal
            && property.value === 'getTime'
          ))

      break;
    }
  }
})
Enter fullscreen mode Exit fullscreen mode

πŸ’Ž You can find this in:

βœ… Automated Mentoring

When all these pieces are put together, it is the analyzer for gigasecond. There are a few more things to check:

  • Is the bigNumber.kind equal to "const"? If not, add a comment
  • Is the value of GIGASECOND_IN_MS using one of the comprehensions? If not, add a comment.
  • Is there only one argument to gigasecond? Make sure it's not a ...splat argument, and that it has no value = "default".
  • Is gigasecond actually exported? Is the export inline? if not, add a comment.

Since the first one was mentioned (kind equality check) and the second one is very similar to the inner expression of the new Date(...) call, I've left out how to implement these. You can check the gigasecond analyzer source code if you need some inspiration. The third one is testing the entry for parameters.

As for exports, these are handled by πŸ’Ž extract_export, but I'll show you the gist of it.

πŸ“¦ Testing exports

Exports in JavaScript and TypeScript basically come in three types. Those using a core language feature (read: use a keyword) are the easiest:

export const inline = {}
export default class InlineClass {}
export default defaultSpecifier
export { specifier }
export { specifier as renamed }
Enter fullscreen mode Exit fullscreen mode

The default export have their own token type ExportDefaultDeclaration.

{
  type: "ExportDefaultDeclaration",
  declaration: {
    // ...
  }
}
Enter fullscreen mode Exit fullscreen mode

Those without a default modifier are of type ExportNamedDeclaration.

{
  type: "ExportNamedDeclaration",
  declaration: {
    // ...
  }
}
Enter fullscreen mode Exit fullscreen mode

The declaration property is where it gets slightly tricky. The inline export statements, regardless if they're default or not, are followed by the same token types as if they did not have the export keyword similarly to how writing parentheses wraps an expression in a CallExpression.

Inline exports

This means that the first example is a VariableDeclaration with a single VariableDeclaractor: the id is an Identifier with name = "inline" and the init is an ObjectExpression. Similarly the second example is a ClassDeclaration with as id an Identifier with name = "InlineClass".

Specifier exports

The third has a declaration of type Identifier with name = "defaultSpecifier". This is similar to inline exports.

The fourth and fifth however, do not have a declaration property. Instead, they have a specifiers property with, in this case, one item:

{
  type: "ExportSpecifier",
  local: {
    type: "Identifier",
    name: "specifier"
  }
  exported: {
    type: "Identifier",
    name: "specifier" // or "renamed"
  }
}
Enter fullscreen mode Exit fullscreen mode

Use the local property to determine what is exported (what the internal name is) and the exported property how it's imported (what the exported name is).

CommonJS exports

Finally there are those exports that don't use a keyword but instead use the (as far as I'm concerned) defunct module.exports.

module.exports = singleExport
module.exports = { specifier }
module.exports.default = defaultExport
module.exports.renamed = specifier
Enter fullscreen mode Exit fullscreen mode

Since these don't use keywords, they're interpreted as ExpressionStatements, as they're AssignmentExpressions. Here is a quick overview table of the important properties and representations:

expression type prop value
module.exports.renamed = specifier AssignmentExpression
operator "="
Identifier right "specifier"
MemberExpression left πŸ”½ module.exports.renamed πŸ”½
module.exports.renamed MemberExpression
Identifier property "renamed"
MemberExpression object πŸ”½ module.exports πŸ”½
module.exports MemberExpression
Identifier property "exports"
Identifier object "module"

There is also the variant of using the object['accessor'], where accessor is not an Identifier but a Literal, but otherwise that is the same.

πŸ”€ Testing varations

As mentioned before, there are many ways to write functions in JavaScript and TypeScript. In the source code for the analyzer there is a utility method πŸ’Ž extract_main_method. It can detect the following variations:

function name() {}
// FunctionDeclaration

const name = () => {}
// ArrowFunctionExpression

const name = function() {}
// FunctionExpression

export default {
  name: () => {}
}
// ExportDefaultDeclaration + ObjectExpression + (Arrow)FunctionExpression
Enter fullscreen mode Exit fullscreen mode

And the TypeScript specific variants (but they work on both)

class Foo {
  name() {}
}
// MethodDefinition + FunctionExpression

class Foo {
  static name = () => {}
  static name = function name() {}
}
// ClassProperty + (Arrow)FunctionExpression
Enter fullscreen mode Exit fullscreen mode

Walking TypeScript Trees

As you've noticed, all we did so far is checking the JavaScript code and shown how that is parsed and walked. In order to adapt the solution so that TypeScript code can be parsed with it, there is one thing to change in the walker and a few extra properties to test for.

πŸ”‘ Visitor Keys

When the traverser is walking the tree, it decides which nodes to "walk πŸšΆπŸ½β€" based on a set of keys called visitor keys. Since TypeScript is a superset of JavaScript it has the same keys and then some.

import { visitorKeys } from "@typescript-eslint/parser/dist/visitor-keys"

traverse(root, {
  enter(node: Node) {
    // ...
  },

  visitorKeys
})
Enter fullscreen mode Exit fullscreen mode

If you look at the source of that file, you'll see that it actually imports the eslint visitor keys (in order to visit all the JavaScript keys) and adds the specific TypeScript keys.

πŸ“– Type annotations

These are interesting.

class Gigasecond {
  constructor(private readonly date: Date) { }
}
Enter fullscreen mode Exit fullscreen mode
{
  type: "ClassDeclaration",
  id: {
    type: "Identifier",
    name: "Gigasecond"
  },
  // everything inside the class { body }
  body: {
    type: "ClassBody",
    body: [
      {
        // a constructor is a regular method definition...
        type: "MethodDefinition",
        key: {
          type: "Identifier",
          name: "constructor"
        },
        value: {
          type: "FunctionExpression",
          params: [{ /*...*/ }],
          generator: false,
          expression: false,
          async: false,
          body: { /*...*/ }
        }
        computed: false,
        static: false,  // (typescript static keyword)
        // ... but with a special kind
        kind: 'constructor'
      }
    ]
  }
}
Enter fullscreen mode Exit fullscreen mode

The above has no special properties over the JavaScript equivalent, but that's because there are no type annotations in the source except for in the params of the constructor:

{
  type: "Identifier",
  name: "date",
  // : ...
  typeAnnotation: {
    type: "TSTypeAnnotation",
    typeAnnotation: {
      // Readonly
      type: "TSTypeReference",
      typeName: {
        type: "Identifier"
        name: "Readonly"
      },
      // <...>
      typeParameters: {
        type: "TSTypeParameterInstantiation",
        // Each type between the < brackets >
        params: [
          {
            type: "TSTypeReference",
            typeName: {
              type: "Identifier",
              name: "Date"
            }
          }
        ]
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

A few key observations:

  • The type annotation has its own visitor key typeAnnotation,
  • All the TypeScript nodes start with TS,
  • A generic type is just a TSTypeReference with both a typeName and one or multiple typeParameters.
  • Stripping types is almost as easy as removing the typeAnnotation keys, which is almost exactly what babel's preset-typescript does.

Class properties

In TypeScript you can annotate class properties with keywords such as private and readonly. Additionally, they can have a type (which is Date in this example).

class Gigasecond {
  private readonly futureDate: Date
}
Enter fullscreen mode Exit fullscreen mode
{
  type: "ClassProperty",
  key: {
    type: "Identifier",
    name: "futureDate"
  },
  computed: false,
  static: false,            // static keyword
  readonly: true,           // readonly keyword
  accessibility: "private", // private, protected, public keywords
  typeAnnotation: {
    type: "TSTypeAnnotation",
    typeAnnotation: {
      type: "TSTypeReference",
      typeName: {
        type: "Identifier",
        name: "Date"
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

The TypeScript keywords private and readonly modify the ClassProperty directly, but the type is again on typeAnnotation. If the type annotation is left out the source code (read: implicit any), the typeAnnotation key is not present on the AST.

↩ Return types

The final type we'll look at for now are function return types. Most other type annotations are just variations on this and the ones mentioned before.

class Gigasecond {
  date(): Date {
    // ...
  }
}
Enter fullscreen mode Exit fullscreen mode
{
  type: "MethodDefinition",
  key: {
    type: "Identifier",
    name: "date"
  },
  value: {
    type: "FunctionExpression",
    generator: false,
    expression: false,
    async: false,
    body: { /*...*/ },
    params: [],
    returnType: {
      type: "TSTypeAnnotation",
      typeAnnotation: {
        type: "TSTypeReference",
        typeName: {
          type: "Identifier",
          name: "Date"
        }
      }
    }
  },
  computed: false,
  static: false, // static keyword
  kind: "method"
}
Enter fullscreen mode Exit fullscreen mode

As you might have noticed, the typeAnnotation is not on the MethodDefinition. That is because a method definition on a class is actually binding a function expression (): Date { ... } to the identifier date.

On the FunctionExpression you can find the previously not encountered type annotation returnType. Its structure is the same as typeAnnotations for ClassProperty.

Conclusion

Interpreting code as an Abstract Syntax Tree and seeking out certain properties is a lot of tree walking; because there is a specification for the format of the output of certain AST Parsers, you can write tooling yourself.

The contents of this article, in a different format, is being used to automatically approve the gigasecond exercise, given that the student has provided an exact variation of the optimal solution. There is enough surface to hook into the findings of the analyzer to provide meaningfull commentary, should the student not have provided an optimal solution.

Photo of Erasmusbrug, Rotterdam, The Netherlands, displaying gray concrete bridge near buildings.

Reference

Analyzer reference

Exercism repositories

Packages

Top comments (5)

Collapse
 
jsjoeio profile image
Joe Previte (he/him)

I like that in your solution you say "Optimal is extracting the GIGASECOND_IN_MS value as a top-level constant".

I did not do this in my solution, but I can see how it makes your code more readable. I also like how you defined your variable name in all caps.

What's your personal rule of them for this? (i.e. when to use all caps)

Collapse
 
sleeplessbyte profile image
Derk-Jan Karrenbeld

Great questions!

I am actually writing the analyser code to approve_with_comment if someone has not extracted it. There are two variants we see that are approvable but not optimal.

// Variation one
export function gigasecond(...) {
 const gigasecond = 10 ** 12
 return ...
}

They already have it as a constant, but it's "re-created" each invocation. Extracting is might be a premature optimisation from a speed standpoint, but that's not it.

  1. The variable name is shadowing the function name, which leads to subtle bugs.

  2. Extracting the constant to the top-level allows for re-use (which is ok, not a necessity)

  3. Extracting the constant is declaring intent: this is a "compile"-time value, that is actually constant in value, and not just assignment, which is further solidified with making it UPPERCASE

In the above case, extracting it is a small gain, but mostly for declaring intent.

// Variation two
export function gigasecond(...) {
 return new Date(... + 10 ** 12)
}

There is nothing wrong with the solution above, but this is a magic number or rather magic expression. It's adding a number to date, but in a year's time, can you remember why it's 10 ** 12 and not 10 ** 9 (which would be giga)? Maintainability is easier if you name your constants.

What's your personal rule of them for this? (i.e. when to use all caps)

This is just a preference I inherited from other languages where this is enforced in the language (Ruby constants start with a Capital for example), but mostly I do this in JavaScript and TypeScript based on:

  • If the constant is constant per invocation, e.g. it is a computed constant that can change based on the state of the process or function (input is a state), then I use const camelCase.
  • If the constant is constant per process, e.g. it is the same value when "compiled" or when "first interpreted", it is probably a file-level (and thus process-level) constant. I use const UPPER_CASE

The added benefit of the approach above is:

  • It's instantly clear which VALUE is always the same and which functionValue depend on state.
  • Most code highlighters (such as prism, or the tsc in vscode etc.) will color the UPPER_CASE differently.

I hope these make sense!

Collapse
 
jsjoeio profile image
Joe Previte (he/him)

They already have it as a constant, but it's "re-created" each invocation.

Great point! Didn't even think about that when I first wrote my solution.

Also, thanks for the detailed explanation and sharing your thoughts on when to use uppercase for your consts. Seems like a solid rule I may adopt into my own practice!

Collapse
 
jsjoeio profile image
Joe Previte (he/him)

Really interesting read! When I heard exercism was going to automate solutions, I didn't realize this is what they meant. I thought there was an "easier" way. But it's really fascinating to read about AST parsing.

The way you broke everything down made it feel more approachable than I originally thought. Thanks for sharing this and putting in all the work for the JS/TS communities.

Collapse
 
sleeplessbyte profile image
Derk-Jan Karrenbeld

C# has a different approach for two-fer and Ruby is going to try it out for another solution where they don't do AST interpretation like I do, but do AST matching:

  • parse the AST for a lot of solutions
  • strip the Identifier names and rename them to be consistent e.g. a, b, c or input_arg_1, body_constant_1 (so that two solutions that are identical except for the name use are now actually identical). This is normalisation
  • match the incoming, normalised AST of the solution to the set of known solutions, and have a fixed output.
  • periodically collect everything that is not matched and add these to the analyser.

I think that is an easier approach, but not necessarily a better one. Especially in JavaScript where you have so many ways to write the same thing (this each way creating a new set of permutations).

The way you broke everything down made it feel more approachable than I originally thought.

Feel free to contribute! πŸ’˜