DEV Community

Jason Barr
Jason Barr

Posted on • Edited on

Create Your Own Programming Language 6: Functions

Welcome to the next installment in the Create Your Own Programming Language series! In this article we're going to add functions to the Wanda language.

As always, if you haven't read the previous article in the series on vectors and records go ahead and do that first.

Ok, let's go!

What We're Going To Do

The most important thing we're going to do today is add functions to our language. Functions are extremely important. In fact, when you have functions you can literally do anything with your language. You might not believe it, but literally everything in your programming language can be functions if you're dedicated enough.

That's one of my all-time favorite computer science talks and I strongly recommend you take the time to watch it (it's split into 2 parts in the playlist).

Adding functions is the most important thing we're doing today, but we'll also do a few other things to support the addition of functions to our language.

First, we'll improve our type annotations and how we parse them. We hadn't really thought through how type annotations were supposed to work, so we'll solidify that and implement a more uniform method of parsing them.

We'll also add desugaring for the first time.

As mentioned before, "desugaring" means converting more complex language forms into simpler core forms that make things easier for the compiler to handle.

To do that, we'll create a base Visitor from which the desugarer will inherit so we only have to implement the methods we need on the Desugarer class. We're going to make it so that function declarations, which use the keyword def, desugar to the var variable declaration form with a Lambda Expression as the value being assigned.

We're also going to make it so that all functions created in the language are curried. If you're not familiar with curried functions, a curried function is one that can take in one argument at a time. If you partially apply a curried function, i.e. give it fewer arguments than it needs to execute, you get back another function that only needs the remaining arguments before it will execute. That means we'll need to make some changes to how call expressions are checked in the type checker.

We're also going to allow functions to be variadic, which is just a fancy way of saying you can allow a function to take any number of arguments.

Finally, we'll add contracts for the functions in our core library. A contract is basically a type signature. It's a promise that this function will receive these types of arguments and produce that type of return value.

There are no changes to the lexer and reader for this article, so we'll start with the parser.

Changes To The Parser

We need to do 2 things with the parser: enable it to parse functions and fix how we handle type annotations. Let's do the latter first.

Fixing Type Annotations

Before now, it's been possible for type annotations to be 2 things: either a primitive or named (Symbol) value type (i.e. number) or what we could call a "generic" type (even though we don't have generics yet) where you have a main type and that type has one or more types it contains.

The two kinds of "generic" types we've used so far are list and vector types. We've been writing them like list (number).

This actually complicates parsing type annotations quite a bit, because we could have to parse 2 things from the list of forms or only 1 thing depending on which kind of annotation it is. I don't want to have that kind of complexity in parsing type annotations, so I'm going to change how we handle generic type annotations.

From now on, instead of writing list (number) we're going to use (list number). That way regardless of whether it's a named type or a generic type the parser only has to handle a single form to parse the type annotation.

When we get generic types in full, we'll continue the same scheme so, i.e., a Result type that can be either a string or an Exception will get the annotation (Result string Exception). If you're not familiar with Result types, don't worry about it for now; it's just an example of a generic type that has more than one type parameter.

We're also going to add type annotations for functions, which will be written like (argtype1 argtype2 ... -> returnType). Again, the type annotation parser only has to handle one form from the reader because the entire annotation will be inside a cons list.

For variadic functions with type annotations, we'll use the & operator and make the rest parameter a vector.

A variadic function annotation could look something like this:

(number &(vector number) -> number)
Enter fullscreen mode Exit fullscreen mode

This will greatly streamline parsing when it comes to handling type annotations, which will simplify things like parsing function parameters.

Parsing Type Annotations

Let's take a look at src/parser/parseTypeAnnotation.js. We need to check if the annotation is a Cons or a Symbol. If it's a Cons we'll flatten it into an array.

Next we need to see if it's a function or generic annotation. If it's a function annotation it will have an arrow. If it has an arrow, we'll handle it as a function annotation.

We'll filter the arrow out because once we know it's a function annotation the arrow has done its job. We'll parse the last item in the array as the return type and all the rest as parameter types. Then construct the annotation and it's ready to go.

We've also changed the way we handle a nil type annotation. I found during further testing that the reader passes nil back as a Nil token, not as a Symbol token.

Everything else stays the same as it was: if the annotation was generic we get the first item in the array as the container type then handle the 2nd item as the contained type. Otherwise we handle the annotation as a named type like number or a type alias.

export const parseTypeAnnotation = (annotation) => {
  if (annotation instanceof Cons) {
    // is function or generic annotation
    // flatten Cons to array
    annotation = [...annotation];

    // if it has an arrow, it's a function annotation
    const hasArrow = annotation.reduce((hasArrow, item) => {
      if (item.type === TokenTypes.Symbol && item.value === "->") {
        return true;
      }
      return hasArrow;
    }, false);

    if (hasArrow) {
      // is function annotation
      // filter out arrow
      annotation = annotation.filter((item) => item.value !== "->");

      // get return type
      const retType = parseTypeAnnotation(annotation.pop());
      // get param types and if it's variadic
      let params = [];
      let variadic = false;

      for (let item of annotation) {
        if (item.type === TokenTypes.Amp) {
          variadic = true;
          continue;
        } else {
          params.push(parseTypeAnnotation(item));
        }
      }

      return { kind: TATypes.Function, params, retType, variadic };
    }
  }

  let annot;
  if (Array.isArray(annotation)) {
    // is generic annotation
    // get container type
    annot = annotation[0];
  } else {
    // is simple annotation
    annot = annotation;
  }

  if (annot.type === "RecordLiteral") {
    return parseObjectAnnotation(annot);
  }

  if (annot.type === TokenTypes.Nil) {
    return { kind: TATypes.NilLiteral };
  }

  if (annot.type === TokenTypes.Symbol) {
    switch (annot.value) {
      case "any":
        return { kind: TATypes.AnyLiteral };
      case "number":
        return { kind: TATypes.NumberLiteral };
      case "string":
        return { kind: TATypes.StringLiteral };
      case "boolean":
        return { kind: TATypes.BooleanLiteral };
      case "keyword":
        return { kind: TATypes.KeywordLiteral };
      case "list":
        // annotation is array with listType as 2nd member
        return parseListAnnotation(annotation[1]);
      case "vector":
        return parseVectorAnnotation(annotation[1]);
      default:
        // must be a named type
        return { kind: TATypes.Symbol, name: annot.value };
    }
  }

  throw new Exception(
    `Unknown type annotation kind ${JSON.stringify(annot, null, 2)}`
  );
};
Enter fullscreen mode Exit fullscreen mode

Parsing Functions

First, let's add AST nodes for the new forms: FunctionDeclaration and LambdaExpression.

We'll add 3 members (including one for function parameters) to the ASTTypes enum in src/parser/ast.js:

export const ASTTypes = {
  // other nodes
  Param: "Param",
  FunctionDeclaration: "FunctionDeclaration",
  LambdaExpression: "LambdaExpression",
};
Enter fullscreen mode Exit fullscreen mode

And we'll add constructors for the nodes:

export const AST = {
  // other constructors...
  FunctionDeclaration(name, params, body, variadic, retType, srcloc) {
    return {
      kind: ASTTypes.FunctionDeclaration,
      name,
      params,
      body,
      variadic,
      retType,
      srcloc,
    };
  },

  LambdaExpression(params, body, variadic, retType, srcloc) {
    return {
      kind: ASTTypes.LambdaExpression,
      params,
      body,
      variadic,
      retType,
      srcloc,
    };
  },
};
Enter fullscreen mode Exit fullscreen mode

There are 2 new nodes, so 2 different function forms we need to parse: function declarations and lambda expressions. Function declarations start with the def symbol and lambdas start with fn.

Let's add those cases to the switch statement in parseList in src/parser/parse.js:

    // other cases...
    case "def":
      return parseFunctionDeclaration(form);
    case "fn":
      return parseLambdaExpression(form);
    // default case
Enter fullscreen mode Exit fullscreen mode

We just delegate to the parser functions for these nodes, which are very similar. The biggest difference is that parseFunctionDeclaration needs to handle the name of the function. We can delegate most of the work to a single parseFunction function. We'll also handle parsing parameters with a separate parseParams function since that will be a little bit of work.

Both function declarations and lambda expressions can have annotations both for parameter types and return types, so we'll need to handle both the cases where these are included and when they are omitted.

We'll use "maybe" variable names for the relevant parts of a function form.

Here are parseFunctionDeclaration and parseLambdaExpression:

const parseFunctionDeclaration = (form) => {
  const [_, name, params, maybeArrow, maybeRetType, ...maybeBody] = form;
  const srcloc = form.srcloc;
  const parsedName = parseExpr(name);
  const { parsedParams, parsedBody, variadic, retType } = parseFunction(
    params,
    maybeArrow,
    maybeRetType,
    maybeBody
  );

  return AST.FunctionDeclaration(
    parsedName,
    parsedParams,
    parsedBody,
    variadic,
    retType,
    srcloc
  );
};

const parseLambdaExpression = (form) => {
  const [_, params, maybeArrow, maybeRetType, ...maybeBody] = form;
  const srcloc = form.srcloc;
  const { parsedParams, parsedBody, variadic, retType } = parseFunction(
    params,
    maybeArrow,
    maybeRetType,
    maybeBody
  );

  return AST.LambdaExpression(
    parsedParams,
    parsedBody,
    variadic,
    retType,
    srcloc
  );
};
Enter fullscreen mode Exit fullscreen mode

Then parseFunction handles most of the real work:

const parseFunction = (params, maybeArrow, maybeRetType, maybeBody) => {
  let retType, body;

  if (maybeArrow.type === TokenTypes.Symbol && maybeArrow.value === "->") {
    // has return type annotation
    retType = parseTypeAnnotation(maybeRetType);
    body = maybeBody;
  } else {
    retType = null;
    body = maybeRetType
      ? [maybeArrow, maybeRetType, ...maybeBody]
      : [maybeArrow, ...maybeBody];
  }

  const variadic = [...params].reduce((isVar, param) => {
    if (param.type === TokenTypes.Amp) {
      return true;
    }

    return isVar;
  }, false);
  const parsedParams = parseParams(params);
  /** @type {AST[]} */
  const parsedBody = body.map(parseExpr);

  return {
    parsedParams,
    parsedBody,
    variadic,
    retType,
  };
};
Enter fullscreen mode Exit fullscreen mode

And parseParams uses a for loop over the parameter forms and handles the cases where they do and do not have type annotations:

const parseParams = (forms) => {
  forms = [...forms];
  /** @type {import("./ast.js").Param[]} */
  let params = [];
  for (let i = 0; i < forms.length; i++) {
    const form = forms[i];
    if (form.type === TokenTypes.Symbol) {
      const name = parseExpr(form);
      let typeAnnotation = null;

      if (
        forms[i + 1]?.type === TokenTypes.Keyword &&
        forms[i + 1].value === ":"
      ) {
        // has type annotation
        typeAnnotation = parseTypeAnnotation(forms[i + 2]);
        i += 2;
      }

      params.push({ kind: ASTTypes.Param, name, typeAnnotation });
    } else if (form.type === TokenTypes.Amp) {
      continue;
    }
  }

  return params;
};
Enter fullscreen mode Exit fullscreen mode

See how simple it is now that we only have to worry about a single form for a type annotation? This would be much more complicated to parse if we'd kept the old list (number) type of annotation.

We also need to make a slight edit to parseVariableDeclaration to handle the new type annotation paradigm:

const parseVariableDeclaration = (decl) => {
  let [_, lhv, expression] = decl;

  let parsedLhv,
    typeAnnotation = null;
  if (lhv instanceof Cons) {
    // has type annotation
    const realLhv = lhv.get(0);
    // skip over : and get type annotation
    typeAnnotation = lhv.cdr.cdr;

    if (typeAnnotation.cdr === null) {
      // is a simple annotation, otherwise it's a Cons type annotation
      typeAnnotation = typeAnnotation.car;
    }
    // parse type annotation
    typeAnnotation = parseTypeAnnotation(typeAnnotation);
    parsedLhv = parseExpr(realLhv);
  } else {
    parsedLhv = parseExpr(lhv);
  }

  if (parsedLhv.kind === ASTTypes.VectorLiteral) {
    parsedLhv = convertVectorLiteralToVectorPattern(parsedLhv);
  }

  const parsedExpression = parseExpr(expression);

  return AST.VariableDeclaration(
    parsedLhv,
    parsedExpression,
    decl.srcloc,
    typeAnnotation
  );
};
Enter fullscreen mode Exit fullscreen mode

Changes to The Type Checker

Adding functions to our language presents a particular problem for the type checker: recursive functions won't have a type inferred for the return type. Even with type annotations, the function's name won't be set yet in the type environment because its body hasn't finished being processed yet.

That also means you can't define mutually recursive functions. Trying to call the 2nd function in the body of the 1st function will throw an error because the type of the 2nd function hasn't been defined yet.

To handle this issue, we're going to implement 2 passes in the type checker.

You'll see that when we get to the changes to the TypeChecker class, but first let's implement the machinery we need to check function types.

As a nod towards handling the issue with the function name not being set in the environment, we're going to add a special internal type to the type checker called undefined.

You'll remember that we actually already have function types because we needed them to implement call expressions a few articles back.

Add the undefined case to the TypeTypes enum in src/typechecker/types.js:

export const TypeTypes = {
  // other members
  Undefined: "Undefined",
};
Enter fullscreen mode Exit fullscreen mode

Also add a constructor in src/typechecker/constructors.js:

export const undefinedType = { kind: TypeTypes.Undefined };
Enter fullscreen mode Exit fullscreen mode

And a validator in src/typechecker/validators.js:

export const isUndefined = (type) => {
  return type.kind === TypeTypes.Undefined;
};
Enter fullscreen mode Exit fullscreen mode

A type should only ever be undefined during the first pass of the type checker, and we need undefined to always pass a subtype check so we'll treat it similarly to any in src/typechecker/isSubtype.js:

  // other cases
  if (Type.isUndefined(type1) || Type.isUndefined(type2)) return true;
  // rest of function
Enter fullscreen mode Exit fullscreen mode

We'll also need to handle subtyping functions. A function is a subtype of another function if all the parameters of one are subtypes of all the parameters of the other and the return type of the first is a subtype of the return type of the second.

  // undefined case
  if (Type.isFunctionType(type1) && Type.isFunctionType(type2)) {
    return (
      type1.params.length === type2.params.length &&
      type1.params.every((a, i) => isSubtype(type2.params[i], a)) &&
      isSubtype(type1.ret, type2.ret)
    );
  }
  // rest of function
Enter fullscreen mode Exit fullscreen mode

We also need to be able to parse a function type from a type annotation in fromTypeAnnotation, in src/typechecker/fromTypeAnnotation.js:

    // other cases
    case TATypes.Function: {
      const paramTypes = typeAnnotation.params.map((p) =>
        fromTypeAnnotation(p, typeEnv)
      );
      const retType = fromTypeAnnotation(typeAnnotation.retType, typeEnv);
      return Type.functionType(paramTypes, retType, typeAnnotation.variadic);
    }
    // default case
Enter fullscreen mode Exit fullscreen mode

We need to be able to infer types for functions. We can use the same inferFunction function to handle both function declarations and lambda expressions, since the type is what matters here and not the function name.

In src/typechecker/infer.js we add cases for LambdaExpression and FunctionDeclaration to the switch statement in the infer function:

    // other cases
    case ASTTypes.LambdaExpression:
      return inferFunction(ast, env);
    case ASTTypes.FunctionDeclaration:
      return inferFunction(ast, env);
    // default case
Enter fullscreen mode Exit fullscreen mode

Note that in inferFunction the environment that will be passed into infer will be the extended function environment, not the global environment.

We map over the function node's parameters. If the parameters have type annotations, we turn checking on.

If the parameters have type annotations we construct the type from the annotation; otherwise we set them to any.

We set each parameter name with its type in the environment and return the type from the map callback to get an array of types for the parameters.

If the node has an annotated return type we make sure type checking is on.

If there's a return type annotation we construct that type, otherwise we assign any. Then we loop over the function body and infer the return type from the function's body using the extended function environment.

If the inferred body type is a subtype of the annotated return type, everything is good and we construct and return the function type.

If the function wasn't annotated (in which case its return annotation type is any) or has been inferred as undefined, we use the inferred return type. Otherwise, if there was a function annotation, we use the return annotation type.

Here's inferFunction:

const inferFunction = (node, env) => {
  const params = node.params.map((p) => {
    if (p.typeAnnotation) {
      env.checkingOn = true;
    }
    const type = p.typeAnnotation
      ? fromTypeAnnotation(p.typeAnnotation, env)
      : Type.any;
    env.set(p.name.name, type);
    return type;
  });

  if (node.retType) {
    env.checkingOn = true;
  }

  const retType = node.retType
    ? fromTypeAnnotation(node.retType, env)
    : Type.any;
  let inferredRetType;

  for (let expr of node.body) {
    // type of last expression "wins"
    inferredRetType = infer(expr, env);
  }

  if (env.checkingOn && !isSubtype(inferredRetType, retType)) {
    throw new TypeException(
      `Inferred return type ${Type.toString(
        inferredRetType
      )} is not a subtype of annotated return type ${Type.toString(retType)}`,
      node.srcloc
    );
  }

  return Type.functionType(
    params,
    Type.isAny(retType) || Type.isUndefined(retType)
      ? inferredRetType
      : retType,
    node.variadic
  );
};
Enter fullscreen mode Exit fullscreen mode

We also need to make some changes to inferCallExpression.

Since we now will have curried functions that can be partially applied, we don't need to throw an error if the call expression has fewer arguments than the type of the function. Instead, we'll construct a new function type using the remaining parameter types and the function's return type and return that.

Let's also do some refactoring and delegate the actual argument checking to a function called checkArgTypes.

Here's the new version of inferCallExpression with checkArgTypes:

const inferCallExpression = (node, env) => {
  const func = infer(node.func, env);

  if (Type.isAny(func)) {
    return Type.any;
  } else if (Type.isUndefined(func) || Type.isUndefined(func.ret)) {
    // this should only happen during first typechecker pass
    return Type.undefinedType;
  }

  if (!func.variadic && node.args.length > func.params.length) {
    throw new TypeException(
      `Too many arguments for function: ${node.args.length} given; ${func.params.length} expected`,
      node.srcloc
    );
  }

  // handle partially applied functions
  if (
    node.args.length <
    (func.variadic ? func.params.length - 1 : func.params.length)
  ) {
    // is partially applied
    const params = func.params.slice(0, node.args.length);

    if (env.checkingOn) {
      checkArgTypes(node, params, env, func);
    }

    const newParams = func.params.slice(node.args.length);
    return Type.functionType(newParams, func.ret, func.variadic);
  }

  if (env.checkingOn) {
    checkArgTypes(node, func.params, env, func);
  }

  return func.ret;
};

const checkArgTypes = (node, params, env, func) => {
  node.args.forEach((arg, i) => {
    const argType = infer(arg, env);

    if (func.variadic && i >= params.length - 1) {
      // is part of rest args
      let p = params[params.length - 1];
      if (!p.vectorType) {
        throw new TypeException(
          `Rest parameter type must be vector; ${Type.toString(p)} given`,
          arg.srcloc
        );
      }
      p = p.vectorType;
      if (!isSubtype(argType, p)) {
        throw new TypeException(
          `${Type.toString(argType)} is not a subtype of ${Type.toString(p)}`,
          arg.srcloc
        );
      }
    } else {
      const p = params[i];
      if (!isSubtype(argType, p)) {
        throw new TypeException(
          `${Type.toString(argType)} is not a subtype of ${Type.toString(p)}`,
          arg.srcloc
        );
      }
    }
  });
};
Enter fullscreen mode Exit fullscreen mode

Note that checkArgTypes has to handle the case of a variadic function where the call expression can have more arguments than the number of parameters in the function type.

In src/typechecker/check.js we need a case for function types. Add this check to check above const inferredType = infer(ast, env);:

  if (
    (ast.kind === ASTTypes.FunctionDeclaration ||
      ast.kind === ASTTypes.LambdaExpression) &&
    Type.isFunctionType(type)
  ) {
    return checkFunction(ast, type, env);
  }
Enter fullscreen mode Exit fullscreen mode

checkFunction checks the type of each parameter in the function type against each parameter in the AST node, then checks the return types. Here's checkFunction:

const checkFunction = (ast, type, env) => {
  if (type.params.length !== ast.params.length) {
    throw new TypeException(
      `Expected ${type.params.length} args; got ${ast.params.length}`,
      ast.srcloc
    );
  }

  const maybeType = env.get(ast.name?.name);
  const funcType = maybeType ? maybeType : infer(ast, env);

  type.params.forEach((p, i) => {
    const pType = funcType.params[i];
    if (!isSubtype(pType, p)) {
      throw new TypeException(
        `${Type.toString(pType)} is not a valid subtype of ${Type.toString(p)}`,
        ast.srcloc
      );
    }
  });

  if (!isSubtype(funcType.ret, type.ret)) {
    throw new TypeException(
      `${Type.toString(funcType.ret)} is not a valid subtype of ${Type.toString(
        type.ret
      )}`,
      ast.srcloc
    );
  }
};
Enter fullscreen mode Exit fullscreen mode

We already have a string conversion for function types, but we also need one for undefined.

In the switch statement in src/typechecker/typeToString.js:

    // other cases
    case TypeTypes.Undefined:
      return "undefined";
    // default case
Enter fullscreen mode Exit fullscreen mode

The biggest changes to the type checker will be in the TypeChecker class as we implement a two-pass typechecker.

The first pass is just to populate the environment, including getting all function names added to the environment (even if the function types return undefined).

Then the 2nd pass actually infers and checks all types now that the environment has been set up.

First, at the top of the src/typechecker/TypeChecker.js file (above the TypeChecker class declaration), add this:

let isSecondPass = false;
Enter fullscreen mode Exit fullscreen mode

Next, rename the check method on the TypeChecker class to checkNode.

If you're using VS Code you can do this easily by highlighting the name of the check method and pressing F2. Other IDEs should have similar refactoring capabilities, so you shouldn't have to search and replace the text.

Now add a new check method that implements the 2-pass checking:

  check(env = this.env) {
    // first pass is to populate environments so valid forward references will resolve
    const firstPassProgram = this.checkNode(this.program, env);
    isSecondPass = true;
    return this.checkNode(firstPassProgram, env);
  }
Enter fullscreen mode Exit fullscreen mode

To keep track of all the environments, we're actually going to add the extended environments to the nodes they're there for. That means with DoExpression, FunctionDeclaration, and LambdaExpression nodes we attach the extended environment to the node itself.

Here's the edited version of checkDoExpression:

  checkDoExpression(node, env) {
    if (!node.env) {
      // during the first pass this will create the child env
      node.env = env.extend("DoExpression");
    }

    const doEnv = node.env;

    /** @type {TypedAST[]} */
    let body = [];
    for (let expr of node.body) {
      const node = this.checkNode(expr, doEnv);
      body.push(node);
    }

    return {
      kind: node.kind,
      body,
      srcloc: node.srcloc,
      type: infer(node, doEnv),
    };
  }
Enter fullscreen mode Exit fullscreen mode

Next let's add the checkFunctionDeclaration method. Similarly to checkDoExpression, we check the node for the existence of an env property and, if we don't find it, we add it.

Next we infer the type of the function using the new function environment.

We have a nested environment, so if checking has been turned on in the nested environment we need to be sure to propagate that change to the parent environment. Also, if checking is on, we need to check the node.

Finally, we set the function type to its name in the parent environment and return the type-annotated AST node.

  checkFunctionDeclaration(node, env) {
    if (!node.env) {
      node.env = env.extend(node.name.name);
    }

    const funcEnv = node.env;
    const type = infer(node, funcEnv);

    if (funcEnv.checkingOn && isSecondPass) {
      env.checkingOn = funcEnv.checkingOn;
      check(node, type, funcEnv);
    }

    env.set(node.name.name, type);
    return { ...node, type };
  }
Enter fullscreen mode Exit fullscreen mode

checkLambdaExpression is almost exactly the same except without setting the function type to its name in the environment:

  checkLambdaExpression(node, env) {
    if (!node.env) {
      node.env = env.extend("Lambda");
    }

    const funcEnv = node.env;
    const type = infer(node, funcEnv);

    if (funcEnv.checkingOn) {
      env.checkingOn = funcEnv.checkingOn;
      check(node, type, funcEnv);
    }

    return { ...node, type };
  }
Enter fullscreen mode Exit fullscreen mode

We also need to make sure in checkCallExpression that we don't return undefined as the type from a function on the 2nd checker pass:

  checkCallExpression(node, env) {
    // infer handles checking for argument types
    let type = infer(node, env);

    if (Type.isUndefined(type) && isSecondPass) {
      type = Type.any;
    }

    return { ...node, type };
  }
Enter fullscreen mode Exit fullscreen mode

That shouldn't ever happen, but just in case it does we'll handle it there.

Finally, we need to handle the new two-pass checking in checkSymbol:

  checkSymbol(node, env) {
    try {
      let type = infer(node, env);

      if (Type.isUndefined(type)) {
        // this should never happen on the 2nd pass
        type = Type.any;
        env.set(node.name, type);
      }

      return { ...node, type };
    } catch (e) {
      if (!isSecondPass) {
        env.set(node.name, Type.undefinedType);
      }
    }
  }
Enter fullscreen mode Exit fullscreen mode

Adding a Generic Visitor

You know by now that for compiler passes over the AST we've been using the Visitor Pattern to execute a postorder traversal over all the nodes of the syntax tree.

We haven't been able to do this until now, because we've needed compile-time environments in both the traversal passes we've implemented thus far, but now we can actually implement a generic Visitor with a method for each node that we use to transform and return new versions of the syntax tree.

We're going to use it for desugaring the typechecked AST into some core forms used by the emitter, which will allow us to simplify the emitter and not have to implement emitter methods for every kind of AST node.

The generic Visitor simply takes in a node with each method, visits any child nodes, and then returns a node exactly like the original.

Then in the Desugarer class we only have to implement the methods we need.

Here's the Visitor class in src/visitor/Visitor.js:

import { AST, ASTTypes } from "../parser/ast.js";
import { SyntaxException } from "../shared/exceptions.js";

export class Visitor {
  constructor(program) {
    this.program = program;
  }

  static new(program) {
    return new Visitor(program);
  }

  visit(node = this.program) {
    switch (node.kind) {
      case ASTTypes.Program:
        return this.visitProgram(node);
      case ASTTypes.NumberLiteral:
        return this.visitNumberLiteral(node);
      case ASTTypes.StringLiteral:
        return this.visitStringLiteral(node);
      case ASTTypes.BooleanLiteral:
        return this.visitBooleanLiteral(node);
      case ASTTypes.KeywordLiteral:
        return this.visitKeywordLiteral(node);
      case ASTTypes.NilLiteral:
        return this.visitNilLiteral(node);
      case ASTTypes.Symbol:
        return this.visitSymbol(node);
      case ASTTypes.CallExpression:
        return this.visitCallExpression(node);
      case ASTTypes.VariableDeclaration:
        return this.visitVariableDeclaration(node);
      case ASTTypes.SetExpression:
        return this.visitSetExpression(node);
      case ASTTypes.DoExpression:
        return this.visitDoExpression(node);
      case ASTTypes.TypeAlias:
        return this.visitTypeAlias(node);
      case ASTTypes.VectorLiteral:
        return this.visitVectorLiteral(node);
      case ASTTypes.VectorPattern:
        return this.visitVectorPattern(node);
      case ASTTypes.RecordLiteral:
        return this.visitRecordLiteral(node);
      case ASTTypes.RecordPattern:
        return this.visitRecordPattern(node);
      case ASTTypes.MemberExpression:
        return this.visitMemberExpression(node);
      case ASTTypes.Param:
        return this.visitParam(node);
      case ASTTypes.FunctionDeclaration:
        return this.visitFunctionDeclaration(node);
      case ASTTypes.LambdaExpression:
        return this.visitLambdaExpression(node);
      default:
        throw new SyntaxException(node.kind, node.srcloc);
    }
  }

  visitBooleanLiteral(node) {
    return node;
  }

  visitCallExpression(node) {
    const func = this.visit(node.func);
    const args = node.args.map(this.visit.bind(this));
    return { ...node, func, args };
  }

  visitDoExpression(node) {
    let body = [];

    for (let n of node.body) {
      body.push(this.visit(n));
    }

    return { ...node, body };
  }

  visitFunctionDeclaration(node) {
    const params = node.params.map(this.visit.bind(this));
    let body = [];

    for (let expr of node.body) {
      body.push(this.visit(expr));
    }

    return { ...node, params, body };
  }

  visitKeywordLiteral(node) {
    return node;
  }

  visitLambdaExpression(node) {
    const params = node.params.map(this.visit.bind(this));
    let body = [];

    for (let expr of node.body) {
      body.push(this.visit(expr));
    }

    return { ...node, params, body };
  }

  visitMemberExpression(node) {
    const object = this.visit(node.object);
    const property = this.visit(node.property);
    return { ...node, object, property };
  }

  visitNilLiteral(node) {
    return node;
  }

  visitNumberLiteral(node) {
    return node;
  }

  visitParam(node) {
    return node;
  }

  visitProgram(node) {
    let body = [];
    for (let n of node.body) {
      body.push(this.visit(n));
    }

    return { ...node, body };
  }

  visitRecordLiteral(node) {
    return node;
  }

  visitRecordPattern(node) {
    return node;
  }

  visitSetExpression(node) {
    const lhv = this.visit(node.lhv);
    const expression = this.visit(node.expression);
    return { ...node, lhv, expression };
  }

  visitStringLiteral(node) {
    return node;
  }

  visitSymbol(node) {
    return node;
  }

  visitTypeAlias(node) {
    return node;
  }

  visitVariableDeclaration(node) {
    const lhv = this.visit(node.lhv);
    const expression = this.visit(node.expression);
    return { ...node, lhv, expression };
  }

  visitVectorLiteral(node) {
    return node;
  }

  visitVectorPattern(node) {
    return node;
  }
}
Enter fullscreen mode Exit fullscreen mode

Note that some methods map over arrays of nodes, which means they need to use the bind method on the method they pass into map.

Now it's time to desugar FunctionDeclaration nodes into VariableDeclaration nodes with LambdaExpression values.

We'll create a new file src/desugarer/Desugarer.js and stub out the Desugarer class:

import { AST } from "../parser/ast.js";
import { TATypes } from "../parser/parseTypeAnnotation.js";
import { Visitor } from "../visitor/Visitor.js";

export class Desugarer extends Visitor {
  constructor(program) {
    super(program);
  }

  static new(program) {
    return new Desugarer(program);
  }
}
Enter fullscreen mode Exit fullscreen mode

Since the base class handles all other nodes, the only method we need to implement is visitFunctionDeclaration. Remember that this is after type checking, so the AST we're getting in this case is already annotated with type information.

The emitter expects to have that type information, so we need to make sure our visitFunctionDeclaration node handles it.

We construct a LambdaExpression node from the FunctionDeclaration node, use the node's name property to create the variable name, and handle type information accordingly.

  visitFunctionDeclaration(node) {
    const variadic = node.variadic;
    // since it's TypedAST it has a type property
    const type = node.type;
    const lambda = AST.LambdaExpression(
      node.params,
      node.body,
      variadic,
      node.retType,
      node.srcloc
    );
    lambda.type = type;
    const paramTypes = node.params.map(
      (p) => p.typeAnnotation ?? { kind: TATypes.AnyLiteral }
    );
    const retType = node.retType ?? { kind: TATypes.AnyLiteral };
    const funcType = {
      kind: TATypes.Function,
      params: paramTypes,
      retType,
      variadic,
    };

    const varDecl = AST.VariableDeclaration(
      node.name,
      lambda,
      node.srcloc,
      funcType
    );

    varDecl.type = type;
    return varDecl;
  }
Enter fullscreen mode Exit fullscreen mode

I bet that's way simpler than you thought it would be at first. I was amazed myself at how straightforward it is the first time I did this.

Now we need to use the Desugarer class. In src/desugarer/desugar.js we rewrite the desugar function:

import { Desugarer } from "./Desugarer.js";

export const desugar = (ast) => Desugarer.new().visit(ast);
Enter fullscreen mode Exit fullscreen mode

Changes to The Runtime

First, we're going to use a package to handle currying functions. Ramda is a good choice, and it also has a lot of other goodies if we decide we want to use them.

npm install ramda
Enter fullscreen mode Exit fullscreen mode

We also need a function to parse contracts so we can implement them on library functions defined in JavaScript.

Let's add the parseContract function to the new file src/runtime/parseContract.js:

import { tokenize } from "../lexer/tokenize.js";
import { read } from "../reader/read.js";
import { parseTypeAnnotation } from "../parser/parseTypeAnnotation.js";
import { fromTypeAnnotation } from "../typechecker/fromTypeAnnotation.js";

export const parseContract = (code) =>
  fromTypeAnnotation(parseTypeAnnotation(read(tokenize(code)).car));
Enter fullscreen mode Exit fullscreen mode

Since we're only ever going to read in one contract at a time, we can take the car property off the output from read and use it as the input to parseTypeAnnotation.

Now we need to rewrite makeFunction in src/runtime/makeFunction.js to handle currying everything, as well as creating contracts for library functions.

We'll pass an optional 2nd parameter into makeFunction that's an options object.

Right now the only option is contract, but that could change in the future.

We'll also add a hash of the function object as a name property since the compiler loses the name.

Here's the new version of makeFunction:

import objectHash from "object-hash";
import { curryN } from "ramda";
import { makeWandaValue } from "./conversion.js";
import { addMetaField } from "./object.js";
import { parseContract } from "./parseContract.js";

export const makeFunction = (func, { contract = "" } = {}) => {
  let fn = curryN(func.length, (...args) => {
    const val = makeWandaValue(func(...args));

    if (typeof val === "function") {
      return makeFunction(val);
    }

    return val;
  });
  const hash = objectHash(fn);

  addMetaField(fn, "wanda", true);
  addMetaField(fn, "arity", func.length);
  addMetaField(fn, "name", hash);

  if (contract !== "") {
    Object.defineProperty(fn, "contract", {
      enumerable: false,
      configurable: false,
      writable: false,
      value: parseContract(contract),
    });
  }

  Object.defineProperty(fn, "name", {
    enumerable: false,
    configurable: false,
    writable: false,
    value: hash,
  });

  return fn;
};
Enter fullscreen mode Exit fullscreen mode

Now we can add contracts to functions from the core library, which we'll do in a minute. First, though, I want to handle emitting code for functions.

Since we're desugaring away FunctionDeclaration nodes, we only need to handle the case for LambdaExpression in the emitter.

Add it to the switch statement in the emit method in src/emitter/Emitter.js:

      // other cases
      case ASTTypes.LambdaExpression:
        return this.emitLambdaExpression(node, ns);
      // default case
Enter fullscreen mode Exit fullscreen mode

We're going to emit the parameters into an array of strings so we can easily join them with ", " in the final output.

We need to handle the case of a variadic function by outputting ... before the rest parameter.

For the function body we loop over the body and emit code for each node.

Finally, we wrap the whole thing in rt.makeFunction.

Here's emitLambdaExpression in its entirety:

  emitLambdaExpression(node, ns) {
    const funcNs = ns.extend("Lambda");
    /** @type {string[]} */
    let params = [];
    let i = 0;

    for (let p of node.params) {
      funcNs.set(p.name.name, makeSymbol(p.name.name));

      if (node.variadic && i === node.params.length - 1) {
        params.push(`...${this.emit(p.name, funcNs)}`);
      } else {
        params.push(this.emit(p.name, funcNs));
      }
      i++;
    }

    let code = `rt.makeFunction((${params.join(", ")}) => {\n`;

    let j = 0;
    for (let expr of node.body) {
      if (j === node.body.length - 1) {
        code += `return ${this.emit(expr, funcNs)};`;
      } else {
        code += this.emit(expr, funcNs) + ";\n";
      }
      j++;
    }

    code += "\n})";

    return code;
  }
Enter fullscreen mode Exit fullscreen mode

That's it for changes to the emitter!

Changes to The Printer

We need to add a case for functions to the switch statement in the printString function in src/printer/printString.js:

    // other cases...
    case "function":
      return `Function: ${value.name || "lambda"}`;
    // case "object"...
Enter fullscreen mode Exit fullscreen mode

We need to add cases for both nodes to the print method in src/printer/printAST.js:

      // other cases...
      case ASTTypes.FunctionDeclaration:
        return this.printFunctionDeclaration(node, indent);
      case ASTTypes.LambdaExpression:
        return this.printLambdaExpression(node, indent);
      // default case
Enter fullscreen mode Exit fullscreen mode

And a printFunctionDeclaration method:

  printFunctionDeclaration(node, indent) {
    let prStr = `${prIndent(indent)}FunctionDeclaration:\n`;
    prStr += `${prIndent(indent + 2)}Name: ${node.name.name}\n`;
    prStr += `${prIndent(indent + 2)}Params:\n`;

    for (let param of node.params) {
      prStr += this.print(param.name, indent + 4) + "\n";
    }

    prStr += `${prIndent(indent + 2)}Body:\n`;

    for (let expr of node.body) {
      prStr += this.print(expr, indent + 4) + "\n";
    }

    return prStr;
  }
Enter fullscreen mode Exit fullscreen mode

And a printLambdaExpression method:

  printLambdaExpression(node, indent) {
    let prStr = `${prIndent(indent)}LambdaExpression:\n`;
    prStr += `${prIndent(indent + 2)}Params:\n`;

    for (let param of node.params) {
      prStr += this.print(param.name, indent + 4) + "\n";
    }

    prStr += `${prIndent(indent + 2)}Body:\n`;

    for (let expr of node.body) {
      prStr += this.print(expr, indent + 4) + "\n";
    }

    return prStr;
  }
Enter fullscreen mode Exit fullscreen mode

Changes to The Core Library

You'll see with all these function contracts that there are still a lot of any annotations.

That will start to change as we add new and better types to our type system, but for now everything is pretty basic. We have to use any in a lot of cases because right now there isn't a better way to do it.

Adding generics will go a long way towards improving this, as will union and intersection types.

Also note that I've added 4 higher-order functions that work on lists to the core library. Eventually I'd like them to work on any iterable type, like vectors, but it will take some time to be able to make that work with the type system.

I'm just going to include the whole core library here because as you'll notice I've made some small edits to a few functions here and there.

If you read closely, you'll note that I've used a from method on the Cons class that you've not seen before. Here's the definition for that in src/shared/cons.js:

  static from(iter) {
    return Cons.of(...iter);
  }
Enter fullscreen mode Exit fullscreen mode

Now for the core library in lib/js/core.js:

import equal from "fast-deep-equal/es6/index.js";
import { makeModule } from "../../src/runtime/Module.js";
import { Cons, cons } from "../../src/shared/cons.js";
import { print } from "../../src/printer/print.js";
import { println } from "../../src/printer/println.js";
import { printString } from "../../src/printer/printString.js";
import { isTruthy } from "../../src/runtime/utils.js";
import { Exception } from "../../src/shared/exceptions.js";

export const theModule = makeModule("Core", (rt, ns) => {
  const isList = (obj) => {
    if (!rt.isNil(obj) && !(obj instanceof Cons)) {
      return false;
    } else if (rt.isNil(obj)) {
      return true;
    }

    // only option left is cons
    return isList(obj.cdr);
  };

  return {
    print: rt.makeFunction(print, { contract: "(any -> string)" }),
    println: rt.makeFunction(println, { contract: "(any -> string)" }),
    cons: rt.makeFunction(cons, { contract: "(any, any -> (list any))" }),
    car: rt.makeFunction((pair) => pair.car, {
      contract: "((list any) -> any)",
    }),
    cdr: rt.makeFunction((pair) => pair.cdr, {
      contract: "((list any) -> any)",
    }),
    string: rt.makeFunction(printString, { contract: "(any -> string)" }),
    number: rt.makeFunction(Number, { contract: "(string -> number)" }),
    boolean: rt.makeFunction((val) => isTruthy(val), {
      contract: "(any -> boolean)",
    }),
    keyword: rt.makeFunction((str) => Symbol.for(":" + str), {
      contract: "(string -> keyword)",
    }),
    "+": rt.makeFunction(
      (a, b, ...nums) => nums.reduce((sum, n) => sum + n, a + b),
      { contract: "(number, number, &(vector number) -> number)" }
    ),
    "-": rt.makeFunction(
      (a, b, ...nums) => nums.reduce((diff, n) => diff - n, a - b),
      { contract: "(number, number, &(vector number) -> number)" }
    ),
    "*": rt.makeFunction(
      (a, b, ...nums) => nums.reduce((prod, n) => prod * n, a * b),
      { contract: "(number, number, &(vector number) -> number)" }
    ),
    "/": rt.makeFunction(
      (a, b, ...nums) => nums.reduce((quot, n) => quot / n, a / b),
      { contract: "(number, number, &(vector number) -> number)" }
    ),
    "%": rt.makeFunction(
      (a, b, ...nums) => nums.reduce((quot, n) => quot % n, a % b),
      { contract: "(number, number, &(vector number) -> number)" }
    ),
    "=": rt.makeFunction((a, b) => a === b, {
      contract: "(number, number -> boolean)",
    }),
    ">": rt.makeFunction((a, b) => a > b, {
      contract: "(number, number -> boolean)",
    }),
    ">=": rt.makeFunction((a, b) => a >= b, {
      contract: "(number, number -> boolean)",
    }),
    "<": rt.makeFunction((a, b) => a < b, {
      contract: "(number, number -> boolean)",
    }),
    "<=": rt.makeFunction((a, b) => a <= b, {
      contract: "(number, number -> boolean)",
    }),
    not: rt.makeFunction((x) => !x, { contract: "(any -> boolean)" }),
    list: rt.makeFunction((...args) => Cons.from(args), {
      contract: "(&(vector any) -> (list any))",
    }),
    length: rt.makeFunction(
      (obj) => {
        if (obj instanceof Cons) {
          let i = 0;
          for (let _ of obj) {
            i++;
          }
          return i;
        }

        return obj.length;
      },
      { contract: "((list any) -> number)" }
    ),
    get: rt.makeFunction(
      (n, obj) => {
        const value = obj.get(n);

        if (value === undefined) {
          throw new Exception(`Value for index ${n} not found on object`);
        }

        return value;
      },
      { contract: "((list any) -> any)" }
    ),
    "list?": rt.makeFunction(isList, { contract: "((list any) -> boolean)" }),
    "pair?": rt.makeFunction((obj) => obj instanceof Cons, {
      contract: "((list any) -> boolean)",
    }),
    "number?": rt.makeFunction((obj) => typeof obj === "number", {
      contract: "(any -> boolean)",
    }),
    "string?": rt.makeFunction((obj) => typeof obj === "string", {
      contract: "(any -> boolean)",
    }),
    "boolean?": rt.makeFunction((obj) => typeof obj === "boolean", {
      contract: "(any -> boolean)",
    }),
    "nil?": rt.makeFunction((obj) => obj == null, {
      contract: "(any -> boolean)",
    }),
    "keyword?": rt.makeFunction(
      (obj) => typeof obj === "symbol" && obj.description.startsWith(":"),
      { contract: "(any -> boolean)" }
    ),
    "equal?": rt.makeFunction(
      (a, b) => {
        if (rt.hasDict(a) && rt.hasDict(b)) {
          return equal(rt.getMetaField(a, "dict"), rt.getMetaField(b, "dict"));
        }
        return equal(a, b);
      },
      { contract: "(any, any) -> boolean" }
    ),
    "is?": rt.makeFunction((a, b) => Object.is(a, b), {
      contract: "(any, any -> boolean)",
    }),
    append: rt.makeFunction(
      (obj1, obj2) => {
        if (typeof obj1 === "string" && typeof obj2 === "string") {
          return obj1 + obj2;
        } else if (obj1 instanceof Cons) {
          return Cons.from(obj1).append(obj2);
        } else if (Array.isArray(obj1)) {
          return [...obj1].push(obj2);
        } else {
          rt.failRuntime(`Value of type ${typeof obj1} cannot be appended`);
        }
      },
      { contract: "(any, any -> any)" }
    ),
    with: rt.makeFunction(
      (rec1, rec2) => {
        const newDict = Object.assign(
          {},
          rt.getMetaField(rec1, "dict"),
          rt.getMetaField(rec2, "dict")
        );
        return rt.makeObject(newDict);
      },
      { contract: "(any, any -> any)" }
    ),
    prop: rt.makeFunction((prop, obj) => rt.getField(obj, prop), {
      contract: "(string, any -> any)",
    }),
    each: rt.makeFunction(
      (fn, lst) => {
        for (let item of lst) {
          fn(item);
        }
      },
      {
        contract: "((any -> nil), (list any) -> nil)",
      }
    ),
    map: rt.makeFunction(
      (fn, lst) => {
        let mapped = [];
        for (let item of lst) {
          mapped.push(fn(item));
        }
        return Cons.from(mapped);
      },
      {
        contract: "((any -> any), (list any) -> (list any))",
      }
    ),
    filter: rt.makeFunction(
      (fn, lst) => {
        let filtered = [];
        for (let item of lst) {
          if (rt.isTruthy(fn(item))) {
            filtered.push(item);
          }
        }
        return Cons.from(filtered);
      },
      {
        contract: "((any -> boolean), (list any) -> (list any))",
      }
    ),
    fold: rt.makeFunction(
      (fn, init, lst) => {
        let acc = init;
        for (let item of lst) {
          acc = fn(acc, item);
        }
        return acc;
      },
      {
        contract: "((any, any -> any), any, (list any) -> any)",
      }
    ),
    "fold-r": rt.makeFunction(
      (fn, init, lst) => {
        let acc = init;
        const reversed = [...lst].reverse();
        for (let item of reversed) {
          acc = fn(acc, item);
        }
        return acc;
      },
      {
        contract: "((any, any -> any), any, (list any) -> any)",
      }
    ),
  };
});
Enter fullscreen mode Exit fullscreen mode

Conclusion

That's it for adding functions to the language! Fire up the REPL with the wanda command and take them for a spin.

(def add-3 (a: number, b: number, c: number) -> number
  (+ a b c))

(add-3 1 2 3) ;-> 6
Enter fullscreen mode Exit fullscreen mode

As always you can see the state of the code as of the end of this article on the relevant tag at the Wanda repo.

Next time we're going to do a lot of work with type checking. We'll add tuple, union, and intersection types to Wanda as well as :as expressions that will tell the type checker to treat a value as a specific type for times when it needs a little help.

I'll see you then!

Top comments (0)