DEV Community

Cover image for Recursive Data Structures and Lazy Evaluation
Roman Liutikov
Roman Liutikov

Posted on • Originally published at romanliutikov.com

Recursive Data Structures and Lazy Evaluation

In this article I’m trying to explain to myself recursive data structures and how they work together with lazy evaluation. I happen to use them everyday, but never really thought about underlying implementation and the idea in general.

First we define a problem and describe recursive solution to it. After that we are going to implement a simple data structure, recursively. And once we get the idea, I’ll cover lazy evaluation and present lazy variant of that data structure.

The Problem

A typical problem in game development is collision detection. This technique is used to detect intersection between entities in coordinate space, such as enemies and bullets in shooter games for example.

Let’s imagine we are building 2D space shooter game. The game involves interaction between enemy entities and player’s missiles. In order to say if enemy entity is destroyed the game should be able to detect when the missile intersects an entity in two-dimensional coordinate space. In the context of our game collision detection algorithm could be described as the following:

  1. Compose a list of all missile entities and a list of all enemy entities currently on the screen
  2. Take the first/next missile entity from the list
  3. Check it’s position against all enemy entities
    • if position matches — collision is detected, remove both entities from the list and repeat from step 2
    • otherwise — repeat from step 2 until the end of the list

As you can see the algorithm is straightforward and can be easily implemented via iteration.



for (let i = 0; i < missiles.length; i++) {
  const missile = missiles[i];
  for (let j = 0; j < enemies.length; j++) {
    const enemy = enemies[j];
    if (missile.intersects(enemy)) {
      // collision is detected
    }
  }
}


Enter fullscreen mode Exit fullscreen mode

The problem however is that this naive approach takes O(n*(n-1)) or simply O(n^2) time to execute, meaning that every time the number of entities doubles the number of iterations required to detect collision between all entities is going to be four times larger. That is for 3 entities it takes roughly 9 iterations, 900 iterations for 30 entities, 90000 for 300 entities and so on. It’s totally fine to use this approach if a single check is relatively cheap operation and there are not much entities on a screen, but sometimes that’s not the case.

Solution

An obvious improvement would be to perform collision check for a given entity only against entities that are close to it, discarding all distant entities. Screen area can be divided into separate areas to form a grid and now for every entity we can perform collision check only against other entities currently in the same area. This technique is called space partitioning.

More sophisticated algorithm would be to use an adaptive grid, where cells can subdivide itself based on how much entities there are currently in a given cell.

Recursive algorithm

One of the algorithms implementing adaptive grid approach is called quadtree space partitioning. A quadtree is a data structure where every node has exactly four child nodes, hence the name. A quadtree is recursive because it is defined in terms of itself (every node, except leaf nodes, has another quadtree as its subtree).

Game canvas can be divided between those nodes where each node would represent an area on the screen defined by its bounds. A node may hold entities that are currently in the area which corresponds to that node. Quadtree nodes should be set a maximum number of entities they can hold (maximum number of entities in a given area). Once capacity of the node is full, the node subdivides itself into four child nodes and spreads its entities across child nodes with respect to their position on the screen. Subdivision can be infinitely deep, but usually the maximum number of levels is set.

The benefit of this approach is that in the best case, when all entities are spread evenly across the screen, the time required to perform all collision checks is reduced by 4x with every level of subdivision. That is, for 100 entities we get 2500 checks instead of 10000 with one level of subdivision, with two levels it is 625 checks and so on. This can reduce CPU load when applied correctly, but don’t forget that maintaining a quadtree also consumes CPU cycles and memory.



What we’ve just learned is that game development is fun and algorithms are not that hard, but most importantly in this section we’ve seen how recursive data structures could be a natural fit as a solution to a certain type of problems. Now let’s see the code.

Recursive data structures

In this section we are going to build List data structure, singly linked list to be precise. A list is fundamental data structure in functional programming known from early days of Lisp programming language. Take a look at type declaration of List data type in Haskell



data List a = Nil | Cons a (List a)


Enter fullscreen mode Exit fullscreen mode

You can see that a list is either Nil (empty list) or Cons (construction) of a value and another list. Having another list as the next value in a list is what makes this data structure recursive or defined in terms of itself. The last value in a list is always Nil, it is used to signal that the end of list is reached. Nil is also treated as empty list, which means that this type inherits properties of a list. To get better understanding here’s how a list of three items could be written in pseudo code:



[1 [2 [3 nil]]]


Enter fullscreen mode Exit fullscreen mode

A tuple of two items here is called a cons (construction) cell. A cons cell consists of a value at the first position (head) and a pointer to another value (tail). Thus a list of three items is essentially a chain of three cons cells with Nil in the last cell to indicate the end of list. This can be also visualized as the following:

A list built on cons cells has interesting properties such as access to the head of list and its tail in constant time. On the other hand lists doesn’t provide efficient random access such as in indexed collections (arrays).

Now that we know underlying representation of a list, lets create our own implementation. First we define data types — building blocks of a list. They are: List, Cons and Nil. List type is not really necessary to build a list, because a list is just a chain of cons cells with Nil in the end. It is rather a useful abstraction which could provide us with an API for creating and manipulating lists.

First we implement type Nil as JavaScript class and create an instance of the class, a variable named nil, to be used as a marker of the end of a list.



class Nil {
  toString() {
    return 'Nil';
  }
}

const nil = new Nil();


Enter fullscreen mode Exit fullscreen mode

The next type is Cons. It is a class that accepts head value and tail — a reference to the next value. We also create a helper static method .of to avoid using new keyword.



class Cons {
  constructor(head, tail = nil) {
    this.head = head;
    this.tail = tail;
  }
  toString() {
    return `Cons(${this.head}, ${this.tail})`;
  }
}

Cons.of = (head, tail) => new Cons(head, tail);


Enter fullscreen mode Exit fullscreen mode

Using just cons cells we can already create lists of values and access those values:



const list = Cons.of(1, Cons.of(2, Cons.of(3, nil)));

console.log(list.head);
console.log(list.tail.toString());
console.log(list.tail.head);
console.log(list.tail.tail.tail.toString());


Enter fullscreen mode Exit fullscreen mode

As I mentioned before cons cells are enough for creating lists, but list abstraction built on top of Cons could save us a little time and code by providing an API to create and manipulate lists. So lets built List type on top of Cons:



class List extends Cons {
  constructor(head, tail) {
    super(head, tail);
  }
  toString() {
    return `List(${this.first()}, ${this.rest()})`;
  }
  first() {
    return this.head;
  }
  rest() {
    return this.tail;
  }
}

List.of = (head, tail) => new List(head, tail);

List.fromValues = (head, ...tail) => {
  if (tail.length > 0) {
    return List.of(head, List.fromValues(...tail));
  } else {
    return List.of(head, nil);
  }
};


Enter fullscreen mode Exit fullscreen mode

We also implement .fromValues helper for creating lists from arbitrary number of arguments. Note how it builds a list recursively.



const list = List.fromValues(1, 2, 3);

console.log(list.toString());


Enter fullscreen mode Exit fullscreen mode

Another useful operation would be generating a range of values. Add a new static method to List class definition:



List.range = (start, end) => {
  if (start < end) {
    return List.of(start, List.range(start + 1, end));
  } else {
    return nil;
  }
};


Enter fullscreen mode Exit fullscreen mode

take method is used to take an arbitrary number of values from a list.



take(n) {
  if (n > 0) {
    return List.of(this.first(), this.rest().take(n - 1));
  } else {
    return List.of(this.first(), nil);
  }
}


Enter fullscreen mode Exit fullscreen mode

Quick test with range and take



const list = List.range(0, 10);

console.log(list.take(3).toString());


Enter fullscreen mode Exit fullscreen mode

So far we’ve implemented only retrieval and list creation operations. Lets also create operations for adding values to list and mapping over its values: add and map methods.

add operation is super cheap, because list supports efficient addition to the head, just create a new list with value in head position and target list in tail position, it is as simple as that:



add(value) {
  return List.of(value, this);
}


Enter fullscreen mode Exit fullscreen mode

map operation also returns a new list where a value in head position is applied recursively to mapper function:



map(fn) {    
  const first = this.first();
  const rest = this.rest();

  if (rest === nil) {
    return List.of(fn(first), nil);
  } else {
    return List.of(fn(first), rest.map(fn));
  }
}


Enter fullscreen mode Exit fullscreen mode


const list = List.range(1, 4).add(0).map(n => n * n);

console.log(list.toString());


Enter fullscreen mode Exit fullscreen mode

Operations such as filter and reduce could be implemented similarly to map.

In this section we’ve learnt about List data structure and its recursive implementation. Similar reasoning could be applied to build quadtree mentioned earlier and any other data structure as well. Later here we will create lazy version of List.

Lazy and eager evaluation

Before we dive into implementing lazy variant of List lets quickly remind ourselves what is lazy evaluation.

There are two evaluation strategies: eager and lazy. In short eager means now or to execute immediately and lazy — later or to delay execution until result is needed. Lazy evaluation is essentially an optimization technique. A real world analogy would be lazy loading images on a web page. When you open a website with a bunch of images the browser will start downloading them all immediately. On the other hand we could save bandwidth by loading only those images that are currently visible in browser window. Non of the images below the fold should load until the page scrolled to a position where they become visible. The same applies to code: a set of operations are executed on demand, when the result is actually needed.

Eager evaluation is more straightforward in a sense that it’s easier to reason about. It’s a standard evaluation strategy in languages like JavaScript and Ruby. Consider the following example (evaluation process is described in comments):



range(0, 10) // [0..9]
  .map(n => n + 1) // [1..10]
  .filter(n => n % 2 === 0) // [2 4 6 8 10]
  .take(3) // [2 4 6]


Enter fullscreen mode Exit fullscreen mode

You can see that the code is executed in-place and result returned immediately.

Lazy evaluation strategy is different.



range(0, 10) // nothing happened
  .map(n => n + 1) // nothing happened
  .filter(n => n % 2 === 0) // nothing happened
  .take(3) // nothing happened
  .run() // [2 4 6]


Enter fullscreen mode Exit fullscreen mode

Here none of the operations in the chain are executed until the call to .run method. nothing happened is of course not entirely true. When called each operation is being saved for later execution and when the result is requested the whole chain is executed and the final value is returned. Languages like Haskell and Clojure relies heavily on delayed execution. In Haskell for example everything is stored in thunks. You can think of a thunk as a container for arbitrary code that prevents its evaluation. It could be a closure that contains an expression. Eager version of 1 + 1 can be represented as lazy () => 1 + 1 in JavaScript. As an example let’s create lazy functions to add and subtract numbers:



const add = (a, b) => {
  return () => {
    a = typeof a === "function" ? a() : a;
    b = typeof b === "function" ? b() : b;
    return a + b;
  };
};

const sub = (a, b) => {
  return () => {
    a = typeof a === "function" ? a() : a;
    b = typeof b === "function" ? b() : b;
    return a - b;
  };
};


Enter fullscreen mode Exit fullscreen mode

Both add and sub returns a function that is closed over the arguments, thus the actual operation is not executed yet, until we force it by calling returned thunk.



const thunk = fn => {
  fn.toString = () => '[Thunk]';
  return fn;
};



const add = (a, b) => {
  return thunk(() => {
    a = typeof a === "function" ? a() : a;
    b = typeof b === "function" ? b() : b;
    return a + b;
  });
};

const sub = (a, b) => {
  return thunk(() => {
    a = typeof a === "function" ? a() : a;
    b = typeof b === "function" ? b() : b;
    return a - b;
  });
};



const result1 = sub(20, add(10, 5));
const result2 = add(3, sub(5, 7));
const result3 = add(result1, result2);

console.log(result1);
console.log(result2);
console.log(result3);

console.log(result3());


Enter fullscreen mode Exit fullscreen mode

Lazy evaluation always comes with memoization which is yet another optimization. Once a thunk was evaluated it can cache the result in closure and return cached value on subsequent calls instead of evaluating the whole thing again and again. Here’s modified version of previous example:



const memoize = (fn) => {
  let cache;
  return () => {
    if (cache) {
      console.log('Return cached');
      return cache;
    } else {
      console.log('Evaluate');
      cache = fn();
      return cache;
    }
  };
};



const add = (a, b) => {
  return memoize(() => {
    a = typeof a === "function" ? a() : a;
    b = typeof b === "function" ? b() : b;
    return a + b;
  });
};

const sub = (a, b) => {
  return memoize(() => {
    a = typeof a === "function" ? a() : a;
    b = typeof b === "function" ? b() : b;
    return a - b;
  });
};



const result1 = sub(20, add(10, 5));
const result2 = add(3, sub(5, 7));
const result3 = add(result1, result2);

console.log(result3());
console.log(result3());


Enter fullscreen mode Exit fullscreen mode

Lazy and recursive data structures

Below is the initial implementation of LazyList abstraction which we are going to build on top of Cons type.



class LazyList {
  constructor(fn) {
    this._fn = fn;
  }
  toString() {
    return `LazyList(${this.next()})`;
  }
  next() {
    return this._fn();
  }
  first() {
    return this.next().head;
  }
  rest() {
    return this.next().tail;
  }
}

LazyList.of = fn => new LazyList(fn);


Enter fullscreen mode Exit fullscreen mode

fn here is a thunk which should return a cons cell and next method evaluates the thunk.

See how .fromValues helper builds a list lazily and recursively. It returns an instance of LazyList which contains a thunk of Cons with head as value from arguments and tail as another lazy list.



LazyList.fromValues = (head, ...tail) => {
  return LazyList.of(() => {
    if (tail.length > 0) {
      return Cons.of(head, LazyList.fromValues(...tail));
    } else {
      return Cons.of(head, nil);
    }
  });
};


Enter fullscreen mode Exit fullscreen mode


const list = LazyList.fromValues(1, 2, 3);

console.log(list.toString());


Enter fullscreen mode Exit fullscreen mode

take also returns a lazy list. But in order to take actual values it evaluates thunks one by one, taking from the tail of the target list, and constructs a new one.



take(n) {
  if (n > 0) {
    return LazyList.of(() => {
      const head = this.first();
      const tail = this.rest();

      if (tail === nil) {
        return Cons.of(head, nil);
      } else {
        return Cons.of(head, tail.take(n - 1));
      }
    });
  }
}


Enter fullscreen mode Exit fullscreen mode

range returns a range of values, except that now with laziness it is possible to create infinite lists.



LazyList.range = (start = 0, end = Infinity) => {
  if (start < end) {
    return LazyList.of(() => Cons.of(start, LazyList.range(start + 1, end)));
  }
};


Enter fullscreen mode Exit fullscreen mode


const list = LazyList.range();

console.log(list.take(2).toString());
console.log(list.take(5).toString());


Enter fullscreen mode Exit fullscreen mode

Adding a value to a lazy list is very similar to List implementation



add(value) {
  return LazyList.of(() => Cons.of(value, this));
}


Enter fullscreen mode Exit fullscreen mode

map operation is also similar to its eager variant



map(fn) {    
  return LazyList.of(() => {
    const first = this.first();
    const rest = this.rest();

    return Cons.of(fn(first), rest.map(fn));
  });
}


Enter fullscreen mode Exit fullscreen mode

Now we can create an infinite list of values with a set of operations defined on it.



const list = LazyList.range().map(n => n * n);

console.log(list.take(2).toString());
console.log(list.take(5).toString());


Enter fullscreen mode Exit fullscreen mode

So far we had only one method for evaluating entire list: .toString. In order to be able to consume a list using common JavaScript facilities it should be converted to something meaningful, for example array. .toArray method does exactly this.



toArray() {
  const first = this.first();
  const rest = this.rest();

  if (rest === nil) {
    return [first];
  } else {
    return [first, ...rest.toArray()];
  }
}


Enter fullscreen mode Exit fullscreen mode


list.take(5).toArray()


Enter fullscreen mode Exit fullscreen mode

Let’s add memoization now. Every time someone calls .next method to retrieve a value from a list we either evaluate a thunk and memoize the result or just return the result if it was already cached. Here’s what modifications need to be done to LazyList class:



constructor(fn) {
  this._fn = fn;
  this._next = null; // cache
}

next() {
  // if there's a thunk
  if (typeof this._fn === 'function') {
    this._next = this._fn(); // evaluate it and cache the result
    this._fn = null; // we don't need thunk anymore
    return this._next; // return cached  value
  } else {
    // other just return cached value
    return this._next;
  }
}


Enter fullscreen mode Exit fullscreen mode

See this example of pulling user records lazily from database. getUsers function produces a list of user records lazily by pulling them out of database on demand, when the thunk is evaluated.



const getUsers = (db, [id, ...ids]) => {
  return LazyList.of(() => {
    if (ids.length > 0) {
      return Cons.of(db.getUserByID(id), getUsers(db, ids));
    } else {
      return Cons.of(db.getUserByID(id), nil);
    }
  });
};

// usage
const ids = db.getUsersIDs();
const users = getUsers(db, ids); // nothing happened
const firstThreeUsers = users.take(3).map(({ id }) => id); // nothing happened

firstThreeUsers.toArray(); // [1, 2, 3]


Enter fullscreen mode Exit fullscreen mode

Let’s test this. If you run the code below you will see that the first run takes ~1s but the next one takes less than 1ms, even though we execute different operations. This happens because users list was evaluated on the first run and result was memoized for subsequent calls.



// benchmark helper
const time = (fn) => {
  const start = performance.now();
  const result = fn();
  const delta = performance.now() - start;
  console.log(Math.round(delta * 100) / 100);
  return result;
};

// db mock
const db = {
  getUsersIDs() {
    return [1, 2, 3, 4, 5, 6];
  },
  getUserByID(id) {
    for (let i = 0; i < 1e8; i++) {}
    return { id };
  }
};

const getUsers = (db, [id, ...ids]) => {
  return LazyList.of(() => {
    if (ids.length > 0) {
      return Cons.of(db.getUserByID(id), getUsers(db, ids));
    } else {
      return Cons.of(db.getUserByID(id), nil);
    }
  });
};

const users = getUsers(db, [1, 2, 3, 4, 5, 6]);

time(() =>
   users
     .take(6)
     .toArray());

time(() =>
  users
    .map(id => 'user id ' + id)
    .take(3)
    .toArray());


Enter fullscreen mode Exit fullscreen mode

Thats pretty much it 🎬

If you are interested in learning functional data structures, I highly recommend to read Okasaki’s book “Purely Functional Data Structures”.

Top comments (1)

Collapse
 
alephnaught2tog profile image
Max Cerrina

The List of [1 [2 [3 nil]]] reminds me super strongly of the Von Neumann format for defining ordinal numbers.