Introduction
We founded MURAL in 2011 with one goal in mind: create a feature-rich digital whiteboard, called a “mural,” for real-time online collaboration. As the company grew, our engineering team focused increasingly on streamlining performance. Then the COVID-19 pandemic hit, forcing companies around the world to rapidly adapt to working remotely. MURAL’s popularity surged, and with it came an influx of new and concurrent users. Increasingly large teams were suddenly coworking in MURAL to create ever more complex and varied content, which was slowing our app down. Performance was now our top priority, and we needed to address it as soon as possible.
The Problem
Initially, we developed MURAL to leverage the DOM; as the base programming interface underlying the HTML, it’s familiar to web developers, making it a practical starting point for our application. However, the DOM was not designed to support dynamically changing interfaces comprising a massive number of graphical components — and MURAL users were creating new elements by the thousands. Each sticky note, image, and text box added to a mural further slowed the app’s performance, eventually leading to a bogged down and frustrating experience.
A collaborative design thinking tool that freezes or lags is a collaborative design thinking tool that won’t be used; we needed to update our approach. To address the limitations of the DOM, we decided to go around it and migrate our application to HTML5 canvas. There was just one problem: we’d designed our entire codebase around the DOM.
A Custom Solution for a Custom Problem
Overnight, MURAL had become massively multiplayer, with concurrency and latency issues. In other words, large teams were working together in MURAL and slowing each other down. We needed to enable many users to collaborate simultaneously within a single mural without affecting the app’s performance, a challenge well traversed by the video game industry. Who better to solve game developer problems than a game developer?
Why performance in game development is important
Performance is often highly prioritized in game development; because games are a time-based medium, developers need to make sure their code doesn't slow down the frame rate.
Enter Fede
Fede, a former game developer, was familiar with building products using existing engines but had never built an engine for an existing product. Excited by the challenge, he began researching and planning his approach, starting with “low-level JavaScript.”
What is an engine?
The definition of an engine can get a bit nebulous. For both the purposes of this piece and game development, an engine can be thought of as a set of code libraries that handles low-level code. This code includes rendering, object management, audio, and user input.
Low-level programming isn’t generally associated with web development. When it is, it often means WebAssembly (WASM) is involved. However, Fede ruled out WASM at the time because of its more limited browser support in comparison to JavaScript. So what did he mean by low-level?
Reminiscent of programming requirements in the financial tech sector, low-level JavaScript describes code in which performance informs every choice. For example, Fede pruned out unnecessary prototypal methods like map
and reduce
, using for
loops instead, and structured the code to avoid using first-class functions while still keeping it readable and maintainable. While taking such an ascetic approach isn’t always necessary in general web development, it’s crucial in the case of a massively concurrently used app like MURAL.
What is "C-like" JavaScript?
JavaScript automatically handles garbage collection. Therefore, web developers typically aren't concerned about memory allocation and the speed of method calls. However, when speed is the highest priority, techniques borrowed from low-level languages can powerfully boost performance. Some of these techniques include removing prototypal methods, avoiding passing functions as arguments, and ruthlessly managing memory (e.g., mutating an array in place rather than creating a new one).Here's an example. Compare the following two programs, which both identify the first character of every animal type that is exactly three letters long:
// Common JavaScript approach
const animals = ['fish', 'dog', 'cat', 'giraffe'];
const firstCharOfThreeLetterNames = animals
.filter(animal => animal.length === 3)
.map(animal => animal[0]);
// C-like JavaScript approach
const animals = ['fish', 'dog', 'cat', 'giraffe'];
const threeLetterAnimalTypes = [];
for (let i = 0; i < animals.length; i++) {
if (names[i].length === 3) {
threeLetterAnimalTypes.push(names[i][0]);
}
}
Benchmarking these programs against each other shows the second solution runs roughly 50% faster.
The Engines
Our existing solution started as an experiment to transform user input into graphics and render them to the DOM. While this DOM-based approach served us well until now, it was eventually rendered obsolete by new needs. The existing code was also a monolith designed around painting to the DOM, and the complexity of the existing logic to translate MURAL concepts into renderable content made updating and testing the app a fiddly affair. So, in order to migrate MURAL to canvas in a maintainable way, Fede decided to start anew with a custom system consisting of two engines: one to handle rendering graphical primitives like triangles and circles, and one to translate MURAL objects, like sticky notes and gifs, into those primitives.
The Rendering Engine
The rendering engine has one responsibility: send visuals to a target. Unlike the other engine, this one has no handlers for MURAL-specific concepts.
The rendering engine expects the graphical primitives output by the MURAL engine and renders them to the specified “surface” — anywhere murals are visible to users.
Potential surfaces include:
- Canvas: used in the MURAL application
- PDF or SVG: used when exporting to these file formats
- WebGL: not yet implemented, but presents interesting potential three-dimensional applications
The MURAL Engine
The MURAL engine translates user-facing concepts into graphical primitives, by reducing them down to their most basic characteristics, represented by ASTs.
For example, the aforementioned sticky note could be broken down into a colored rectangle, a border, and a text box.
What is an AST?
An Abstract Syntax Tree, or AST, is a structure that describes how data relates semantically to itself, or in other words, where each datum fits in a hierarchy with the rest. For example, an AST of a sentence might be broken down into verbs, adjectives, and nouns. Read more.
Originally, the dual responsibilities of translating MURAL logic into graphical primitives and sending those primitives to render targets were handled by a single engine. So why did we now divide them into two?
Separating the concerns into two distinct engines enabled us to maintain, update, and build additional support infrastructure around each engine with less overhead going forward. This new design enabled Fede to build a test automation system, a key part of the new approach, alongside the rendering engine to ensure it worked as expected and that future changes didn’t slow down performance.
Challenges
Of course, being the best solution for the job doesn’t mean canvas was simple to integrate. In order to encourage iteration and experimentation, A/B test, and because such a significant change could result in huge merge conflicts, new developments were incrementally merged to production, meaning both the old and new approaches had to work side by side until work was completed.
Additional complications with our use case included canvas' complete lack of built-in accessibility support, which we as a company have committed to addressing; and its suboptimal support for rendering text at the subpixel level across devices and resolutions. Text handling also happens to be under-described in the CSS specification, meaning, for example, that each browser is free to decide when and how line breaks occur. While we could have built workarounds, they would have tacked on significant overhead for testing and maintenance, especially in today’s evergreen browser landscape, and our users create murals on a cornucopia of browsers.
Fede chose to track his development progress with Chrome, because it represents 80% of our user base and leverages the GPU to accelerate canvas renders. To ensure all users had a smooth user experience, he also regularly tested and built across other browsers and devices. These manual tests included devices running Linux, Windows, macOS, Android, and iOS at a variety of hardware specifications, as our latency problem was a hardware issue, not a network one.
By doing so, he discovered and addressed additional browser-specific restrictions, such as Safari’s variable hardware limitation for renderable assets — Fede addressed this by offloading some of the rendering responsibilities to an off-screen canvas whenever the full mural is visible, which slows down scrolling at that zoom level in exchange for enabling the application to continue running smoothly.
Outcomes
After migrating from a monolithic DOM-focused architecture to our custom canvas rendering engine, we improved performance (as measured by the percentage of frames at 30fps or higher) by 80%. When Fede ran into a character painting issue in Chrome during the course of development, he contributed to its resolution by submitting a detailed report with solid reproduction cases.
Because we separated the rendering process from the application-specific logic, they were no longer intertwined. The codebase became more maintainable and testable with automated end-to-end and visual tests, and by creating a system that can render to multiple targets, we opened up new opportunities for both internal and external developers to leverage MURAL features in a variety of exciting ways.
Top comments (1)
Excellent stuff!