Too slow without shooting yourself in the foot

#javascript #with #performance #bestpractices

JavaScript, year after year, becomes a more stable, predictable, and safe tool for building large applications, allowing even junior developers to maintain it without high risks of catastrophic failures. It’s the right choice and strategy for the community as a whole.

Except in one and only one case, from my perspective: when you definitely know what you're doing and why.

Yeah, for example, there were a lot of bugs related to == (non-strict) equality when developers wrote code without hesitation. But you know what? Doing anything without hesitation is a bad idea! It makes me sad to find value === null || value === undefined in production code, while in JS, there's a special equality between null and undefined (Comparing equality methods) and the right choice was to make this code look like value == null. There's even a special exception in the ESLint eqeqeq rule for this...

But sorry, I got a little distracted from the topic of shooting oneself in the foot. While many developers could agree with the double equal operator, almost no one can accept the with statement (documentation). There's a strong reason — you can't predict the scope you are working with. And I have to say, it's not just a gun, but a whole big bomb for developers. It's appropriate that it was deprecated (Removal of the with statement). Except, perhaps, in that one situation where you need to inject something into a scope and want to do it fast.

Let’s imagine a situation where you need to process a series of data with some formulas. For example, you have a list of market prices by date and you need to get the average price of companies a, b, and c. Or maybe filtering instead of calculation — you have movies and you want to find an old good movie with an IMDb rating above 7.5 and produced before the millennium. You could always create hand-written functions like (market) => (market.a + market.b + market.c) / 3, or (movie) => movie.year < 2000 && movie.imdb > 7.5.

But what if you need to create a tool rather than provide a specific result? I've seen a lot of applications where such calculations were hardcoded, and even more applications where there were different solutions to pre-build and add any kind of extensions. All of these solutions require developer’s effort and still do not enable users to write any formula they want. What if you finally want to allow users to write something like (a + b + c) / 3 or year < 2000 and imdb > 7.5?

Obviously, it's a well-known SQL syntax. And we can find it, for example, as JQL in Jira, a really popular tool for a lot of non-developer users like project managers. So, what if we want to give the same ability to users?

Some time ago, I developed a library that transformed text inputs into functions based on an extendable grammar set. Importantly, since the final outcome is a function, I kept support for all JavaScript functionalities within a query, allowing users to incorporate calculations such as Math.min(a, b, c) directly into their queries. And it works great and really fast.

The key feature that enabled me to effectively calculate user input within the context of a data item was the when statement. Once a user submits a formula, the library processes it, converts it into code, and then returns a function. Here’s what the process looks like:

function createFilter(query) {
    const functionBody = `with (item) {
        return ${query}
    }`

    return new Function('item', functionBody)
}

Let’s put aside all the input-to-code transpiling and safety stuff and concentrate only on the scoping.

As a result, when we pass a query equal to year < 2000 && imdb > 7.5, we’ll get the ready filtering function:

const filterFn = createFilter('year < 2000 && imdb > 7.5')

// You'll get this function referenced in filterFn:
function anonymous(item) {
    with (item) {
        return year < 2000 && imdb > 7.5
    }
}

Then you can apply the ready function to a whole list and get results.

const movies = [
    {year: 1966, imdb: 8.8, title: 'The Good'},
    {year: 1966, imdb: 5.6, title: 'The Bad'},
    {year: 2024, imdb: 3.5, title: 'The Ugly'}
]

const result = movies.filter(filterFn)
// You'll get the only one result - 'The Good' movie

Fast and easy.

Except that we used with, which is problematic, as I mentioned earlier. So, can we eliminate the with statement and rewrite the solution in strict mode? Certainly.

The initial plan was simply to parse all the keys used in the query and slap the item. prefix on them. No need to mess with the scope at all! However, this method could mess up the integrated JS code, miss some elements, and make searching for keys overly complicated.

The same goes for the idea of incrementally creating the function, step by step fixing all ReferenceError that should help identify such keys. And then there were a few other non-effective ideas that I tried to tweak the function body.

So, let's return to the scope. There is another way to inject keywords into a scope — just pass them as arguments to a function! This makes the filtering function look something like this:

function anonymous(year, imdb, title) {
    return year < 2000 && imdb > 7.5
}

But… there’s a small issue. A tiny one.

We don’t know each object’s content before we run the filter. Because, let me remind you, we are making a common tool, the ‘third-party library’ for other developers. So, we can’t predict an object's structure. You can say — the developers will know the structure, they are using TypeScript, thoughtful architecture, OpenAPI specs, and JSDoc as a cherry on top. They can predict, you’ll say. But what if not? What if our library would be used to filter heterogeneous content exactly to get homogeneous results? It is always easy to say that we’ll know exactly the parameters of the filtering function and can pass it manually. But what if we can’t? What if we still want to keep the SQL-like syntax for users on a dataset of unknown objects?

The most abstract solution will be to collect each time an object's keys and create a new function with the right scope. Something like:

function createFilter(query) {
    return (item) => {
        const keys = Object.keys(item).sort()
        const filterFunction = new Function(...keys, `return ${query}`)
        return filterFunction(...keys.map(key => item[key]))
    }
}

The greatest fail here is that we need to create a new function for each array item! So expensive!

I made a tiny benchmark, and found that it runs on average 17 times slower! On large sets of data, you will be able to notice it. Adding a cache based on the keys of an object makes it a bit better, now it is only 11 times slower. This is the price of just getting rid of the deprecated with statement.

Maybe we can sometimes allow shooting ourselves in the foot when we really want it.
Some kind of "use sloppy; yes I know what I'm doing" pragma ;)