Max Svidlo

Posted on Dec 16, 2023

Is Javascript Object.groupBy() worth the hype? let's find out

#node #javascript #benchmark

With the upcoming release of NodeJS 21, we're getting an exciting new method that should make our object groupings more straightforward: Object.groupBy().

As the name of the method suggests, this method accepts an array and groups it by a parameter via a callback function. The use of this new method looks like this:



const people =[
  {"first_name":"Dotti","birth_year":1985},
  {"first_name":"Gratia","birth_year":1992},
  {"first_name":"Robert","birth_year":1985},
  {"last_name":"Versey","birth_year":1992}
]

const peopleGroupedByBirthYear = Object.groupBy(people, (person) => person.birth_year)

console.log(peopleGroupedByBirthYear)
/*
{
  "1985":[
    {"first_name":"Dotti","birth_year":1985},
    {"first_name":"Robert","birth_year":1985}
  ],
  "1992":[
    {"first_name":"Gratia","birth_year":1992},
    {"last_name":"Versey","birth_year":1992}
  ]
}
*/

But does this method bring us more value except for syntactic sugar? Does it improve runtime and increase performance? Let's take a deeper look.

Technicalities

To test this new functionality with different sets and data scale sizes, I've prepared datasets with 1k users, 10k users, 100k users, and 1m users. Each user within those datasets has been given a birth year between 1900 and 2000 to make the groups a bit larger.

For the benchmark, I used mostly common loops and array methods, to see which one will shine the brightest.

Additionally, every function underwent multiple individual tests across various datasets on a Docker container equipped with a single-core CPU and 512 MB of RAM. This setup ensured that my machine's performance did not influence the results. The functions were repeatedly tested to calculate a cumulative average runtime. Now, let's explore some charts based on these evaluations.

On a dataset with 1k users, we can see that Object.groupBy() is indeed the faster one, surpassing other implementations for group data (which is pretty neat, to be honest). I like it, but will it outstand larger datasets? Let's find out:

As for 10k users, Object.groupBy() starts really to shine, and it beats other iterations and functions by around 50% on average (really nice!), but is it blazingly fast?

On a dataset with 100k users, we're starting to see that Object.groupBy() is starting to run out of fuel, but it still outshines the other functionalities and is superior to other implementations! But is it going to win the race with a dataset of 1 million users?

Ouch, it seems that Object.groupBy() is starting to fall behind when it comes to larger datasets, with Reduce taking its place. Why is that what you're asking? Well, that's for another article.

Conclusion

Object.groupBy() looks to be a promising new (and almost blazingly fast) functionality for small to medium-sized datasets, which is neat! For the average use case, you will find this new feature to improve your existing code base. As for larger datasets, that's not the case.

Tell me what you think in the comments below :)

https://github.com/svidlak/groupby-benchmark
(1million file is zipped because it exceeds github's filesize policy)

Top comments (13)

Lev N. • Dec 17 '23

I mean, I’ve been using maps in JS for years. groupBy is a convenience.

Max Svidlo • Dec 17 '23

Well, now it's also a performance increase and not just syntactic sugar, which makes it even better

Lev N. • Dec 17 '23

No doubt, but it’s nothing new.

lionel-rowe • Dec 17 '23

I don't think there's any "hype" about Object.groupBy being super performant. It's just a convenience method. If it happens to be slightly more performant for some datasets, that's great, but I wouldn't rely on any performance gains being stable between different engines, different versions of the same engine, different hardware, or even necessarily different runs on the exact same setup.

The key advantage is that it gives better ergonomics. Wanting to split an array into groups is a pretty common use case that didn't have an ergonomic way of doing it before now. For example, if you had a list of users that you wanted to group into admin and non-admin users, you either had to initialize variables for both then iterate with a for loop (a bit messy and verbose, especially with the array initialization in TypeScript), or you had to iterate through the array twice with Array#filter (wasteful as you only really wanted 1 iteration). Now it's simple:

Object.groupBy(users, (user) => user.role === 'admin' ? 'admin' : 'non-admin')

It's also pretty versatile — e.g. you can use it to segment an array into groups of a given size:

const GROUP_SIZE = 42
const arr = new Array(100)

Object.values(Object.groupBy(arr, (_, i) => Math.floor(i / GROUP_SIZE)))
// [Array(42), Array(42), Array(16)]

CitronBrick • Dec 17 '23

Actually, I needed this method this week & came upon this article, & discovered groupBy.

Mike Stemle • Dec 16 '23

Showing performance graphs without posting the code makes them hard to understand.

Max Svidlo • Dec 16 '23

I got you covered @manchicken
github.com/svidlak/groupby-benchmark

Mike Stemle • Dec 16 '23

Thanks! It’s always good to post the code with these things. Well-done.

Max Svidlo • Dec 16 '23

My first article ever written so mistakes are unavoidable

Mike Stemle • Dec 16 '23

No doubt. My advice: learn to love your mistakes.

Eckehard • Dec 17 '23

We should also ask, if it´s worth to blow up the language core to infinitiy, just to add some convenience functions (see also this post). I would love this feature (if I ever used it), if it was part of a library. But too many features may also be confusing.

Max Svidlo • Dec 17 '23

Well, every language has to go thru some kind of an evolution sometimes in order to improve itself, whenever its via internal optimization or via introduction of new API's
(the same as ES6 introduced us some functionalities to replace lodash / rambda dependencies, fatArrow functions),

Its important for JavaScript to keep on evolving and improving as the whole client-side ecosystem of JavaScript is evolving nonstop, so the language has to support those new needs.