Andrew Durber

Posted on Jun 10, 2020 • Edited on Mar 3, 2023

In search of a faster uniqueBy

#javascript #performance

In a project I'm working on I'm trying to keep it as lean as possible, which means I haven't reached for libraries like Lodash. Instead, I've challenged myself to hand-roll everything I need.

I needed to get an array of items unique by a given key, just like the Lodash uniqBy. I had a quick Google around to see how other folks are approaching it.

I came across the following approach:

function uniqueBy(myArr, prop) {
  // Iterate over the array and filter out duplicates
  return myArr.filter((obj, pos, arr) => {
    // Map over the array and get the values from the key.
    return arr.map(mapObj => mapObj[prop]).indexOf(obj[prop]) === pos
  })
}

While this works, I was not too fond of mapping inside the filter. So I set up some tests around my function and started created benchmarks on jsPerf.

With an array of 10,000 items, this had a whopping 0.63 ops/sec. Zoinks.

Note

I just want to clarify, I'm in no way trying to criticise the author of the above function. Most people can use that and not notice any negative performance impact.

Iteration 1

So I thought, what if I moved the map outside of the filter?

function uniqueBy(myArr, prop) {
  // Get all values for the prop up front
  const vals = myArr.map(obj => obj[prop])
  return myArr.filter((obj, pos, arr) => {
    return vals.indexOf(obj[prop]) === pos
  })
}

Result: 3,067 ops/sec
Extracting the map outside of the filter had much better results, relatively speaking.

Iteration 2

Keeping the same vibe, I moved onto Array.prototype.findIndex

function uniqueBy(arr, prop) {
  return arr.filter((record, index, self) => {
    // Instead of getting the values, just get the index where the predicate is true.
    return index === self.findIndex(t => t[prop] === record[prop])
  })
}

Result: 6,962 ops/sec

But this is much of muchness; this will still make multiple passes over the array—time to whip out the old trusty loops without predicates.

Iteration 3

function uniqueBy(arr, prop) {
  const len = arr.length // get the length up front to ensure it's only accessed once

  const data = [] // This will be our return data

  const seen = [] // This is a collection of values we've already seen

  for (let i = 0; i < len; i++) {
    // Get the things I care about here to only access the properties once.
    const item = arr[i] // The current array item

    const val = item[prop] // The current items' value that we want to unique by

    // If there's no record of this in "seen", push it to seen and add it to our return array
    // What's with the tilde? Since indexOf returns a number between -1 and N, the tilde (~) is used to convert that value into a boolean. It's the bitwise NOT operator. Link at the bottom.
    if (!~seen.indexOf(val)) {
      // Mark this value as seen
      seen.push(val)
      // Add the value to the return array
      data.push(item)
    }
  }

  return data
}

Result: 15,196 ops/sec swoon

So we managed to get rid of the predicate callbacks, our tests still pass, and it's faster. Now we're getting somewhere.
It's somewhat less readable than previous iterations, but that's not my goal. We could stop here, but I think we can squeeze some more out of this.

Iteration 4

What if we use a Set? They're pretty nifty right:

function uniqueBy(arr, prop) {
  const len = arr.length
  const data = []
  const seen = new Set() // Create a Set
  for (let i = 0; i < len; i++) {
    const item = arr[i]
    const val = item[prop]
    if (!seen.has(val)) {
      // Check if the set has the value
      seen.add(val)
      data.push(arr[i])
    }
  }

  return data
}

Result: 11,133 ops/sec

Wait a minute! That's slower than the previous one. Wha-, ugh-, but it's nifty! Ah well, on we go.

Iteration 5

After perusing over some benchmarks on loops, I saw that a while loop vastly outperformed a for loop.

function uniqueBy(arr, prop) {
  const len = arr.length
  const record = []
  const seen = []
  let cursor = 0
  while (cursor < len) {
    const item = arr[cursor]
    const val = item[prop]
    if (!~seen.indexOf(val)) {
      seen.push(val)
      record.push(item)
    }
    cursor++
  }
  return record
}

Result:: 15,164 ops/sec

Boom! A while loop made this one is our fastest one yet, but even less readable.

Iteration 6

Hmm, from the loop benchmarks, decrementing is faster than incrementing, how does that look?

function uniqueBy(arr, prop) {
  let len = arr.length
  const record = []
  const seen = []
  while (len--) {
    const item = arr[len]
    const val = item[prop]
    if (!~seen.indexOf(val)) {
      seen.push(val)
      record.push(item)
    }
  }
  return record
}

Result: 15,535 ops/sec

CAVEAT: We've lost the original order of the array.

These are marginal gains here vs the previous iteration.

Iteration 7

If there's one thing I know about JavaScript, it's that property access is fast. seen doesn't need to be an array, what if we just kept a dictionary of seen keys?

function uniqueBy(arr, prop){
  const len = arr.length
  let cursor = 0
  const record = []
  const seen = {}
  while (cursor < len) {
    const item = arr[cursor]
    const val = item[prop]
    if (!seen[val]) {
      seen[val] = 1
      record.push(item)
    }
    cursor++
  }
  return record
}

Result: 24,970 ops/sec

The best one yet!

Iteration 8

Okay after doing some more research on loops, I came across this little number

function uniqueBy(arr, prop){
  const record = []
  const seen = {}
  for (let i = 0, len = arr.length; i < len; ++i) { // Notice the len = arr.length
    const item = arr[i]
    const val = item[prop]
    if (!seen[val]) {
      seen[val] = 1
      record.push(item)
    }
  }
}

Result: 26,390 ops/sec

Hmm, this is the winner (so far). But why? Didn't we find the while loop faster? All that's happening is the len = array.length is just caching the length. We were already doing that?

All I can think is happening is something to do with Locality of Reference. I have no formal Computer Science, and I'm not a particularly smart man. If someone can explain to me why this is faster, please comment 😅

I recreated these tests on ESBench here: ESBench Results if that's more your cup of tea.

Bonus

Here are some other variations I tested with negligible performance gains/losses:

++cursor vs cursor++

function uniqueBy(arr, prop) {
  const len = arr.length
  let cursor = -1
  const record = []
  const seen = []
  while (++cursor < len) {
    const item = arr[cursor]
    const val = item[prop]
    if (!~seen.indexOf(val)) {
      seen.push(val)
      record.push(item)
    }
  }
  return record
}

Reducing variables (💩 )

function uniqueBy(arr, prop) {
  const len = arr.length
  let cursor = -1
  const record = []
  const seen = []
  while (++cursor < len) {
    if (!~seen.indexOf(arr[cursor][prop])) {
      seen.push(arr[cursor][prop])
      record.push(arr[cursor])
    }
  }
  return record
}

Summary

This whole process is mostly a fruitless endeavour. We could have stopped at iteration number 3 and put our feet up; however, I just wanted to see how fast we could make it. I'm glad I carried on since I found the seen object approach.

You do not need to do this in your applications. You should only go this deep down the rabbit hole (and arguably further), if you're experiencing performance issues.

If you have a faster way, please ping me on Twitter @moistmakerr or comment. I'd love to know how fast we can push this.

DEV Community

In search of a faster uniqueBy

Iteration 1

Iteration 2

Iteration 3

Iteration 4

Iteration 5

Iteration 6

Iteration 7

Iteration 8

Bonus

++cursor vs cursor++

Reducing variables (💩 )

Summary

Resources

Top comments (0)

Read next

Mastering Redux Toolkit: Simplify State Management in Your React App

Progressive Web Applications: Your Ultimate Guide to Modern Web Development

Database Deadlocks: Causes, Examples, and Solutions

Downsize your JavaScript: Mastering Bundler Optimizations