Memoization is a useful concept. It helps avoid time taking or expensive calculations, after it's been done once. Applying memoization to a synchronous function is relatively straightforward. This article aims to introduce the overall concept behind memoization and then dive into the problems and their solutions while trying to memoize async functions.
Memoization
Let's start with memoizing a pure function. Let's say we have a function called getSquare
, which returns square of the given:
function getSquare(x){
return x * x
}
To memoize this we can do something like this:
const memo = {}
function getSquare(x){
if(memo.hasOwnProperty(x)) {
return memo[x]
}
memo[x] = x * x
return memo[x]
}
So, with few lines of code we've memoized our getSquare
function.
Let's create a memoize
helper. It would accept a pure function as the first argument and agetKey
function (which returns a unique key given argument of the function) as the second argument, to return a memoized version of the function:
function memoize(fn, getKey){
const memo = {}
return function memoized(...args){
const key = getKey(...args)
if(memo.hasOwnProperty(key)) return memo[key]
memo[key] = fn.apply(this, args)
return memo[key]
}
}
We can apply this function to getSquare
as follows:
const memoGetSquare = memoize(getSquare, num => num)
Memoizing a function accepting multiple arguments:
const getDivision = (a, b) => a/b
// memoizing using the helper
const memoGetDivision= memoize(getDivision, (a, b) => `${a}_${b}`)
Memoizing async functions
Let's say there' a function called expensiveOperation(key)
which accepts a key as an argument and performs some async operation before returning the final result via a callback:
// does some async operation and invokes the callback with final result
expensiveOperation(key, ( data) => {
// Do something
})
Let's use similar notion as above to memoize this function:
const memo = {}
function memoExpensiveOperation(key, callback){
if(memo.hasOwnProperty(key)){
callback(memo[key])
return
}
expensiveOperation(key, data => {
memo[key] = data
callback(data)
})
}
So that was pretty easy. But wait! It doesn't solve the whole problem yet. Consider the following scenario:
1- Invoked expensiveOperation
with key 'a'
2- While #1 is still in progress, invoked it again with same key
The function would run twice for the same operation, because #1 is yet to save the final data in memo
. That's not something we wanted. We would instead want concurrent calls to be resolved at once, after the earliest call is complete. Let's see how we can accomplish this:
const memo = {}, progressQueues = {}
function memoExpensiveOperation(key, callback){
if(memo.hasOwnProperty(key)){
callback(memo[key])
return
}
if(!progressQueues.hasOwnProperty(key)){
// processing new key, create an entry for it in progressQueues
progressQueues[key] = [callback]
} else {
// processing a key that's already being processed, enqueue it's callback and exit.
progressQueues[key].push(callback);
return
}
expensiveOperation(key, (data) => {
// memoize result
memo[key] = data
// process all the enqueued items after it's done
for(let callback of progressQueues[key]) {
callback(data)
}
// clean up progressQueues
delete progressQueue[key]
})
}
We can go a step further, just like the last section, and create a re-usable helper say memoizeAsync
:
function memoizeAsync(fn, getKey){
const memo = {}, progressQueues = {}
return function memoized(...allArgs){
const callback = allArgs[allArgs.length-1]
const args = allArgs.slice(0, -1)
const key = getKey(...args)
if(memo.hasOwnProperty(key)){
callback(key)
return
}
if( !progressQueues.hasOwnProperty(key) ){
// processing new key, create an entry for it in progressQueues
progressQueues[key] = [callback]
} else {
// processing a key that's already being processed, enqueue it's callback and exit.
progressQueues[key].push(callback);
return
}
fn.call(this, ...args , (data) => {
// memoize result
memo[key] = data
// process all the enqueued items after it's done
for(let callback of progressQueues[key]) {
callback(data)
}
// clean up progressQueues
delete progressQueue[key]
})
}
}
// USAGE
const memoExpensiveOperation = memoizeAsync(expensiveOperation, key => key)
Promises
Let's say we have a function processData(key)
which accepts a key as argument and returns a Promise. Let's see how it can be memoized.
Memoizing the underlying promise:
Simplest way would be to memoize the promise issued against the key. Here's how it would look like:
const memo = {}
function memoProcessData(key){
if(memo.hasOwnProperty(key)) {
return memo[key]
}
memo[key] = processData(key) // memoize the promise for key
return memo[key]
}
The code is fairly simple and self explanatory here. In fact, we can use the memoize
helper we created a while ago:
const memoProcessData = memoize(processData, key => key)
Can we memoize the value returned by Promise?
Yes. We can apply the same approach as the callback here. Though it might be an overkill for the sake of memoizing such a function:
const memo = {}, progressQueues = {}
function memoProcessData(key){
return new Promise((resolve, reject) => {
// if the operation has already been done before, simply resolve with that data and exit
if(memo.hasOwnProperty(key)){
resolve(memo[key])
return;
}
if( !progressQueues.hasOwnProperty(key) ){
// called for a new key, create an entry for it in progressQueues
progressQueues[key] = [[resolve, reject]]
} else {
// called for a key that's still being processed, enqueue it's handlers and exit.
progressQueues[key].push([resolve, reject]);
return;
}
processData(key)
.then(data => {
memo[key] = data; // memoize the returned data
// process all the enqueued entries after successful operation
for(let [resolver, ] of progressQueues[key])
resolver(data)
})
.catch(error => {
// process all the enqueued entries after failed operation
for(let [, rejector] of progressQueues[key])
rejector(error);
})
.finally(() => {
// clean up progressQueues
delete progressQueues[key]
})
})
}
Further improvement
Since we're using an memo
object to keep track of memoized operations, with too many calls to expensiveOperation
with various keys(and each operation returning a sizeable chunk of data post processing) the size of this object may grow beyond what's ideal. To handle this scenario we can use a cache eviction policy such as LRU (Least Recently Used). It would ensure we're memoizing without crossing memory limits!
This article has been originally published at StackFull.dev. If you enjoyed reading this, you may want to opt for my newsletter. It would let me reach out to you whenever I publish a new thought!
Top comments (9)
Good stuff! I like that you pointed out the posible memory problems and LRU
I'd recommend using a Map or a WeakMap as memo, instead of object. Also, I'd recommend using some sort of lock to avoid executing the underlying "expensive" function while another instance is running for the same parameters.
Yes. WeakMap would be a good choice if you are using objects as keys, it would facilitate better garbage collection in this case. If your keys are strings, using object should be fine.
Why do the keyed value need to be handled by a promise? Isn’t the keyed value already calculated? An async function can both return a value or a promise.
I guess you're re-iterating the "Memoizing the underlying promise" section. Yes, for promises memoizing the actual value itself is bit of an overkill, but this approach goes beyond just promises, as explained in the article.
Also re-ordered the sections to convey the idea a bit better.
Thanks, the new order was much clearer and the last example was really interesting and enlightning. Promises are really a powerful tool then you can wrap your head arounds them.
Instead of keeping a progress queue, you could just store the Promise itself in the cache.
Yes. That's already covered in "Memoizing the underlying promise:" section.
We can also benefit from closures by currying the functions with an argument.