Let's say you need to fetch and process a lot of data from a remote server. It might be any REST API endpoint, but for demonstration purposes I'll take JSONPlaceholder. I'll use Node.js, but the same principles are applicable for a browser.
JSONPlaceholder has photos endpoint which returns a simple json data by photo id:
{
"albumId": 1,
"id": 1,
"title": "accusamus beatae ad facilis cum similique qui sunt",
"url": "https://via.placeholder.com/600/92c952",
"thumbnailUrl": "https://via.placeholder.com/150/92c952"
}
I'll use helper function getIdList
to generate array with required amount of IDs.
const getIdList = n => [...new Array(n)].map((item, i) => i + 1);
getIdList(5); // [1,2,3,4,5]
axios will help to fetch the data:
function fetchPhoto(id) {
const url = `https://jsonplaceholder.typicode.com/photos/${id}`;
return axios.get(url)
.then(res => res.data)
}
All at once
My first intention to solve the problem of handling thousands of requests was to start all requests in parallel and process the result when all requests are completed.
function all(items, fn) {
const promises = items.map(item => fn(item));
return Promise.all(promises);
}
It works great for small amount of items. Making 10, 50, 100 requests at the same time seems like a good idea. Fetching 10 items in parallel on a good connection takes less than a second.
But what about 2000 items?
Chances are that you'll hit the problem
(node:6740) UnhandledPromiseRejectionWarning: Error: read ECONNRESET
or
(node:3808) UnhandledPromiseRejectionWarning: Error: connect ETIMEDOUT
or
(node:7052) UnhandledPromiseRejectionWarning: Error: Client network socket disconnected before secure TLS connection was established
The point is that Node can't handle a lot of connections at the same time and we need to rethink the solution.
One by one
Other option is to solve the problem step by step. Let's start next request only after previous has been resolved.
function series(items, fn) {
let result = [];
return items.reduce((acc, item) => {
acc = acc.then(() => {
return fn(item).then(res => result.push(res));
});
return acc;
}, Promise.resolve())
.then(() => result);
}
Now it takes 4-5 seconds to fetch 10 items instead of one second in the previous example. But requesting 2000 items will not fail, so kinda success here. But how can we improve the algorithm completion time?
Divide and conquer
Let's take the best parts of both solutions and combine them together. We'll split all requests into chunks and fetch these chunks one by one. Feel free to experiment with the chunk size, I think for this example 50 requests per chunk would be fine.
function splitToChunks(items, chunkSize = 50) {
const result = [];
for (let i = 0; i < items.length; i+= chunkSize) {
result.push(items.slice(i, i + chunkSize));
}
return result;
}
function chunks(items, fn, chunkSize = 50) {
let result = [];
const chunks = splitToChunks(items, chunkSize);
return series(chunks, chunk => {
return all(chunk, fn)
.then(res => result = result.concat(res))
})
.then(() => result);
}
Awesome! Now we can handle a lot of requests with managable amount of time.
Conclusion
Results for fetching 2000 items:
all
: ❌ Fetch error
series
: ✅ 11 min 35 s
chunks
: ✅ 1 min 12 s
Top comments (10)
Thanks for the article. I was looking for an article which can handle huge number of api calls to maps.google.com autocomplete api, I am glad that I found this article, very well written with step by step performance optimisation.
But even though I am using the above pattern I am getting getaddrinfo enotfound error. I am making a huge number of api calls to maps.google.com api.
So can you suggest me a way in the above pattern to make a retry request for the failed requests?
Also is there a github repo for the above code?
Great question!
Handling errors heavily depends on your particular use case. At least I can think of three ways:
Here's the simplified example for the 3rd option. I use
series
helper for simplicity. Fake API request may fail. We keep trying to request API until success.And the result is
Thanks for the article, the content is very good. I have the impression that the functions you use to generate the chunks could be modules in npm, do you suggest any in particular?
Thanks. Actually I saw a similar module with promise patterns on npm several months ago, but can't find it now after googling for couple of minutes.
In my projects I use these promise helpers directly, without installing a module from npm.
Great article. I have a recursive function that makes the requests (it crawls a website), so I cannot know before hand all the requests to be made. How do I go about it?
I'd go the easiest path at first - made requests recursively one by one. If performance is not ok then try to make requests concurrent where possible.
I'm making the requests one by one, but I still get the connection reset and timeout errors.
Well, timeout error may occur in natural way considering that your app crawls websites. You need to catch such errors and handle them appropriately.
I wrote this post because this research helped me to solve problems in my day to day job. This post doesn't have answers for all questions. It can be used as a reference when applied to your specific situation.
Can not recurrent with your demo,other condition?
thank you so much for writing this. saved my day!