Laravel "chunk" method is widely used to split large query into smaller chunks when processing large database. But there is a trick if the attribute used in query would be changed in call back function. It is easy to be ignored and thus often be misused.
Problem
Post::where('processed', 0)->chunk(100, function($posts) {
foreach($posts as $post) {
$post->processed = 1;
$post->save();
}
});
Assumption: the total number of unprocessed posts is 400 (Id is 1 - 400).
The problem is: Only half of the posts are processed.
Analyzation
Each chunk is being fetched via single query with limit and offset. Let’s have a look how chunk works.
In the beginning, unprocessed posts are 400 (Ids 1- 400), for 1st chunk query: select * from `posts` where `processed` = 0 limit 100 offset 0
, Ids 1-100 are fetched.
After 1st chunk, unprocessed posts are 300 (Ids 101- 400), for 2nd chunk query: select * from `posts` where `processed` = 0 limit 100 offset 100
, Ids 201-300 are fetched.
After 2nd chunk, unprocessed posts is 200 (Ids 101- 200 and 301- 400), for 3rd chunk query: select * from `posts` where `processed` = 0 limit 100 offset 200
, bummmm ... none of post is fetched.
The result is: 200 posts are processed, but 200 posts are left unprocessed.
Solution:
"chunkById" is recommended to replace "chunk" in this situation. "chunkById" is similar to "chunk". The only difference is how raw SQL is being constructed. "chunkById" uses id
and limit
while "chunk" uses limit
and offset
.
The modified code would be like:
Post::where('processed', 0)->chunkById(100, function($posts) {
foreach($posts as $post) {
$post->processed = 1;
$post->save();
}
});
When using "chunkById", each chunk query sql sentence would be like below:
1st chunk query: select * from `posts` where `processed` = 0 and `id` > 0 order by id asc limit 100
2nd chunk query: select * from `posts` where `processed` = 0 and `id` > 100 order by id asc limit 100
3rd chunk query: select * from `posts` where `processed` = 0 and `id` > 200 order by id asc limit 100
4th chunk query: select * from `posts` where `processed` = 0 and `id` > 300 order by id asc limit 100
Therefore the fetch result would be 1-100, 101- 200, 201 -300, 301- 400 in sequence, and all of the posts are processed.
Top comments (1)
Good insight, thank you