Overview
In the engineering process we often facing the case to query all the data based on it's parent and the data will be used for some reason. For example, let say there is users
table that has correlation one-to-many with the transactions
table, you need to get all the users and it's transactions based on the user_ids
that given from argument. The simple logic that we will do is to get all the user with id include in user_ids
then get all the transactions one by one
def load_data(user_ids)
result = []
users = User.where(id: user_ids)
users.each do |user|
result << { user: user, transactions: user.transactions }
end
result
end
it's really simple logic, but is it good enough? is it bad? is it our service can endure the high throughput? is there any way to make it more efficient?
Look Inside the Query
Let say we have this model
class User < ApplicationRecord
has_many :transactions
end
class Transaction < ApplicationRecord
belongs_to :user
end
we're gonna simulate the query in the rails console (run: rails console
)
we see in the image, the load_data
method called 4 queries to the database. The first one is query all the users based on the user_ids, and the 3 others are queries to fetch the transactions for each user.
SELECT "users".* FROM "users" WHERE "users"."id" IN (?, ?, ?) [["id", 1], ["id", 2], ["id", 3]]
SELECT "transactions".* FROM "transactions" WHERE "transactions"."user_id" = ? /* loading for inspect */ LIMIT ? [["user_id", 1], ["LIMIT", 11]]
SELECT "transactions".* FROM "transactions" WHERE "transactions"."user_id" = ? /* loading for inspect */ LIMIT ? [["user_id", 2], ["LIMIT", 11]]
SELECT "transactions".* FROM "transactions" WHERE "transactions"."user_id" = ? /* loading for inspect */ LIMIT ? [["user_id", 3], ["LIMIT", 11]]
What happen if the user_ids is so big? will we query to fetch the transactions as much as the user that we have? now we facing the N+1 queries problem
N+1 query problem?
This is common problem in the database query, it will execute the query one-by-one for all instance instead of 1 or 2 queries. In the example above we fetch all the three users data, then continue with query all the transactions for each user, it count 4 queries (1+3). If the are N users data, first it will fetch all the N users then continue to query all the transactions for each user, so it's called N+1 queries.
The problem in the N+1 queries is each query will take some amount of time, bigger data that we fetch bigger time that we need and we may facing the timeout issue. N+1 query is not good for the performance and we need find the solution
*we can ignore the N+1 query if the data is small or low throughput
Eager Load
In ruby, we have Eager load mechanism to load all the data and it's association with single query. One of the method to trigger the eager_load is .includes
def load_data_with_eager_load(user_ids)
result = []
users = User.includes(:transactions).where(id: user_ids)
users.each do |user|
result << { user: user, transactions: user.transactions }
end
result
end
the method above will give result
if we look closely, the load_data_with_eager_load
method only trigger two query. First query get all the users and the second query get all the transactions for all users
SELECT "users".* FROM "users" WHERE "users"."id" IN (?, ?, ?) [["id", 1], ["id", 2], ["id", 3]]
SELECT "transactions".* FROM "transactions" WHERE "transactions"."user_id" IN (?, ?, ?) [["user_id", 1], ["user_id", 2], ["user_id", 3]]
It reduce the database query significantly, the eager load only cost 2 queries for however much data we have
Top comments (2)
what is eiger load ? do you mean eager load?
oops sorry, eiger is local brand in here and always make typo with that one. Thanks for the correction