So far we've been using joins to support filtering based on related associations. But what if we not only want to filter but also access and use data from those associations?
Let's bring back the domain models we've been using during this series. This time Booking
and Rating
will be enough to illustrate some examples:
class Booking < ApplicationRecord
has_many :ratings
end
class Rating < ApplicationRecord
belongs_to :booking
end
Booking.all
Id | Check In | Check Out |
---|---|---|
62 | 20 Jun 2020 | 23 Jun 2020 |
63 | 07 Aug 2021 | 08 Aug 2021 |
64 | 22 May 2021 | 28 May 2021 |
65 | 01 Apr 2021 | 02 Apr 2021 |
Rating.all
Id | Rate | Comments | Booking Id |
---|---|---|---|
25 | 4 | 'Amazing' | 62 |
26 | 3 | 'Coming back soon' | 64 |
27 | 5 | 'friendly staff' | 63 |
pluck
vs select
Get a list of bookings that have been rated. The list should contain booking ids
with their corresponding rating comments
.
A simple way to return this list could be to join ratings to bookings and pluck
the id
values from bookings
and the corresponding comments
values from ratings
. Remember that we can pass an SQL
string to explicitly say which <table>.<column>
we'd like to pluck:
Booking.joins(:ratings).pluck('bookings.id', 'ratings.comments')
In SQL terms, this means that we're inner joining ratings and selecting only the two columns that we need, id
from bookings
table and the comments
from the ratings
table.
SELECT "bookings"."id", "ratings"."comments"
FROM "bookings"
INNER JOIN "ratings"
ON "ratings"."booking_id" = "bookings"."id"
Since we're using pluck
, the output is an array of arrays with the id
, comments
combinations:
[[62, 'amazing'], [63, 'friendly staff'], [64, 'Coming back soon']]
If we'd like to transform this into an index/dictionary structure that is easier to read and consult we can use the .to_h
method:
Booking.joins(:ratings).pluck('bookings.id', 'ratings.comments').to_h
There, much better:
{ 62 => 'Amazing',
63 => 'friendly staff',
64 => 'Coming back soon' }
So pluck
returns the data prepared for us in a ruby array object. If we'd rather return an ActiveRecord collection, we can solve this exercise in a similar way using select
instead:
Booking.joins(:ratings).select('bookings.id', 'ratings.comments')
This will translate in exactly the same SQL as before:
SELECT "bookings"."id", "ratings"."comments"
FROM "bookings"
INNER JOIN "ratings"
ON "ratings"."booking_id" = "bookings"."id"
But now, instead of an array we're returning ActiveRecord objects:
[#<Booking:0x00007f86084a4530 id: 62>,
#<Booking:0x00007f86084a4418 id: 63>,
#<Booking:0x00007f86084a4288 id: 64>]
Looking at this output, it looks like we've lost the comments
data. But notice what happens when we ask one of these objects what attributes they have:
bookings = Booking.joins(:ratings).select('bookings.id', 'ratings.comments')
bookings.first.attributes
=> {"id"=>62, "comments"=>"Amazing"}
What's going on here? Activerecord's select
is basically making all attributes that we've selected into ActiveRecord instance methods.
So if we do:
bookings = Booking.joins(:ratings).select('bookings.id', 'ratings.comments')
bookings.first.comments
=> "Amazing"
All this without triggering any extra queries.
So to get the list that we want, we can now iterate on each booking and print the available attributes:
Booking.joins(:ratings).select('bookings.id', 'ratings.comments').each do |booking|
puts booking.id
puts booking.comments
end
Which will print:
62
"Amazing"
63
"friendly staff"
64
"Coming back soon"
Be aware of n + 1 queries
It's important to mention that joins
per se does not avoid n + 1
queries. We've been managing to avoid them so far because we're asking SQL to join and select the data we need in a single query.
Watch what happens if you don't use the select
method:
Booking.joins(:ratings).each do |booking|
puts booking.id
puts booking.ratings.map(&:comments)
end
Note that now comments is no longer a booking instance method. That means that for each booking we'll have to get its ratings and from there add another iteration to get each rating's comments.
Booking Load (3.0ms) SELECT "bookings".* FROM "bookings" INNER JOIN "ratings" ON "ratings"."booking_id" = "bookings"."id"
🤯 Rating Load (29.8ms) SELECT "ratings".* FROM "ratings" WHERE "ratings"."booking_id" = $1 [["booking_id", 63]]
🤯 Rating Load (0.3ms) SELECT "ratings".* FROM "ratings" WHERE "ratings"."booking_id" = $1 [["booking_id", 64]]
🤯 Rating Load (2.0ms) SELECT "ratings".* FROM "ratings" WHERE "ratings"."booking_id" = $1 [["booking_id", 65]]
eager_load
If you don't want to select
attributes prior to the iteration, you can eager_load
the ratings
data. What this will not do, however, is to make comments
available as a booking instance method. So we'll have to get comments through the ratings:
Booking.joins(:ratings).eager_load(:ratings).each do |booking|
puts booking.id
puts booking.ratings.map(&:comments)
end
In SQL this will be:
SELECT "bookings"."id" AS t0_r0, "bookings"."guest_id" AS t0_r1, "bookings"."accommodation_id" AS t0_r2, "bookings"."check_in" AS t0_r3, "bookings"."check_out" AS t0_r4, "bookings"."status" AS t0_r5, "bookings"."total_guests" AS t0_r6, "bookings"."created_at" AS t0_r7, "bookings"."updated_at" AS t0_r8, "ratings"."id" AS t1_r0, "ratings"."rate" AS t1_r1, "ratings"."comments" AS t1_r2, "ratings"."booking_id" AS t1_r3, "ratings"."created_at" AS t1_r4, "ratings"."updated_at" AS t1_r5 FROM "bookings" INNER JOIN "ratings" ON "ratings"."booking_id" = "bookings"."id"
Note that now you'll have all booking
and all rating
attributes available which can make the query heavier in some cases.
You can also use eager_load
without the joins
which will default into a left outer join
:
Booking.eager_load(:ratings).each do |booking|
puts booking.id
puts booking.ratings.map(&:comments) if booking.ratings.present?
end
Here's the SQL:
SELECT "bookings"."id" AS t0_r0, "bookings"."guest_id" AS t0_r1, "bookings"."accommodation_id" AS t0_r2, "bookings"."check_in" AS t0_r3, "bookings"."check_out" AS t0_r4, "bookings"."status" AS t0_r5, "bookings"."total_guests" AS t0_r6, "bookings"."created_at" AS t0_r7, "bookings"."updated_at" AS t0_r8, "ratings"."id" AS t1_r0, "ratings"."rate" AS t1_r1, "ratings"."comments" AS t1_r2, "ratings"."booking_id" AS t1_r3, "ratings"."created_at" AS t1_r4, "ratings"."updated_at" AS t1_r5 FROM "bookings" LEFT OUTER JOIN "ratings" ON "ratings"."booking_id" = "bookings"."id"
Conclusion
There are several ways you can access data from joined associations:
- You can use
pluck
if returning a simple ruby object is enough for you - If you'd like to get the benefits of returning an ActiveRecord collection use
select
instead - If you want to work with all the model and the joined association(s) attributes, you can use
eager_load
- Don't forget to keep a close eye on performance. While
pluck
andselect
can reduce the load time of your queries, combining multiplejoins
in a single query and making all the attributes accessible witheager_load
can easily bloat them. In that case, consider breaking up your queries either using apreload
strategy or writing separate queries that support each other.
That's it! Hope this 4 part series on understanding ActiveRecord joins was useful! Would love to hear about how you've been using joins too!
Top comments (0)