The Problem
We have a table called ActorDirector
with the following schema:
Column Name | Type |
---|---|
actor_id | int |
director_id | int |
timestamp | int |
Note: timestamp
is the primary key column for this table.
We need to write a SQL query for a report that provides the pairs (actor_id, director_id)
where the actor has cooperated with the director at least three times. We can return the result table in any order.
For example, given the following ActorDirector
table:
actor_id | director_id | timestamp |
---|---|---|
1 | 1 | 0 |
1 | 1 | 1 |
1 | 1 | 2 |
1 | 2 | 3 |
1 | 2 | 4 |
2 | 1 | 5 |
2 | 1 | 6 |
The output should be:
actor_id | director_id |
---|---|
1 | 1 |
The pair (1, 1)
has cooperated exactly three times.
The Solution
We are going to approach the problem using four different SQL queries, each with a slight variation in the HAVING
clause. We will evaluate the performance of each solution and discuss the differences in the logic.
Source Code 1
The first solution counts all rows for each (actor_id, director_id)
pair and keeps those with count 3 or more.
SELECT
actor_id,
director_id
FROM ActorDirector
GROUP BY
actor_id,
director_id
HAVING COUNT(*) >= 3
The runtime for this query is 1570ms, which is faster than 20.41% of all other submissions for this problem.
Source Code 2
The second solution counts only the actor_id
for each (actor_id, director_id)
pair and keeps those with count 3 or more.
SELECT
actor_id,
director_id
FROM ActorDirector
GROUP BY
actor_id,
director_id
HAVING COUNT(actor_id) >= 3
The runtime for this query is 1106ms, which is faster than 90.16% of all other submissions for this problem.
Source Code 3
The third solution counts only the director_id
for each (actor_id, director_id)
pair and keeps those with count 3 or more.
SELECT
actor_id,
director_id
FROM ActorDirector
GROUP BY
actor_id,
director_id
HAVING COUNT(director_id) >= 3
The runtime for this query is 1084ms, which is faster than 95.14% of all other submissions for this problem.
Source Code 4
The fourth solution counts the unique timestamp
for each (actor_id, director_id)
pair and keeps those with count 3 or more.
SELECT
actor_id,
director_id
FROM ActorDirector
GROUP BY
actor_id,
director_id
HAVING COUNT(timestamp) >= 3
The runtime for this query is 1090ms, which is faster than 93.95% of all other submissions for this problem.
Conclusion
All the solutions presented above yield the correct result, but with different performances. Based on the runtime, the rank of the solutions from fastest to slowest is:
- Source Code 3 (1084ms)
- Source Code 4 (1090ms)
- Source Code 2 (1106ms)
- Source Code 1 (1570ms)
It's interesting to note that counting the director_id
(Source Code 3) results in the best performance on LeetCode. However, in real-world RDBMS, the performance might vary depending on the specific dataset and indexing strategies.
Always bear in mind that understanding the problem thoroughly and choosing the right approach based on the context is the key to writing efficient SQL queries.
You can find the original problem at LeetCode.
For more insightful solutions and tech-related content, feel free to connect with me on my Beacons page.
Top comments (0)