The Problem
In this problem, we need to calculate the total score of each hacker, where the total score is defined as the sum of their maximum scores across all challenges. We then need to print the hacker_id, name, and total score of the hackers, sorted by descending total score and ascending hacker_id for ties. We need to exclude all hackers with a total score of 0.
The Input
The input consists of two tables:
Hackers Table: The hacker_id is the id of the hacker, and name is the name of the hacker.
Submissions Table: The submission_id is the id of the submission, hacker_id is the id of the hacker who made the submission, challenge_id is the id of the challenge for which the submission belongs to, and score is the score of the submission.
Sample input and output are available for more in-depth understanding of the problem.
Sample Input
Hackers Table:
Submissions Table:
The Output
Sample Output:
4071 Rose 191
74842 Lisa 174
84072 Bonnie 100
4806 Angela 89
26071 Frank 85
80305 Kimberly 67
49438 Patrick 43
Explanation
Hacker 4071 submitted solutions for challenges 19797 and 49593, so the total score =95+max(43,96)=191.
Hacker 74842 submitted solutions for challenges 19797 and 63132, so the total score =max(98,5)+76=174.
Hacker 84072 submitted solutions for challenges 49593 and 63132, so the total score =100+0=100.
The total scores for hackers 4806, 26071, 80305, and 49438 can be similarly calculated.
The Solution
We'll discuss three SQL solutions, each with different strategies and trade-offs.
Source Code 1
The first source code creates a total_score
Common Table Expression (CTE) that joins the Hackers and Submissions tables and calculates the maximum score per challenge for each hacker. It then sums these max scores per hacker and filters out hackers with a total score of 0. It orders the result by total score in descending order and hacker_id in ascending order for ties. This solution uses the OVER
clause with PARTITION BY
to calculate the max score per challenge per hacker, which simplifies the subsequent aggregation.
WITH total_score AS (
SELECT DISTINCT
h.hacker_id,
h.name,
s.challenge_id,
MAX(s.score) OVER (PARTITION BY h.hacker_id, s.challenge_id) AS max_score_per_subm
FROM
Hackers h JOIN Submissions s ON h.hacker_id = s.hacker_id
)
SELECT
hacker_id,
name,
SUM(max_score_per_subm) AS total
FROM total_score
GROUP BY
hacker_id,
name
HAVING SUM(max_score_per_subm) > 0
ORDER BY
total DESC,
hacker_id
Source Code 2
The second solution differs in that it first calculates the max score per challenge per hacker using a GROUP BY
clause in the total_score
CTE, then sums these max scores per hacker in the score_per_hacker
CTE. This involves two separate groupings, which may increase execution time. However, it also separates concerns and can be easier to understand.
WITH total_score AS (
SELECT
s.hacker_id,
MAX(s.score) AS max_score
FROM
Submissions s
GROUP BY
s.hacker_id,
s.challenge_id
),
score_per_hacker AS (
SELECT
ts.hacker_id,
SUM(ts.max_score) AS total_score
FROM
total_score ts
GROUP BY
ts.hacker_id
)
SELECT
sp.hacker_id,
h.name,
sp.total_score
FROM
score_per_hacker sp
JOIN
Hackers h ON sp.hacker_id = h.hacker_id
WHERE
sp.total_score > 0
ORDER BY
sp.total_score DESC,
sp.hacker_id
Source Code 3
The third solution uses the ROW_NUMBER()
function to assign a unique rank to each submission by each hacker for each challenge. It then only includes the highest-ranked (i.e., highest-scoring) submission for each challenge in the total score. This strategy avoids the need to use DISTINCT
in the CTE or to group by challenge_id, potentially improving performance.
SELECT
hacker_id,
name,
SUM(CASE WHEN rn = 1 THEN score ELSE 0 END) AS total_score
FROM (
SELECT
h.hacker_id,
h.name,
s.score,
ROW_NUMBER() OVER(PARTITION BY s.hacker_id, s.challenge_id ORDER BY s.score DESC) as rn
FROM
Hackers h
JOIN
Submissions s ON h.hacker_id = s.hacker_id
) t
GROUP BY
hacker_id,
name
HAVING
SUM(CASE WHEN rn = 1 THEN score ELSE 0 END) > 0
ORDER BY
total_score DESC,
hacker_id
Conclusion
All three solutions achieve the same result but use different techniques to calculate the total score of each hacker. The first solution is straightforward but might not be as efficient due to its use of DISTINCT
. The second solution separates concerns by creating a CTE for each step, which could improve readability at the cost of execution time. The third solution is likely the most efficient due to its use of the ROW_NUMBER()
function to select the highest score per challenge per hacker directly.
In summary, while different SQL queries can achieve the same result, their performance can vary significantly based on the SQL features and functions used. Therefore, it's crucial to consider different approaches and understand their trade-offs.
You can find the original problem at HackerRank.
For more insightful solutions and tech-related content, feel free to connect with me on my Beacons page.
Top comments (0)