The Problem
We are given two tables: Prices
and UnitsSold
.
Prices
table structure:
Column Name | Type |
---|---|
product_id | int |
start_date | date |
end_date | date |
price | int |
The (product_id, start_date, end_date)
is the primary key for this table. Each row indicates the price of the product_id
in the period from start_date
to end_date
. There are no overlapping periods for the same product_id
.
UnitsSold
table structure:
Column Name | Type |
---|---|
product_id | int |
purchase_date | date |
units | int |
This table may contain duplicates. Each row indicates the purchase_date
, units
, and product_id
of each product sold.
The task is to write an SQL query to find the average selling price for each product, with average_price
rounded to 2 decimal places.
Explanation
For example:
Prices
table:
product_id | start_date | end_date | price |
---|---|---|---|
1 | 2019-02-17 | 2019-02-28 | 5 |
1 | 2019-03-01 | 2019-03-22 | 20 |
2 | 2019-02-01 | 2019-02-20 | 15 |
2 | 2019-02-21 | 2019-03-31 | 30 |
UnitsSold
table:
product_id | purchase_date | units |
---|---|---|
1 | 2019-02-25 | 100 |
1 | 2019-03-01 | 15 |
2 | 2019-02-10 | 200 |
2 | 2019-03-22 | 30 |
Expected Output:
product_id | average_price |
---|---|
1 | 6.96 |
2 | 16.96 |
The average selling price is calculated by dividing the total price of a product by the number of products sold. For example, the average selling price for product_id 1
is calculated as ((100 * 5) + (15 * 20)) / 115 = 6.96.
The Solution
We'll discuss three SQL solutions with varying degrees of complexity. The primary differences among these solutions lie in the SQL features they use, such as aggregate functions, window functions, and subqueries. Each strategy has its strengths and weaknesses.
Source Code 1: Join with Aggregate Functions
This approach uses a simple JOIN operation combined with aggregate functions to calculate the average selling price.
SELECT
p.product_id,
ROUND(SUM(p.price*u.units) * 1.0 / SUM(u.units), 2) AS average_price
FROM Prices p JOIN UnitsSold u
ON (p.product_id = u.product_id) AND (u.purchase_date BETWEEN p.start_date AND p.end_date)
GROUP BY p.product_id
In this solution, we join the Prices
and UnitsSold
tables on product_id
and the condition that purchase_date
lies within the start_date
and end_date
. We then group the results by product_id
and calculate the average selling price.
Runtime: 995ms, beats 7.10% of submissions on LeetCode.
Source Code 2: Window Functions
This approach makes use of window functions, which allow operations on a set of rows related to the current row. This technique can simplify complex queries.
SELECT DISTINCT
p.product_id,
ROUND(
SUM(p.price*u.units) OVER (PARTITION BY p.product_id)
* 1.0 / SUM(u.units) OVER (PARTITION BY p.product_id)
, 2) AS average_price
FROM Prices p JOIN UnitsSold u
ON (p.product_id = u.product_id) AND (u.purchase_date BETWEEN p.start_date AND p.end_date)
Here, the OVER (PARTITION BY p.product_id)
clause divides the result set into partitions based on product_id
, and the aggregate function SUM
is applied to each partition separately. This strategy enhances performance by avoiding repetitive calculations.
Runtime: 690ms, beats 25.97% of submissions on LeetCode.
Source Code 3: Subqueries
This approach uses subqueries to calculate the total price and total units separately before joining the results to compute the average price.
SELECT
prices.product_id,
ROUND(total_price * 1.0 / total_units, 2) AS average_price
FROM
(
SELECT
p.product_id,
SUM(price * units) AS total_price
FROM
Prices p
JOIN
UnitsSold u
ON
p.product_id = u.product_id
AND u.purchase_date BETWEEN p.start_date AND p.end_date
GROUP BY
p.product_id
) prices
JOIN
(
SELECT
product_id,
SUM(units) AS total_units
FROM
UnitsSold
GROUP BY
product_id
) units
ON
prices.product_id = units.product_id
While subqueries can enhance readability, they might impact performance, as seen in this solution's slower runtime.
Runtime: 3141ms, beats 5.8% of submissions on LeetCode.
Conclusion
These solutions offer different ways to approach the same problem, with trade-offs between readability and performance. Based on the LeetCode environment, Source Code 2 has the best performance, followed by Source Code 1, and then Source Code 3. However, real-world RDBMS performance may vary due to factors such as data distribution and indexing.
Ranked from best to worst based on LeetCode performance:
- Source Code 2
- Source Code 1
- Source Code 3
You can find the original problem at LeetCode.
For more insightful solutions and tech-related content, feel free to connect with me on my Beacons page.
Top comments (1)
We have to use LEFT JOIN instead of join in Source code 1,
Where the case is that if any product_id does't match than the value should be 0
leetcode #COALESCE