Average Selling Price | LeetCode | MSSQL

#mssql #leetcode #programming #codenewbie

The Problem

We are given two tables: Prices and UnitsSold.

Prices table structure:

Column Name	Type
product_id	int
start_date	date
end_date	date
price	int

The (product_id, start_date, end_date) is the primary key for this table. Each row indicates the price of the product_id in the period from start_date to end_date. There are no overlapping periods for the same product_id.

UnitsSold table structure:

Column Name	Type
product_id	int
purchase_date	date
units	int

This table may contain duplicates. Each row indicates the purchase_date, units, and product_id of each product sold.

The task is to write an SQL query to find the average selling price for each product, with average_price rounded to 2 decimal places.

Explanation

For example:

Prices table:

product_id	start_date	end_date	price
1	2019-02-17	2019-02-28	5
1	2019-03-01	2019-03-22	20
2	2019-02-01	2019-02-20	15
2	2019-02-21	2019-03-31	30

UnitsSold table:

product_id	purchase_date	units
1	2019-02-25	100
1	2019-03-01	15
2	2019-02-10	200
2	2019-03-22	30

Expected Output:

product_id	average_price
1	6.96
2	16.96

The average selling price is calculated by dividing the total price of a product by the number of products sold. For example, the average selling price for product_id 1 is calculated as ((100 * 5) + (15 * 20)) / 115 = 6.96.

The Solution

We'll discuss three SQL solutions with varying degrees of complexity. The primary differences among these solutions lie in the SQL features they use, such as aggregate functions, window functions, and subqueries. Each strategy has its strengths and weaknesses.

Source Code 1: Join with Aggregate Functions

This approach uses a simple JOIN operation combined with aggregate functions to calculate the average selling price.

SELECT
    p.product_id,
    ROUND(SUM(p.price*u.units) * 1.0 / SUM(u.units), 2) AS average_price
FROM Prices p JOIN UnitsSold u
ON (p.product_id = u.product_id) AND (u.purchase_date BETWEEN p.start_date AND p.end_date)
GROUP BY p.product_id

In this solution, we join the Prices and UnitsSold tables on product_id and the condition that purchase_date lies within the start_date and end_date. We then group the results by product_id and calculate the average selling price.

Runtime: 995ms, beats 7.10% of submissions on LeetCode.

Source Code 2: Window Functions

This approach makes use of window functions, which allow operations on a set of rows related to the current row. This technique can simplify complex queries.

SELECT DISTINCT
    p.product_id,
    ROUND(
        SUM(p.price*u.units) OVER (PARTITION BY p.product_id)
        * 1.0 / SUM(u.units) OVER (PARTITION BY p.product_id)
    , 2) AS average_price
FROM Prices p JOIN UnitsSold u
ON (p.product_id = u.product_id) AND (u.purchase_date BETWEEN p.start_date AND p.end_date)

Here, the OVER (PARTITION BY p.product_id) clause divides the result set into partitions based on product_id, and the aggregate function SUM is applied to each partition separately. This strategy enhances performance by avoiding repetitive calculations.

Runtime: 690ms, beats 25.97% of submissions on LeetCode.

Source Code 3: Subqueries

This approach uses subqueries to calculate the total price and total units separately before joining the results to compute the average price.

SELECT 
    prices.product_id,
    ROUND(total_price * 1.0 / total_units, 2) AS average_price
FROM 
    (
        SELECT 
            p.product_id, 
            SUM(price * units) AS total_price
        FROM 
            Prices p 
        JOIN 
            UnitsSold u 
        ON 
            p.product_id = u.product_id 
            AND u.purchase_date BETWEEN p.start_date AND p.end_date
        GROUP BY 
            p.product_id
    ) prices
JOIN 
    (
        SELECT 
            product_id, 
            SUM(units) AS total_units
        FROM 
            UnitsSold
        GROUP BY 
            product_id
    ) units
ON 
    prices.product_id = units.product_id

While subqueries can enhance readability, they might impact performance, as seen in this solution's slower runtime.

Runtime: 3141ms, beats 5.8% of submissions on LeetCode.

Conclusion

These solutions offer different ways to approach the same problem, with trade-offs between readability and performance. Based on the LeetCode environment, Source Code 2 has the best performance, followed by Source Code 1, and then Source Code 3. However, real-world RDBMS performance may vary due to factors such as data distribution and indexing.

Ranked from best to worst based on LeetCode performance:

Source Code 2
Source Code 1
Source Code 3

You can find the original problem at LeetCode.

For more insightful solutions and tech-related content, feel free to connect with me on my Beacons page.

ranggakd - Link in Bio & Creator Tools | Beacons

@ranggakd | center details summary summary Oh hello there I m a an Programmer AI Tech Writer Data Practitioner Statistics Math Addict Open Source Contributor Quantum Computing Enthusiast details center.

beacons.ai