The Problem
Let's first look at the problem. We have a Department
table structured as follows:
Column Name | Type |
---|---|
id | int |
revenue | int |
month | varchar |
(id, month)
is the primary key for this table. The table contains the revenue of each department per month, and the month
column values are in the form of ["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"].
The task at hand is to write an SQL query to reformat the table such that there is a department id column and a revenue column for each month. We will return the result table in any order. The expected query result format is shown in the following example.
Input:
id | revenue | month |
---|---|---|
1 | 8000 | Jan |
2 | 9000 | Jan |
3 | 10000 | Feb |
1 | 7000 | Feb |
1 | 6000 | Mar |
Output:
id | Jan_Revenue | Feb_Revenue | Mar_Revenue | ... | Dec_Revenue |
---|---|---|---|---|---|
1 | 8000 | 7000 | 6000 | ... | null |
2 | 9000 | null | null | ... | null |
3 | null | 10000 | null | ... | null |
The revenue from April to December is null in this example. The result table will have 13 columns: 1 for the department id and 12 for the months.
The Solution
We are going to explore four SQL solutions to this problem, each with a distinct approach. We will highlight their main differences, strengths and weaknesses, and structures.
Source Code 1
The first approach is the most straightforward one, using the PIVOT function to transform the data from rows into columns:
SELECT
id,
[Jan] AS Jan_Revenue,
[Feb] AS Feb_Revenue,
[Mar] AS Mar_Revenue,
[Apr] AS Apr_Revenue,
[May] AS May_Revenue,
[Jun] AS Jun_Revenue,
[Jul] AS Jul_Revenue,
[Aug] AS Aug_Revenue,
[Sep] AS Sep_Revenue,
[Oct] AS Oct_Revenue,
[Nov] AS Nov_Revenue,
[Dec] AS Dec_Revenue
FROM
(
SELECT id, month, revenue
FROM Department
) AS SourceTable
PIVOT
(
SUM(revenue)
FOR month IN ([Jan], [Feb], [Mar], [Apr], [May], [Jun], [Jul], [Aug], [Sep], [Oct], [Nov], [Dec])
) AS PivotTable
This approach resulted in a runtime of 1443ms, beating 79.74% of other solutions.
Source Code 2
The second approach is similar to the first one but simplifies the SQL syntax. Instead of using an alias for the source table, we directly use the Department table:
SELECT
id,
[Jan] [Jan_Revenue],
[Feb] [Feb_Revenue],
[Mar] [Mar_Revenue],
[Apr] [Apr_Revenue],
[May] [May_Revenue],
[Jun] [Jun_Revenue],
[Jul] [Jul_Revenue],
[Aug] [Aug_Revenue],
[Sep] [Sep_Revenue],
[Oct] [Oct_Revenue],
[Nov] [Nov_Revenue],
[Dec] [Dec_Revenue]
FROM Department
PIVOT
(
SUM(revenue)
FOR month IN ([Jan], [Feb], [Mar], [Apr], [May], [Jun], [Jul], [Aug], [Sep], [Oct], [Nov], [Dec])
) [PivotTable]
This change results in a slightly faster runtime of 1390ms, beating 89.9% of other solutions.
Source Code 3
The third approach takes a different path, eschewing the PIVOT function in favor of a CASE expression inside a SUM function:
SELECT
id,
SUM(CASE WHEN month = 'Jan' THEN revenue ELSE NULL END) [Jan_Revenue],
SUM(CASE WHEN month = 'Feb' THEN revenue ELSE NULL END) [Feb_Revenue],
SUM(CASE WHEN month = 'Mar' THEN revenue ELSE NULL END) [Mar_Revenue],
SUM(CASE WHEN month = 'Apr' THEN revenue ELSE NULL END) [Apr_Revenue],
SUM(CASE WHEN month = 'May' THEN revenue ELSE NULL END) [May_Revenue],
SUM(CASE WHEN month = 'Jun' THEN revenue ELSE NULL END) [Jun_Revenue],
SUM(CASE WHEN month = 'Jul' THEN revenue ELSE NULL END) [Jul_Revenue],
SUM(CASE WHEN month = 'Aug' THEN revenue ELSE NULL END) [Aug_Revenue],
SUM(CASE WHEN month = 'Sep' THEN revenue ELSE NULL END) [Sep_Revenue],
SUM(CASE WHEN month = 'Oct' THEN revenue ELSE NULL END) [Oct_Revenue],
SUM(CASE WHEN month = 'Nov' THEN revenue ELSE NULL END) [Nov_Revenue],
SUM(CASE WHEN month = 'Dec' THEN revenue ELSE NULL END) [Dec_Revenue]
FROM Department
GROUP BY id
Although this approach is more readable and easier to understand, the execution is significantly slower, with a runtime of 3372ms and beating only 5.20% of other solutions.
Source Code 4
The final approach utilizes window functions, specifically the OVER
clause with PARTITION BY
. This allows us to perform the calculation within each department's context:
SELECT DISTINCT
id,
SUM(CASE WHEN month = 'Jan' THEN revenue ELSE NULL END) OVER (PARTITION BY id) [Jan_Revenue],
SUM(CASE WHEN month = 'Feb' THEN revenue ELSE NULL END) OVER (PARTITION BY id) [Feb_Revenue],
SUM(CASE WHEN month = 'Mar' THEN revenue ELSE NULL END) OVER (PARTITION BY id) [Mar_Revenue],
SUM(CASE WHEN month = 'Apr' THEN revenue ELSE NULL END) OVER (PARTITION BY id) [Apr_Revenue],
SUM(CASE WHEN month = 'May' THEN revenue ELSE NULL END) OVER (PARTITION BY id) [May_Revenue],
SUM(CASE WHEN month = 'Jun' THEN revenue ELSE NULL END) OVER (PARTITION BY id) [Jun_Revenue],
SUM(CASE WHEN month = 'Jul' THEN revenue ELSE NULL END) OVER (PARTITION BY id) [Jul_Revenue],
SUM(CASE WHEN month = 'Aug' THEN revenue ELSE NULL END) OVER (PARTITION BY id) [Aug_Revenue],
SUM(CASE WHEN month = 'Sep' THEN revenue ELSE NULL END) OVER (PARTITION BY id) [Sep_Revenue],
SUM(CASE WHEN month = 'Oct' THEN revenue ELSE NULL END) OVER (PARTITION BY id) [Oct_Revenue],
SUM(CASE WHEN month = 'Nov' THEN revenue ELSE NULL END) OVER (PARTITION BY id) [Nov_Revenue],
SUM(CASE WHEN month = 'Dec' THEN revenue ELSE NULL END) OVER (PARTITION BY id) [Dec_Revenue]
FROM Department
The window function approach has a runtime of 2198ms, beating 16.89% of other submissions.
Conclusion
All the presented solutions solve the problem, but they differ in efficiency and readability.
The second approach seems to perform the best, beating 89.9% of other solutions. The first approach follows closely, with a performance beating 79.74% of submissions. The fourth approach, using window functions, outperforms only 16.89% of the submissions, and the third approach is the least efficient, outperforming only 5.2%.
However, keep in mind that these efficiency ratings come from LeetCode and might not always reflect performance in a real-world RDBMS setting. For instance, the third approach might perform better in certain database systems that optimize CASE expressions well.
Ultimately, the choice between these methods depends on your specific use case, system, and personal preferences for readability and maintainability.
You can find the original problem at LeetCode.
For more insightful solutions and tech-related content, feel free to connect with me on my Beacons page.
Top comments (0)