Retiago Drago

Posted on Jun 7, 2023 • Edited on Jun 12, 2023

Reformat Department Table | LeetCode | MSSQL

#mssql #beginners #tutorial #leetcode

The Problem

Let's first look at the problem. We have a Department table structured as follows:

Column Name	Type
id	int
revenue	int
month	varchar

(id, month) is the primary key for this table. The table contains the revenue of each department per month, and the month column values are in the form of ["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"].

The task at hand is to write an SQL query to reformat the table such that there is a department id column and a revenue column for each month. We will return the result table in any order. The expected query result format is shown in the following example.

Input:

id	revenue	month
1	8000	Jan
2	9000	Jan
3	10000	Feb
1	7000	Feb
1	6000	Mar

Output:

id	Jan_Revenue	Feb_Revenue	Mar_Revenue	...	Dec_Revenue
1	8000	7000	6000	...	null
2	9000	null	null	...	null
3	null	10000	null	...	null

The revenue from April to December is null in this example. The result table will have 13 columns: 1 for the department id and 12 for the months.

The Solution

We are going to explore four SQL solutions to this problem, each with a distinct approach. We will highlight their main differences, strengths and weaknesses, and structures.

Source Code 1

The first approach is the most straightforward one, using the PIVOT function to transform the data from rows into columns:

SELECT 
    id, 
    [Jan] AS Jan_Revenue, 
    [Feb] AS Feb_Revenue, 
    [Mar] AS Mar_Revenue, 
    [Apr] AS Apr_Revenue, 
    [May] AS May_Revenue, 
    [Jun] AS Jun_Revenue, 
    [Jul] AS Jul_Revenue, 
    [Aug] AS Aug_Revenue, 
    [Sep] AS Sep_Revenue, 
    [Oct] AS Oct_Revenue, 
    [Nov] AS Nov_Revenue, 
    [Dec] AS Dec_Revenue
FROM 
    (
        SELECT id, month, revenue 
        FROM Department
    ) AS SourceTable
PIVOT
(
    SUM(revenue)
    FOR month IN ([Jan], [Feb], [Mar], [Apr], [May], [Jun], [Jul], [Aug], [Sep], [Oct], [Nov], [Dec])
) AS PivotTable

This approach resulted in a runtime of 1443ms, beating 79.74% of other solutions.

Source Code 2

The second approach is similar to the first one but simplifies the SQL syntax. Instead of using an alias for the source table, we directly use the Department table:

SELECT 
    id, 
    [Jan] [Jan_Revenue], 
    [Feb] [Feb_Revenue], 
    [Mar] [Mar_Revenue], 
    [Apr] [Apr_Revenue], 
    [May] [May_Revenue], 
    [Jun] [Jun_Revenue], 
    [Jul] [Jul_Revenue], 
    [Aug] [Aug_Revenue], 
    [Sep] [Sep_Revenue], 
    [Oct] [Oct_Revenue], 
    [Nov] [Nov_Revenue], 
    [Dec] [Dec_Revenue]
FROM Department
PIVOT
(
    SUM(revenue)
    FOR month IN ([Jan], [Feb], [Mar], [Apr], [May], [Jun], [Jul], [Aug], [Sep], [Oct], [Nov], [Dec])
) [PivotTable]

This change results in a slightly faster runtime of 1390ms, beating 89.9% of other solutions.

Source Code 3

The third approach takes a different path, eschewing the PIVOT function in favor of a CASE expression inside a SUM function:

SELECT
    id,
    SUM(CASE WHEN month = 'Jan' THEN revenue ELSE NULL END) [Jan_Revenue],
    SUM(CASE WHEN month = 'Feb' THEN revenue ELSE NULL END) [Feb_Revenue],
    SUM(CASE WHEN month = 'Mar' THEN revenue ELSE NULL END) [Mar_Revenue],
    SUM(CASE WHEN month = 'Apr' THEN revenue ELSE NULL END) [Apr_Revenue],
    SUM(CASE WHEN month = 'May' THEN revenue ELSE NULL END) [May_Revenue],
    SUM(CASE WHEN month = 'Jun' THEN revenue ELSE NULL END) [Jun_Revenue],
    SUM(CASE WHEN month = 'Jul' THEN revenue ELSE NULL END) [Jul_Revenue],
    SUM(CASE WHEN month = 'Aug' THEN revenue ELSE NULL END) [Aug_Revenue],
    SUM(CASE WHEN month = 'Sep' THEN revenue ELSE NULL END) [Sep_Revenue],
    SUM(CASE WHEN month = 'Oct' THEN revenue ELSE NULL END) [Oct_Revenue],
    SUM(CASE WHEN month = 'Nov' THEN revenue ELSE NULL END) [Nov_Revenue],
    SUM(CASE WHEN month = 'Dec' THEN revenue ELSE NULL END) [Dec_Revenue]
FROM Department
GROUP BY id

Although this approach is more readable and easier to understand, the execution is significantly slower, with a runtime of 3372ms and beating only 5.20% of other solutions.

Source Code 4

The final approach utilizes window functions, specifically the OVER clause with PARTITION BY. This allows us to perform the calculation within each department's context:

SELECT DISTINCT 
    id,
    SUM(CASE WHEN month = 'Jan' THEN revenue ELSE NULL END) OVER (PARTITION BY id) [Jan_Revenue],
    SUM(CASE WHEN month = 'Feb' THEN revenue ELSE NULL END) OVER (PARTITION BY id) [Feb_Revenue],
    SUM(CASE WHEN month = 'Mar' THEN revenue ELSE NULL END) OVER (PARTITION BY id) [Mar_Revenue],
    SUM(CASE WHEN month = 'Apr' THEN revenue ELSE NULL END) OVER (PARTITION BY id) [Apr_Revenue],
    SUM(CASE WHEN month = 'May' THEN revenue ELSE NULL END) OVER (PARTITION BY id) [May_Revenue],
    SUM(CASE WHEN month = 'Jun' THEN revenue ELSE NULL END) OVER (PARTITION BY id) [Jun_Revenue],
    SUM(CASE WHEN month = 'Jul' THEN revenue ELSE NULL END) OVER (PARTITION BY id) [Jul_Revenue],
    SUM(CASE WHEN month = 'Aug' THEN revenue ELSE NULL END) OVER (PARTITION BY id) [Aug_Revenue],
    SUM(CASE WHEN month = 'Sep' THEN revenue ELSE NULL END) OVER (PARTITION BY id) [Sep_Revenue],
    SUM(CASE WHEN month = 'Oct' THEN revenue ELSE NULL END) OVER (PARTITION BY id) [Oct_Revenue],
    SUM(CASE WHEN month = 'Nov' THEN revenue ELSE NULL END) OVER (PARTITION BY id) [Nov_Revenue],
    SUM(CASE WHEN month = 'Dec' THEN revenue ELSE NULL END) OVER (PARTITION BY id) [Dec_Revenue]
FROM Department

The window function approach has a runtime of 2198ms, beating 16.89% of other submissions.

Conclusion

All the presented solutions solve the problem, but they differ in efficiency and readability.

The second approach seems to perform the best, beating 89.9% of other solutions. The first approach follows closely, with a performance beating 79.74% of submissions. The fourth approach, using window functions, outperforms only 16.89% of the submissions, and the third approach is the least efficient, outperforming only 5.2%.

However, keep in mind that these efficiency ratings come from LeetCode and might not always reflect performance in a real-world RDBMS setting. For instance, the third approach might perform better in certain database systems that optimize CASE expressions well.

Ultimately, the choice between these methods depends on your specific use case, system, and personal preferences for readability and maintainability.

You can find the original problem at LeetCode.

For more insightful solutions and tech-related content, feel free to connect with me on my Beacons page.

ranggakd - Link in Bio & Creator Tools | Beacons

@ranggakd | center details summary summary Oh hello there I m a an Programmer AI Tech Writer Data Practitioner Statistics Math Addict Open Source Contributor Quantum Computing Enthusiast details center.

beacons.ai

DEV Community

Reformat Department Table | LeetCode | MSSQL

The Problem

The Solution

Source Code 1

Source Code 2

Source Code 3

Source Code 4

Conclusion

ranggakd - Link in Bio & Creator Tools | Beacons

Top comments (0)

Read next

Oh My Zsh: A Simple Guide for Developers

From Lama2 to LiveAPI: Building Super-Convenient API Documentation (Part II)

Welcome 2025 with 7 must-try New Year decorating ideas

How to Make a Retro 2D JavaScript Game Part 2