Retiago Drago

Posted on May 20, 2023

Trips and Users | LeetCode | MSSQL

#leetcode #mssql #codenewbie #beginners

The Problem

The challenge involves two tables: Trips and Users. The Trips table holds information about all taxi trips, including the unique trip id, client id, driver id, city id, status of the trip, and the date of the request. The Users table maintains records of all users, encompassing the unique user id, user's status (banned or not), and their role (client, driver, partner).

The task is to compute the cancellation rate each day between "2013-10-01" and "2013-10-03". The cancellation rate is defined as the ratio of the number of cancelled requests by unbanned users (both client and driver) to the total number of requests made by unbanned users that day. The cancellation rate should be expressed up to two decimal places. The tables' schemas are as follows:

Trips

Column Name	Type
id	int
client_id	int
driver_id	int
city_id	int
status	enum
request_at	date

Users

Column Name	Type
users_id	int
banned	enum
role	enum

Explanation

Consider the following input:

Trips

id	client_id	driver_id	city_id	status	request_at
1	1	10	1	completed	2013-10-01
2	2	11	1	cancelled_by_driver	2013-10-01
3	3	12	6	completed	2013-10-01
4	4	13	6	cancelled_by_client	2013-10-01
5	1	10	1	completed	2013-10-02
6	2	11	6	completed	2013-10-02
7	3	12	6	completed	2013-10-02
8	2	12	12	completed	2013-10-03
9	3	10	12	completed	2013-10-03
10	4	13	12	cancelled_by_driver	2013-10-03

Users

users_id	banned	role
1	No	client
2	Yes	client
3	No	client
4	No	client
10	No	driver
11	No	driver
12	No	driver
13	No	driver

For this data, the expected output is:

Day	Cancellation Rate
2013-10-01	0.33
2013-10-02	0.00
2013-10-03	0.50

Explanation:

On 2013-10-01:

There were 4 requests in total, 2 of which were canceled.
The request with Id=2 was made by a banned client (User_Id=2), so it is excluded from the calculation.
Hence, considering unbanned users, there were 3 requests in total, 1 of which was canceled.
Therefore, the Cancellation Rate is (1 / 3) = 0.33

On 2013-10-02:

There were 3 requests in total, none of which were canceled.
The request with Id=6 was made by a banned client, so it is excluded from the calculation.
Thus, considering unbanned users, there were 2 requests in total, none of which were canceled.
Hence, the Cancellation Rate is (0 / 2) = 0.00

On 2013-10-03:

There were 3 requests in total, 1 of which was canceled.
The request with Id=8 was made by a banned client, so it is excluded from the calculation.
Hence, considering unbanned users, there were 2 requests in total, 1 of which was canceled.
Therefore, the Cancellation Rate is (1 / 2) = 0.50

The cancellation rate for each day is calculated considering only unbanned users (both client and driver) and requests that occurred on the day. If a request was made by a banned user, it is ignored in the calculation. The final cancellation rate is presented as a fraction of the total requests made by unbanned users that were canceled, rounded to two decimal places.

The Solution

Let's delve into three different approaches to solve this problem. As we progress from Source Code 1 to Source Code 3, the code gets more efficient and easier to read, even though all three of them provide the correct results.

Source Code 1

This SQL query makes use of a Common Table Expression (CTE) called 'cancellation_rates'. It generates a temporary result set by joining the Trips table with the Users table twice, once for the client and once for the driver. This query uses window functions COUNT(*) OVER() to get the number of unbanned requests and unbanned statuses for each day. It then calculates the cancellation rate in the main SELECT query.

This query might seem a bit complex due to the use of window functions and CTE. However, it provides a clear separation of calculating the unbanned request count and the actual calculation of cancellation rate.

The performance of this query is good but not the best, with a runtime of 572ms, beating 45.44% of other submissions.

WITH cancellation_rates AS (
    SELECT DISTINCT
        t.request_at AS Day,
        t.status,
        COUNT(*) OVER(PARTITION BY t.request_at, t.status) AS unbanned_stat,
        COUNT(*) OVER(PARTITION BY t.request_at) AS unbanned_req
    FROM Trips t JOIN Users c ON t.client_id = c.users_id JOIN Users d ON t.driver_id = d.users_id
    WHERE
        c.banned = 'No'
        AND d.banned = 'No'
        AND  t.request_at BETWEEN '2013-10-01' AND '2013-10-03'
)
SELECT
    Day,
    CASE
        WHEN status = 'completed' AND unbanned_stat = unbanned_req THEN 0
        ELSE ROUND(SUM(unbanned_stat) OVER (PARTITION BY Day, unbanned_stat) * 1.0 / unbanned_req, 2)
    END AS 'Cancellation Rate'
FROM cancellation_rates
WHERE
    status LIKE 'cancelled%'
    OR (status = 'completed' AND unbanned_stat = unbanned_req)

Source Code 2

This query simplifies the approach by removing the use of a CTE. Instead, it calculates the cancellation rate directly within the main SELECT statement using conditional aggregation. It uses a CASE statement inside the SUM() function to count the number of cancellations and then divides that by the total number of requests.

This approach is easier to understand but has a slightly worse performance, with a runtime of 741ms, beating 12.61% of other submissions.

SELECT 
    t.request_at AS Day,
    ISNULL(
        ROUND(
            CAST(
                SUM(
                    CASE 
                        WHEN t.status LIKE 'cancelled%' THEN 1
                        ELSE 0 
                    END
                ) AS FLOAT) / 
            CAST(
                COUNT(*) AS FLOAT),
            2), 
        0) AS 'Cancellation Rate'
FROM 
    Trips t 
JOIN 
    Users c ON t.client_id = c.users_id AND c.banned = 'No'
JOIN 
    Users d ON t.driver_id = d.users_id AND d.banned = 'No'
WHERE 
    t.request_at BETWEEN '2013-10-01' AND '2013-10-03'
GROUP BY 
    t.request_at

Source Code 3

The third query is even more simplified and performs better than the previous two queries. It removes the explicit JOIN operations, replacing them with subqueries in the WHERE clause to filter out banned users.

This not only improves readability but also enhances performance because the query doesn't need to create large intermediate join tables. Instead, it filters rows directly based on client_id and driver_id, which is more efficient.

This query has the best performance among the three, with a runtime of 435ms, beating 81.43% of other submissions.

SELECT 
    t.request_at AS Day,
    ISNULL(
        ROUND(
            SUM(
                CASE 
                    WHEN t.status LIKE 'cancelled%' THEN 1.0
                    ELSE 0.0 
                END
            ) / 
            COUNT(*),
            2), 
        0) AS 'Cancellation Rate'
FROM 
    Trips t 
WHERE 
    t.request_at BETWEEN '2013-10-01' AND '2013-10-03'
    AND t.client_id IN (SELECT users_id FROM Users WHERE banned = 'No' AND role = 'client')
    AND t.driver_id IN (SELECT users_id FROM Users WHERE banned = 'No' AND role = 'driver')
GROUP BY 
    t.request_at

Conclusion

From these solutions, we can learn that while different SQL queries can achieve the same result, their performance can vary based on how we leverage the SQL features and functions. Using joins and subqueries strategically can have a significant impact on the query execution speed.

You can find the original problem at LeetCode.

For more insightful solutions and tech-related content, feel free to connect with me on my Beacons page.

ranggakd - Link in Bio & Creator Tools | Beacons

@ranggakd | center details summary summary Oh hello there I m a an Programmer AI Tech Writer Data Practitioner Statistics Math Addict Open Source Contributor Quantum Computing Enthusiast details center.

beacons.ai

DEV Community

Trips and Users | LeetCode | MSSQL

The Problem

Explanation

Explanation:

The Solution

Source Code 1

Source Code 2

Source Code 3

Conclusion

ranggakd - Link in Bio & Creator Tools | Beacons

Top comments (0)

Read next

2182. Construct String With Repeat Limit

Top APIs Every Frontend Developer Should Use in 2025

🚀 When to Use VPS, Vercel, and Cloudflare Worker: A Detailed Comparison

PHP OOP Part-2: Constructor and Destructor