The Problem
Given a Users
table containing user information, including potentially invalid emails, the task is to write an SQL query to find users who have valid emails.
The Users
table is structured as follows:
Column Name | Type |
---|---|
user_id | int |
name | varchar |
varchar |
The user_id
is the primary key for this table. A valid email has a prefix and a domain with specific constraints. The prefix may contain upper or lower case letters, digits, underscore '_', period '.', and dash '-', but it must start with a letter. The domain must be '@leetcode.com'.
The goal is to return the users with valid emails in any order.
Here's a sample input for illustration:
user_id | name | |
---|---|---|
1 | Winston | winston@leetcode.com |
2 | Jonathan | jonathanisgreat |
3 | Annabelle | bella-@leetcode.com |
4 | Sally | sally.come@leetcode.com |
5 | Marwan | quarz#2020@leetcode.com |
6 | David | david69@gmail.com |
7 | Shapiro | .shapo@leetcode.com |
The expected output from the above Users
table would be:
user_id | name | |
---|---|---|
1 | Winston | winston@leetcode.com |
3 | Annabelle | bella-@leetcode.com |
4 | Sally | sally.come@leetcode.com |
The mail of user 2 does not have a domain, the mail of user 5 has the # sign which is not allowed, the mail of user 6 does not have the leetcode domain, and the mail of user 7 starts with a period.
The Solution
We'll explore six different SQL solutions, each with a unique approach to the problem. We'll be using the LIKE
, PATINDEX
, RIGHT
, LEFT
, and LEN
functions, as well as subqueries and WHERE clauses to implement our solutions.
Source Code 1
The first solution selects all users whose emails start with a letter and end with '@leetcode.com', and do not contain any disallowed characters. It checks the email format in the WHERE clause.
SELECT *
FROM Users
WHERE mail LIKE '[A-Za-z]%@leetcode.com'
AND PATINDEX('%[^A-Za-z0-9\-\_\.\-]%@leetcode.com', mail) = 0
This query finishes in 1675ms, beating 50% of submissions.
Source Code 2
The second solution uses a subquery to first filter emails that start with a letter and end with '@leetcode.com'. The main query then ensures that there are no disallowed characters.
SELECT *
FROM (
SELECT *
FROM Users
WHERE mail LIKE '[A-Za-z]%@leetcode.com'
) [u]
WHERE PATINDEX('%[^A-Za-z0-9\-\_\.\-]%@leetcode.com', mail) = 0
This query runs in 2006ms, beating 17.62% of submissions.
Source Code 3
The third solution also filters users with emails ending with '@leetcode.com' and starting with a letter. Instead of PATINDEX
, it uses LEFT
and LEN
to filter out disallowed characters.
SELECT *
FROM Users
WHERE RIGHT(mail, 13) = '@leetcode.com'
AND mail LIKE '[a-zA-Z]%'
AND LEFT(mail, LEN(mail)-13) NOT LIKE '%[^0-9a-zA-Z\_\.\-]%'
This query finishes in 2022ms, beating 16.66% of submissions.
Source Code 4
The fourth solution is similar to the first, but it explicitly selects the columns to return.
SELECT
user_id,
name,
mail
FROM Users
WHERE mail LIKE '[A-Za-z]%@leetcode.com'
AND PATINDEX('%[^A-Za-z0-9\-\_\.\-]%@leetcode.com', mail) = 0
This query finishes in 2212ms, beating 9.76% of submissions.
Source Code 5
The fifth solution, similar to the second, uses a subquery to first filter the emails, then applies the PATINDEX
in the main query. It explicitly selects the columns to return.
SELECT
user_id,
name,
mail
FROM (
SELECT
user_id,
name,
mail
FROM Users
WHERE mail LIKE '[A-Za-z]%@leetcode.com'
) [u]
WHERE PATINDEX('%[^A-Za-z0-9\-\_\.\-]%@leetcode.com', mail) = 0
This query finishes in 1524ms, beating 68.9% of submissions.
Source Code 6
The final solution is similar to the third, but it also explicitly selects the columns to return.
SELECT
user_id,
name,
mail
FROM Users
WHERE RIGHT(mail, 13) = '@leetcode.com'
AND mail LIKE '[a-zA-Z]%'
AND LEFT(mail, LEN(mail)-13) NOT LIKE '%[^0-9a-zA-Z\_\.\-]%'
This query finishes in 1851ms, beating 29.52% of submissions.
Conclusion
While all these solutions accomplish the same goal, their performance varies. It's evident that the use of subqueries (Source Code 2 and 5) can affect the query's efficiency, but in some cases, the performance difference is marginal and may not be a significant concern, especially when working with smaller datasets.
In this particular LeetCode problem, the best performing solution overall was Source Code 5, followed by Source Code 1, Source Code 6, Source Code 3, Source Code 2, and Source Code 4. However, it's crucial to remember that the efficiency of SQL operations can significantly differ based on the database's specifics, including its size, structure, indexing, and the database management system itself.
You can find the original problem at LeetCode.
For more insightful solutions and tech-related content, feel free to connect with me on my Beacons page.
Top comments (0)