I recently implemented a feature that required using the SQL CASE statement. Thus, I decided to write a post on the exciting capabilities of the SQL CASE statement. I would be using sample student result data which you can generate by running the SQL in this gist. Note that PostgreSQL is RDBMS used in this post but the CASE statement is similar in other RDBMS
What is the CASE statement
The CASE statement is the SQL way of creating conditional statements(if-else-then) when querying for column values. Here is the general structure of a CASE statement
CASE
WHEN {condition1} THEN {value1}
WHEN {condition2} THEN {value2}
.
.
.
WHEN {conditionN} THEN {valueN}
END
The CASE statement comes in handy when:
- Categorizing data
- Filtering data
- Aggregating data
- Updating data
I would explain each use case in subsequent sections using the student results data
Categorising data
Let's say we have a table results
that contains student results in several courses as follows
student_id | subject | score |
---|---|---|
1 | Arts | 49 |
1 | Biology | 0 |
1 | Chemistry | 0 |
1 | Computer Science | 74 |
1 | Economics | 73 |
1 | English Language | 63 |
1 | History | 72 |
1 | Literature | 0 |
1 | Mathematics | 0 |
1 | Physical Education | 88 |
1 | Physics | 71 |
1 | Statistics | 14 |
2 | Arts | 76 |
2 | Biology | 78 |
2 | Chemistry | 73 |
2 | Commerce | 76 |
2 | Computer Science | 77 |
2 | Economics | 81 |
2 | English Language | 0 |
2 | History | 71 |
2 | Literature | 73 |
2 | Mathematics | 66 |
2 | Physical Education | 70 |
2 | Physics | 73 |
2 | Statistics | 91 |
4 | Arts | 70 |
4 | Biology | 77 |
4 | Chemistry | 71 |
4 | Commerce | 66 |
4 | Computer Science | 52 |
4 | Economics | 70 |
4 | English Language | 0 |
4 | History | 78 |
4 | Literature | 77 |
4 | Mathematics | 71 |
4 | Physical Education | 70 |
4 | Physics | 80 |
. | . | . |
. | . | . |
. | . | . |
. | . | . |
If the pass mark for each course is 60 then we can use the CASE statement to categorize this data by displaying "Passed" when the student passes and "Failed" when the student fails. Here is how it looks like:
SELECT subject, student_id, score,
CASE
WHEN score >= 60 THEN 'Passed'
ELSE 'Failed'
END AS status
FROM results
ORDER BY subject, score, status;
This would produce the following results:
subject | student_id | score | status |
---|---|---|---|
Arts | 12 | 45 | Failed |
Arts | 21 | 47 | Failed |
Arts | 1 | 49 | Failed |
Arts | 20 | 50 | Failed |
Arts | 23 | 67 | Passed |
Arts | 16 | 67 | Passed |
Arts | 6 | 69 | Passed |
Arts | 17 | 70 | Passed |
Arts | 11 | 70 | Passed |
Arts | 4 | 70 | Passed |
Arts | 22 | 70 | Passed |
Arts | 19 | 71 | Passed |
Arts | 15 | 71 | Passed |
Arts | 8 | 71 | Passed |
Arts | 5 | 72 | Passed |
Arts | 7 | 75 | Passed |
Arts | 2 | 76 | Passed |
Arts | 18 | 76 | Passed |
Arts | 13 | 77 | Passed |
Arts | 10 | 79 | Passed |
Arts | 14 | 80 | Passed |
Arts | 9 | 80 | Passed |
Arts | 24 | 83 | Passed |
Biology | 23 | 0 | Failed |
Biology | 21 | 0 | Failed |
Biology | 12 | 0 | Failed |
Biology | 1 | 0 | Failed |
Biology | 15 | 0 | Failed |
Biology | 20 | 55 | Failed |
Biology | 22 | 56 | Failed |
Biology | 11 | 57 | Failed |
Biology | 6 | 63 | Passed |
Biology | 5 | 64 | Passed |
Biology | 8 | 65 | Passed |
Biology | 13 | 65 | Passed |
Biology | 10 | 66 | Passed |
Biology | 16 | 70 | Passed |
Biology | 19 | 71 | Passed |
Biology | 18 | 73 | Passed |
Biology | 14 | 74 | Passed |
Biology | 17 | 77 | Passed |
Biology | 4 | 77 | Passed |
Biology | 2 | 78 | Passed |
Biology | 7 | 81 | Passed |
Biology | 9 | 84 | Passed |
Biology | 24 | 85 | Passed |
Chemistry | 21 | 0 | Failed |
Chemistry | 1 | 0 | Failed |
Chemistry | 23 | 36 | Failed |
Chemistry | 20 | 40 | Failed |
Chemistry | 12 | 40 | Failed |
Chemistry | 22 | 41 | Failed |
Chemistry | 15 | 45 | Failed |
Chemistry | 10 | 51 | Failed |
Chemistry | 11 | 56 | Failed |
Chemistry | 17 | 60 | Passed |
Chemistry | 5 | 61 | Passed |
Chemistry | 8 | 61 | Passed |
Chemistry | 13 | 62 | Passed |
Chemistry | 6 | 63 | Passed |
Chemistry | 16 | 67 | Passed |
Chemistry | 14 | 70 | Passed |
Chemistry | 4 | 71 | Passed |
Chemistry | 18 | 71 | Passed |
Chemistry | 2 | 73 | Passed |
Chemistry | 7 | 74 | Passed |
Chemistry | 19 | 75 | Passed |
Chemistry | 9 | 81 | Passed |
Chemistry | 24 | 82 | Passed |
Commerce | 12 | 44 | Failed |
Commerce | 22 | 55 | Failed |
Commerce | 15 | 60 | Passed |
Commerce | 11 | 61 | Passed |
Commerce | 21 | 62 | Passed |
. | . | . | . |
. | . | . | . |
. | . | . | . |
Now we can see which student passed or failed each subject.
The CASE statement does not affect our other parts of our select statement above(although it can be used in other parts), thus we can decide to view the people that have passed/failed Statistics as follows:
SELECT student_id, subject, score,
CASE
WHEN score >= 60 THEN 'Passed'
ELSE 'Failed'
END AS status
FROM results
WHERE subject = 'Statistics'
ORDER BY score
;
We get the following results
student_id | subject | score | status |
---|---|---|---|
1 | Statistics | 14 | Failed |
21 | Statistics | 48 | Failed |
22 | Statistics | 49 | Failed |
12 | Statistics | 52 | Failed |
6 | Statistics | 53 | Failed |
20 | Statistics | 54 | Failed |
15 | Statistics | 57 | Failed |
23 | Statistics | 57 | Failed |
13 | Statistics | 58 | Failed |
8 | Statistics | 60 | Passed |
14 | Statistics | 61 | Passed |
11 | Statistics | 62 | Passed |
18 | Statistics | 62 | Passed |
10 | Statistics | 65 | Passed |
5 | Statistics | 65 | Passed |
16 | Statistics | 66 | Passed |
4 | Statistics | 71 | Passed |
17 | Statistics | 76 | Passed |
7 | Statistics | 80 | Passed |
19 | Statistics | 80 | Passed |
9 | Statistics | 85 | Passed |
24 | Statistics | 88 | Passed |
2 | Statistics | 91 | Passed |
Now it is easy to see that there are quite a number of students that failed Statistics
.
Adding more conditions to the CASE statement
Let's say we want to retrieve the grades(A, B, C,D, F) for each student in the Statistics
course such that
an A is 90-100, a B is 80-89, a C is 70-79, a D is 60-69 and an F is <60
. We can achieve that by using AND
operator in the conditional
part of our CASE statement as follows:
SELECT student_id, subject, score,
CASE
WHEN score >= 70 AND score <= 79 THEN 'C'
WHEN score >= 60 AND score <= 69 THEN 'D'
WHEN score >= 90 THEN 'A'
WHEN score >= 80 AND score <= 89 THEN 'B'
ELSE 'F'
END AS grade
FROM results
WHERE subject = 'Statistics'
ORDER BY score DESC
;
;
student_id | subject | score | grade |
---|---|---|---|
2 | Statistics | 91 | A |
24 | Statistics | 88 | B |
9 | Statistics | 85 | B |
7 | Statistics | 80 | B |
19 | Statistics | 80 | B |
17 | Statistics | 76 | C |
4 | Statistics | 71 | C |
16 | Statistics | 66 | D |
10 | Statistics | 65 | D |
5 | Statistics | 65 | D |
11 | Statistics | 62 | D |
18 | Statistics | 62 | D |
14 | Statistics | 61 | D |
8 | Statistics | 60 | D |
13 | Statistics | 58 | F |
15 | Statistics | 57 | F |
23 | Statistics | 57 | F |
20 | Statistics | 54 | F |
6 | Statistics | 53 | F |
12 | Statistics | 52 | F |
22 | Statistics | 49 | F |
21 | Statistics | 48 | F |
1 | Statistics | 14 | F |
You could also use the OR
operator within the CASE
statement. Also, note that the CASE statement checks the conditions in the order in which they were created thus the following CASE
statement does the same thing as before:
CASE
WHEN score >= 90 THEN 'A'
WHEN score >= 80 THEN 'B'
WHEN score >= 70 THEN 'C'
WHEN score >= 60 THEN 'D'
ELSE 'F'
END AS grade
Aggregating data
COUNT
Let us say we want to retrieve the number of students that passed and the number that failed each subject. We can do that using the following query:
SELECT subject,
COUNT(CASE
WHEN score >= 60 THEN 'Passed'
END ) as number_of_success,
COUNT(CASE
WHEN score < 60 THEN 'Failed'
END ) as number_of_failure
FROM results
GROUP BY subject
ORDER BY subject;
subject | number_of_success | number_of_failure |
---|---|---|
Arts | 19 | 4 |
Biology | 15 | 8 |
Chemistry | 14 | 9 |
Commerce | 20 | 2 |
Computer Science | 19 | 4 |
Economics | 22 | 1 |
English Language | 16 | 6 |
History | 21 | 2 |
Literature | 21 | 2 |
Mathematics | 21 | 2 |
Physical Education | 23 | 0 |
Physics | 23 | 0 |
Statistics | 14 | 9 |
SUM
We can also get a similar result using the SUM function. This time we would be returning numbers (in this case 1 when the student passes and 0 otherwise) to the SUM function so that it can add all the numbers together.
SELECT subject,
SUM(
CASE
WHEN score >= 60 THEN 1
ELSE 0
END ) as number_of_success,
SUM(
CASE
WHEN score < 60 THEN 1
ELSE 0
END ) as number_of_failure
FROM results
GROUP BY subject
ORDER BY subject;
subject | number_of_success | number_of_failure |
---|---|---|
Arts | 19 | 4 |
Biology | 15 | 8 |
Chemistry | 14 | 9 |
Commerce | 20 | 2 |
Computer Science | 19 | 4 |
Economics | 22 | 1 |
English Language | 16 | 6 |
History | 21 | 2 |
Literature | 21 | 2 |
Mathematics | 21 | 2 |
Physical Education | 23 | 0 |
Physics | 23 | 0 |
Statistics | 14 | 9 |
Gives exactly the same result as before.
AVG
The CASE statement can be used along with the AVG function to get the percentage of a set of values. For example, we can use the following query to get what percentage of students passed and failed each subject:
SELECT subject,
AVG(
CASE
WHEN score >= 60 THEN 1
ELSE 0
END )* 100 AS percentage_passed,
AVG(
CASE
WHEN score < 60 THEN 1
ELSE 0
END )* 100 AS percentage_failed
FROM results
GROUP BY subject;
subject | percentage_passed | percentage_failed |
---|---|---|
Arts | 82.60869565217391304300 | 17.39130434782608695700 |
Biology | 65.21739130434782608700 | 34.78260869565217391300 |
Chemistry | 60.86956521739130434800 | 39.13043478260869565200 |
Commerce | 90.90909090909090909100 | 9.09090909090909090900 |
Computer Science | 82.60869565217391304300 | 17.39130434782608695700 |
Economics | 95.65217391304347826100 | 4.34782608695652173900 |
English Language | 72.72727272727272727300 | 27.27272727272727272700 |
History | 91.30434782608695652200 | 8.69565217391304347800 |
Literature | 91.30434782608695652200 | 8.69565217391304347800 |
Mathematics | 91.30434782608695652200 | 8.69565217391304347800 |
Physical Education | 100.00000000000000000000 | 0.00000000000000000000 |
Physics | 100.00000000000000000000 | 0.00000000000000000000 |
Statistics | 60.86956521739130434800 | 39.13043478260869565200 |
To make the result more readable let us round to 2 d.p:
SELECT subject,
ROUND(AVG(
CASE
WHEN score >= 60 THEN 1
ELSE 0
END )* 100 , 2) AS percentage_passed,
ROUND(AVG(
CASE
WHEN score < 60 THEN 1
ELSE 0
END )* 100 , 2) AS percentage_failed
FROM results
GROUP BY subject
ORDER BY subject;
subject | percentage_passed | percentage_failed |
---|---|---|
Arts | 82.61 | 17.39 |
Biology | 65.22 | 34.78 |
Chemistry | 60.87 | 39.13 |
Commerce | 90.91 | 9.09 |
Computer Science | 82.61 | 17.39 |
Economics | 95.65 | 4.35 |
English Language | 72.73 | 27.27 |
History | 91.30 | 8.70 |
Literature | 91.30 | 8.70 |
Mathematics | 91.30 | 8.70 |
Physical Education | 100.00 | 0.00 |
Physics | 100.00 | 0.00 |
Statistics | 60.87 | 39.13 |
Any other aggregation function
You can use any other aggregation function with the CASE statement to get any result that requires some form of conditional operation.
Updating Data
Let's take a look at the students that passed/failed 'Chemistry' once again.
SELECT student_id, subject, score,
CASE
WHEN score >= 60 THEN 'Passed'
ELSE 'Failed'
END AS status
FROM results
WHERE subject = 'Chemistry'
ORDER BY score
;
student_id | subject | score | status |
---|---|---|---|
1 | Chemistry | 0 | Failed |
21 | Chemistry | 0 | Failed |
23 | Chemistry | 36 | Failed |
12 | Chemistry | 40 | Failed |
20 | Chemistry | 40 | Failed |
22 | Chemistry | 41 | Failed |
15 | Chemistry | 45 | Failed |
10 | Chemistry | 51 | Failed |
11 | Chemistry | 56 | Failed |
17 | Chemistry | 60 | Passed |
8 | Chemistry | 61 | Passed |
5 | Chemistry | 61 | Passed |
13 | Chemistry | 62 | Passed |
6 | Chemistry | 63 | Passed |
16 | Chemistry | 67 | Passed |
14 | Chemistry | 70 | Passed |
18 | Chemistry | 71 | Passed |
4 | Chemistry | 71 | Passed |
2 | Chemistry | 73 | Passed |
7 | Chemistry | 74 | Passed |
19 | Chemistry | 75 | Passed |
9 | Chemistry | 81 | Passed |
24 | Chemistry | 82 | Passed |
If the students with ids 12, 10, 11 and 15 re-took the course and had the following scores
student_id | score |
---|---|
12 | 70 |
10 | 80 |
11 | 85 |
15 | 90 |
Now we have to update the Chemistry table with this new result. One way we can do this is to use multiple UPDATE statements like so:
UPDATE results
SET score = 70
WHERE student_id = 12 AND subject = 'Chemistry';
UPDATE results
SET score = 80
WHERE student_id = 10 AND subject = 'Chemistry';
.
.
.
But this can be quite inefficient and might be a little difficult to maintain. You can perform multiple updates with different values using the CASE statement to specify a value for each student_id and an IN clause to select all the student_ids you want like so:
UPDATE results
SET score =
CASE
WHEN student_id = 12 THEN 70
WHEN student_id = 10 THEN 80
WHEN student_id = 11 THEN 85
WHEN student_id = 15 THEN 90
END
WHERE student_id IN (12, 10, 11, 15) AND subject = 'Chemistry';
Now when we retrieve the data for 'Chemistry'
student_id | subject | score | status |
---|---|---|---|
1 | Chemistry | 0 | Failed |
21 | Chemistry | 0 | Failed |
23 | Chemistry | 36 | Failed |
20 | Chemistry | 40 | Failed |
22 | Chemistry | 41 | Failed |
17 | Chemistry | 60 | Passed |
8 | Chemistry | 61 | Passed |
5 | Chemistry | 61 | Passed |
13 | Chemistry | 62 | Passed |
6 | Chemistry | 63 | Passed |
16 | Chemistry | 67 | Passed |
14 | Chemistry | 70 | Passed |
12 | Chemistry | 70 | Passed |
4 | Chemistry | 71 | Passed |
18 | Chemistry | 71 | Passed |
2 | Chemistry | 73 | Passed |
7 | Chemistry | 74 | Passed |
19 | Chemistry | 75 | Passed |
10 | Chemistry | 80 | Passed |
9 | Chemistry | 81 | Passed |
24 | Chemistry | 82 | Passed |
11 | Chemistry | 85 | Passed |
15 | Chemistry | 90 | Passed |
We can see that the students we specified were updated successfully.
Note that while doing an update and the data type returned in the CASE statement does not match the data type of the column, we would need to do an explicit cast. For example, if we had returned strings in our case statement previously
UPDATE results
SET score =
CASE
WHEN student_id = 12 THEN '70'
WHEN student_id = 10 THEN '80'
WHEN student_id = 11 THEN '85'
WHEN student_id = 15 THEN '90'
END
WHERE student_id IN (12, 10, 11, 15) AND subject = 'Chemistry';
Then Postgresql would have thrown the following error
ERROR: syntax error at or near "score"
LINE 2: SET SET score =
To solve this we just cast the values in the case to INT like so:
UPDATE results
SET score =
CASE
WHEN student_id = 12 THEN '70'
WHEN student_id = 10 THEN '80'
WHEN student_id = 11 THEN '85'
WHEN student_id = 15 THEN '90'
END::int
WHERE student_id IN (12, 10, 11, 15) AND subject = 'Chemistry';
Works just like before!
Conclusion
So we have seen that the CASE statement can come in handy for categorizing, aggregating and updating data. So in case you ever want to perform a conditional statement in SQL, always remember to use the CASE statement
Top comments (3)
edited to have syntax highlighting eg
from
to
Wonderful write up this CASE statement is immediately applicable in my current project nice work
Thanks