This is the third post in a series about database design and normalization. In our last post, we discussed the first normal form and took an initial step toward proper database design. We learned that we should have a primary key, atomic values, and no repeating groups. We also learned how to separate our entities to avoid redundancy.
Now we can go a step further and make sure we abide by the second normal form.
Second Normal Form
A table is in 2NF if it is in 1NF, and there are no partial dependencies. This means that non-prime columns (columns that are not part of the key, or possible keys) depend entirely on the primary key.
In a table that has a single primary key column, this will always hold true, so we don't need to stress about it. If we have a composite key, though, we could break this rule.
For example, let's add a teacher column to our student_classes
table from the last part. Let's say that Ms. Smith teaches math, Ms. Jackson teaches programming, and Ms. James teaches history.
student_id | class_id | teacher |
---|---|---|
1 | 1 | Smith |
1 | 2 | Jackson |
2 | 3 | James |
The teacher column is determined only by the class_id, not by the entire primary identifier of (student_id, class_id).
Problem
Before we discuss a solution, let's look at the problems this could cause. Consider if Prince also took programming:
student_id | class_id | teacher |
---|---|---|
1 | 1 | Smith |
1 | 2 | Jackson |
2 | 3 | James |
2 | 2 | Jackson |
Notice any familiar problems? We have data redundancy issues. Twice we see that Ms. Jackson teaches programming, which also indicates we could be at risk of an update anomaly if I only changed one of those rows.
Solution
Since the teacher column is dependent only on the class_id column, we should group those entities together.
Let's put the teacher column in the classes table:
id | class_name | teacher |
---|---|---|
1 | Math | Smith |
2 | Programming | Jackson |
3 | History | James |
We've got a lot to keep track of in our three tables so far, so we'll just look at a high level diagram. Shout out to dbdiagram.io for making a helpful tool:
The 2NF was a quick one to explain, but now we've got a pretty well designed database. Let's recap everything up to the 2NF:
- Each table has a primary identifier.
- Each column only has atomic values.
- No tables have repeating groups.
- There are no partial dependencies.
We're in a good spot now that we know how to ensure these four things. In the next post, we'll continue on this normalization adventure to comply with the third normal form.
Top comments (10)
Second lol
I just finish part two of this series, and can't wait for more xD
These techniques have been around like 50 years and still apply. It shows how brilliant the relational model is.
Absolutely, and I think normalization makes database design fun! It starts to feel more like a puzzle, and figuring out how to fit the pieces together can be really fascinating! Especially as you get further along the normal forms.
This series is really helpful and easy to digest. Kudos for great work! Keep it up!
Thank you so much!
Good Job Adam! My 3month database design college class broken down into 5mins read per post... awesome.
nice piece again....
Can’t wait for more! Loving your database design explanations. Berry clear and concise 👌🏽
Thank you! Next post coming tomorrow. :)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.