Note: This is a republication of a post that I wrote almost 10 years ago, over on my other blog. I was on vacation last week and I am at That Conference this week, so I didn't have the time to write something new. Database normalization isn't really an evolving topic, though. Enjoy! :)
Database normalization is a process meant to reduce or eliminate data redundancy. Typically, a normalized database exists in first normal form (1NF), second normal form (2NF), or third normal form (3NF).
First Normal Form
In order for a database to be in first normal form, your tables must not contain repeating groups of data. Let’s look at a simple example. In the table below, we are tracking basic contact information. We have an ID, a first name, a last name, and the contact’s phone numbers.
So, what wrong with this table? Users can have more than one phone number, so we are storing multiple values within the “PhoneNumbers” column. This means that our DBMS is storing multiple values as a single value and we have to manage the phone number values manually. How can we better this?
In the above example, we placed phone numbers into their own table. We connected the contacts to their phone numbers using the “ContactID” column. Notice that there are no duplicate rows and every row/column intersection contains exactly one value.
Second Normal Form
In order for a database to be in second normal form, it first must satisfy the conditions for first normal form. In addition to that, all of it’s non-key attributes must be fully dependent upon its primary key. Let’s look at a table that tracks the things our contacts are experts in.
You will notice in the above table that contacts #2 and #3 are experts in two things. In the case of contact #2, he is an expert in PHP and Digital Design. You will also notice that we keep repeating the facility where the contact is. This was probably placed there to make it easy to find a ColdFusion expert at the Downtown facility. This example table can cause problems because, if contact #2 relocates to the Downtown facility, there is a chance that only one record will be modified, leave contradicting information. So, how do we fix it?
In this new example, the facility is now part of the “Contacts” table. You can still find your ColdFusion expert at the Downtown office but you can avoid update anomalies.
Third Normal Form
In order for a database to be in second normal form, it first must satisfy the conditions for first normal form. In addition to that, the columns should depend solely upon the primary key of the table. What does this mean? Let’s take a look at our “Contact Expertise” table.
Let’s say that we want to see a list of possible skills somebody could be an expert in. This could be so that we can indicate which skill a new contact is an expert in. Right now, we can query “Contact Expertise” for unique values in the “Expertise” column but if we don’t already have an expert in that skill, it won’t be available as an option. We can solve that with a “Skills” table.
Notice that you can now query for possible skills that a contact can specialize in. Additionally, all values are dependent upon the primary key of the table.
How to “cheat” at database normalization
A recent post on the Perl blog had a great set of rules to make sure your database is adequately normalized.
Every “noun” gets its own table.
“many to many” relationship get their own tables
“one to many” relationship require the table that “owns” another table have its ID in that other table
Any time a table has an ID that refers to a row in another table, use a foreign key constraint to make sure that ID really exists
Remember, your database is the foundation of your application. If you build it wrong, your application will struggle to succeed.
Top comments (0)