DEV Community

Cover image for Database Normalization Explained
Lorraine for Next Tech

Posted on • Edited on • Originally published at next.tech

Database Normalization Explained

Normalization is a technique for organizing data in a database. It is important that a database is normalized to minimize redundancy (duplicate data) and to ensure only related data is stored in each table. It also prevents any issues stemming from database modifications such as insertions, deletions, and updates.

The stages of organization are called normal forms. In this tutorial we will be redesigning a database for a construction company and ensuring that it satisfies the three normal forms:

First Normal Form (1NF):

  • Data is stored in tables with rows uniquely identified by a primary key
  • Data within each table is stored in individual columns in its most reduced form
  • There are no repeating groups

Second Normal Form (2NF):

  • Everything from 1NF
  • Only data that relates to a table’s primary key is stored in each table

Third Normal Form (3NF):

  • Everything from 2NF
  • There are no in-table dependencies between the columns in each table

Note that there are actually six levels of normalization; however, the third normal form is considered the highest level necessary for most applications so we will only be discussing the first three forms.

Let's get started!

This tutorial is adapted from Next Tech's Database Fundamentals course which comes with an in-browser MySQL database and interactive tasks and projects to complete. You can get started here for free!


Our Database: Codey's Construction

schemawCustomers.png

Codey's Construction's database schema with a new table that causes the database to violate the rules of normalization.

The database we will be working with in this tutorial is for Codey's Construction company (Codey is a helpful coding bot that works with you in the course mentioned earlier). As you can see from the schema above, the database contains the tables projects, job_orders, employees, project_employees. Recently, they have decided to add the table customers to store customer data.

Unfortunately, this table has not designed in a way that satisfies the three forms of normalization... Let's fix that!

First Normal Form

First normal form relates to the duplication and over-grouping of data in tables and columns.

Codey’s Construction's table customers violates all three rules of 1NF.

  1. There is no primary key! A user of the database would be forced to look up companies by their name, which is not guaranteed to be unique (since unique company names are registered on a state-by-state basis).
  2. The data is not in its most reduced form. The column contact_person_and_role can be further divided into two columns, such as contact_person and contact_role.
  3. There are two repeating groups of columns - (project1_id, project1_feedback) and (project2_id, project2_feedback).

The following SQL statement was used to create the customers table:

CREATE TABLE customers (
    name                          VARCHAR(255),
    industry                      VARCHAR(255),
    project1_id                   INT(6),
    project1_feedback             TEXT,
    project2_id                   INT(6),
    project2_feedback             TEXT,
    contact_person_id             INT(6),
    contact_person_and_role       VARCHAR(300),
    phone_number                  VARCHAR(12),
    address                       VARCHAR(255),
    city                          VARCHAR(255),
    zip                           VARCHAR(5)
  );
Enter fullscreen mode Exit fullscreen mode

Screen Shot 2019-06-27 at 10.59.36 am.png

Example data for `customers` table.

By modifying some columns, we can help redesign this table so that it satisfies 1NF.

First, we need to add a primary key column called id with data type INT(6):

ALTER TABLE customers
    ADD COLUMN id INT(6) AUTO_INCREMENT PRIMARY KEY FIRST;
Enter fullscreen mode Exit fullscreen mode

With this statement, we added an automatically incrementing primary key as the first column in the table.

To satisfy the second condition, we need to split the contact_person_and_role column:

ALTER TABLE customers
    CHANGE COLUMN contact_person_and_role contact_person VARCHAR(300);

ALTER TABLE customers
    ADD COLUMN contact_person_role VARCHAR(300) AFTER contact_person;
Enter fullscreen mode Exit fullscreen mode

Here, we simply renamed it as contact_person, and added a column contact_person_role immediately after it.

To satisfy the third condition, we need to move the columns containing project IDs and project feedback to a new table called project_feedbacks. First, let's drop these columns from the customers table:

ALTER TABLE customers
    DROP COLUMN project1_id,
    DROP COLUMN project1_feedback,
    DROP COLUMN project2_id,
    DROP COLUMN project2_feedback;
Enter fullscreen mode Exit fullscreen mode

And then create the project_feedbacks table:

CREATE TABLE project_feedbacks (
    id                  INT(6) AUTO_INCREMENT PRIMARY KEY,
    project_id          INT(6),
    customer_id         INT(6),
    project_feedback    TEXT
);
Enter fullscreen mode Exit fullscreen mode

Here's what the database schema looks like now:
schemawCustomers1NF.png

Modified schema that now satisfies 1NF.

As you can see, there are no more repeating groups in either the project_feedbacks table or the customers table. We still know which customer said what since project_feedbacks.customer_id refers back to the customers table.

Now our customers table satisfies 1NF! Let's move on to second normal form.

Second Normal Form

To achieve second normal form, a database must first satisfy all the conditions for 1NF. After this, satisfying 2NF requires that all data in each table relates directly to the record that the primary key of the table identifies.

We are in violation of 2NF because the contact_person, contact_person_role and phone_number columns track data that relate to the contact person, not the customer. If the contact person for a customer changes, we would have to edit all of these columns, running the risk that we will change the values in one of the columns but forget to change another.

To help Codey's Construction fix this table to satisfy 2NF, these columns should be moved to a table containing data on the contact person. First, let's remove the columns in customers that are not related to our primary key:

ALTER TABLE customers
    DROP COLUMN contact_person,
    DROP COLUMN contact_person_role,
    DROP COLUMN phone_number;
Enter fullscreen mode Exit fullscreen mode

Note that we kept the contact_person_id so we still know who to contact. Now, let's create our new table contact_persons so we have somewhere to store data about each contact.

CREATE TABLE contact_persons (
  id            INT(6) PRIMARY KEY,
  name          VARCHAR(300),
  role          VARCHAR(300),
  phone_number  VARCHAR(15)
);
Enter fullscreen mode Exit fullscreen mode

Codey's Construction's database schema now looks like this:
schemawCustomers2NF.png

Modified schema that now satisfies 2NF.

Now, if the contact person for a customer changes, the construction company just has to insert a record into the contact_persons table and change the contact_person_id in the customers table.

Third Normal Form

For a database to be in third normal form, it must first satisfy all the criteria for 2NF (and therefore, also 1NF).

Then, each column must be non-transitively dependent on the table’s primary key. This means that all columns in a table should rely on the primary key and no other column. If column_a relies on the primary key and also on column_b then column_a is transitively dependent on the primary key so the table does not satisfy 3NF.

Does your brain hurt from reading that? Don't worry! It's explained more below.

This is how the customers table looks after we have satisfied 1NF and 2NF:
Screen Shot 2019-06-27 at 11.03.13 am.png

Example data for modified `customers` table.

The table currently has transitively dependent columns. The transitively dependent relationship is between city and zip. The city in which a customer is located relies on the customer, so this satisfies 2NF; however, the city also depends on the zip code. If a customer relocates, there may be a chance we update one column but not the other. Because this relationship exists, the database is not in 3NF.

To fix our database to satisfy 3NF, we need to drop the city column from customers, and create a new table zips to store this data:

ALTER TABLE customers
    DROP COLUMN city;

CREATE TABLE zips (
  zip   VARCHAR(5) PRIMARY KEY, 
  city  VARCHAR(255)
);
Enter fullscreen mode Exit fullscreen mode

schemawCustomers3NF.png

Modified schema that now satisfies 3NF.

That's it! Finding issues that violate 3NF can be difficult, but it's worth it to ensure that your database is resilient to errors caused by only partially updating data.


I hope you enjoyed this tutorial on database normalization! Codey's Construction's database now satisfies the three forms of normalization.

If you'd like to continue learning about databases, Next Tech's Database Fundamentals course covers all you need to know to get started with databases and SQL. By helping an interactive coding bot named Codey, you will learn how to create and design databases, modify data, and how to write SQL queries to answer business problems. You can get started for free here!

Top comments (4)

Collapse
 
anwar_nairi profile image
Anwar

I whish we teach this in database class, everything makes perfect sense and is clean with NF! My former boss taught me 3NF principles, I did not know it was formalized in this form, very instructive. I am gonna seek for the other forms 🤓

Collapse
 
lorrli274 profile image
Lorraine

Glad you liked the post!

Collapse
 
thebrotherfromasouthernmother profile image
Christian Lowe

Thanks for writing this article, this really helped me wrap my head around normalization.

Collapse
 
lorrli274 profile image
Lorraine

No problem! :)