Normalization

Suvam Banerjee
4 min readNov 23, 2020

It is a technique to remove and reduce redundancy from a table in the database.

The problem we face that can be resolved by normalization are:

  1. Insertion anomaly: The inability to add data into the database because other data are absent. Let's take an example of university table where it has student id as primary key and that university has just introduced new course (MBBS) where no student has registered, therefore we can’t add that course into the database as student id is absent.
  2. Deletion anomaly: When we want to delete a particular record but because of that deletion other vital information is also deleted.Example- If we delete first student information ie: ram then c++ course and faculty information are also deleted, which occurs only once in that table.
  3. Update anomaly: It occurs when updating is done base on a value which is present in multiple rows. Example: if we want to update the salary of faculty Jeremy then two updates will occur, as above one is a small table but some table has lots of data in it, this will effect performance.

First Normal Form

A table shouldn't contain a multi-valued attribute. Every cell should contain only one value.

We can make a table first normal form in the following way

a)Making multiple rows if values are repeated in the same cell, but here the single-column primary key is not possible.

b)Making multiple columns for repeated values. Here if a student is enrolled in a single course and another student is enrolled in multiple courses then like this, most of that table will be null or empty.

c) We can make a combination of the base table and reference table with foreign key .

BASE TABLE

REFERENCE TABLE

2)Second Normal Form

a)Table should satisfy the first normal form first.

b)It should not have any partial dependency.

When there is one primary key and other columns depend on that primary key for uniquely identify, then those columns are dependent on that primary key. When a table has a composite primary key ie: the primary key is a combination of two keys or more attributes, and other column depends on both the keys to uniquely identify and among them one column has a dependency on only one of that primary key then it is known as partial dependency. In the below example to identify a mark, we require student_Id and Subject_Id but the teacher is dependent only on Subject_Id, so a partial dependency is present which is not accepted in second normal form.

We can get rid of the above problem and make it second normal dependency, by making a separate subject table and make subject_id as primary key and teacher name nad subject name as dependency on subject_id or create a separate table for Teacher with teacher id and place that teacher_id in the score table.

3)Third Normal Form:

a)Table should satisfy the second normal form first.

b)It shouldn't have Transitive Dependency which means, an attribute in a table that depends on non-prime Attribute, non-prime attribute are those who do not participate for candidate key determination.

In the above example, STUDENT_ID and SUBJECT_ID is a primary key, but total marks of exam depend on EXAM_NAME which is Transitive Dependency.

4)Boyce-Codd Normal Form (BCNF)

a) Table should satisfy the third normal form first.

b)Prime attribute should not be derived from a non-prime attribute.

--

--