Table of Contents
- Introduction to Normalization
- First Normal Form (1NF)
- Second Normal Form (2NF)
- Third Normal Form (3NF)
- Boyce-Codd Normal Form (BCNF)
- Fourth Normal Form (4NF)
- Fifth Normal Form (5NF)
- Denormalization: When and Why to Use It
- Summary & Best Practices
Introduction to Normalization
Normalization is the process of organizing data to minimize redundancy and improve integrity. It involves splitting tables and defining relationships.
Key Goals:
- Eliminate duplicate data.
- Ensure data dependencies make sense.
- Optimize storage and maintainability.
Levels of Normalization:
1NF → 2NF → 3NF → BCNF → 4NF → 5NF
1. First Normal Form (1NF)
Every table column must contain atomic (single) values with no nested lists, arrays, or repeating groups. This ensures strong data independence and lays the groundwork for higher normal forms.
Rules:
- All columns contain atomic (indivisible) values.
- No repeating groups.
Example: Before 1NF
OrderID | Products |
---|---|
101 | Laptop, Mouse, Keyboard |
After 1NF
OrderID | Product |
---|---|
101 | Laptop |
101 | Mouse |
101 | Keyboard |
2. Second Normal Form (2NF)
A table is in 2NF if it is already in 1NF and all non-key attributes fully depend on the entire primary key, not just part of it. In this way, partial dependencies are prevented and redundancy reduced.
Rules:
- Must be in 1NF.
- No partial dependencies (all non-key columns depend on the full primary key).
Example: Before 2NF
OrderID (PK) | ProductID (PK) | ProductName |
---|---|---|
101 | P1 | Laptop |
After 2NF
Orders Table:
| OrderID (PK) |
Products Table:
| ProductID (PK) | ProductName |
OrderDetails Table:
| OrderID (PK, FK) | ProductID (PK, FK) |
3. Third Normal Form (3NF)
A relation is in 3NF if it is in 2NF and no non-prime attribute depends transitively on a candidate key. Essentially, each non-key attribute must directly depend on the key, the whole key, and nothing but the key.
This eliminates transitive dependencies and further enforces data integrity.
Rules:
- Must be in 2NF.
- No transitive dependencies (non-key columns depend only on the primary key).
Example: Before 3NF
| StudentID | Department | DepartmentHead |
After 3NF
Students Table:
| StudentID | Department |
Departments Table:
| Department | DepartmentHead |
3b. Boyce-Codd Normal Form (BCNF)
This is a refinement of 3NF where every determinant must be a candidate key. Used to resolve corner cases where 3NF still has undesirable dependencies.
Rules:
- Must be in 3NF.
- Every determinant must be a superkey.
Example: Before BCNF
| StudentID | Course | Professor |
After BCNF
StudentCourses Table:
| StudentID | Course |
ProfessorCourses Table:
| Professor | Course |
4. Fourth Normal Form (4NF)
A table is in 4NF if it is already in Boyce–Codd Normal Form (BCNF) and no non-trivial multivalued dependencies exist besides those originating from a superkey.
Having a table in the fourth normal form ensures that multiple independent relationships don’t cause data duplication across rows.
Rules:
- Must be in BCNF.
- No multi-valued dependencies.
Example: Before 4NF
| EmployeeID | Skill | Language |
After 4NF
EmployeeSkills Table:
| EmployeeID | Skill |
EmployeeLanguages Table:
| EmployeeID | Language |
5. Fifth Normal Form (5NF)
Also called Project–Join Normal Form, 5NF ensures every join dependency in the table is a consequence of candidate keys.
This form addresses complex join constraints and ensures data is irreducible and free of redundancy due to joint relationships.
Rules:
- Must be in 4NF.
- No join dependencies.
Example: Before 5NF
| Supplier | Part | Project |
After 5NF
SupplierParts Table:
| Supplier | Part |
SupplierProjects Table:
| Supplier | Project |
PartProjects Table:
| Part | Project |
Denormalization: When and Why to Use It
Denormalization refers to intentionally adding redundancy to improve read performance.
When to Use:
- Read-heavy workloads (e.g., analytics).
- Reducing complex joins.
- Real-time applications.
Example 1: E-Commerce Order History
Denormalized Table:
| OrderID | CustomerName | ProductName | TotalPrice |
Example 2: Social Media Like Count
Denormalized Column:
Posts (PostID, Content, LikeCount)
Summary & Best Practices
Normal Form | Purpose | Denormalization Use Case |
---|---|---|
1NF | Atomic values | Rarely needed |
2NF | Eliminate partial dependencies | Reporting systems |
3NF | Remove transitive dependencies | Data warehouses |
BCNF | Superkey dependencies | High-traffic web apps |
4NF/5NF | Handle multi-valued/join dependencies | Complex enterprise systems |
Recommendation:
- Normalize first for integrity.
- Denormalize selectively for performance.
Visual Workflow
Raw Data → 1NF → 2NF → 3NF → BCNF → 4NF → 5NF
↓
Denormalize (for reads)