DP-900Chapter 20 of 101Objective 2.1

Database Normalization (1NF, 2NF, 3NF)

This chapter covers database normalization, specifically the first three normal forms (1NF, 2NF, 3NF). Normalization is a foundational concept for designing relational databases that minimize redundancy and prevent update anomalies. On the DP-900 exam, approximately 5-10% of questions touch on normalization concepts, often asking you to identify normal forms or recognize violations. Understanding these forms is crucial for designing efficient, maintainable data schemas in Azure SQL Database and other relational data services.

25 min read
Intermediate
Updated May 31, 2026

Database Normalization as Library Organization

Imagine a public library receives a massive donation of books. The books are dumped in a pile on the floor. Each book may have multiple authors, multiple genres, and the same author may appear on many books. The librarians decide to organize the collection to avoid redundancy and make updates easy. First, they create a single table where each row is a book, but they store all authors as a comma-separated list (like 1NF violation). This causes problems: if an author changes their name, every book row must be updated, and queries for books by a specific author are slow. To fix this, they apply 1NF: ensure each cell contains a single value. They split the authors into separate rows for each book-author pair, but now they have duplicate author names across rows. Next, they apply 2NF: they separate out partial dependencies. They create an Authors table with AuthorID and AuthorName, and a BookAuthors table linking BookID and AuthorID. This removes redundancy for author names. Finally, they apply 3NF: they notice that the genre description depends on the genre code, not directly on the book. They create a Genres table with GenreCode and GenreDescription, and reference GenreCode in the Books table. Now, the library is fully normalized: no redundant data, each fact is stored once, and updates are consistent. This mirrors the mechanistic steps of normalization: eliminating repeating groups (1NF), removing partial dependencies (2NF), and removing transitive dependencies (3NF).

How It Actually Works

What is Database Normalization?

Database normalization is a systematic process of organizing columns (attributes) and tables (relations) to reduce data redundancy and improve data integrity. The goal is to ensure that each piece of data is stored in exactly one place, eliminating update anomalies (inconsistencies when data is modified), insertion anomalies (inability to add data without additional unrelated data), and deletion anomalies (loss of unintended data when deleting rows). Normalization was first proposed by Edgar F. Codd in 1970 and later expanded by Raymond F. Boyce and others. The process involves applying a series of rules called normal forms. The DP-900 exam focuses on the first three normal forms (1NF, 2NF, 3NF), which are the most commonly used in practice.

Why Normalization Exists

Without normalization, databases suffer from three types of anomalies: - Update anomaly: When the same fact is stored in multiple rows, updating it in one place but not others causes inconsistency. For example, if an author's name is stored in every book row, changing the name requires updating many rows. - Insertion anomaly: You cannot insert a new entity (e.g., a new author) without also inserting a related entity (e.g., a book). For instance, if author data is embedded in a book table, you cannot add a new author until they have written a book. - Deletion anomaly: Deleting a row may inadvertently delete unrelated data. For example, deleting the last book by an author also deletes the author's information.

Normalization eliminates these anomalies by decomposing tables into smaller, focused tables linked by foreign keys.

First Normal Form (1NF)

Definition: A table is in 1NF if and only if: 1. All columns contain atomic (indivisible) values. Each cell holds a single value, not a set of values or a composite value. 2. Each column contains values of the same type. 3. Each row is unique (typically enforced by a primary key). 4. The order of rows and columns does not matter.

Violation Example: Consider a Books table with columns BookID, Title, Authors (where authors are stored as a comma-separated list). This violates 1NF because the Authors column contains multiple values in a single cell.

How to Achieve 1NF: Split the repeating group into separate rows. Create a BookAuthors table with one row per book-author pair. Now each cell contains a single value.

Exam Tip: The exam may present a table with multiple values in a single cell (e.g., 'John, Jane') and ask which normal form is violated. The answer is 1NF.

Second Normal Form (2NF)

Definition: A table is in 2NF if and only if: 1. It is in 1NF. 2. Every non-key column is fully functionally dependent on the entire primary key (no partial dependencies).

Partial Dependency: A partial dependency occurs when a non-key column depends on only part of a composite primary key. For example, consider a table OrderDetails with composite primary key (OrderID, ProductID). If the column ProductName depends only on ProductID (part of the key), that is a partial dependency.

How to Achieve 2NF: Remove partial dependencies by decomposing the table into two tables: one for the full key and the dependent columns, and another for the partial key and its dependent columns. In the example, create a Products table with ProductID as primary key and ProductName, and keep OrderDetails with OrderID, ProductID, and quantity.

Exam Tip: The exam often tests partial dependencies with composite keys. If a table has a composite key and a column depends on only one part of that key, the table is not in 2NF. The correct answer is to move that column to a separate table.

Third Normal Form (3NF)

Definition: A table is in 3NF if and only if: 1. It is in 2NF. 2. Every non-key column is non-transitively dependent on the primary key (no transitive dependencies).

Transitive Dependency: A transitive dependency occurs when a non-key column depends on another non-key column, which in turn depends on the primary key. For example, consider a table Employees with columns EmployeeID (PK), DepartmentID, and DepartmentName. Here, DepartmentName depends on DepartmentID, which depends on EmployeeID. So DepartmentName is transitively dependent on the primary key.

How to Achieve 3NF: Remove transitive dependencies by decomposing the table. Create a Departments table with DepartmentID as PK and DepartmentName, and reference DepartmentID in the Employees table as a foreign key.

Exam Tip: The exam often tests transitive dependencies with columns like ZipCode and City. If City depends on ZipCode, and ZipCode depends on the primary key, then City is transitively dependent. The solution is to create a ZipCodes table.

How Normalization Interacts with Related Technologies

In Azure SQL Database, normalization is applied during schema design. Indexes, such as clustered indexes, are typically created on primary keys. Normalized tables often have more foreign key relationships, which can impact query performance due to joins. However, denormalization (the reverse process) is sometimes used for read-heavy workloads to reduce joins. The DP-900 exam expects you to understand when normalization is appropriate and when denormalization might be considered for performance.

Key Terms and Values

Atomic value: A single, indivisible value (e.g., a string, number, date).

Primary key (PK): A column or set of columns that uniquely identifies each row.

Foreign key (FK): A column that references the primary key of another table.

Functional dependency: A relationship where knowing the value of one column (or set) determines the value of another.

Full functional dependency: A non-key column depends on the entire primary key, not just part of it.

Configuration and Verification

Normalization is not a configuration setting; it is a design process. You verify normalization by examining the schema. In SQL Server Management Studio (SSMS) or Azure Data Studio, you can view table schemas and dependencies. There is no built-in tool to automatically detect normal form violations, but you can write queries to check for anomalies. For example, to find duplicate author names, you could query:

SELECT AuthorName, COUNT(*) FROM Authors GROUP BY AuthorName HAVING COUNT(*) > 1;

This would indicate a potential redundancy that normalization should eliminate.

Common Pitfalls

Over-normalization: Creating too many tables can lead to excessive joins and poor performance. In practice, many databases are normalized to 3NF, but sometimes BCNF or higher forms are used.

Misidentifying dependencies: Ensure you correctly identify functional dependencies. A common mistake is to think that a column is partially dependent when it is actually fully dependent.

Ignoring business rules: Normalization should reflect real-world relationships. If business rules change, the schema may need to be re-normalized.

Walk-Through

1

Identify Repeating Groups

Examine the table for columns that contain multiple values in a single cell, such as a list of authors separated by commas. This is a violation of 1NF. To fix, create a new table that stores each value in a separate row, linking back to the original row via a foreign key. Ensure every column holds atomic values.

2

Check for Partial Dependencies

If the table has a composite primary key, identify any non-key column that depends on only part of that key. For example, in a table with (OrderID, ProductID) as key, ProductName depends only on ProductID. This is a partial dependency violating 2NF. Remove it by creating a separate table for the partial key and its dependent columns.

3

Remove Transitive Dependencies

Look for non-key columns that depend on other non-key columns, not directly on the primary key. For example, in an Employees table, DepartmentName depends on DepartmentID, which depends on EmployeeID. This transitive dependency violates 3NF. Create a new table for the intermediate entity (Departments) and reference it via foreign key.

4

Verify Normal Forms

After each decomposition, verify that the resulting tables satisfy the normal form. Check that all columns are atomic (1NF), no partial dependencies exist (2NF), and no transitive dependencies exist (3NF). Use sample data to test for update anomalies: try updating a value in one place and ensure consistency.

5

Document Dependencies

Maintain a clear record of functional dependencies. This helps in future schema changes and in verifying that normalization rules are correctly applied. Use a dependency diagram or a data dictionary to list all functional dependencies and their justifications.

What This Looks Like on the Job

Enterprise Scenario 1: E-Commerce Product Catalog

An online retailer stores product information in a single table: ProductID, ProductName, CategoryID, CategoryName, SupplierID, SupplierName, Price. This table is in 1NF, but it has partial dependencies (CategoryName depends on CategoryID, SupplierName depends on SupplierID) and transitive dependencies (CategoryName depends on CategoryID which is part of the key? Actually CategoryID is not part of the key; it's a non-key column, so CategoryName depends on CategoryID, which is a transitive dependency from ProductID). To normalize to 3NF, the retailer creates separate tables: Categories (CategoryID, CategoryName), Suppliers (SupplierID, SupplierName), and Products (ProductID, ProductName, CategoryID, SupplierID, Price). This eliminates redundancy: category names are stored once, and updating a supplier name requires a single update. In production, this schema supports millions of products and thousands of categories and suppliers. Indexes on foreign keys (CategoryID, SupplierID) improve join performance. A common misconfiguration is forgetting to create foreign key constraints, leading to orphaned records. When misconfigured, queries may return inconsistent data, such as a product referencing a non-existent category.

Enterprise Scenario 2: University Student Enrollment System

A university database initially has a table Enrollments with columns: StudentID, StudentName, CourseID, CourseName, InstructorID, InstructorName, Grade. This violates 2NF because StudentName depends only on StudentID (partial dependency if the key is (StudentID, CourseID)), and CourseName and InstructorName depend only on CourseID. Also, InstructorName depends on InstructorID, which is a transitive dependency from CourseID? Actually, InstructorID is a non-key column, and InstructorName depends on InstructorID, so it's a transitive dependency from the primary key. Normalization splits into Students (StudentID, StudentName), Courses (CourseID, CourseName, InstructorID), Instructors (InstructorID, InstructorName), and Enrollments (StudentID, CourseID, Grade). This design supports 50,000 students, 2,000 courses, and 500 instructors. The system uses foreign key constraints to ensure referential integrity. A common mistake is to store the instructor name directly in the Courses table, causing update anomalies when an instructor changes their name. In production, the normalized schema allows easy updates and consistent data across the university.

Performance Considerations

Normalization often increases the number of joins required for queries, which can impact performance. In Azure SQL Database, proper indexing on foreign keys and primary keys mitigates this. For read-heavy workloads, denormalization (adding redundant columns) may be used to reduce joins, but this must be balanced with the risk of update anomalies. The DP-900 exam expects you to understand that normalization improves data integrity but may reduce query performance.

How DP-900 Actually Tests This

What DP-900 Tests on Normalization (Objective 2.1)

The DP-900 exam covers normalization under the domain 'Relational Data' and objective '2.1 Describe relational data structures'. Specifically, you need to:

Identify the characteristics of 1NF, 2NF, and 3NF.

Recognize violations of each normal form given a table schema.

Understand the purpose of normalization: reduce redundancy and prevent anomalies.

Know that normalization is a design principle, not a configuration option.

Common Wrong Answers and Why Candidates Choose Them

1.

'The table is in 1NF because it has a primary key.' This is wrong because having a primary key is necessary but not sufficient for 1NF. The table must also have atomic columns. Candidates often confuse primary key requirement with 1NF.

2.

'2NF requires that all non-key columns depend on the primary key.' This is true but incomplete. 2NF specifically requires no partial dependencies, meaning non-key columns must depend on the entire primary key, not just part of it. Candidates may pick an option that says 'all non-key columns depend on the primary key' without specifying 'full functional dependency'.

3.

'3NF is the same as 2NF but with no transitive dependencies.' This is correct but often candidates choose a distractor that says '3NF requires that all non-key columns depend on the primary key' (which is 2NF). They miss the transitive dependency aspect.

4.

'Normalization always improves performance.' This is false. Normalization can hurt performance due to increased joins. Candidates often assume normalization is always beneficial.

Specific Numbers and Terms That Appear on the Exam

The terms 'atomic', 'partial dependency', 'transitive dependency', 'composite key' appear verbatim.

The exam may ask: 'A table with a composite key and a column that depends on only part of the key violates which normal form?' Answer: 2NF.

'A table where a non-key column depends on another non-key column violates which normal form?' Answer: 3NF.

'What is the purpose of normalization?' Answer: Reduce data redundancy and improve data integrity.

Edge Cases and Exceptions

A table in 3NF may still have anomalies if it has multiple candidate keys. This leads to BCNF, which is beyond DP-900 scope.

Denormalization is sometimes used for performance in data warehouses (star schemas). The exam may ask when denormalization is appropriate: for read-heavy analytical queries.

The exam does not test 4NF or 5NF.

How to Eliminate Wrong Answers

If an option mentions 'repeating groups' or 'multiple values in a cell', the correct normal form violation is 1NF.

If an option mentions 'composite key' and 'partial dependency', the correct violation is 2NF.

If an option mentions 'transitive dependency', the correct violation is 3NF.

Always look for the most specific violation. For example, if a table violates both 1NF and 2NF, the question may ask 'Which normal form is violated first?' or 'What is the highest normal form satisfied?'

Key Takeaways

1NF requires atomic columns and a primary key.

2NF requires 1NF and no partial dependencies on a composite key.

3NF requires 2NF and no transitive dependencies.

Partial dependency: a non-key column depends on only part of a composite primary key.

Transitive dependency: a non-key column depends on another non-key column.

Normalization reduces redundancy and prevents update, insert, and delete anomalies.

Normalization may increase joins and reduce query performance.

The DP-900 exam tests identification of normal form violations, not the process of normalization.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

1NF

Requires atomic columns (no repeating groups).

Requires a primary key.

Does not address partial or transitive dependencies.

Violation: multiple values in a cell.

Example: Authors column with comma-separated names.

2NF

Must be in 1NF.

Requires no partial dependencies on a composite key.

Non-key columns must depend on the entire primary key.

Violation: column depends on part of a composite key.

Example: ProductName depends on ProductID in a table with (OrderID, ProductID) key.

2NF

Must be in 1NF.

No partial dependencies.

Allows transitive dependencies.

Violation: partial dependency.

Example: StudentName depends on StudentID in a table with (StudentID, CourseID) key.

3NF

Must be in 2NF.

No transitive dependencies.

Non-key columns must depend only on the primary key.

Violation: column depends on another non-key column.

Example: DepartmentName depends on DepartmentID, which depends on EmployeeID.

Watch Out for These

Mistake

Normalization is only about splitting tables into smaller ones.

Correct

Normalization is about removing redundancy and anomalies through a systematic process of decomposing tables based on functional dependencies. Simply splitting tables arbitrarily does not guarantee normalization; you must follow the rules of each normal form.

Mistake

A table with a primary key is automatically in 1NF.

Correct

A primary key is required for 1NF, but the table must also have atomic columns. If any column contains multiple values (e.g., comma-separated list), the table violates 1NF even with a primary key.

Mistake

2NF is automatically satisfied if the table has a single-column primary key.

Correct

2NF concerns partial dependencies, which only exist with composite keys. If the primary key is a single column, there cannot be a partial dependency, so the table is automatically in 2NF if it is in 1NF. However, this is a common trick: the exam may state 'the table has a single-column primary key, so it is in 2NF' — this is true, but candidates might think it's false.

Mistake

3NF eliminates all redundancy.

Correct

3NF eliminates most redundancy but not all. For example, if two columns are functionally dependent on each other (e.g., ZipCode and City), 3NF allows redundancy because City depends on ZipCode, which is a non-key column. BCNF or 4NF may be needed to eliminate such redundancy. DP-900 does not go beyond 3NF.

Mistake

Normalization always improves database performance.

Correct

Normalization improves data integrity but often degrades query performance due to increased joins. In practice, denormalization is used for read-heavy workloads. The exam may test this trade-off.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the difference between 2NF and 3NF?

2NF addresses partial dependencies, which occur when a non-key column depends on only part of a composite primary key. 3NF addresses transitive dependencies, which occur when a non-key column depends on another non-key column, not directly on the primary key. Both require the table to be in the previous normal form. For example, a table with a single-column primary key is automatically in 2NF if in 1NF, but may still violate 3NF if it has transitive dependencies.

Can a table be in 3NF but still have redundancy?

Yes. 3NF eliminates most redundancy but not all. For instance, if you have columns ZipCode and City, and City is functionally dependent on ZipCode, then City is transitively dependent on the primary key (if ZipCode is not the key). But if ZipCode is a candidate key, the table might be in 3NF yet still have redundancy because the same City appears for many ZipCodes. Higher normal forms like BCNF address this, but they are beyond DP-900 scope.

How do I know if a table violates 2NF?

Check if the table has a composite primary key. If it does, identify non-key columns that depend on only part of that key. For example, in a table with (OrderID, ProductID) as key, if ProductName depends only on ProductID, that is a partial dependency violating 2NF. If the table has a single-column primary key, it cannot have a partial dependency, so it is automatically in 2NF if in 1NF.

What is an example of a transitive dependency?

Consider an Employees table with EmployeeID (PK), DepartmentID, and DepartmentName. DepartmentName depends on DepartmentID, which depends on EmployeeID. So DepartmentName is transitively dependent on the primary key. To fix, create a Departments table with DepartmentID (PK) and DepartmentName, and remove DepartmentName from Employees.

Does normalization improve query performance?

Not necessarily. Normalization reduces redundancy and improves data integrity, but it often increases the number of joins required to retrieve data, which can slow down queries. In read-heavy systems, denormalization (adding redundant columns) may be used to improve performance at the cost of potential update anomalies.

What normal form is most commonly used in practice?

Third Normal Form (3NF) is the most commonly used in transactional databases because it balances data integrity with reasonable performance. Many database designers stop at 3NF, as higher normal forms can lead to excessive table decomposition and complexity.

How does normalization relate to Azure SQL Database?

Azure SQL Database is a relational database service that supports normalized schemas. When designing tables in Azure SQL Database, you apply normalization principles to ensure data integrity. The service does not automatically normalize data; you must design the schema accordingly. Indexes can be created on foreign keys to improve join performance.

Terms Worth Knowing

Ready to put this to the test?

You've just covered Database Normalization (1NF, 2NF, 3NF) — now see how well it sticks with free DP-900 practice questions. Full explanations included, no account needed.

Done with this chapter?