This chapter covers relational data concepts, a foundational topic for the DP-900 exam. You will learn about tables, keys, relationships, constraints, and normalization. Approximately 15-20% of exam questions touch on relational data concepts, often asking you to identify primary/foreign keys, understand relationship types, or recognize normalization levels. Mastery of these concepts is essential for any Azure data role.
Jump to a section
A relational database is like a library's card catalog system. In a library, each book has a unique call number (like a primary key) and is stored on a shelf. The card catalog contains drawers of cards, each representing a book, with fields like title, author, and subject. To find a book, you look up the card, note the call number, and retrieve the book. The catalog itself is separate from the books; it's a structured index. Relationships exist: an author card links to all books by that author via the author's name (foreign key). If a book is checked out, the card remains, but a note indicates it's unavailable (like a NULL value). The catalog enforces rules: no two books can have the same call number (unique constraint), and you cannot file a card for a book that doesn't exist (referential integrity). When you search by subject, the catalog quickly narrows down cards matching that subject, similar to a database index. The library also has a checkout log (another table) that links borrower IDs to book call numbers, with dates — this is a junction table for a many-to-many relationship between borrowers and books. Just as a library patron doesn't need to know the exact shelf location — only the call number — a database user queries using keys and lets the system locate the data physically.
What is Relational Data?
Relational data is data organized into tables (relations) where each table represents an entity (e.g., Customer, Order) and rows represent instances. The relational model was proposed by E.F. Codd in 1970 and is based on set theory and predicate logic. In Azure, relational databases include Azure SQL Database, Azure SQL Managed Instance, and Azure Database for PostgreSQL/MySQL. The exam expects you to understand the logical structure, not physical storage.
Tables and Rows
A table consists of columns (attributes) and rows (tuples). Each column has a data type (e.g., INT, VARCHAR, DATE). In a well-designed table, every row is unique — typically enforced by a primary key. For example, a Customer table might have columns: CustomerID (INT, PK), FirstName (VARCHAR(50)), LastName (VARCHAR(50)), Email (VARCHAR(100)).
Primary Keys
A primary key uniquely identifies each row in a table. It must contain unique values and cannot be NULL. A table can have only one primary key, which may consist of a single column or multiple columns (composite key). In Azure SQL Database, you define a primary key using:
CREATE TABLE Customer (
CustomerID INT PRIMARY KEY,
FirstName VARCHAR(50),
LastName VARCHAR(50)
);The exam tests that a primary key ensures entity integrity. Common wrong answer: 'A primary key can be NULL.' That is false — primary keys enforce NOT NULL implicitly.
Foreign Keys
A foreign key is a column (or set of columns) in one table that references the primary key of another table. It establishes a link between the two tables and enforces referential integrity. For example, an Order table might have CustomerID as a foreign key referencing Customer.CustomerID.
CREATE TABLE Order (
OrderID INT PRIMARY KEY,
OrderDate DATE,
CustomerID INT,
FOREIGN KEY (CustomerID) REFERENCES Customer(CustomerID)
);The exam tests that foreign keys prevent orphan records: you cannot insert an order with a CustomerID that doesn't exist in Customer. Also, you cannot delete a customer that has orders unless you handle it (CASCADE, SET NULL, etc.).
Relationships
There are three types of relationships between tables: - One-to-One (1:1): Each row in Table A relates to exactly one row in Table B, and vice versa. Example: Employee and EmployeeDetails. Rare in practice; often merged. - One-to-Many (1:N): A row in Table A can relate to many rows in Table B, but a row in Table B relates to exactly one row in Table A. Example: Customer and Order. This is the most common relationship. - Many-to-Many (M:N): Rows in Table A can relate to many rows in Table B, and vice versa. Implemented via a junction table. Example: Student and Course with Enrollment table.
The exam may ask you to identify the relationship type given a scenario. For example, 'A book can have multiple authors, and an author can write multiple books' — that's many-to-many.
Constraints
Constraints enforce rules on data. Common constraints: - NOT NULL: Column cannot have NULL values. - UNIQUE: All values in a column must be different. - PRIMARY KEY: Combination of NOT NULL and UNIQUE. - FOREIGN KEY: Ensures referential integrity. - CHECK: Validates data based on a logical expression (e.g., Age >= 0). - DEFAULT: Provides a default value if none is specified.
The exam expects you to know which constraint ensures uniqueness (UNIQUE or PRIMARY KEY) and which prevents NULLs (NOT NULL).
Indexes
Indexes speed up data retrieval but slow down writes. They are not explicitly part of the relational model but are crucial for performance. In Azure SQL, you can create a clustered index (determines physical order) or nonclustered index. Primary keys automatically create a clustered index by default. The exam may ask: 'What is the primary purpose of an index?' Answer: To improve query performance by reducing data scans.
Normalization
Normalization is the process of organizing data to reduce redundancy and improve integrity. There are several normal forms: - 1NF (First Normal Form): Each column contains atomic (indivisible) values, and each row is unique. Example: A 'PhoneNumbers' column with multiple numbers violates 1NF. - 2NF (Second Normal Form): Must be in 1NF and all non-key columns depend on the entire primary key (no partial dependency). Relevant for composite keys. - 3NF (Third Normal Form): Must be in 2NF and no transitive dependency (non-key column depends on another non-key column). Example: Customer table with City and State — State depends on City, not CustomerID, so it violates 3NF.
The exam typically tests 1NF and 3NF. You might be asked: 'Which normal form removes transitive dependencies?' Answer: 3NF.
Denormalization
Denormalization intentionally introduces redundancy for performance (fewer joins). It is the opposite of normalization. In Azure data warehousing (e.g., Synapse), star schemas use denormalized fact and dimension tables. The exam may compare normalized OLTP vs denormalized OLAP.
Entity Integrity and Referential Integrity
Entity Integrity: Ensures each row is uniquely identified by the primary key (no duplicates, no NULLs).
Referential Integrity: Ensures foreign key values match a primary key value in the referenced table (or are NULL).
Views
A view is a virtual table based on a SELECT query. It does not store data but presents data from underlying tables. Views can simplify complex queries and provide security by restricting access to specific columns. In Azure SQL, you create a view with:
CREATE VIEW CustomerOrders AS
SELECT c.CustomerID, c.FirstName, o.OrderID, o.OrderDate
FROM Customer c JOIN Order o ON c.CustomerID = o.CustomerID;The exam may ask: 'Does a view store data?' Answer: No, it's a saved query.
Stored Procedures
Stored procedures are precompiled SQL code that can accept parameters and perform operations. They improve performance and security. Example:
CREATE PROCEDURE GetCustomerOrders @CustomerID INT
AS
SELECT * FROM Order WHERE CustomerID = @CustomerID;The exam expects you to know that stored procedures can return multiple result sets and are executed with EXEC.
Transactions
A transaction is a unit of work that is ACID (Atomic, Consistent, Isolated, Durable). In Azure SQL, transactions are used to ensure data integrity. BEGIN TRANSACTION, COMMIT, ROLLBACK. The exam tests ACID properties: Atomicity (all or nothing), Consistency (valid state), Isolation (concurrent transactions don't interfere), Durability (committed changes persist).
NULL Handling
NULL represents unknown or missing data. It is not zero or empty string. Comparisons with NULL use IS NULL or IS NOT NULL, not = NULL. Aggregate functions like COUNT(*) count all rows, but COUNT(column) excludes NULLs. The exam often tests that NULL is not equal to anything, even another NULL.
Data Types
Common SQL data types: - Exact Numeric: INT, BIGINT, SMALLINT, TINYINT, DECIMAL(p,s), NUMERIC, MONEY, SMALLMONEY - Approximate Numeric: FLOAT, REAL - Date and Time: DATE, TIME, DATETIME, DATETIME2, SMALLDATETIME, DATETIMEOFFSET - Character Strings: CHAR(n), VARCHAR(n), TEXT (deprecated), NCHAR(n), NVARCHAR(n), NTEXT (deprecated) - Binary: BINARY, VARBINARY, IMAGE (deprecated) - Other: UNIQUEIDENTIFIER, XML, GEOGRAPHY, GEOMETRY
The exam may ask which type to use for large Unicode text: NVARCHAR(MAX).
Relational vs Non-Relational
Relational databases use structured schema, ACID transactions, and SQL. Non-relational (NoSQL) databases like Cosmos DB use flexible schemas, eventual consistency, and different query languages. The exam compares these: 'Which Azure data service is best for structured data with complex relationships?' Answer: Azure SQL Database (relational).
Identify Entities and Attributes
Begin by identifying the real-world entities your system needs to store data about. For example, in an e-commerce system, entities are Customer, Product, Order, and Category. For each entity, list its attributes (columns). Customer has CustomerID, FirstName, LastName, Email. Product has ProductID, Name, Price, CategoryID. This step is conceptual and does not involve SQL. The goal is to create a logical data model.
Define Primary Keys
For each entity, select a primary key that uniquely identifies each instance. Typically, use a surrogate key like an auto-incrementing integer (e.g., CustomerID INT IDENTITY(1,1)). In Azure SQL, you can use IDENTITY property or SEQUENCE. Ensure the primary key is minimal — single column if possible. Avoid using natural keys like Email because they can change. The primary key enforces entity integrity.
Establish Relationships and Foreign Keys
Determine how entities relate. For a one-to-many relationship (e.g., Customer to Orders), add a foreign key in the 'many' table (Order.CustomerID) referencing the 'one' table's primary key. For many-to-many (e.g., Product and Order), create a junction table (OrderDetail) with composite primary key (OrderID, ProductID) and foreign keys to both tables. Enforce referential integrity with foreign key constraints.
Apply Normalization
Normalize the design to reduce redundancy. Ensure 1NF: all columns are atomic. For example, if Product has a 'Colors' column with multiple values, split into a separate ProductColor table. Then check for 2NF: no partial dependencies (relevant only for composite keys). Finally, check 3NF: no transitive dependencies. For instance, if Order table includes CustomerCity and CustomerState, those depend on CustomerID, not OrderID — remove them to a Customer table.
Implement Constraints and Indexes
Add constraints to enforce data integrity: NOT NULL on required columns, UNIQUE on alternative keys (e.g., Email), CHECK for business rules (e.g., Price > 0), DEFAULT for default values. Create indexes on columns used in WHERE, JOIN, and ORDER BY to improve query performance. The primary key automatically gets a clustered index. Consider nonclustered indexes on foreign keys and frequently searched columns.
Scenario 1: E-Commerce Platform Migration to Azure SQL Database
A mid-sized e-commerce company migrates from on-premises SQL Server to Azure SQL Database. They have tables: Customer, Product, Order, OrderItem, Category. The relational design ensures that each order is linked to a customer via CustomerID foreign key, and each order item links to order and product. The database uses 3NF to avoid data duplication. In production, they handle millions of orders per month. They create indexes on OrderDate and CustomerID for common queries. A common misconfiguration: not indexing foreign keys, leading to slow joins. They also use elastic pools to scale. The problem solved: ensuring data integrity across transactions (e.g., no orphan order items).
Scenario 2: Healthcare Patient Records with Azure SQL Managed Instance
A hospital uses Azure SQL Managed Instance for patient records. Tables: Patient, Appointment, Doctor, Diagnosis. The relationship between Patient and Appointment is one-to-many. The Diagnosis table links to Appointment (one-to-one). They enforce referential integrity to prevent appointments without patients. They use CHECK constraints on Age (0-120) and UNIQUE on Patient.SSN. They also use views to restrict access: doctors see only their patients. A common issue: forgetting to set cascading deletes, causing errors when deleting a patient with appointments. They use transactions to ensure atomicity when updating multiple tables.
Scenario 3: Financial Reporting with Azure Synapse Analytics
A financial firm uses Azure Synapse Analytics (dedicated SQL pool) for large-scale reporting. They use a star schema: a central FactTransaction table (millions of rows) with foreign keys to dimension tables: DimCustomer, DimProduct, DimDate. This denormalized design (3NF is not used) optimizes for read performance. They use clustered columnstore indexes for compression and fast scans. A misconfiguration: not distributing fact tables correctly (e.g., using ROUND_ROBIN when hash distribution on CustomerID would reduce data movement). They also use materialized views to pre-aggregate daily totals.
The DP-900 exam tests relational data concepts under objective '1.2 Describe core data concepts'. You must be able to:
Identify primary keys and foreign keys in a schema.
Understand the purpose of constraints (NOT NULL, UNIQUE, PRIMARY KEY, FOREIGN KEY, CHECK).
Recognize relationship types (one-to-one, one-to-many, many-to-many).
Know normalization basics (1NF, 2NF, 3NF).
Distinguish between relational and non-relational data.
Common wrong answers: 1. 'A primary key allows NULL values.' — FALSE. Primary keys enforce NOT NULL. 2. 'A foreign key must be unique.' — FALSE. Foreign keys can have duplicates (one-to-many). 3. 'Normalization always improves performance.' — FALSE. It reduces redundancy but can slow reads due to joins; denormalization may be used for performance. 4. 'A view stores data.' — FALSE. A view is a virtual table; it does not store data.
Specific numbers/terms: The exam asks about ACID properties, the difference between clustered and nonclustered indexes, and the default index type for primary keys (clustered).
Edge cases: The exam may present a table with a composite primary key and ask about partial dependency (2NF). Another edge: a foreign key can be NULL if the relationship is optional.
Elimination strategy: If a question asks about 'ensuring uniqueness', remember that both PRIMARY KEY and UNIQUE enforce uniqueness, but PRIMARY KEY also implies NOT NULL. For 'referential integrity', the answer is FOREIGN KEY.
A primary key uniquely identifies each row and cannot be NULL.
A foreign key enforces referential integrity by linking to a primary key in another table.
One-to-many is the most common relationship type; many-to-many requires a junction table.
Normalization reduces redundancy: 1NF requires atomic columns, 3NF removes transitive dependencies.
Constraints include NOT NULL, UNIQUE, PRIMARY KEY, FOREIGN KEY, CHECK, and DEFAULT.
A view is a virtual table; it does not store data.
ACID properties: Atomicity, Consistency, Isolation, Durability.
NULL is not equal to anything, even another NULL; use IS NULL to test.
These come up on the exam all the time. Here's how to tell them apart.
Primary Key
Uniquely identifies each row in a table.
Cannot contain NULL values.
Only one per table (can be composite).
Automatically creates a clustered index by default.
Ensures entity integrity.
Foreign Key
Links to a primary key in another table.
Can contain NULL values (optional relationship).
Multiple foreign keys per table allowed.
Does not automatically create an index (but should be indexed for performance).
Ensures referential integrity.
Clustered Index
Determines the physical order of data in the table.
Only one per table.
Leaf nodes contain actual data rows.
Faster for range queries because data is physically sorted.
Primary key creates a clustered index by default.
Nonclustered Index
Does not affect physical order; stored separately.
Multiple per table (up to 999 in SQL Server).
Leaf nodes contain pointers to data rows (bookmark lookup).
Slower for range queries due to lookups.
Can be created on any column to speed up queries.
Mistake
A primary key can be NULL.
Correct
A primary key column cannot contain NULL. It implicitly has a NOT NULL constraint.
Mistake
Foreign keys must be unique.
Correct
Foreign keys can have duplicate values; they only need to match a primary key value in the referenced table.
Mistake
Normalization always improves database performance.
Correct
Normalization reduces redundancy and improves write performance and integrity, but can degrade read performance due to more joins. Denormalization is often used for read-heavy workloads.
Mistake
A view stores data physically.
Correct
A view is a virtual table based on a SELECT query. It does not store data; it retrieves data from underlying tables each time it is queried.
Mistake
All relationships must be enforced with foreign keys.
Correct
Foreign keys enforce referential integrity, but relationships can exist logically without them (though not recommended). In some scenarios, like data warehouses, foreign keys may be omitted for performance.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
Both enforce uniqueness, but a primary key also implies NOT NULL and can only be one per table. A unique constraint allows one NULL value (in SQL Server) and can have multiple per table. The primary key is used to uniquely identify rows; a unique constraint is for alternate keys.
Yes, a foreign key can reference a column with a UNIQUE constraint. However, it commonly references a primary key. The referenced column must have unique values (either primary key or unique constraint).
2NF requires that all non-key columns depend on the entire primary key (no partial dependency). This only applies to tables with composite keys. 3NF requires that no non-key column depends on another non-key column (no transitive dependency). 3NF builds on 2NF.
A junction table (also called association table) is used to implement a many-to-many relationship. It contains foreign keys referencing the primary keys of the two related tables, and its primary key is often a composite of those foreign keys. For example, StudentCourse with StudentID and CourseID.
Use a clustered index for columns that are frequently used in range queries or sorting (e.g., OrderDate). Use nonclustered indexes for columns used in WHERE clauses with exact matches or for covering indexes. Remember, a table can have only one clustered index because data can be physically sorted in only one order.
CHAR(n) is fixed-length; it always uses n characters, padding with spaces. VARCHAR(n) is variable-length; it uses only the actual data length plus overhead. Use CHAR for fixed-length data (e.g., state abbreviations), VARCHAR for variable-length data (e.g., names).
ACID stands for Atomicity (all or nothing), Consistency (valid state), Isolation (concurrent transactions don't interfere), Durability (committed changes persist). It ensures reliable processing of database transactions, which is critical for financial systems and other applications requiring data integrity.
You've just covered Relational Data Concepts — now see how well it sticks with free DP-900 practice questions. Full explanations included, no account needed.
Done with this chapter?