DP-900Chapter 3 of 101Objective 1.3

Non-Relational Data Concepts

This chapter covers non-relational data concepts, which are a core part of the DP-900 exam under Objective 1.3: Describe core data concepts. Understanding non-relational (NoSQL) data stores is critical because Azure offers several such services (Cosmos DB, Table Storage, Blob Storage) and the exam tests your ability to differentiate them from relational databases. Approximately 15-20% of exam questions touch on non-relational concepts, including data types, storage structures, and use cases. By the end of this chapter, you will be able to identify the four main categories of non-relational databases, explain their mechanisms, and choose the right service for a given scenario.

25 min read
Intermediate
Updated May 31, 2026

Warehouse vs. Library: Organizing Data

Imagine you run a large company that receives thousands of packages daily. A relational database is like a library: every book (record) has a fixed set of fields (columns) like title, author, ISBN, and shelf location. You must place each book in a specific shelf (table) with the same structure. If a book has an extra attribute, like a signed copy, you need a separate table and a join to relate it — rigid but consistent. Now consider a non-relational database: it's like a warehouse where you can store anything — boxes, pallets, loose items — without predefining a format. Each item can have its own set of properties. A pallet of electronics might have a shipping label, a weight, and a destination, while a bin of screws might just have a part number and quantity. You can group similar items loosely (like a container), but there's no enforced schema. To find something, you look up its unique ID or scan tags (indexes). The warehouse approach is faster for diverse, unpredictable data and scales by adding more floor space (horizontal scaling), whereas the library requires careful planning and is harder to expand. This is the core difference between relational and non-relational data systems.

How It Actually Works

What Is Non-Relational Data?

Non-relational data, often called NoSQL (Not Only SQL), refers to data storage systems that do not enforce a fixed schema or relational model. Unlike relational databases that store data in tables with predefined columns and rows, non-relational databases use flexible data models such as key-value pairs, documents, column families, or graphs. This flexibility allows them to handle unstructured, semi-structured, and polymorphic data efficiently. The term NoSQL emerged in the late 2000s as a response to the scalability limitations of traditional relational databases for web-scale applications.

Why Non-Relational Databases Exist

Relational databases excel at maintaining data integrity through ACID transactions (Atomicity, Consistency, Isolation, Durability) and complex joins. However, they face challenges with horizontal scaling (sharding across many servers), handling varied data structures, and achieving low latency at massive scale. Non-relational databases address these by: - Schema flexibility: Each record can have different fields. - Horizontal scalability: Data is partitioned across many nodes automatically. - High availability: Built-in replication and eventual consistency models. - Performance: Optimized for simple read/write operations at scale.

Categories of Non-Relational Databases

There are four main types, each optimized for different data access patterns:

#### 1. Key-Value Stores Key-value stores are the simplest NoSQL model. Each item consists of a unique key and a value, which can be a string, JSON, binary data, or any blob. The database provides fast lookups by key, similar to a hash table or dictionary. Examples: Azure Cosmos DB (Table API), Redis, Amazon DynamoDB. - Mechanism: When a write occurs, the key is hashed to determine which partition stores the data. Reads use the same hash to locate the partition and retrieve the value. There is no support for querying by value attributes unless secondary indexes are built. - Default values: Azure Cosmos DB Table API uses a default consistency level of Session. The maximum item size is 2 MB. - Configuration: In Azure, you create a Cosmos DB account, then a database, and then a container with a partition key. The partition key is critical for performance; choosing a high-cardinality key (e.g., user ID) ensures even distribution. - Use case: Session management, shopping cart contents, user preferences.

#### 2. Document Databases Document databases store data as documents, typically in JSON, BSON, or XML format. Each document contains self-describing data with nested structures. Documents are grouped into collections (analogous to tables but schema-less). Examples: Azure Cosmos DB (SQL API), MongoDB. - Mechanism: Documents are indexed automatically on the primary key (id). Queries can filter on any field within the document, using indexes that you define. The database can return entire documents or projections. Updates can modify specific fields within a document without affecting others. - Key components: Document ID (unique within a collection), partition key (logical partition for scaling), indexing policy (which fields are indexed). - Default values: Cosmos DB SQL API has a default indexing policy that indexes all properties with automatic indexing. The maximum document size is 2 MB. - Configuration: Create a Cosmos DB account, database, container with partition key, and optionally define custom indexing paths to optimize queries. - Use case: Content management, user profiles, product catalogs.

#### 3. Column-Family Stores Column-family stores organize data into rows and columns, but unlike relational tables, columns are grouped into families, and each row can have different columns. Data is stored contiguously on disk by column family, enabling efficient reads of specific columns across many rows. Examples: Apache Cassandra, HBase, Azure Cosmos DB (Cassandra API). - Mechanism: Each row is identified by a row key. Column families contain related columns. For example, a "user" column family might have columns: name, email, age. Another row might omit age. Data is partitioned by row key hash. Queries are efficient when filtering by row key and column family. - Key components: Keyspace (analogous to database), column family, row key, column qualifier (name), timestamp (for conflict resolution). - Default values: Cassandra defaults to eventual consistency with tunable consistency levels (ONE, QUORUM, ALL). The maximum column value size is 2 GB. - Configuration: Define keyspace, replication factor, and column families with data types. - Use case: Time-series data, IoT sensor data, recommendation engines.

#### 4. Graph Databases Graph databases store data as nodes (entities) and edges (relationships), each with properties. They are optimized for traversing relationships, such as social networks or fraud detection. Examples: Azure Cosmos DB (Gremlin API), Neo4j. - Mechanism: Nodes and edges are stored with adjacency lists. Traversal queries start from a node and follow edges using pattern matching (e.g., "find friends of friends"). The database uses index-free adjacency to avoid expensive joins. - Key components: Vertex (node), edge (relationship), property (key-value on vertex/edge), label (type of vertex/edge). - Default values: Cosmos DB Gremlin API uses a default indexing policy that indexes all properties. - Configuration: Define graph structure implicitly by inserting vertices and edges. No schema required. - Use case: Social networks, recommendation engines, network topology.

Common Features Across Non-Relational Databases

Horizontal scaling: Data is partitioned across multiple servers using a partition key. Azure Cosmos DB automatically manages partitions and distributes throughput (RU/s).

Replication: Data is replicated across regions for high availability. Cosmos DB offers multi-region writes and automatic failover.

Consistency models: Most NoSQL databases offer tunable consistency levels, from strong (linearizable) to eventual (reads may return stale data). Cosmos DB provides five well-defined levels: Strong, Bounded Staleness, Session, Consistent Prefix, and Eventual.

Schema-on-read: The database does not enforce a schema on write; the application interprets the data on read. This allows storing heterogeneous data in the same container.

Comparison with Relational Databases

Schema: Relational requires predefined schema; NoSQL is schema-flexible.

ACID: Relational ensures full ACID; NoSQL often sacrifices strong consistency for availability (BASE: Basically Available, Soft state, Eventual consistency).

Joins: Relational supports complex joins; NoSQL discourages joins and denormalizes data.

Scaling: Relational scales vertically (bigger server); NoSQL scales horizontally (more servers).

Query language: Relational uses SQL; NoSQL uses APIs (REST, SDKs) with limited query capabilities.

How to Choose in Azure

When selecting an Azure data service, consider: - Key-value: Azure Cosmos DB Table API or Azure Table Storage (for simpler needs). Use for high-speed lookups by key. - Document: Azure Cosmos DB SQL API. Use for flexible schemas and complex queries. - Column-family: Azure Cosmos DB Cassandra API. Use for high write throughput and time-series data. - Graph: Azure Cosmos DB Gremlin API. Use for highly connected data. - Blob: Azure Blob Storage for unstructured binary data (images, videos, backups). Though not a database per se, it is a non-relational store.

Common Misconfigurations

Choosing the wrong partition key: A low-cardinality partition key (e.g., status = 'active') leads to hot partitions and throttling.

Ignoring indexing: In Cosmos DB, default indexing indexes all properties, which can slow writes. Disable indexing for rarely queried fields.

Assuming ACID: Most NoSQL databases do not support multi-record transactions. For critical financial data, prefer relational or use Cosmos DB with strong consistency and transactional batch.

Exam-Relevant Details

Azure Cosmos DB: Supports multiple APIs (SQL, MongoDB, Cassandra, Gremlin, Table). The default consistency is Session. RU/s (Request Units per second) measures throughput. 1 RU = 1 read of 1 KB item.

Azure Table Storage: A key-value store with a maximum table size of 500 TB. Supports up to 1,000 entities per transaction.

Azure Blob Storage: Three types: Block blobs (for large objects up to 4.7 TB), Append blobs (for logging), Page blobs (for VHDs up to 8 TB).

Azure Data Lake Storage Gen2: Built on Blob Storage with hierarchical namespace, suitable for big data analytics.

Verification Commands

Using Azure CLI:

# Create a Cosmos DB account
az cosmosdb create --name mycosmosdb --resource-group myrg --kind GlobalDocumentDB

# List containers
az cosmosdb sql container list --account-name mycosmosdb --database-name mydb --resource-group myrg

# Query items using Azure CLI (Cosmos DB SQL API)
az cosmosdb sql query --account-name mycosmosdb --database-name mydb --container-name mycontainer --query "SELECT * FROM c WHERE c.category = 'electronics'"

For Azure Table Storage:

# Create storage account
az storage account create --name mystorage --resource-group myrg --kind StorageV2

# List tables
az storage table list --account-name mystorage

# Insert entity
az storage entity insert --table-name mytable --entity "PartitionKey=pk1 RowKey=rk1 Name=value" --account-name mystorage

Interaction with Related Technologies

Non-relational databases often work with: - Azure Functions: Serverless compute that triggers on data changes (e.g., Cosmos DB change feed). - Azure Stream Analytics: Process real-time streaming data into Cosmos DB. - Azure Search: Index documents from Cosmos DB for full-text search. - Power BI: Connect to Cosmos DB for analytics, but beware of high RU consumption.

Understanding these fundamentals will help you answer DP-900 questions on choosing the right data store, explaining NoSQL characteristics, and identifying Azure services.

Walk-Through

1

Identify Data Requirements

Begin by analyzing the data you need to store. Determine if the data is structured (fixed schema), semi-structured (varying fields), or unstructured (binary). For DP-900, you must recognize scenarios where non-relational is appropriate: high volume, low latency, flexible schema, or need for horizontal scaling. For example, a product catalog with variable attributes (some products have size, others have color) is a candidate for a document database. Avoid relational if you anticipate frequent schema changes or need to scale out easily.

2

Choose the Data Model Type

Based on access patterns, select one of the four NoSQL models. For simple key-based lookups (e.g., user sessions), choose key-value. For complex queries on nested data, choose document. For high-write time-series data, choose column-family. For relationship-heavy queries, choose graph. On the exam, you might be given a scenario and asked which Azure service to use. For instance, if the requirement is to store JSON documents with ad-hoc queries, the answer is Cosmos DB SQL API (document).

3

Select Azure Service

Map the data model to an Azure service. For key-value: Azure Cosmos DB Table API or Azure Table Storage. For document: Cosmos DB SQL API or MongoDB API. For column-family: Cosmos DB Cassandra API. For graph: Cosmos DB Gremlin API. For unstructured blobs: Azure Blob Storage. The exam may ask about differences: Table Storage is cheaper but less feature-rich than Cosmos DB Table API (e.g., no global distribution, lower throughput). Blob Storage is for large binaries, not for querying by content.

4

Define Partitioning and Indexing

Configure the partition key to distribute data evenly. For Cosmos DB, choose a high-cardinality attribute (e.g., userId, deviceId). Avoid partition keys with few values (e.g., status, region). Set indexing policies: in Cosmos DB, default indexes all properties, which may slow writes. For read-heavy workloads, keep default; for write-heavy, exclude some paths. On the exam, remember that a bad partition key causes hot partitions and throttling (429 errors).

5

Configure Consistency and Durability

Set the consistency level based on your application's needs. Cosmos DB offers five levels. Strong consistency ensures linearizability but has higher latency and lower availability. Eventual consistency offers lowest latency but may return stale data. The default is Session, which guarantees monotonic reads and writes for a single session. For multi-region writes, you must use one of the weaker consistency levels. Also configure replication: single-region or multi-region writes. The exam may test that you cannot have strong consistency with multi-region writes.

What This Looks Like on the Job

Scenario 1: E-Commerce Product Catalog

A large online retailer needs to store product data where each product has different attributes (e.g., electronics have voltage, clothing has size). They use Azure Cosmos DB SQL API to store JSON documents. Each document contains a product ID, name, category, and a dynamic attributes object. The partition key is category (e.g., 'Electronics', 'Clothing') – but this is a bad choice because some categories are huge (hot partition). In production, they learned to use a composite partition key like category_id to spread load. They configured indexing to exclude the attributes subfields that are rarely queried, reducing RU consumption. Performance: 10,000 RU/s, serving 50,000 reads/sec with <10 ms latency. Common issue: when a flash sale occurs, the 'Electronics' partition gets throttled because all traffic hits that partition. Solution: redesign partition key to include a distribution factor like productId hashed.

Scenario 2: IoT Sensor Data

A manufacturing company collects temperature and vibration data from thousands of sensors every second. They use Azure Cosmos DB Cassandra API (column-family) to store time-series data. Each row key is a combination of sensor ID and timestamp (e.g., sensor123_20250321120000). Column families include readings with columns: temperature, vibration, humidity. They use a TTL (time-to-live) of 30 days to automatically expire old data. They configured a replication factor of 3 across two regions for disaster recovery. Performance: 100,000 writes/sec with eventual consistency. Misconfiguration: they initially used a single partition key (sensor ID) causing hot partitions for active sensors. They fixed by adding a time-based bucket to the partition key (e.g., sensor ID + hour).

Scenario 3: Social Network Graph

A social media startup wants to recommend friends based on mutual connections. They use Azure Cosmos DB Gremlin API. Nodes represent users with properties like name, location. Edges represent friendships. Queries traverse the graph: "find friends of friends of user X". They use the Gremlin traversal: g.V().has('name','Alice').repeat(out('friends').simplePath()).times(2).dedup(). The database automatically indexes properties. Performance: 5,000 queries/sec with 20 ms latency. Issue: they initially stored all data in a single partition (no partition key) causing throttling. They added a partition key based on user ID region (e.g., 'US', 'EU') to distribute data. They also learned to use vertex-centric indexes to speed up edge traversal for high-degree nodes (celebrities).

How DP-900 Actually Tests This

What DP-900 Tests

Objective 1.3 covers non-relational data concepts. The exam expects you to:

Distinguish between relational and non-relational data.

Identify the four NoSQL categories (key-value, document, column-family, graph) and their Azure implementations.

Understand when to use each type based on scenarios.

Know the characteristics of Azure Cosmos DB, Table Storage, and Blob Storage.

Recognize the term 'schema-less' or 'flexible schema' as a property of non-relational databases.

Common Wrong Answers

1.

"Non-relational databases support ACID transactions" – This is false. Most NoSQL databases are BASE (Basically Available, Soft state, Eventual consistency). Only some, like Cosmos DB with multi-record transactions in certain APIs, support limited ACID. The exam expects you to know that NoSQL typically sacrifices strong consistency.

2.

"All NoSQL databases use SQL for queries" – False. Each has its own API (e.g., Cassandra Query Language (CQL), Gremlin, MongoDB query language). Cosmos DB SQL API uses SQL-like syntax, but it's not standard SQL.

3.

"Azure Blob Storage is a NoSQL database" – Blob Storage is an object store, not a database. It stores unstructured binary data but does not support querying by content like a database. The exam may include it under 'non-relational data stores' but not as a NoSQL database.

4.

"Column-family stores are the same as relational tables" – False. In column-family, each row can have different columns, and columns are grouped into families. It's more flexible than relational.

Specific Numbers and Terms

Cosmos DB default consistency: Session.

Maximum item size in Cosmos DB: 2 MB.

Table Storage maximum table size: 500 TB.

Table Storage maximum entity size: 1 MB.

Blob Storage block blob max size: 4.7 TB (or 4.75 TiB).

Blob Storage page blob max size: 8 TB.

RU (Request Unit): 1 RU = 1 read of a 1 KB item. 10 RU for a 1 KB write.

Partition key: Must be immutable; choose high cardinality.

Edge Cases

Multi-region writes: Cannot use Strong consistency; must use Eventual, Consistent Prefix, Session, or Bounded Staleness.

Change feed: Cosmos DB provides a change feed for real-time processing; only available for SQL API and MongoDB API.

TTL (Time-to-Live): Can be set at item level or container level; expired items are deleted automatically within 10 seconds.

Indexing: Default indexes all properties; you can set custom indexing policy to include/exclude paths.

How to Eliminate Wrong Answers

If a scenario requires complex joins or transactions, eliminate NoSQL.

If the data is highly structured with fixed schema, prefer relational.

If the scenario mentions 'flexible schema', 'horizontal scaling', or 'high throughput', lean towards NoSQL.

For Azure services: if the question says 'globally distributed, multi-model', it's Cosmos DB. If 'simple, cheap key-value', it's Table Storage. If 'binary files', it's Blob Storage.

Key Takeaways

Non-relational (NoSQL) databases include key-value, document, column-family, and graph types.

Azure Cosmos DB is a multi-model NoSQL database that supports multiple APIs and global distribution.

Azure Table Storage is a simple key-value store, cheaper but less feature-rich than Cosmos DB Table API.

Azure Blob Storage stores unstructured binary data (block, append, page blobs).

NoSQL databases typically sacrifice ACID for scalability and performance (BASE model).

Choose a high-cardinality partition key to avoid hot partitions and throttling.

Cosmos DB default consistency level is Session; strong consistency cannot be used with multi-region writes.

The maximum item size in Cosmos DB is 2 MB; in Table Storage, it is 1 MB.

Column-family stores group columns into families and are optimized for write-heavy time-series data.

Graph databases are best for highly connected data with complex relationships.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Relational Database (SQL)

Fixed schema: tables with predefined columns and data types.

Supports ACID transactions (Atomicity, Consistency, Isolation, Durability).

Uses SQL for queries, supporting complex joins and aggregations.

Scales vertically (more powerful server) primarily.

Best for structured data with consistent relationships (e.g., financial records).

Non-Relational Database (NoSQL)

Flexible schema: each record can have different fields.

Typically BASE (Basically Available, Soft state, Eventual consistency).

Uses API-specific query languages (e.g., SQL-like for Cosmos DB, CQL for Cassandra).

Scales horizontally (more servers) easily.

Best for semi-structured or unstructured data, high throughput, and rapid scaling (e.g., IoT, social media).

Azure Cosmos DB

Multi-model: supports SQL, MongoDB, Cassandra, Gremlin, Table APIs.

Global distribution with multi-region writes and multiple consistency levels.

Auto-indexing, change feed, and low latency (<10 ms reads).

Higher cost per RU, but offers unlimited throughput.

Maximum item size: 2 MB.

Azure Table Storage

Key-value store only (Table API).

Single-region only (no global distribution).

No automatic indexing; only indexed on PartitionKey and RowKey.

Lower cost, suitable for simple scenarios.

Maximum entity size: 1 MB; maximum table size: 500 TB.

Watch Out for These

Mistake

NoSQL databases cannot handle transactions at all.

Correct

Some NoSQL databases, like Azure Cosmos DB, support multi-record transactions with ACID guarantees within a single partition (using stored procedures or transactional batch). However, cross-partition transactions are not supported in most NoSQL systems. The exam tests that NoSQL generally sacrifices ACID for scalability, but Cosmos DB can achieve strong consistency with single-partition transactions.

Mistake

All NoSQL databases are schema-less, meaning no structure at all.

Correct

Schema-less means the database does not enforce a schema on write. However, data still has an implicit structure defined by the application. For example, a document database may store JSON with nested fields; the application expects certain fields. The term 'flexible schema' is more accurate. The exam uses 'schema-less' to contrast with relational's rigid schema.

Mistake

Azure Table Storage and Cosmos DB Table API are identical.

Correct

They share the same API but differ in features. Cosmos DB Table API offers global distribution, multi-region writes, automatic indexing, and multiple consistency levels, with higher throughput and lower latency. Table Storage is cheaper but limited to single-region, lower throughput (up to 20,000 entities per second per account), and eventual consistency only. The exam may ask you to choose between them based on requirements like global distribution.

Mistake

Column-family databases store data in columns like relational tables.

Correct

In column-family stores, data is stored by column family, not by row. Each row can have different columns, and columns within a family are stored contiguously on disk. This allows efficient reads of specific columns across many rows. Relational tables store all columns of a row together. The exam expects you to know that column-family is optimized for write-heavy workloads and time-series data.

Mistake

Graph databases are only for social networks.

Correct

Graph databases are ideal for any domain with complex relationships, such as fraud detection (finding unusual connections), recommendation engines (product affinity), network management (topology), and knowledge graphs. The exam may present a scenario like 'detecting fraudulent transactions' and expect you to choose a graph database.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the difference between relational and non-relational databases?

Relational databases store data in tables with fixed schemas and support ACID transactions, using SQL for queries. Non-relational databases (NoSQL) offer flexible schemas, horizontal scalability, and are often BASE (eventually consistent). They are categorized into key-value, document, column-family, and graph types. For DP-900, remember that NoSQL is chosen for high throughput, flexible data, and global distribution, while relational is for structured data requiring complex joins and strong consistency.

When should I use Azure Cosmos DB vs Azure Table Storage?

Use Azure Cosmos DB when you need global distribution, multi-region writes, multiple consistency levels, automatic indexing, or support for multiple data models (document, graph, etc.). Use Azure Table Storage for simple, cost-effective key-value storage within a single region, with lower throughput requirements. Both use the same Table API, but Cosmos DB offers higher performance and features at a higher cost.

What is a partition key and why is it important?

A partition key is a property that determines how data is distributed across physical partitions. It is critical for performance: a good partition key has high cardinality (many unique values) to spread load evenly. A bad partition key (e.g., with few values) causes hot partitions, leading to throttling (HTTP 429 errors). In Cosmos DB, the partition key is immutable and must be specified at container creation.

What are the consistency levels in Azure Cosmos DB?

Cosmos DB offers five consistency levels: Strong (linearizable), Bounded Staleness (reads may lag writes by a configurable staleness window), Session (guarantees monotonic reads/writes for a single session), Consistent Prefix (reads never see out-of-order writes), and Eventual (no ordering guarantees). The default is Session. Strong consistency cannot be used with multi-region writes.

Is Azure Blob Storage considered a NoSQL database?

No, Azure Blob Storage is an object storage service for unstructured binary data (images, videos, backups). It is not a database because it does not support querying by content or indexing beyond metadata. However, it is a non-relational data store. For DP-900, you should know that Blob Storage is for large binaries, while Cosmos DB and Table Storage are for structured/semi-structured data.

What is the maximum size of an item in Cosmos DB and Table Storage?

In Azure Cosmos DB, the maximum item (document or entity) size is 2 MB. In Azure Table Storage, the maximum entity size is 1 MB. This is a common exam detail.

What is the difference between document and column-family databases?

Document databases store data as self-describing documents (JSON, BSON) with nested structures, ideal for complex objects. Column-family databases store data in rows with column families; each row can have different columns, and data is stored contiguously by column family, optimizing for queries that read specific columns across many rows (e.g., time-series data).

Terms Worth Knowing

Ready to put this to the test?

You've just covered Non-Relational Data Concepts — now see how well it sticks with free DP-900 practice questions. Full explanations included, no account needed.

Done with this chapter?