DP-900Chapter 34 of 101Objective 1.3

Non-Relational DB Types: Document, Key-Value, Graph, Column

This chapter covers the four main types of non-relational (NoSQL) databases: document, key-value, graph, and column-family stores. Understanding these is critical for the DP-900 exam, as roughly 20–25% of questions on Core Data Concepts (Objective 1.3) test your ability to differentiate these models and match them to appropriate use cases. You will learn the internal mechanics, Azure service offerings, and common exam traps for each type.

25 min read
Intermediate
Updated May 31, 2026

Warehouse vs Filing Cabinet vs Mind Map vs Ledger

Imagine you are organizing information for a large company. A document database is like a warehouse where each item is a self-contained box with a label (the document ID) and inside is a collection of papers (fields) that can vary from box to box. You can search for any box by its contents, like finding all boxes that mention 'urgent'. A key-value store is like a massive bank of filing cabinets with only two things: a drawer label (key) and a single folder (value). You can only open a drawer by its exact label; you cannot search inside folders. A graph database is like a giant mind map on a whiteboard where each idea (node) is connected to others by lines (edges) that have meanings like 'reports to' or 'is related to'. To find a path from one idea to another, you follow the lines. A column-family store is like an accounting ledger where data is grouped into column families (e.g., 'Personal Info', 'Order History'), and within each family, rows can have different columns. You can efficiently read all columns for a given row within a family, but joining across families is expensive. Each model is optimized for different access patterns: documents for flexible schemas, key-value for high-speed lookups by ID, graphs for traversing relationships, and column-families for analytical queries over wide rows.

How It Actually Works

1. Overview of Non-Relational Databases

Non-relational databases (often called NoSQL) are designed to handle data that does not fit neatly into tables with fixed schemas and relationships. They sacrifice some ACID guarantees for scalability, performance, and flexibility. The DP-900 exam expects you to know the four main types: document, key-value, graph, and column-family. Each has distinct characteristics, query patterns, and typical use cases.

2. Document Databases

What it is: A document database stores data as documents, typically in JSON, BSON, or XML format. Each document is a self-contained unit that contains all the data for an entity. Documents in the same collection can have different fields and structures (schema flexibility).

How it works internally: The database indexes fields within documents to allow queries on any field. For example, in Azure Cosmos DB (which supports document model via SQL API), documents are stored in a collection, and the database automatically indexes all properties by default. When you query, it uses the index to locate matching documents. Documents are identified by a unique ID within a collection. Partitioning is based on a partition key, which distributes documents across physical partitions.

Key components: - Collection: A container for documents (like a table in relational). - Document: A JSON object with fields and values. - ID: Unique identifier within a collection. - Partition key: A property used to distribute data across partitions.

Configuration and verification: In Azure Cosmos DB, create a database and a container with a partition key. Use the Data Explorer to insert and query documents.

Example:

{
  "id": "1",
  "name": "John Doe",
  "address": {
    "street": "123 Main St",
    "city": "Seattle"
  },
  "orders": [
    { "orderId": "A100", "total": 50.00 }
  ]
}

Interaction with related technologies: Document databases often support SQL-like query languages (e.g., Cosmos DB SQL API). They can be used with change feed for real-time processing.

3. Key-Value Stores

What it is: A key-value store is the simplest NoSQL model. Data is stored as an associative array (dictionary) where each key is unique and maps to a value. The value can be a simple string, a complex object (like JSON), or a binary blob.

How it works internally: The database uses a hash table or B-tree to map keys to storage locations. Lookups by key are extremely fast (O(1) average). There is no support for querying by value or relationships; you must know the key to retrieve the value. In Azure, Redis Cache is a popular key-value store. It stores data in memory for low latency. Azure Cosmos DB also offers a Table API that is a key-value store (similar to Azure Table Storage).

Key components: - Key: Unique identifier (string, up to 1024 bytes in Cosmos DB Table API). - Value: Data associated with the key (max size 1 MB per entity in Azure Table Storage). - Partition key: Determines partition for scalability.

Configuration and verification: In Azure Redis Cache, use SET key value and GET key. In Cosmos DB Table API, use SDKs to insert entities with PartitionKey and RowKey.

Example:

SET user:1000 '{"name":"Alice","email":"alice@example.com"}'
GET user:1000

Interaction with related technologies: Key-value stores are often used for caching, session management, and real-time data. They integrate with web apps via Redis clients.

4. Graph Databases

What it is: A graph database stores data as nodes (entities) and edges (relationships). Both nodes and edges can have properties. The focus is on traversing relationships efficiently.

How it works internally: The database uses adjacency lists or index-free adjacency to store connections. Each node stores pointers to its adjacent nodes, making traversals very fast regardless of graph size. Queries use graph query languages like Gremlin or SPARQL. In Azure, Cosmos DB offers Gremlin API for graph data.

Key components: - Node (vertex): Represents an entity (e.g., person, product). - Edge (relationship): Connects two nodes with a direction and label (e.g., "knows", "purchased"). - Properties: Key-value pairs on nodes and edges. - Label/Type: Categorizes nodes or edges (e.g., "Person", "Product").

Configuration and verification: In Cosmos DB, create a graph container and use Gremlin queries:

g.addV('person').property('name', 'Alice')
g.addV('person').property('name', 'Bob')
g.V().has('name','Alice').addE('knows').to(g.V().has('name','Bob'))
g.V().has('name','Alice').out('knows').values('name')

Interaction with related technologies: Graph databases are used for social networks, recommendation engines, and fraud detection. They can be integrated with analytics tools.

5. Column-Family Stores

What it is: A column-family store stores data in columns rather than rows. Data is grouped into column families, and each row can have different columns within a family. It is optimized for queries that read many columns for a few rows or aggregate over columns.

How it works internally: Data is stored on disk in column-oriented format, allowing efficient compression and I/O for column scans. In Apache Cassandra (used in Azure Cosmos DB Cassandra API), data is partitioned by partition key and sorted within a partition by clustering keys. Writes are append-only to commit logs and memtables, then flushed to SSTables.

Key components: - Column family (table): Container for rows. - Row key (partition key): Determines partition. - Column: Has a name, value, and timestamp. Columns can be added dynamically. - Clustering key: Sorts columns within a partition.

Configuration and verification: In Cosmos DB Cassandra API, use CQL:

CREATE TABLE users (user_id UUID PRIMARY KEY, name text, email text);
INSERT INTO users (user_id, name, email) VALUES (uuid(), 'Alice', 'alice@example.com');
SELECT * FROM users;

Interaction with related technologies: Column-family stores are used for time-series data, IoT, and large-scale analytics. They integrate with Spark and Hadoop.

6. Comparison Summary

Document: Best for content management, catalogs, and applications with evolving schemas. Azure Cosmos DB (SQL API) is the primary offering.

Key-Value: Best for caching, session state, and high-speed lookups. Azure Redis Cache and Cosmos DB Table API.

Graph: Best for interconnected data like social networks, recommendation engines. Cosmos DB Gremlin API.

Column-Family: Best for time-series, IoT, and analytical workloads. Cosmos DB Cassandra API.

7. Azure Services for Non-Relational Databases

Azure Cosmos DB: Multi-model database supporting document (SQL API), key-value (Table API), graph (Gremlin API), and column-family (Cassandra API) models. It offers global distribution, multiple consistency levels, and SLA-backed throughput.

Azure Redis Cache: In-memory key-value store for caching.

Azure Table Storage: Part of Azure Storage, a key-value store for structured data.

8. Exam-Relevant Details

Cosmos DB is the primary Azure service for non-relational databases.

Each API corresponds to a different data model.

Partition key design is critical for performance.

Consistency levels affect read performance and staleness.

Throughput is provisioned in Request Units per second (RU/s).

Walk-Through

1

Identify Data Access Patterns

Start by determining how the application will access data. Will it need point lookups by ID (key-value)? Or complex queries with filters on multiple fields (document)? Or traversals of relationships (graph)? Or analytical aggregation over many rows (column-family)? This step determines which non-relational model fits best. For example, a user profile service that needs fast retrieval by user ID is a key-value use case. A product catalog with varying attributes is a document use case.

2

Choose Azure Service and API

Based on the model, select the appropriate Azure service. For document, use Cosmos DB SQL API. For key-value, consider Azure Redis Cache for caching or Cosmos DB Table API for persistent storage. For graph, use Cosmos DB Gremlin API. For column-family, use Cosmos DB Cassandra API or Azure Table Storage (also key-value). Each API has its own SDKs and query languages. The exam tests your ability to match use cases to these APIs.

3

Design Partition Key

Partition key determines how data is distributed across physical partitions. A good partition key should have high cardinality and evenly distribute requests. For Cosmos DB, choose a property that appears in most queries and has many distinct values. For example, a user ID is often a good partition key. Bad choices include status fields with few values (e.g., 'active'/'inactive') which cause hot partitions. The exam often includes questions about partition key selection.

4

Configure Throughput and Consistency

In Cosmos DB, you provision throughput in RU/s (Request Units per second). Choose either dedicated throughput per container or shared throughput across a database. Consistency level affects read performance and data staleness. The default is Session consistency. Strong consistency offers the highest guarantee but lower performance. The exam may ask about the trade-offs between consistency levels.

5

Write and Query Data

Use the appropriate SDK or query language to insert and retrieve data. For document, use SQL-like queries: SELECT * FROM c WHERE c.name = 'Alice'. For key-value, use SET/GET commands. For graph, use Gremlin traversal steps. For column-family, use CQL. Ensure queries use the partition key to avoid cross-partition scans, which are slower and consume more RU. The exam tests knowledge of query patterns and best practices.

What This Looks Like on the Job

Scenario 1: E-commerce Product Catalog An online retailer needs to store product information where each product can have different attributes (e.g., electronics have 'wattage', clothing has 'size'). Using a relational database would require a complex schema with many tables or sparse columns. They choose Azure Cosmos DB with SQL API (document model). Each product is a JSON document with flexible fields. The 'category' field is used as partition key to distribute products evenly. Queries like 'find all products with price < $50' are supported by indexing. They provision 10,000 RU/s to handle peak traffic. A common misconfiguration is choosing a partition key with low cardinality (e.g., 'productType' with only 3 values), causing hot partitions and throttling. The solution is to use a high-cardinality key like 'productId' or a synthetic key.

Scenario 2: Real-Time Leaderboard A gaming company needs a low-latency leaderboard that updates frequently and supports point lookups by player ID. They use Azure Redis Cache as a key-value store. Player scores are stored as strings with key pattern 'player:{playerId}:score'. Updates are done with SET commands. The leaderboard is maintained using Redis sorted sets (ZADD). This provides sub-millisecond latency. A common mistake is using a relational database for this, which would introduce latency and scalability issues. Another mistake is not setting an expiration on session data, leading to memory exhaustion.

Scenario 3: Social Network Friend Recommendations A social media platform wants to suggest friends based on mutual connections. They use Cosmos DB with Gremlin API (graph model). Users are nodes, 'friend' relationships are edges. Traversals like 'find friends of friends' are efficient using graph queries. They partition by user ID. A misconfiguration would be using a document model and trying to store friend lists as arrays in documents, which makes traversals expensive and limits scale. The graph model naturally supports this use case.

How DP-900 Actually Tests This

Objective Coverage: This chapter aligns with DP-900 Objective 1.3: Describe non-relational data. The exam expects you to identify characteristics of each non-relational type and match them to appropriate Azure services. Common objective codes: 1.3.1 (describe document databases), 1.3.2 (describe key-value stores), 1.3.3 (describe graph databases), 1.3.4 (describe column-family stores).

Common Wrong Answers: 1. Confusing document and column-family: Candidates often think column-family stores are like relational tables with columns. In reality, column-family stores have flexible columns per row, but they are optimized for columnar access, not row-based joins. 2. Assuming all NoSQL databases are schema-less: While document and key-value are schema-flexible, graph databases have a schema (node/edge labels and properties) and column-family stores have a defined column family structure. 3. Mismatching Azure services: Many candidates think Azure SQL Database is a NoSQL option (it is relational). Others don't realize Cosmos DB supports multiple models via different APIs. 4. Ignoring partition key importance: Questions about performance often have answers that ignore partition key design. The correct answer will emphasize choosing a high-cardinality partition key.

Specific Numbers and Terms: - Cosmos DB default indexing: all properties indexed automatically. - Partition key value max size: 101 bytes (UTF-8). - RU/s: minimum 400 RU/s per container. - Consistency levels: Strong, Bounded Staleness, Session, Consistent Prefix, Eventual. - Redis: max value size 512 MB.

Edge Cases: - Cosmos DB Table API is key-value, not column-family. - Azure Table Storage is also key-value, not relational. - Graph databases can also support key-value lookups but are optimized for traversals.

Eliminating Wrong Answers: - If a question mentions 'relationships' and 'traversals', eliminate document and key-value. - If a question mentions 'flexible schema' and 'JSON', eliminate column-family and graph. - If a question mentions 'high-speed lookups by ID', eliminate document and graph. - If a question mentions 'columnar storage' or 'wide rows', eliminate document and key-value.

Key Takeaways

Document databases (e.g., Cosmos DB SQL API) store JSON documents with flexible schemas and support querying on any field.

Key-value stores (e.g., Redis, Cosmos DB Table API) provide fast lookups by key but cannot query by value.

Graph databases (e.g., Cosmos DB Gremlin API) excel at traversing relationships between entities.

Column-family stores (e.g., Cosmos DB Cassandra API) are optimized for wide rows and columnar access patterns.

Partition key selection is critical for performance in Cosmos DB; choose high-cardinality keys to avoid hot partitions.

Cosmos DB is multi-model; you must select the correct API for the data model you need.

Consistency levels in Cosmos DB range from Strong (lowest performance) to Eventual (highest performance).

Provisioned throughput in Cosmos DB is measured in Request Units per second (RU/s).

Azure Table Storage is a key-value store, not a relational database.

Graph databases use adjacency lists for efficient traversals; they are not suitable for simple lookups.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Document Database

Stores data as self-contained documents (JSON/BSON).

Supports querying on any field within the document.

Schema is flexible; documents can have different fields.

Ideal for catalogs, content management, and complex objects.

Azure Cosmos DB SQL API is the primary Azure service.

Key-Value Store

Stores data as key-value pairs (associative array).

Only supports lookups by exact key; no field-level queries.

Value can be any blob (string, binary, etc.), but no schema enforcement.

Ideal for caching, session state, and high-speed lookups.

Azure Redis Cache and Cosmos DB Table API are common services.

Graph Database

Data is stored as nodes (entities) and edges (relationships).

Optimized for traversing relationships (e.g., friends of friends).

Uses query languages like Gremlin or SPARQL.

Best for interconnected data like social networks, fraud detection.

Azure Cosmos DB Gremlin API is the primary service.

Column-Family Store

Data is stored in column families; each row can have different columns.

Optimized for wide-row reads and columnar aggregation.

Uses CQL (Cassandra Query Language).

Best for time-series, IoT, and analytical workloads.

Azure Cosmos DB Cassandra API and Azure Table Storage are services.

Watch Out for These

Mistake

All NoSQL databases are schema-less.

Correct

Document and key-value stores are schema-flexible, but graph databases have a schema (node/edge labels and properties) and column-family stores have a defined column family structure. Only document databases allow completely different fields per document.

Mistake

Key-value stores can query by value.

Correct

Key-value stores only support lookups by key. To query by value, you must either know the key or use secondary indexes, which are not natively supported. This is a common exam trap: if a use case requires filtering on attributes, key-value is not appropriate.

Mistake

Column-family stores are the same as relational column-oriented databases.

Correct

Column-family stores (e.g., Cassandra) are different: they store data in rows but group columns into families, allowing flexible columns per row. Relational column-oriented databases store data by column for compression, but still enforce a fixed schema.

Mistake

Cosmos DB is only a document database.

Correct

Cosmos DB is multi-model: it supports document (SQL API), key-value (Table API), graph (Gremlin API), and column-family (Cassandra API). The exam often tests this by asking which API to use for a given model.

Mistake

Graph databases are just for social networks.

Correct

While social networks are a classic use case, graph databases are also used for fraud detection, recommendation engines, network topology, and knowledge graphs. Any domain with interconnected data benefits from graph.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the difference between document and key-value databases?

Document databases store data as structured documents (usually JSON) and allow querying on any field within the document. Key-value stores store data as simple key-value pairs and only support retrieval by key. For example, a product catalog with many attributes is best suited for a document database, while a session cache is best for a key-value store.

When should I use a graph database instead of a relational database?

Use a graph database when your data is highly interconnected and you need to traverse relationships efficiently, such as in social networks, recommendation engines, or fraud detection. Relational databases require complex joins that become slow at scale, whereas graph databases use index-free adjacency for fast traversals.

What is a column-family store and how is it different from a relational columnar database?

A column-family store (e.g., Cassandra) groups columns into families, and each row can have different columns within a family. It is optimized for write-heavy workloads and wide rows. A relational columnar database stores table data by column for compression but still enforces a fixed schema. Column-family stores are schema-flexible and designed for horizontal scaling.

Which Azure services support non-relational databases?

Azure Cosmos DB supports document (SQL API), key-value (Table API), graph (Gremlin API), and column-family (Cassandra API) models. Azure Redis Cache is a key-value store. Azure Table Storage is also a key-value store. These are the primary services for non-relational data in Azure.

What is the role of partition key in Cosmos DB?

The partition key determines how data is distributed across physical partitions. It must be chosen carefully to ensure even distribution and avoid hot partitions. A good partition key has high cardinality (many distinct values) and is used in most queries. For example, 'userId' is often a good choice, while 'status' with only a few values is bad.

Can I use SQL to query a non-relational database in Azure?

Yes, Cosmos DB SQL API supports a SQL-like query language for document data. However, other non-relational databases use different query languages: Redis uses commands (SET/GET), Gremlin uses traversal steps, and Cassandra uses CQL. The exam tests your knowledge of which query language matches which model.

What is the difference between Azure Table Storage and Cosmos DB Table API?

Both are key-value stores, but Cosmos DB Table API offers global distribution, multiple consistency levels, and dedicated throughput (RU/s). Azure Table Storage is simpler and cheaper but has lower performance and fewer features. The exam may ask which service to use for a given scenario based on requirements like latency or global distribution.

Terms Worth Knowing

Ready to put this to the test?

You've just covered Non-Relational DB Types: Document, Key-Value, Graph, Column — now see how well it sticks with free DP-900 practice questions. Full explanations included, no account needed.

Done with this chapter?