This chapter covers Azure Cosmos DB, a globally distributed, multi-model database service that is a key topic in the AZ-204 exam domain Storage (Objective 2.1). Cosmos DB appears in approximately 15-20% of exam questions, focusing on its consistency models, partitioning strategies, request unit (RU) estimation, and SDK usage. Mastering Cosmos DB is critical because it is the go-to database for applications requiring low-latency reads and writes at global scale, and the exam tests your ability to design and optimize its configuration.
Jump to a section
Imagine Cosmos DB as a global library system with branches in multiple cities. Each branch (region) has a complete copy of the catalog (data). When a patron (application) wants to check out a book (read data), they go to their local branch and get the book instantly, because every branch has the same catalog. When a patron returns a book (writes data), they hand it to the local librarian, who immediately updates the local catalog and then sends a notification to all other branches via a fast courier service (replication). The system offers five different ways to organize the catalog: by author (Document API), by key-value pairs (Table API), by graph connections (Gremlin API), by column families (Cassandra API), or by document ID (SQL API). Each branch can be configured to either strongly guarantee that all branches see the same catalog update at the same time (strong consistency) or allow some branches to show slightly outdated catalogs for a few seconds (eventual consistency) in exchange for faster performance. The global traffic manager (multi-master) routes patrons to the nearest healthy branch, and if a branch is damaged (region outage), patrons are automatically redirected to another branch without interruption.
What is Azure Cosmos DB?
Azure Cosmos DB is a fully managed NoSQL database service designed for globally distributed, low-latency, and high-availability applications. It offers turn-key global distribution across any number of Azure regions, elastic scaling of throughput and storage, and multiple consistency models. Unlike traditional databases, Cosmos DB abstracts the underlying infrastructure, allowing developers to focus on data modeling and application logic.
Why Does Cosmos DB Exist?
Traditional relational databases struggle with global scale due to the overhead of ACID transactions and synchronous replication. Cosmos DB was built to solve the challenges of modern cloud applications: millions of users worldwide, sub-second response times, 99.999% availability, and the ability to handle both read-heavy and write-heavy workloads. It achieves this through a combination of horizontal partitioning (sharding), asynchronous replication, and a resource governance model based on Request Units (RUs).
How It Works Internally
Cosmos DB stores data in containers (like tables in SQL) that are partitioned across physical partitions. Each physical partition is a set of replicas (typically 4) that are replicated across the regions where the database account is configured. When you write data, the operation is first committed to a quorum of replicas in the local region (write region), then asynchronously replicated to other regions. The consistency model determines how quickly replicas in other regions see the write. Reads can be served from any region, and if multi-master is enabled, writes can be accepted in any region.
Key Components, Values, Defaults, and Timers
Request Unit (RU): The currency for throughput. 1 RU = 1 KB read of a 1 KB document using session consistency. Default throughput per container: 400 RU/s (minimum). Maximum: unlimited (if autoscale or manual). RUs are consumed by operations: a point read (1 KB) = 1 RU, a 1 KB write = 5 RUs, a 10-item query = 10-50 RUs depending on complexity.
Consistency Levels (5): Strong, Bounded Staleness (default staleness: 100,000 operations or 5 seconds), Session (default), Consistent Prefix, Eventual. Strong guarantees linearizability but requires synchronous replication across all regions (higher latency). Eventual offers lowest latency but no ordering guarantees.
Partition Key: A JSON property (or synthetic key) that determines how data is distributed across physical partitions. Choose a key with high cardinality (e.g., user ID) to avoid hot partitions. Maximum partition size: 20 GB. Throughput per logical partition: 10,000 RU/s.
Time-to-Live (TTL): Default: none. Can be set at container level (in seconds) to automatically delete expired documents. Example: TTL = 86400 deletes documents after 24 hours.
Indexing Policy: By default, all properties are indexed automatically. You can customize to exclude or include specific paths to reduce RU cost for writes. Default indexing mode: Consistent (index updated synchronously). Alternative: Lazy (asynchronous, eventually consistent).
Multi-Master: When enabled, writes can be accepted in any region. Conflict resolution policies: Last Writer Wins (LWW) using a timestamp or custom conflict resolution.
SDKs: .NET, Java, Python, Node.js, Go. Use Direct mode (TCP) for best performance; Gateway mode (HTTPS) for restricted networks.
Configuration and Verification Commands
Create a Cosmos DB account (Azure CLI):
az cosmosdb create --name mycosmosdb --resource-group myrg --locations regionName=eastus failoverPriority=0 --locations regionName=westus failoverPriority=1 --default-consistency-level SessionCreate a database and container:
az cosmosdb sql database create --account-name mycosmosdb --name mydb --resource-group myrg
az cosmosdb sql container create --account-name mycosmosdb --database-name mydb --name mycontainer --partition-key-path "/userId" --throughput 400Check throughput:
az cosmosdb sql container throughput show --account-name mycosmosdb --database-name mydb --name mycontainer --resource-group myrgInteraction with Related Technologies
Cosmos DB integrates with Azure Functions (triggers and bindings), Azure Stream Analytics (as output), Azure Cognitive Search (indexer), and Azure Data Factory (copy activity). For change feed, you can use Azure Functions or a custom processor to react to inserts and updates. Cosmos DB also supports the MongoDB API, Cassandra API, Gremlin API, and Table API, allowing migration from existing NoSQL databases with minimal code changes.
Select API and Consistency Model
Choose the appropriate API (SQL, MongoDB, Cassandra, Gremlin, Table) based on your application's existing drivers or data model. For greenfield projects, SQL API is recommended due to its native support for indexing and queries. Then select a default consistency level. For globally distributed apps, Session is the most common default as it provides read-your-writes guarantees with low latency. Strong consistency should only be used when absolute ordering is required, as it increases write latency and limits availability during region failures.
Design Partition Key
Analyze your data access patterns to choose a partition key that evenly distributes requests. A good partition key has high cardinality (e.g., 10,000+ distinct values) and is used in most queries as a filter. Avoid keys with a few values (e.g., 'status' with values 'active' and 'inactive') as they cause hot partitions. Use a synthetic key if necessary, such as combining user ID and date. The partition key cannot be changed after container creation, so careful design is critical.
Provision Throughput (RUs)
Estimate required RU/s based on workload: 10 RU/s per 1 KB write, 1 RU/s per 1 KB point read, and 2-5 RU/s per 1 KB query result. Use the Cosmos DB Capacity Calculator or Azure Monitor metrics to analyze traffic. Start with autoscale (max RU/s = 2x initial estimate) to handle bursts. For predictable workloads, use manual throughput. Minimum 400 RU/s per container or database. For shared throughput, provision at database level (minimum 400 RU/s total, up to 25 containers).
Configure Global Distribution
Add regions via Azure Portal or CLI. Specify write region (failover priority 0) and read regions. Enable multi-master if writes must be accepted in multiple regions (requires conflict resolution policy). Understand that with strong consistency, all regions must confirm writes, increasing latency. Use Service-Managed Failover for automatic failover; Manual failover for planned maintenance. Test failover using Azure CLI: `az cosmosdb failover-priority-change`.
Implement SDK and Change Feed
Use the appropriate SDK (e.g., Microsoft.Azure.Cosmos for .NET). Configure connection mode: Direct mode with TCP for lower latency (port 10255 must be open). For high availability, use `CosmosClientOptions` with `ApplicationPreferredRegions` and `ApplicationRegion`. Implement retry policies for 429 (rate limiting) errors. Use Change Feed to react to data changes: set up a processor with `Container.GetChangeFeedProcessorBuilder`. The change feed is persisted per partition and can be used for real-time analytics or caching.
Monitor and Optimize
Use Azure Monitor metrics: Total Request Units, Normalized RU Consumption (per partition), Server-Side Latency, 429 responses. Set alerts when Normalized RU Consumption exceeds 90%. Optimize indexing: exclude unused paths to reduce write RU cost. For read-heavy workloads, use materialized views via change feed. For write-heavy, consider batching and using bulk mode in SDK (throughput up to 10x). Regularly review partition key distribution using PartitionKeyStatistics in Azure Monitor.
Scenario 1: E-Commerce Product Catalog
A global e-commerce company uses Cosmos DB to store product catalog data across 3 regions (US, Europe, Asia). They chose SQL API with Session consistency. The partition key is categoryId (e.g., 'electronics', 'clothing') which initially seemed good but caused hot partitions for popular categories like 'electronics'. To fix, they switched to a synthetic key combining categoryId and productId (e.g., 'electronics-12345'). They provisioned 10,000 RU/s with autoscale (max 20,000 RU/s) to handle Black Friday spikes. Misconfiguration: initially they set TTL to 0 (disabled) and never cleaned up old products, causing storage costs to balloon. Solution: implemented TTL of 90 days for inactive products and used change feed to archive to Blob Storage.
Scenario 2: IoT Telemetry Ingestion
A manufacturing company ingests sensor data from 100,000 devices every 5 seconds. They use Cosmos DB with Cassandra API for low-latency writes. Partition key is deviceId (high cardinality). They provision 100,000 RU/s at database level (shared across 25 containers). Problem: write spikes cause 429 errors. Solution: implement client-side retry with exponential backoff and enable bulk mode in the Cassandra driver. They also used multi-master to write to the nearest region (US and Europe) to reduce latency. Common mistake: they initially used strong consistency for all writes, causing high latency; switched to Eventual consistency for telemetry data (acceptable because analytics can tolerate slight staleness).
Scenario 3: Social Media Feed
A social media app uses Cosmos DB to store user posts and generate a real-time feed. They use SQL API with Change Feed to update a materialized view in a second container. Partition key is userId. They provision 5,000 RU/s per container. Issue: cross-partition queries for the feed (querying posts from friends) were slow and expensive (high RU). They redesigned to use a fan-out pattern: when a user posts, the change feed triggers an Azure Function that writes the post to each follower's feed container (denormalization). This increased write RU but reduced read RU by 90%. Misconfiguration: they initially set indexing policy to default (index all) which increased write RU; they optimized by excluding unnecessary fields from indexing.
What AZ-204 Tests on Cosmos DB
The exam focuses on Objective 2.1: 'Develop solutions that use Cosmos DB storage.' Specific sub-objectives include:
Choose the appropriate API and SDK
Implement partitioning schemes
Configure consistency levels
Manage throughput and scaling
Implement change feed
Handle global distribution and multi-region writes
Common Wrong Answers and Why
Choosing Strong consistency for all workloads: Candidates think strong is 'best' because it's most consistent. But the exam emphasizes that Strong increases latency and reduces availability. The correct answer is usually Session or Eventual for globally distributed apps.
Using partition key with low cardinality: E.g., status (active/inactive). Candidates forget that this creates hot partitions. Correct answer: high cardinality like userId.
Setting RU/s too low to save costs: Candidates underestimate traffic. Exam questions test that you must add buffer for spikes. Use autoscale or provision 2x estimated peak.
Assuming change feed captures deletes: It does NOT by default. You must enable 'delete' tracking via a soft-delete flag or use a custom solution.
Specific Numbers and Terms
Minimum RU/s per container: 400
Maximum storage per logical partition: 20 GB
Default session consistency
Bounded staleness thresholds: 100,000 operations or 5 seconds
RU cost for 1 KB point read: 1 RU
RU cost for 1 KB write: 5 RU
Port for Direct mode: 10255 (TCP)
Default indexing mode: Consistent
Autoscale max RU/s: 2x the set max (e.g., set max 4000, actual max 8000)
Edge Cases and Exceptions
Multi-master conflict resolution: Must choose LWW or custom. If not set, LWW is default.
Change feed for deletes: Not supported natively; use TTL to delete after a period, or implement soft delete.
Throughput migration: You can't reduce RU/s below 10% of current provisioned within 24 hours (due to scaling limits).
Partition key change: Not possible after container creation. You must create a new container and migrate data.
How to Eliminate Wrong Answers
If an answer suggests using strong consistency for a globally distributed app with low-latency requirements, eliminate it.
If an answer proposes a partition key with low cardinality (e.g., 'color'), eliminate it.
If an answer says change feed captures deletes, eliminate it.
If an answer suggests using Gateway mode for high throughput, eliminate it (Direct mode is faster).
Cosmos DB offers five APIs: SQL, MongoDB, Cassandra, Gremlin, and Table.
The default consistency level is Session, which provides read-your-writes guarantees.
Minimum throughput per container is 400 RU/s; per database shared throughput is also 400 RU/s.
A good partition key has high cardinality (e.g., userId) and is used in most queries as a filter.
Each logical partition can store up to 20 GB and handle up to 10,000 RU/s.
Strong consistency requires synchronous replication across all regions, increasing write latency.
Change feed does not capture deletes; use soft-delete or TTL to handle deletions.
Direct mode (TCP) provides better performance than Gateway mode (HTTPS) for SDK connections.
Autoscale throughput allows scaling from 10% to 2x the set maximum RU/s.
Multi-master writes require a conflict resolution policy (default is Last Writer Wins).
These come up on the exam all the time. Here's how to tell them apart.
SQL API
Native Cosmos DB API with rich querying using SQL syntax.
Supports stored procedures, triggers, and UDFs in JavaScript.
Best for new applications; no driver compatibility issues.
Indexing is automatic by default; can be customized.
Change feed is available natively.
MongoDB API
Wire-protocol compatible with MongoDB drivers (version 4.0).
Limited query capabilities compared to native SQL API (no JOIN across collections).
Best for migrating existing MongoDB applications with minimal code changes.
Indexing follows MongoDB conventions; some features like unique indexes are limited.
Change feed is available but requires using MongoDB change streams.
Mistake
Cosmos DB is just a document database like MongoDB.
Correct
Cosmos DB is multi-model: it supports document (SQL API), key-value (Table API), graph (Gremlin API), column-family (Cassandra API), and MongoDB API. It is not limited to documents.
Mistake
Strong consistency is always the best choice for data integrity.
Correct
Strong consistency requires synchronous replication across all regions, increasing write latency and reducing availability during region failures. For most globally distributed apps, Session or Eventual consistency is sufficient and recommended.
Mistake
Partition key can be changed after container creation.
Correct
The partition key is immutable once the container is created. To change it, you must create a new container and migrate data using the change feed or bulk operations.
Mistake
Provisioning more RU/s always improves performance linearly.
Correct
Performance can be limited by hot partitions. If a single partition key value (e.g., a popular user) receives too many requests, it can hit the 10,000 RU/s per partition limit, causing throttling even if total RU/s is high.
Mistake
Change feed captures all operations including deletes.
Correct
By default, change feed only captures inserts and updates. Deletes are not included. To capture deletes, you must implement a soft-delete pattern (e.g., set a 'deleted' flag) and use TTL for physical deletion.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
Use autoscale when your workload has unpredictable spikes or variable traffic. Autoscale allows throughput to scale from 10% to 2x the set maximum RU/s based on demand. You pay for the maximum RU/s used each hour. Manual throughput is suitable for predictable workloads where you want fixed capacity and cost. Note that autoscale cannot be set below 400 RU/s (1000 RU/s max for autoscale) and you can switch between modes, but there is a 24-hour cooldown after reducing throughput.
In Cosmos DB Table API, the partition key (PartitionKey) is used to distribute data across physical partitions, similar to other APIs. The row key (RowKey) is a unique identifier within a partition. Together they form the primary key. The partition key must have high cardinality to avoid hot partitions. The row key can be used for efficient point queries when combined with the partition key.
When you receive a 429 error (too many requests), the response includes a `x-ms-retry-after-ms` header indicating how long to wait before retrying. The SDK automatically retries with exponential backoff by default. You can customize retry options in `CosmosClientOptions` (e.g., `MaxRetryAttemptsOnThrottledRequests`). To reduce 429s, increase RU/s, optimize queries (use point reads instead of queries), or ensure even partition key distribution.
Yes, Azure Functions has a Cosmos DB trigger that uses the change feed to react to inserts and updates. You can also use output bindings to write to Cosmos DB from a function. The trigger is ideal for real-time scenarios like updating a materialized view or sending notifications. Note: the trigger does not capture deletes unless you implement a soft-delete pattern.
You can add any number of Azure regions to a Cosmos DB account, up to the total number of Azure regions available (currently over 60). However, consider the cost and latency implications. Each additional region increases write latency if strong consistency is used, and adds storage costs for data replication. For most applications, 2-3 regions are sufficient.
Use the Cosmos DB Capacity Calculator (available in Azure Portal) or use the formula: total RU/s = (number of writes per second * RU cost per write) + (number of reads per second * RU cost per read) + (number of queries per second * RU cost per query). For point reads (1 KB), 1 RU; for writes (1 KB), 5 RU. For queries, test with sample data. Add a 20-30% buffer for spikes. Consider using autoscale to handle variable traffic.
Cosmos DB is a NoSQL database with flexible schema, automatic indexing, and global distribution. Azure SQL Database is a relational database with fixed schema, full T-SQL support, and ACID transactions. Cosmos DB is better for globally distributed, low-latency applications with high throughput, while Azure SQL is better for complex relational queries and transactions. They serve different use cases.
You've just covered Cosmos DB for Developers — now see how well it sticks with free AZ-204 practice questions. Full explanations included, no account needed.
Done with this chapter?