AZ-204Chapter 45 of 102Objective 2.1

Cosmos DB SDK: Bulk Operations and Indexing

This chapter covers two critical Cosmos DB SDK features for high-throughput scenarios: bulk operations and indexing policies. On the AZ-204 exam, these topics appear in roughly 10-15% of questions related to Storage (Objective 2.1: Develop solutions that use Cosmos DB storage). You must understand how the SDK's bulk mode works internally, how to configure indexing policies to balance write performance and query efficiency, and the exact thresholds and defaults the exam tests. Mastering these concepts will allow you to design scalable, cost-effective solutions and avoid common pitfalls that lead to throttled requests or excessive RU consumption.

25 min read
Intermediate
Updated May 31, 2026

Warehouse Shipping Dock Analogy

Imagine a large warehouse shipping department that must send out thousands of packages each day. Without bulk processing, each package is individually carried to the dock, a shipping label is printed and attached, and the package is loaded onto a truck. This single-package-at-a-time approach is slow because each package requires a full round trip from the storage area to the dock and back. Now consider bulk processing: the worker collects 100 packages at once, places them on a pallet, prints a single batch shipping manifest, and loads the entire pallet onto the truck. The truck departs only when full, and the manifest is sent electronically to the recipient. This is exactly how Cosmos DB bulk operations work. Instead of sending individual point operations (one document at a time), the SDK batches multiple operations into a single network request. The SDK's BulkExecutor (now integrated into the v3 SDK) accumulates operations in memory, groups them by physical partition, and sends them as a single batch when a threshold (e.g., 100 operations or 2 MB payload) is reached. The server processes the batch atomically per partition, returning a consolidated response. Just as the warehouse reduces trips and label printing, bulk operations reduce network round trips and per-operation overhead, dramatically increasing throughput. The analogy also extends to indexing: the warehouse's inventory system indexes each package's destination, weight, and priority. When packages arrive in bulk, the indexing is updated in a batch rather than one by one, which is more efficient. Cosmos DB's default automatic indexing indexes every property of every document; with bulk operations, the indexing engine processes the batch as a unit, but still applies indexing policies consistently. Misconfiguring indexing (e.g., turning off indexing on a collection expecting bulk inserts) is like removing the inventory system entirely — you lose query ability but gain write speed, a trade-off the exam tests.

How It Actually Works

What Are Bulk Operations and Why Do They Exist?

Cosmos DB is a fully managed NoSQL database that scales horizontally across physical partitions. Each operation — create, read, update, delete — consumes Request Units (RUs) and incurs network latency. When applications need to ingest large volumes of data (e.g., IoT telemetry, log streaming, data migration), performing individual point operations is inefficient due to per-request overhead. Bulk operations allow you to group multiple operations into a single network call, reducing latency and improving throughput.

The Azure Cosmos DB .NET SDK v3 (and similar patterns in other SDKs) introduced a built-in bulk mode that replaces the older BulkExecutor library. In bulk mode, the SDK automatically batches concurrent operations and sends them to the service in a single request. This is not a separate API; it's a mode you enable via AllowBulkExecution = true in the CosmosClientOptions.

How Bulk Operations Work Internally

When bulk mode is enabled, the SDK does the following:

Operation Accumulation: Each CreateItemAsync, UpsertItemAsync, ReplaceItemAsync, etc., call is added to an internal buffer. The buffer is organized by physical partition key range. Operations targeting the same partition are grouped together.

Batching Thresholds: The SDK flushes the buffer when either:

The number of operations in a partition group reaches 100 (default).

The total payload size for a partition group exceeds 2 MB.

A configurable timeout (default 1 second) elapses since the last flush.

Batch Request: The SDK sends a single batch request to the Cosmos DB gateway or direct mode endpoint. The batch contains multiple operations in a structured format (JSON array). The service processes the batch atomically per partition: all operations in the batch succeed or fail as a group if there is a conflict (e.g., duplicate IDs). For non-conflicting errors (e.g., individual operation failures), the batch may partially succeed.

Response Handling: The service returns a batch response with status codes for each operation. The SDK parses the response and surfaces individual exceptions or success results to the caller. If the batch is throttled (HTTP 429), the SDK retries the entire batch after a delay, respecting the retry policy.

Flow Control: Bulk mode inherently provides backpressure. If the client is producing operations faster than the service can consume, the buffer grows, and the SDK may throttle the producer (e.g., by blocking Task continuations). This prevents overwhelming the database.

Key Components, Values, and Defaults

`CosmosClientOptions.AllowBulkExecution`: Boolean. Default false. Set to true to enable bulk mode.

Max concurrency per partition: The SDK limits the number of concurrent batch requests per partition to avoid head-of-line blocking. Default is 10 concurrent batches per partition.

Batch size: Default 100 operations per batch per partition. Configurable via CosmosClientOptions.MaxRetryAttemptsOnRateLimitedRequests? No — batch size is not directly configurable in the SDK, but you can influence it by controlling the rate of operations.

Payload limit: 2 MB per batch request. If the accumulated payload exceeds 2 MB, the SDK flushes even if the operation count is below 100.

Timeout: The SDK flushes after 1 second of inactivity (no new operations added to the buffer). This ensures that low-volume streams still make progress.

Retry policy: The SDK retries on 429 (rate limiting) using exponential backoff. Default MaxRetryAttemptsOnRateLimitedRequests is 9, with a maximum retry wait time of 30 seconds.

Indexing Policies: What They Are and Why They Matter

Cosmos DB automatically indexes every property of every document by default. This enables rich queries without manual index management, but it comes at a cost: every write operation must update the index, consuming additional RUs and increasing write latency. Indexing policies allow you to control which properties are indexed, the index kind (hash, range, spatial), and the precision. For bulk write scenarios, you may want to disable indexing temporarily to maximize throughput.

Indexing Policy Components: - `automatic`: Boolean. Default true. If true, the index is automatically updated on writes. If false, writes do not update the index (but queries may return stale or no results unless you manually rebuild the index). - `indexingMode`: consistent (default) or lazy (deprecated, not recommended). consistent means the index is updated synchronously on every write. lazy was an asynchronous mode that could lead to inconsistent query results; it is no longer supported in most SDKs. - `includedPaths`: Array of paths to include in the index. Default includes all paths (/*). You can restrict to specific paths like /name/?. - `excludedPaths`: Array of paths to exclude from indexing. Useful for large binary or unused properties. - `compositeIndexes`: For multi-property order-by queries. Not covered in detail here but important for performance. - `spatialIndexes`: For geospatial queries.

Indexing and RU Consumption: - Each indexed property adds ~2-10 RUs of write cost per operation, depending on property size and index kind. - Excluding paths or disabling indexing can reduce write RU consumption by 50-80%. - However, queries on non-indexed properties require full scan (cross-partition if not specified), which is expensive and slow.

Configuration and Verification Commands

Enable Bulk Mode in .NET SDK v3:

CosmosClientOptions options = new CosmosClientOptions
{
    AllowBulkExecution = true
};
CosmosClient client = new CosmosClient(connectionString, options);

Set Indexing Policy via Azure CLI:

az cosmosdb sql container update \
    --resource-group myResourceGroup \
    --account-name myAccount \
    --database-name myDatabase \
    --name myContainer \
    --idx @indexing-policy.json

Example indexing-policy.json to exclude all paths (indexing off):

{
  "indexingMode": "consistent",
  "automatic": true,
  "includedPaths": [],
  "excludedPaths": [
    {
      "path": "/*"
    }
  ]
}

Note: Setting includedPaths to empty and excludedPaths to /* effectively disables indexing. You must set automatic to false to completely stop index updates.

Monitor RU Consumption: - Use Azure Monitor metrics: Total Request Units, Max Consumed RU/s per partition. - In the .NET SDK, capture ResponseMessage.RequestCharge for each operation.

Interaction Between Bulk Operations and Indexing

When bulk mode is enabled and indexing is active, each batch operation still incurs indexing RU costs. The batch is processed atomically, and the index is updated for each document in the batch. If indexing is disabled, the batch write RU cost drops significantly, but you lose the ability to query without a full scan. The exam often tests the trade-off: for migration or high-volume ingestion, temporarily disable indexing to increase throughput, then re-enable and rebuild the index afterward.

Index Transformation: You can change the indexing policy on an existing container. This triggers an index transformation that consumes RUs and can take time. During transformation, writes and queries continue, but the index may be in an intermediate state. The SDK handles this transparently, but you should monitor progress via the Azure portal or CLI.

Common Trap Patterns on the Exam

Assuming bulk mode is always faster: Bulk mode reduces network overhead but does not reduce RU consumption per operation. In fact, if the client is not producing enough operations to fill batches, bulk mode may add latency due to buffering.

Confusing `AllowBulkExecution` with `BulkExecutor`: The older BulkExecutor library (v2 SDK) is deprecated. The v3 SDK's built-in bulk mode is the recommended approach.

Thinking indexing off means zero RU write cost: Even without indexing, writes consume RU for storage and replication. Indexing adds overhead, but the base write RU still applies.

Using `lazy` indexing mode: This mode is deprecated and should not be used. The exam expects you to know it exists but that consistent is the only supported mode.

**Setting automatic: false but leaving includedPaths as /*:** This still indexes on write because automatic: true is the default and overrides. You must set both automatic: false and excludedPaths: [{"path":"/*"}] to truly disable indexing.

Summary of Key Defaults

| Parameter | Default Value | |-----------|---------------| | AllowBulkExecution | false | | Batch size per partition | 100 operations | | Batch payload limit | 2 MB | | Flush timeout | 1 second | | Max concurrency per partition | 10 batches | | Indexing automatic | true | | Indexing indexingMode | consistent | | Max retry attempts on 429 | 9 | | Max retry wait time | 30 seconds |

Walk-Through

1

Enable Bulk Execution on Client

Create a `CosmosClient` instance with `AllowBulkExecution = true` in the `CosmosClientOptions`. This tells the SDK to use the internal batching mechanism. Without this, each `CreateItemAsync` call sends a separate HTTP request. The client internally initializes a `BulkOperations` executor that manages concurrent batches. You must also ensure the application is using the .NET SDK v3.12.0 or later (or equivalent in other languages). The SDK will then route all point operations through the bulk pipeline.

2

Accumulate Operations in Buffer

As you call `CreateItemAsync`, `UpsertItemAsync`, etc., the SDK does not immediately send a request. Instead, it adds the operation to a concurrent dictionary keyed by physical partition key range. Each operation is serialized to JSON and stored in a memory buffer. The buffer is bounded by the number of operations and total size. If the buffer exceeds 100 operations for a partition or 2 MB total, the SDK triggers a flush. The client can continue to add operations while a flush is in progress; the SDK handles concurrency using a semaphore per partition.

3

Flush Batch to Cosmos DB Service

When a flush condition is met, the SDK constructs a batch request. The batch is a JSON array of operations, each with a unique ID, operation type, and document payload. The request is sent to the Cosmos DB gateway (or directly to replicas in direct mode). The service receives the batch and processes it sequentially within the partition. It checks for conflicts (e.g., duplicate IDs), applies indexing if enabled, and commits the writes. The service returns a batch response with individual status codes (e.g., 201 Created, 409 Conflict, 429 Throttled).

4

Handle Batch Response and Retries

The SDK parses the batch response. For each operation, it raises the corresponding `Task` completion (success or exception). If the entire batch is throttled (HTTP 429), the SDK retries the entire batch after a delay based on the retry policy. If only some operations fail (e.g., 409 Conflict), those operations surface exceptions individually; the successful ones complete normally. The SDK also monitors the rate of successful operations and adjusts concurrency to avoid overwhelming the service.

5

Configure Indexing Policy for Bulk

Before starting a bulk ingestion, evaluate the indexing policy. For maximum write throughput, set `automatic` to `false` and exclude all paths (`"excludedPaths": [{"path": "/*"}]`). This prevents index updates on writes, reducing RU consumption by up to 80%. After the bulk load, change the policy back to index the needed paths and trigger an index transformation. This transformation runs asynchronously and can take minutes to hours depending on data volume. During transformation, queries may not return complete results until the index is rebuilt.

What This Looks Like on the Job

Scenario 1: IoT Telemetry Ingestion A manufacturing company ingests sensor data from thousands of devices every second. Each device sends a JSON document with temperature, pressure, and vibration readings. They use Cosmos DB with bulk mode enabled. The ingestion service runs on Azure Functions with a Cosmos DB output binding. By enabling AllowBulkExecution, they achieve 50,000 writes per second across 10 physical partitions, consuming 100,000 RU/s. The indexing policy excludes all paths except /deviceId and /timestamp to reduce write RU cost. Queries are limited to recent data by device ID, which uses the indexed properties. The team monitors the TotalRequestUnits metric and scales the RU/s manually during high-load events. A common misconfiguration is forgetting to set AllowBulkExecution = true in the Functions host builder, resulting in individual requests and throttling at 10,000 RU/s.

Scenario 2: Database Migration A retail company migrates from MongoDB to Cosmos DB. They use the Cosmos DB Data Migration Tool (which internally uses bulk operations) to copy 10 TB of data. Before migration, they set the container's indexing policy to automatic: false and excludedPaths: [{"path":"/*"}]. This reduces the write RU cost by 70%, allowing them to use a smaller RU/s provisioned throughput (e.g., 50,000 RU/s instead of 200,000). After migration, they change the policy to include all paths (includedPaths: [{"path":"/*"}]) and set automatic: true. The index transformation takes 12 hours, during which queries are slow but the migration is complete. The team learns that they must not change the partition key during migration, as that would require a new container. A pitfall is forgetting to rebuild the index before running production queries, leading to full scans and high RU consumption.

Scenario 3: E-commerce Order Processing An e-commerce platform processes orders in batches during peak hours. Each order triggers multiple writes: order header, line items, and inventory updates. They use bulk mode to batch all writes for a single order into one batch (since they share the same partition key, e.g., customerId). This reduces the number of requests from 10 per order to 1 batch. The indexing policy includes all properties except orderDetails.internalNotes (excluded path) to save RU. They also use composite indexes for sorting orders by date and status. A frequent issue is hitting the 2 MB batch payload limit when an order has many line items; the SDK automatically splits the batch into multiple batches for the same partition. The team monitors the BulkOperationCount metric to ensure batches are being created efficiently.

How AZ-204 Actually Tests This

AZ-204 Objective 2.1: Develop solutions that use Cosmos DB storage specifically tests your ability to "implement bulk operations" and "configure indexing policies." The exam expects you to know:

1.

Bulk mode is enabled via `AllowBulkExecution = true` in `CosmosClientOptions`. The older BulkExecutor class is deprecated and should not be used. A common wrong answer is selecting BulkExecutor or DocumentClient options.

2.

Default batch size is 100 operations or 2 MB, whichever comes first. Many candidates mistakenly think the batch size is configurable directly; it is not. The SDK manages it internally. Another trap: thinking the batch size is based on a timer only (the 1-second timeout is a fallback).

3.

Indexing policy can be changed at any time, but changes trigger an index transformation that consumes RUs. The exam may present a scenario where you need to maximize write throughput during a one-time data load. The correct answer is to disable indexing temporarily by setting automatic: false and excluding all paths. The wrong answer is to set indexingMode: lazy (deprecated) or to delete and recreate the container (which is expensive and loses existing data).

4.

Bulk operations do not reduce RU cost per operation. They reduce network overhead and improve throughput, but the total RU consumed per document is the same as individual writes (plus a small overhead for the batch). A common wrong answer is claiming bulk operations reduce RU cost.

5.

The exam tests the trade-off between write performance and query performance. For example: "You need to ingest 1 million records as fast as possible. After ingestion, users will query by timestamp. What should you do?" Correct: disable indexing during ingestion, then enable and rebuild index. Wrong: keep indexing on, or use a different partition key.

6. Specific numbers to memorize: - Batch size: 100 operations per partition. - Payload limit: 2 MB. - Flush timeout: 1 second. - Max concurrency per partition: 10 batches. - Default max retry attempts: 9. - Default max retry wait time: 30 seconds.

7. Edge cases: - If a batch contains operations with duplicate IDs, the entire batch fails with a 409 Conflict. The SDK does not retry 409 errors. - Bulk mode works only with point operations (Create, Upsert, Replace, Delete). It does not batch query operations. - Bulk mode is not available in serverless accounts (provisioned throughput only).

How to eliminate wrong answers: - If the answer mentions BulkExecutor, it's wrong (deprecated). - If the answer says bulk operations reduce RU cost, it's wrong. - If the answer suggests using lazy indexing, it's wrong. - If the answer claims you can change the partition key after creation, it's wrong. - If the answer says you must use a stored procedure for bulk operations, it's wrong (stored procedures are for transactional batches within a partition, but they are not the same as SDK bulk mode).

Key Takeaways

Enable bulk mode by setting `AllowBulkExecution = true` in `CosmosClientOptions`.

Default batch size is 100 operations per partition or 2 MB payload, whichever comes first.

Bulk mode does not reduce RU cost per operation; it reduces network overhead.

To maximize write throughput during bulk ingestion, disable indexing by setting `automatic: false` and excluding all paths.

After bulk ingestion, re-enable indexing and allow the index transformation to complete.

The `lazy` indexing mode is deprecated and should not be used.

Bulk mode is not supported on serverless Cosmos DB accounts.

The SDK retries throttled batches up to 9 times with exponential backoff (max wait 30 seconds).

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Individual Point Operations

Each operation sends a separate HTTP request.

Higher network overhead per operation.

Slower throughput due to per-request latency.

No buffering; operations are sent immediately.

Easier to debug with per-operation logs.

Bulk Operations (SDK Bulk Mode)

Multiple operations grouped into a single HTTP request.

Lower network overhead per operation.

Higher throughput due to reduced round trips.

Operations are buffered and sent in batches.

Harder to debug; individual failures are surfaced in batch response.

Indexing Enabled (Default)

All properties indexed automatically.

Higher write RU consumption (2-10 RU per property).

Queries are fast and efficient.

Index is always up-to-date.

Best for mixed read/write workloads.

Indexing Disabled (Temporary)

No properties indexed.

Lower write RU consumption (up to 80% reduction).

Queries require full scan (slow and expensive).

Index must be rebuilt after re-enabling.

Best for bulk ingestion with no immediate query needs.

Watch Out for These

Mistake

Bulk operations reduce the total RU cost per document.

Correct

Bulk operations do not reduce RU consumption per operation. Each document write still consumes the same number of RUs as an individual write. The benefit is reduced network overhead and improved throughput by batching multiple operations into one request. The total RU cost for a set of documents is the same whether sent individually or in bulk.

Mistake

You can configure the batch size in CosmosClientOptions.

Correct

The batch size is not directly configurable. The SDK uses internal thresholds: 100 operations per partition or 2 MB payload, whichever is reached first. You can influence batching by controlling the rate at which you submit operations, but you cannot set a custom batch size via the SDK options.

Mistake

Setting indexingMode to 'lazy' is a valid way to improve write performance.

Correct

The 'lazy' indexing mode is deprecated and not recommended. It provided asynchronous indexing that could lead to inconsistent query results. The only supported indexing mode is 'consistent', which updates the index synchronously on every write.

Mistake

Bulk mode works with any Cosmos DB account, including serverless.

Correct

Bulk mode requires provisioned throughput (manual or autoscale). Serverless accounts do not support bulk mode because they have a maximum RU/s limit that cannot be exceeded by batching. Attempting to use bulk mode on a serverless account will result in an error.

Mistake

Disabling indexing completely eliminates RU consumption for writes.

Correct

Even with indexing disabled, writes still consume RUs for storage, replication, and consistency. Indexing adds overhead (typically 2-10 RU per property), but the base write RU cost remains. Disabling indexing can reduce RU consumption by up to 80%, not 100%.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

How do I enable bulk operations in the .NET SDK v3 for Cosmos DB?

Set `AllowBulkExecution = true` in `CosmosClientOptions` when creating the `CosmosClient` instance. For example: `var client = new CosmosClient(connectionString, new CosmosClientOptions { AllowBulkExecution = true });`. Then use `CreateItemAsync`, `UpsertItemAsync`, etc., as usual. The SDK automatically batches operations. Ensure you are using SDK version 3.12.0 or later.

Can I use bulk operations with any Cosmos DB API (SQL, MongoDB, Cassandra, etc.)?

Bulk mode is primarily available in the SQL API SDKs (e.g., .NET, Java, Python). For MongoDB API, you can use the MongoDB driver's bulk write operations (e.g., `InsertMany`), which are similar but not the same as Cosmos DB SDK bulk mode. For Cassandra and Gremlin APIs, bulk operations are handled differently and are not part of the SQL SDK bulk mode.

What happens if a batch operation fails due to a conflict (409)?

If any operation in a batch has a conflict (e.g., duplicate ID), the entire batch fails with a 409 Conflict. The SDK does not retry 409 errors because they indicate a client-side issue. You must handle the conflict by catching the `CosmosException` and taking appropriate action (e.g., using upsert instead of create).

Does bulk mode work with the Change Feed?

Bulk mode is for write operations only. The Change Feed is a read-only feature that captures inserts and updates. You can use bulk mode to write data that will later appear in the Change Feed. The Change Feed processor reads changes individually, not in bulk.

How do I monitor the performance of bulk operations?

Use Azure Monitor metrics like `Total Request Units`, `Max Consumed RU/s per partition`, and `BulkOperationCount` (if available). In the SDK, you can capture `ResponseMessage.RequestCharge` for each batch to see RU consumption. Also monitor the `CosmosClient` diagnostics for batch sizes and flush events.

What is the difference between bulk mode and stored procedures for batch operations?

Stored procedures run server-side and can perform multiple operations within a single transaction (all or nothing) on the same partition. Bulk mode batches operations client-side and sends them as a single request, but each operation is not transactional across the batch (except per-partition atomicity). Bulk mode is simpler and does not require writing JavaScript, while stored procedures offer transactional guarantees but are limited to a single partition.

Can I use bulk mode with autoscale provisioned throughput?

Yes, bulk mode works with both manual and autoscale provisioned throughput. However, autoscale may limit the maximum RU/s, so if you exceed that, you will get 429 throttling. Bulk mode's retry policy will handle this, but throughput may be lower than with manual throughput set to a high value.

Terms Worth Knowing

Ready to put this to the test?

You've just covered Cosmos DB SDK: Bulk Operations and Indexing — now see how well it sticks with free AZ-204 practice questions. Full explanations included, no account needed.

Done with this chapter?