DP-900Chapter 62 of 101Objective 3.1

Azure Synapse Link for Cosmos DB

This chapter covers Azure Synapse Link for Azure Cosmos DB, a cloud-native hybrid transactional/analytical processing (HTAP) capability that enables near-real-time analytics on operational data without impacting transaction performance. For the DP-900 exam, understanding Synapse Link is critical because it appears in multiple questions on analytics workloads, typically representing 5-10% of the exam. You will need to know its architecture, benefits, and how it differs from traditional ETL/ELT approaches.

25 min read
Intermediate
Updated May 31, 2026

Real-Time Data Pipeline for Analytics

Imagine a large retail warehouse with two separate departments: the sales floor (OLTP) and the analytics office (OLAP). The sales floor processes customer purchases in real time, updating inventory counts and receipts instantly. The analytics office needs to generate daily sales reports, but traditionally, they had to wait until the end of the day when the sales floor closed, then manually carry all the day's receipts over to the analytics office (ETL batch process). This was slow, and the reports were always a day behind. Now, consider a new system: a conveyor belt (Azure Synapse Link) that runs directly from the sales floor to the analytics office. Every time a cashier rings up a sale, the receipt is automatically placed on the conveyor belt and arrives in the analytics office within seconds. The analytics office can then immediately update its dashboards and reports without waiting for the store to close. The key mechanic is that the conveyor belt only carries the changes (deltas) — not the entire inventory each time. It uses a special sensor (Change Feed) that detects every new or modified receipt and sends it along the belt. The analytics office stores the incoming data in a special format (columnar storage) optimized for fast queries, not for individual transaction lookups. This way, the sales floor continues to operate at high speed for customer transactions, while the analytics office gets near-real-time data without interfering with sales operations.

How It Actually Works

What is Azure Synapse Link for Cosmos DB?

Azure Synapse Link for Cosmos DB is a cloud-native HTAP (Hybrid Transactional/Analytical Processing) feature that seamlessly integrates Azure Cosmos DB (operational database) with Azure Synapse Analytics (analytics service). It allows you to run near-real-time analytics over your Cosmos DB data without affecting the transactional workload. The key innovation is that analytics queries are run directly against the Cosmos DB data store using the analytical store, not the transactional store. This eliminates the need for complex ETL pipelines and reduces data latency from hours to minutes.

Why Synapse Link Exists

Traditional architectures separate transactional and analytical systems. Data is extracted from the operational database (e.g., Cosmos DB), transformed, and loaded into a data warehouse (e.g., Azure Synapse dedicated SQL pool) in batch processes. This approach has several drawbacks: - Latency: Data can be hours or even days old in the analytics system. - Complexity: ETL pipelines require maintenance and can break. - Cost: Duplicate storage and compute resources are needed. - Impact: Running analytics directly on the transactional store can degrade OLTP performance.

Synapse Link solves these problems by providing a fully managed, real-time analytical store that is automatically synchronized with the transactional store. The analytical store is columnar-optimized for fast scan queries, while the transactional store remains row-oriented for low-latency reads and writes.

How It Works Internally

Azure Synapse Link leverages two key components: 1. Cosmos DB Analytical Store: A columnar representation of the data in Cosmos DB, automatically maintained by the system. When you enable Synapse Link on a Cosmos DB container, the analytical store is created and kept in sync with the transactional store using the Change Feed. 2. Azure Synapse Apache Spark Runtime: A serverless Spark pool in Azure Synapse that can read from the analytical store using a special connector.

The synchronization process works as follows:

Every write to the Cosmos DB transactional store (insert, update, delete) generates a change feed event.

The analytical store processes these events asynchronously and applies the changes to the columnar store.

The latency between a write to the transactional store and its appearance in the analytical store is typically under 2 minutes (often 10-30 seconds).

The analytical store stores data in a columnar format (Parquet-like) that is optimized for analytical queries (aggregations, projections, filters).

Key Components and Defaults

Cosmos DB API: Synapse Link is supported for SQL (Core) API and MongoDB API. For MongoDB, the analytical store stores documents in a relational-like schema.

Analytical Storage Time-to-Live (TTL): You can set a TTL for the analytical store (default: no expiration). This is independent of the transactional TTL.

Throughput: The analytical store does not consume Request Units (RUs). It uses separate, automatically managed throughput resources.

Data Consistency: The analytical store is eventually consistent with the transactional store. The default consistency level for Synapse Link reads is "eventual".

Supported Regions: Available in all Azure regions where both Cosmos DB and Azure Synapse are present.

Configuration and Verification

To enable Synapse Link on an existing Cosmos DB account: 1. In the Azure portal, navigate to your Cosmos DB account. 2. Under "Features", enable "Azure Synapse Link". 3. Create a new container with analytical store enabled (or update an existing container using the EnableAnalyticalStorage property).

Using Azure CLI:

# Enable Synapse Link at account level
az cosmosdb update --name mycosmosdb --resource-group myrg --enable-analytical-storage true

# Create a container with analytical store enabled
az cosmosdb sql container create --account-name mycosmosdb --resource-group myrg --database-name mydb --name mycontainer --partition-key-path /id --analytical-storage-ttl -1

In Azure Synapse, you can then create a linked service to Cosmos DB and load data into a Spark DataFrame:

# Read from Cosmos DB analytical store using Spark
cosmos_df = spark.read.format("cosmos.olap").option("spark.synapse.linkedService", "CosmosDbLinkedService").option("spark.cosmos.container", "mycontainer").load()

Interaction with Related Technologies

Azure Synapse Pipelines: You can use Synapse pipelines to orchestrate data movement from Cosmos DB analytical store to other destinations like Azure Data Lake Storage or dedicated SQL pool.

Power BI: DirectQuery mode is supported against the analytical store, enabling real-time dashboards.

Azure Machine Learning: You can train ML models using data from the analytical store without impacting production.

Cosmos DB Change Feed: The foundation of Synapse Link; it captures all changes in the transactional store.

Performance Considerations

- Latency: Typically under 2 minutes from write to analytical store availability. For updates, the entire document is replaced in the analytical store (no partial updates). - Storage Costs: Analytical storage costs are separate from transactional storage. They are billed per GB/month, typically lower than transactional storage. - Query Performance: Analytical queries on the columnar store are much faster than on the transactional store for aggregations and scans, but they are not as low-latency as transactional point reads. - Limitations: - Analytical store does not support spatial queries or user-defined functions (UDFs). - Only SQL and MongoDB APIs are supported (not Cassandra, Gremlin, or Table). - Analytical store is read-only from Synapse; writes must go to the transactional store.

Exam-Relevant Details

Synapse Link is a no-ETL solution: data is automatically replicated to the analytical store.

It enables HTAP scenarios: run transactional and analytical workloads on the same data.

The analytical store is columnar, optimized for analytical queries.

It uses Change Feed to sync changes.

Latency is near-real-time (seconds to a few minutes).

It does not consume RUs for analytical reads.

You can set analytical TTL independent of transactional TTL.

Supported APIs: SQL (Core) and MongoDB.

Walk-Through

1

Enable Synapse Link on Cosmos DB

In the Azure portal, navigate to your Cosmos DB account. Under the 'Features' blade, locate 'Azure Synapse Link' and enable it. This action enables the analytical store capability at the account level. It may take a few minutes to propagate. You can also use Azure CLI: `az cosmosdb update --enable-analytical-storage true`. This step does not affect existing containers; you must enable analytical store on each container individually.

2

Create container with analytical store

When creating a new container, set the 'Analytical store' option to 'On'. You can also configure the analytical storage TTL (time-to-live) in seconds. A value of -1 means no expiration. For existing containers, you can enable analytical store using Azure CLI: `az cosmosdb sql container update --analytical-storage-ttl -1`. The analytical store will start populating from the existing data and then sync changes via Change Feed.

3

Set up Azure Synapse workspace

Create or use an existing Azure Synapse Analytics workspace. Ensure it is in the same region as your Cosmos DB account to minimize latency. In Synapse Studio, create a linked service to Cosmos DB using the 'CosmosDB (SQL API)' connector. Provide the Cosmos DB account endpoint, database name, and authentication key. This linked service will be used to access the analytical store.

4

Load data into Spark DataFrame

In a Synapse notebook (Spark pool), use the `cosmos.olap` format to read from the analytical store. Example: `df = spark.read.format("cosmos.olap").option("spark.synapse.linkedService", "CosmosDbLinkedService").option("spark.cosmos.container", "mycontainer").load()`. This reads the columnar data directly, without going through the transactional store. You can then apply transformations, aggregations, and write to other destinations.

5

Query and analyze with T-SQL or Power BI

You can also use serverless SQL pool in Synapse to query the analytical store via T-SQL. Create an external table using OPENROWSET or CREATE EXTERNAL TABLE AS SELECT (CETAS). Example: `SELECT * FROM OPENROWSET('CosmosDB', 'Account=myaccount;Database=mydb;Key=mykey', mycontainer)`. For Power BI, use DirectQuery mode with the Cosmos DB connector (SQL API) to build real-time dashboards. This avoids duplicating data and ensures near-real-time updates.

What This Looks Like on the Job

Enterprise Scenario 1: E-commerce Real-Time Dashboard

A large online retailer uses Cosmos DB to store customer orders, inventory, and user sessions. The marketing team needs a real-time dashboard showing current sales by product category, region, and channel. Previously, they ran hourly ETL jobs to a dedicated SQL pool, causing a 60-minute delay. With Synapse Link, they enable analytical store on the orders container. In Azure Synapse, they create a Spark notebook that reads from the analytical store every minute, aggregates sales data, and writes results to a Power BI dataset. The dashboard now updates within 2 minutes of an order being placed. The key consideration is that the analytical store does not consume RUs, so the transactional workload remains unaffected. However, they had to ensure the analytical TTL was set to -1 to retain historical data. A misconfiguration could cause data to expire prematurely, leading to incomplete reports.

Enterprise Scenario 2: IoT Device Telemetry Analysis

A manufacturing company collects sensor data from thousands of IoT devices into Cosmos DB using the MongoDB API. They need to run real-time anomaly detection on temperature and vibration metrics. Previously, they streamed data into Azure Stream Analytics, which added complexity and cost. With Synapse Link, they enable analytical store on the telemetry container. In Synapse, they use a serverless SQL pool to query the analytical store with T-SQL, joining with reference data stored in another container. They built a Power BI report that alerts when average temperature exceeds thresholds. The analytical store's columnar format makes aggregate queries (AVG, MAX, MIN) extremely fast. They discovered that the analytical store does not support MongoDB-specific queries like $lookup or $unwind, so they had to flatten the data model. Also, they set an analytical TTL of 7 days to limit storage costs, as old telemetry is moved to cold storage.

Common Pitfalls

Not enabling Synapse Link at account level: Many users forget to enable the feature on the Cosmos DB account before creating containers. The portal may not show the option if not enabled.

Incorrect linked service configuration: Using the wrong API (e.g., using MongoDB connector for SQL API) leads to errors. Ensure the linked service matches the Cosmos DB API type.

Ignoring analytical TTL: If analytical TTL is set too low, historical data disappears from the analytical store, causing gaps in reports. Default is -1 (no expiration), but users often change it without understanding the impact.

Assuming real-time (sub-second) latency: Synapse Link is near-real-time (seconds to minutes). For sub-second analytics, consider Azure Stream Analytics or change feed triggers.

How DP-900 Actually Tests This

What DP-900 Tests

Objective 3.1: "Describe analytics workloads" includes understanding HTAP and the role of Azure Synapse Link for Cosmos DB. Specific sub-objectives:

Identify scenarios where Synapse Link is appropriate (e.g., real-time dashboards, no-ETL integration).

Understand that Synapse Link uses the analytical store, not the transactional store, for analytics.

Know that it supports SQL and MongoDB APIs only.

Recognize that it does not consume RUs for analytical queries.

Common Wrong Answers

1.

"Synapse Link runs analytics directly on the transactional store." This is wrong because it uses a separate analytical store to avoid impacting OLTP performance. Candidates often confuse it with running queries directly against Cosmos DB.

2.

"Synapse Link requires ETL pipelines to move data." The key selling point is no-ETL; data is automatically synced. Candidates may associate Synapse with traditional data movement.

3.

"Synapse Link supports all Cosmos DB APIs." It only supports SQL and MongoDB. Cassandra, Gremlin, and Table APIs are not supported. The exam may list these as distractors.

4.

"Analytical store consumes RUs." Analytical reads are free from RU consumption. Only transactional operations consume RUs. This is a common trick question.

Specific Numbers and Terms

Latency: "Near-real-time" or "under 2 minutes". The exam may ask about expected delay.

APIs: SQL (Core) and MongoDB.

Feature name: "Azure Synapse Link for Cosmos DB" (not "Synapse Analytics Link" or "Cosmos DB Synapse").

Analytical store TTL: Default is -1 (no expiration).

Consistency level for analytical reads: Eventual.

Edge Cases

If you disable Synapse Link, the analytical store is deleted and must be rebuilt from scratch if re-enabled.

Updates to a document in the transactional store replace the entire document in the analytical store (no partial updates).

The analytical store does not support transactional guarantees; it is eventually consistent.

How to Eliminate Wrong Answers

If an answer mentions "ETL" or "data movement" as a requirement, it is likely wrong because Synapse Link is no-ETL.

If an answer implies sub-second latency, it is wrong; the exam uses "near-real-time".

If an answer says all APIs are supported, it is wrong; only SQL and MongoDB.

If an answer says analytical queries consume RUs, it is wrong.

Key Takeaways

Azure Synapse Link for Cosmos DB enables HTAP by using an analytical store that is automatically synced via Change Feed.

The analytical store is columnar and does not consume RUs for analytical queries.

Only SQL (Core) and MongoDB APIs support Synapse Link.

Data synchronization latency is near-real-time, typically under 2 minutes.

Analytical store TTL is independent of transactional TTL; default is -1 (no expiration).

Synapse Link is a no-ETL solution; no data movement pipelines are required.

Power BI DirectQuery is supported against the analytical store for real-time dashboards.

Analytical store consistency is eventual; it is not transactional.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Azure Synapse Link for Cosmos DB

No ETL pipeline required; data syncs automatically via Change Feed

Near-real-time latency (seconds to minutes)

Analytical store is columnar, optimized for analytics

Does not consume RUs for analytical reads

Supports only SQL and MongoDB APIs

Traditional ETL with Azure Data Factory

Requires building and maintaining ETL pipelines

Batch latency (hours to days)

Data is copied to separate storage (e.g., Data Lake, SQL DW)

Consumes RUs if reading from transactional store

Works with any Cosmos DB API (but requires custom pipeline)

Watch Out for These

Mistake

Azure Synapse Link for Cosmos DB runs analytics directly on the transactional store.

Correct

It uses a separate analytical store (columnar) that is automatically synced via Change Feed, so analytics do not impact transactional performance.

Mistake

Synapse Link is an ETL tool that moves data from Cosmos DB to Synapse.

Correct

It is a no-ETL solution; data remains in Cosmos DB's analytical store and is accessed directly by Synapse without copying.

Mistake

Synapse Link supports all Cosmos DB APIs including Cassandra, Gremlin, and Table.

Correct

It only supports SQL (Core) API and MongoDB API. Other APIs are not supported for analytical store.

Mistake

Analytical queries via Synapse Link consume Cosmos DB Request Units (RUs).

Correct

The analytical store has its own throughput and does not consume RUs from the transactional store. RUs are only for transactional operations.

Mistake

Synapse Link provides real-time (sub-second) data synchronization.

Correct

It is near-real-time with typical latency of seconds to a few minutes (under 2 minutes). It is not sub-second.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is Azure Synapse Link for Cosmos DB?

Azure Synapse Link for Cosmos DB is a cloud-native HTAP feature that enables near-real-time analytics on operational data in Cosmos DB without impacting transactional performance. It automatically replicates data from the transactional store to a columnar analytical store using Change Feed. You can then query the analytical store using Azure Synapse Apache Spark or serverless SQL pools.

How does Synapse Link differ from traditional ETL?

Traditional ETL requires building pipelines to extract data from Cosmos DB, transform it, and load it into a separate analytics store. Synapse Link eliminates this by automatically syncing a columnar analytical store within Cosmos DB. This reduces latency from hours to minutes, simplifies architecture, and avoids additional compute costs for data movement.

Which Cosmos DB APIs are supported by Synapse Link?

Only the SQL (Core) API and MongoDB API are supported. Other APIs like Cassandra, Gremlin, and Table are not supported. You must use the appropriate connector in Azure Synapse to access the analytical store.

Does Synapse Link consume RUs?

No, analytical reads from the analytical store do not consume Request Units (RUs). The analytical store uses its own throughput resources, separate from the transactional store. Only transactional operations (inserts, updates, deletes) consume RUs.

What is the latency of Synapse Link?

The latency between a write to the transactional store and its availability in the analytical store is typically under 2 minutes, often 10-30 seconds. It is near-real-time, not sub-second. For real-time sub-second requirements, consider using Change Feed with Azure Functions or Azure Stream Analytics.

Can I use Power BI with Synapse Link?

Yes, Power BI supports DirectQuery mode against the Cosmos DB analytical store. This allows you to create real-time dashboards without importing data into Power BI. Ensure you use the Cosmos DB connector (SQL API) and have Synapse Link enabled.

What is analytical storage TTL?

Analytical storage TTL (time-to-live) controls how long data is retained in the analytical store. It is independent of the transactional TTL. The default is -1 (no expiration). You can set it in seconds to automatically delete old data from the analytical store, reducing storage costs.

Terms Worth Knowing

Ready to put this to the test?

You've just covered Azure Synapse Link for Cosmos DB — now see how well it sticks with free DP-900 practice questions. Full explanations included, no account needed.

Done with this chapter?