AZ-900Chapter 93 of 127Objective 2.4

Azure Synapse Analytics

This chapter covers Azure Synapse Analytics, a limitless analytics service that brings together big data and data warehousing. For AZ-900, understanding Synapse helps you identify how Azure supports large-scale data integration, exploration, and analysis. This objective (2.4) focuses on core Azure services, and Synapse is a key example of a modern cloud analytics platform. You'll learn its components, how it differs from traditional data warehouses, and the exam's focus points.

25 min read

Intermediate

Updated Jul 20, 2026

Reviewed by Johnson Ajibi· Senior Network & Security Engineer · MSc IT Security

Jump to a section

Explain it to me simply Where people get tripped up Look up key terms

Azure Synapse Analytics as a Data Factory

A large manufacturing company has dozens of suppliers (data sources) delivering raw materials. The company has a massive warehouse where they store all the raw materials (data lake), but the warehouse is chaotic—materials are piled randomly, and it's hard to find what you need. To turn these materials into finished products (insights), they need a factory with multiple specialized departments. Azure Synapse Analytics is that factory. It has a receiving dock (data ingestion) where raw data comes in from various sources (databases, files, streams). Then, a sorting and cleaning department (data preparation and transformation) organizes the materials, removing defects and standardizing formats. Next, the factory has a powerful assembly line (analytics engine) that can quickly combine materials into complex products (queries and reports). The factory also has a quality control lab (machine learning integration) that can test products and predict future demand. Finally, the finished goods are stored in a showroom (data warehousing) for customers (business analysts) to view and purchase (query and visualize). The factory manager (Azure Synapse Studio) oversees the entire operation from a single control room, adjusting workflows and monitoring performance. This unified approach means the company doesn't need separate buildings for each step—everything is under one roof, reducing delays and errors. In the same way, Synapse integrates big data and data warehousing into one service, so you don't have to stitch together separate tools for storage, processing, and analytics.

How It Actually Works

What is Azure Synapse Analytics?

Azure Synapse Analytics is a unified analytics platform that combines enterprise data warehousing, big data analytics, data integration, and machine learning. It allows you to ingest, prepare, manage, and serve data for immediate BI and machine learning needs. The service replaces the need for separate tools like SQL Server, Spark clusters, and data factories by integrating them into a single environment.

The Business Problem It Solves

Traditional data analytics often involves silos: a data warehouse for structured data, a Hadoop cluster for unstructured data, and separate ETL tools. This leads to complexity, data duplication, and slow insights. Synapse solves this by providing a single service that can handle both structured and unstructured data, batch and real-time processing, and all stages of analytics—from ingestion to visualization.

How It Works – Step by Step

Data Ingestion: Data flows into Synapse from various sources (Azure Blob Storage, Data Lake Storage, SQL databases, IoT streams) using pipelines (like Azure Data Factory).

Data Storage: Data is stored in a data lake (Azure Data Lake Storage Gen2) or within Synapse's built-in SQL pools.

Data Preparation: Using Synapse Pipelines or notebooks (with Spark or SQL), data is cleaned, transformed, and shaped.

Analytics: You run queries using serverless SQL (on-demand) or dedicated SQL pools (provisioned). Synapse also supports Apache Spark for big data processing.

Data Serving: Processed data is exposed to BI tools like Power BI, Azure Analysis Services, or custom applications.

Governance: Synapse integrates with Azure Purview for data cataloging and lineage tracking.

Key Components

- Synapse SQL: Two types of SQL endpoints: - Dedicated SQL Pool: Provisioned, persistent storage. Good for predictable, high-performance workloads. You pay for provisioned DWUs (Data Warehouse Units). - Serverless SQL Pool: On-demand, pay-per-query. Ideal for ad-hoc exploration, data lake queries, and prototyping. - Apache Spark Pool: Managed Spark clusters for big data processing. You define autoscaling and session timeouts. - Synapse Pipelines: Orchestration engine (built on Azure Data Factory) to create ETL/ELT workflows. - Synapse Studio: Web-based IDE for managing all resources, writing code, and visualizing data. - Synapse Link: Real-time analytics on operational data (e.g., from Cosmos DB) without ETL.

Tiers and Pricing Models

Synapse doesn't have traditional tiers like Basic/Standard. Instead, you pay for:

Dedicated SQL Pool: per DWU-hour (e.g., DW100c to DW30000c).

Serverless SQL: per TB of data scanned.

Apache Spark: per vCore-hour.

Pipelines: per activity run and data movement.

Comparison to On-Premises Equivalent

An on-premises data warehouse typically uses a single large server (e.g., Teradata, SQL Server PDW) with fixed storage and compute. Scaling requires hardware upgrades. Synapse separates compute and storage, allowing independent scaling. Additionally, on-premises solutions lack integrated Spark, serverless options, and cloud-native data lake connectivity.

Azure Portal and CLI Touchpoints

In the Azure portal, you create a Synapse workspace (like a logical container). Then you add SQL pools, Spark pools, and pipelines. The Azure CLI can automate creation:

az synapse workspace create --name MySynapseWorkspace \
  --resource-group MyResourceGroup --location eastus \
  --storage-account MyStorageAccount \
  --file-system MyFileSystem --sql-admin-login-user admin \
  --sql-admin-login-password P@ssw0rd1234

You can also manage pools via CLI:

az synapse sql pool create --workspace-name MySynapseWorkspace \
  --resource-group MyResourceGroup --name MySqlPool \
  --performance-level DW100c

Concrete Business Scenarios

Retail Analytics: A retailer ingests sales data from stores (structured) and social media feeds (unstructured). Synapse unifies the data, runs Spark to analyze sentiment, and serves aggregated reports via Power BI.

IoT Telemetry: A manufacturer collects sensor data from machines. Synapse ingests streaming data via Event Hubs, uses serverless SQL to query recent data, and stores historical data in a dedicated pool for long-term analysis.

Healthcare Claims: A health insurer processes millions of claims daily. Synapse pipelines transform raw claims, deduplicate records, and load into a dedicated SQL pool for fraud detection models.

Walk-Through

Create a Synapse Workspace

In the Azure portal, search for 'Azure Synapse Analytics' and click 'Create'. You'll need to provide a resource group, workspace name, region, and select a Data Lake Storage Gen2 account (or create one). The workspace is the logical container for all Synapse resources. Behind the scenes, Azure provisions a managed resource group with a SQL server, Spark pools, and storage. The workspace name must be globally unique. After creation, you access Synapse Studio via a URL like https://web.azuresynapse.net.

Configure Data Storage

Synapse uses Azure Data Lake Storage Gen2 as its primary data lake. In the workspace, you can create containers and folders for raw, processed, and curated data. You can also link other storage accounts (Blob, ADLS Gen1) using linked services. Data is stored in Parquet, CSV, JSON, or other formats. For dedicated SQL pools, data is stored internally in columnar format. The separation of storage and compute means you can scale compute without moving data.

Ingest Data with Pipelines

In Synapse Studio, go to the 'Integrate' hub and create a pipeline. Pipelines are like Azure Data Factory pipelines: they contain activities (copy, data flow, notebook) that move and transform data. For example, you can copy data from an on-premises SQL Server to the data lake using a self-hosted integration runtime. Pipelines can be triggered on a schedule or event. Each activity run incurs cost based on execution time and data movement.

Query Data with Serverless SQL

Serverless SQL is built into every Synapse workspace. You can query files in the data lake using T-SQL without provisioning any resources. For example, to query a CSV: SELECT * FROM OPENROWSET(BULK 'https://mystorage.dfs.core.windows.net/container/folder/file.csv', FORMAT='CSV') AS [result]. You pay only for the data scanned (per TB). Serverless SQL is perfect for ad-hoc exploration and prototyping. It can also create views and external tables.

Create a Dedicated SQL Pool

For production workloads with predictable performance, create a dedicated SQL pool. Choose a performance level from DW100c (100 compute DWUs) to DW30000c. The pool provisions dedicated resources. You can pause the pool to save costs when not in use. Data is stored in the pool's internal storage. You can load data using PolyBase, COPY statement, or pipelines. Queries run in seconds on large datasets. Scaling up/down takes a few minutes.

What This Looks Like on the Job

Scenario 1: E-commerce Company Unifying Sales and Clickstream Data

An online retailer wants to combine structured sales data from their transactional database with unstructured clickstream logs from their website. They use Synapse to ingest both into a data lake (ADLS Gen2). Sales data is loaded nightly via a pipeline from SQL Server, while clickstream data streams in real-time via Azure Event Hubs. A Synapse notebook (Spark) joins the datasets, calculates conversion rates, and writes results to a dedicated SQL pool. Business analysts use Power BI to visualize dashboards. The team configures Synapse Pipelines to run hourly for clickstream and daily for sales. Cost considerations: they use serverless SQL for ad-hoc queries and a small dedicated pool (DW100c) for dashboards. If they misconfigure the pipeline (e.g., wrong file format), data may not load, causing stale reports.

Scenario 2: Financial Services Firm for Risk Analytics

A bank processes terabytes of transaction data for fraud detection. They use Synapse to run complex SQL queries and machine learning models. Data is ingested from multiple sources (mainframe, CRM) using pipelines. They create a dedicated SQL pool (DW1000c) for high-performance queries. Data scientists use Spark pools to train anomaly detection models. Governance is handled via Azure Purview integration. The team uses Synapse Studio to monitor pipeline runs and query performance. Common mistake: not pausing the dedicated pool overnight, leading to unnecessary costs. They set up autoscale for Spark pools to handle varying workloads.

Scenario 3: Healthcare Provider for Patient Analytics

A hospital system wants to analyze patient records, lab results, and IoT device data. They use Synapse to ingest data from on-premises SQL databases and HL7 feeds. Serverless SQL is used to explore data lake files. They build a pipeline that transforms data into a star schema and loads into a dedicated SQL pool. Power BI reports show patient outcomes. They integrate Azure Machine Learning to predict readmission risk. If they don't properly secure the data (e.g., missing column-level security), sensitive patient data could be exposed. They use Synapse's row-level security and dynamic data masking.

How AZ-900 Actually Tests This

Exam Objective: 2.4 Describe core Azure services

AZ-900 expects you to know the purpose of Azure Synapse Analytics as a unified analytics service. You won't be asked to write queries or configure pipelines, but you must understand its role in big data and data warehousing.

Common Wrong Answers and Why

'Synapse is just a rebranded SQL Data Warehouse' – Wrong. Synapse combines SQL Data Warehouse with Spark, pipelines, and data lake integration. The exam tests that it's more than just a data warehouse.

'Synapse replaces Azure Analysis Services' – Wrong. Analysis Services is for tabular models and is complementary, not replaced.

'Serverless SQL pool is the same as dedicated SQL pool' – Wrong. Serverless is on-demand, pay-per-query; dedicated is provisioned and persistent.

'Synapse requires a separate data factory' – Wrong. Synapse has built-in pipelines; you don't need a separate Azure Data Factory.

Specific Terms and Values

Dedicated SQL Pool – provisioned, persistent, uses DWUs (Data Warehouse Units).

Serverless SQL Pool – on-demand, pay per TB scanned.

Synapse Pipelines – built on Azure Data Factory.

Synapse Studio – web-based IDE.

Synapse Link – real-time analytics on operational data.

Edge Cases and Tricky Distinctions

Synapse vs. Azure Data Lake: Data Lake is storage; Synapse provides compute and analytics on top of that storage.

Synapse vs. Azure HDInsight: HDInsight is open-source Hadoop/Spark; Synapse is a managed, integrated platform.

Synapse vs. Azure Databricks: Databricks is a Spark-based platform with collaborative notebooks; Synapse is more SQL-centric with integrated pipelines.

Memory Trick: 'SUDS' – Synapse Unifies Data and Services

Storage (data lake)

Unified SQL (serverless and dedicated)

Data integration (pipelines)

Spark (big data processing)

Decision Tree for Elimination

If a question asks about 'big data and data warehousing combined' or 'unified analytics', pick Synapse. If it's about 'real-time analytics on operational data', think Synapse Link. If it's about 'orchestrating ETL', think Synapse Pipelines. If it's about 'querying data lake files without provisioning', think serverless SQL.

Key Takeaways

Azure Synapse Analytics is a unified analytics platform combining big data and data warehousing.

It includes two SQL options: serverless (on-demand, pay-per-query) and dedicated (provisioned, persistent).

Synapse Pipelines are built-in for data integration, no separate Data Factory needed.

Synapse Studio is the web-based IDE for managing all resources.

Synapse Link enables real-time analytics on operational data (e.g., Cosmos DB) without ETL.

Dedicated SQL pools scale in DWUs (Data Warehouse Units) from DW100c to DW30000c.

Serverless SQL queries data lake files directly without provisioning.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Azure Synapse Analytics

Unified analytics platform with SQL, Spark, and pipelines

Includes data warehousing (dedicated/ serverless SQL)

Provides a web-based IDE (Synapse Studio)

Best for end-to-end analytics including BI

Pricing includes DWU-hours, vCore-hours, and per-TB scanning

Azure Data Factory

Cloud-scale data integration and orchestration only

No built-in data warehousing or compute engines

Uses Data Factory Studio for monitoring

Best for ETL/ELT and data movement

Pricing based on activity runs and integration runtime hours

Watch Out for These

Mistake

Azure Synapse Analytics is just a new name for Azure SQL Data Warehouse.

Correct

While Synapse evolved from SQL Data Warehouse, it adds integrated Spark, serverless SQL, pipelines, and a unified studio. It's a superset, not a rename.

Mistake

You must choose between serverless and dedicated SQL pools; you can't use both.

Correct

A Synapse workspace can have both serverless and dedicated SQL pools simultaneously. They serve different use cases.

Mistake

Synapse Pipelines are a separate service and require Azure Data Factory.

Correct

Synapse Pipelines are built into Synapse and do not require a separate Data Factory instance, though they share the same engine.

Mistake

Synapse can only analyze data stored in Azure Data Lake Storage Gen2.

Correct

Synapse can connect to many sources (Blob, SQL DB, Cosmos DB, on-premises) via linked services and pipelines.

Mistake

Serverless SQL pool stores data persistently.

Correct

Serverless SQL pool does not store data; it queries files in the data lake on demand. Only dedicated SQL pools have persistent storage.

Frequently Asked Questions

What is the difference between serverless SQL and dedicated SQL pool in Synapse?

Serverless SQL is on-demand: you query files in the data lake and pay per TB scanned. It has no persistent storage. Dedicated SQL pool is provisioned: you allocate compute resources (DWUs) and store data internally. Dedicated pool offers predictable performance and supports advanced features like materialized views and row-level security. Use serverless for ad-hoc exploration and prototyping; use dedicated for production workloads.

Do I need Azure Data Factory if I have Synapse?

No. Synapse includes built-in pipelines (Synapse Pipelines) that are based on Azure Data Factory. You can create and run ETL/ELT workflows directly within Synapse without a separate Data Factory instance. However, if you already use Data Factory, you can integrate it with Synapse.

What is Synapse Link and when should I use it?

Synapse Link enables near real-time analytics on operational data from sources like Azure Cosmos DB and SQL Server. It automatically syncs changes from the operational database to Synapse, eliminating the need for ETL. Use it when you need low-latency analytics on transactional data without impacting operational performance.

Can I use Power BI with Synapse?

Yes. Synapse integrates seamlessly with Power BI. You can connect Power BI to a dedicated SQL pool, serverless SQL, or even directly to data lake files. Synapse Studio even has a built-in Power BI integration for creating reports.

How does Synapse pricing work?

Pricing is consumption-based: dedicated SQL pool charges per DWU-hour (e.g., DW100c costs ~$1.10/hr). Serverless SQL charges per TB of data scanned (e.g., $5/TB). Spark pools charge per vCore-hour. Pipelines charge per activity run and data movement. You can pause dedicated pools to stop costs.

What is the difference between Synapse and Azure Databricks?

Synapse is a unified analytics platform with strong SQL support, integrated pipelines, and data warehousing. Databricks is a Spark-based platform focused on collaborative data science and engineering. Synapse is better for SQL-centric BI and ETL; Databricks excels at machine learning and advanced analytics. Both can work together.

Is Synapse suitable for real-time streaming analytics?

Yes. Synapse can ingest streaming data from Azure Event Hubs and IoT Hub using pipelines or Spark Structured Streaming. You can then query the data in real-time using serverless SQL or store it for later analysis. For low-latency (<1 second) streaming, consider Azure Stream Analytics.

Terms Worth Knowing

Azure Active Directory Region Resource group

Ready to put this to the test?

You've just covered Azure Synapse Analytics — now see how well it sticks with free AZ-900 practice questions. Full explanations included, no account needed.

Try AZ-900 practice questions Back to all chapters

Done with this chapter?

Azure Database for MySQL and PostgreSQL

Azure Cache for Redis

See the full AZ-900 study guide