DP-900Chapter 90 of 101Objective 3.1

BigQuery vs Azure Synapse Analytics

This chapter provides a comprehensive comparison of Google BigQuery and Microsoft Azure Synapse Analytics, two leading cloud data warehousing solutions. Understanding their differences is crucial for the DP-900 exam, particularly for objective 3.1: Describe core data workloads and the differences between transactional and analytical workloads. While DP-900 does not require deep product knowledge, you must understand the high-level capabilities and use cases of these services. Approximately 5-10% of exam questions may reference Azure Synapse Analytics, and you may need to distinguish it from BigQuery in scenario-based questions.

25 min read
Intermediate
Updated May 31, 2026

Comparing Two Data Warehouses: BigQuery vs Azure Synapse

Imagine you are a city planner needing to analyze traffic patterns across millions of vehicles daily. You have two options: a fully automated, serverless traffic analysis center (BigQuery) or a customizable, scalable traffic command center with dedicated servers you can control (Azure Synapse). BigQuery is like a cloud-based analytics service where you simply upload your traffic data and it automatically allocates computing power and storage as needed. You pay only for the queries you run, and you never worry about infrastructure. Azure Synapse is like building your own command center with a dedicated pool of servers that you can scale up or down, and you have the option to use serverless resources for bursty workloads. You have more control over performance and costs, but you must manage the infrastructure. Both can handle massive amounts of data, but they differ in how they allocate resources and charge for usage. For the DP-900 exam, you need to understand these fundamental architectural differences and when to choose one over the other.

How It Actually Works

What Are BigQuery and Azure Synapse Analytics?

BigQuery is Google Cloud's fully managed, serverless data warehouse that allows you to run super-fast SQL queries using the processing power of Google's infrastructure. It automatically scales compute and storage, and you pay only for the data processed by your queries (or flat-rate pricing for predictable workloads). Azure Synapse Analytics is Microsoft's unified analytics platform that combines enterprise data warehousing, big data analytics, data integration, and visualization. It offers both dedicated SQL pools (provisioned resources) and serverless SQL pools (pay-per-query), along with Apache Spark for big data processing.

How They Work Internally

BigQuery separates storage and compute. When you load data into BigQuery, it is stored in a columnar format (Capacitor) in Colossus, Google's distributed file system. Compute resources are dynamically allocated from a shared pool when you run a query. BigQuery uses a tree architecture: a root node distributes query fragments to worker nodes, which scan the data in parallel. The results are aggregated and returned. This allows massive parallelism without manual tuning.

Azure Synapse uses a Massively Parallel Processing (MPP) architecture. Data is stored in Azure Blob Storage or Azure Data Lake Storage Gen2. When using a dedicated SQL pool, you provision a specific number of Data Warehouse Units (DWUs), which define compute, memory, and I/O. The control node distributes queries to compute nodes, which process data in parallel. For serverless SQL pools, Synapse automatically scales compute resources based on query complexity and data volume.

Key Components, Values, and Defaults

BigQuery: - Default query timeout: 6 hours (for interactive queries) or up to 24 hours for batch queries. - Maximum query size: 1 TB per query (can be increased with flat-rate pricing). - Data storage cost: $0.02 per GB per month for active storage, $0.01 per GB per month for long-term storage (90 days without modification). - Query pricing: $5 per TB processed (on-demand), or flat-rate reservations starting at 100 slots ($2,000/month). - Slots: units of compute capacity; each query consumes a number of slots based on complexity.

Azure Synapse: - Dedicated SQL pool: DWU sizes range from DW100c to DW30000c (c = compute-optimized). - Serverless SQL pool: pay per TB of data processed ($5 per TB). - Data storage: $0.024 per GB per month for Azure Data Lake Storage Gen2 (hot tier). - Maximum concurrency: depends on DWU level; e.g., DW100c supports up to 4 concurrent queries. - Query timeout: default 30 minutes (configurable).

Configuration and Verification Commands

BigQuery (using bq command-line tool):

# Create a dataset
bq mk mydataset

# Load data into a table
bq load --autodetect --source_format=CSV mydataset.mytable gs://mybucket/data.csv

# Run a query
bq query --use_legacy_sql=false 'SELECT COUNT(*) FROM mydataset.mytable'

# View job details
bq show -j job_id

Azure Synapse (using Azure CLI or T-SQL):

# Create a dedicated SQL pool
az synapse sql pool create --workspace-name myworkspace --name mypool --performance-level DW100c

# Pause a dedicated SQL pool
az synapse sql pool pause --workspace-name myworkspace --name mypool

# Resume a dedicated SQL pool
az synapse sql pool resume --workspace-name myworkspace --name mypool

Serverless SQL pool queries are run via T-SQL without provisioning, e.g.,

SELECT COUNT(*) FROM OPENROWSET(BULK 'https://mystorage.dfs.core.windows.net/container/data/*.csv', FORMAT='CSV') AS r;

How They Interact with Related Technologies

BigQuery integrates tightly with Google Cloud services:

Data ingestion: Cloud Storage, Dataflow, Pub/Sub, Transfer Service.

Analytics: Looker, Data Studio, Vertex AI.

Machine Learning: BigQuery ML allows training models directly with SQL.

External tables: query data in Cloud Storage, Google Sheets, or Bigtable without loading.

Azure Synapse integrates with the Microsoft ecosystem:

Data ingestion: Azure Data Factory, Azure Stream Analytics, AzCopy.

Analytics: Power BI, Azure Machine Learning, Azure Data Explorer.

Big data: Apache Spark pools within Synapse for ETL and machine learning.

External tables: PolyBase to query data in Azure Data Lake, Blob Storage, or Cosmos DB.

Both support standard SQL and have built-in connectors for common data sources. The key difference is that BigQuery is serverless by default, while Synapse offers both serverless and provisioned options.

Walk-Through

1

Provisioning the Service

In BigQuery, there is no provisioning step. You simply create a project and a dataset, then load data. In Azure Synapse, you must first create a Synapse workspace, then optionally provision a dedicated SQL pool (specify DWU level). The workspace acts as a container for all analytics resources. For serverless, no provisioning is needed; you just connect and query.

2

Loading Data

BigQuery supports loading data from Cloud Storage, local files, or streaming via the API. Data is automatically distributed and compressed. Azure Synapse loads data using PolyBase or COPY INTO T-SQL command. For dedicated pools, data is distributed across 60 distributions (hash or round-robin). For serverless, data remains in external storage and is read on the fly.

3

Running a Query

BigQuery sends the SQL query to the root server, which compiles it into a distributed execution plan. Worker nodes scan data in parallel, and results are aggregated. In Azure Synapse dedicated pool, the control node compiles the query and distributes it to compute nodes, which process data in their distributions. Serverless SQL pool uses a similar distributed engine but without fixed resources.

4

Scaling Resources

BigQuery scales automatically; you do not manage resources. For predictable workloads, you can purchase slots (compute capacity) via flat-rate pricing. Azure Synapse dedicated pools can be scaled up or down manually or automatically (via Azure Autoscale). Serverless SQL pools scale automatically but have limited control. Scaling a dedicated pool changes the number of compute nodes and is typically done via T-SQL or Azure CLI.

5

Managing Costs

BigQuery on-demand charges per query (data scanned). You can set cost controls like custom quotas and maximum bytes billed. Flat-rate pricing provides predictable costs for a fixed number of slots. Azure Synapse dedicated pools charge per hour for provisioned DWUs, regardless of usage. Serverless SQL pools charge per TB of data processed. You can pause dedicated pools to stop compute charges while retaining storage costs.

What This Looks Like on the Job

Scenario 1: Ad Hoc Analytics for a Marketing Team A marketing team needs to run occasional, unpredictable SQL queries on terabytes of clickstream data stored in Google Cloud Storage. They have limited DevOps support and want to avoid managing infrastructure. BigQuery is ideal because it is serverless: they simply create external tables pointing to Cloud Storage and run queries. They pay only for the data scanned, which is cost-effective for low-volume usage. However, if they run many large queries, costs can spike. To control costs, they set a maximum bytes billed per query and use partitioned tables to reduce scanned data.

Scenario 2: Enterprise Data Warehouse with Consistent Workloads A financial services company has a predictable, high-volume nightly ETL process that loads 500 GB of transaction data into a data warehouse. They need consistent performance and predictable costs. Azure Synapse with a dedicated SQL pool (DW2000c) is chosen because it provides reserved resources and performance isolation. The DWU size is chosen based on the workload: 2000 DWUs provide sufficient concurrency and query performance. The pool is scaled up during the ETL window and scaled down during off-hours to save costs. They use PolyBase to load data from Azure Data Lake Storage Gen2.

Scenario 3: Hybrid Data Lake and Warehouse A retail company uses Azure Data Lake Storage as a data lake for raw sensor data. They need to run both ad hoc queries and scheduled reports. They deploy Azure Synapse with both serverless and dedicated pools. Serverless SQL pool is used for exploratory queries on raw data, while dedicated SQL pool handles curated, aggregated data for Power BI dashboards. This hybrid approach optimizes cost and performance. A common misconfiguration is using a dedicated pool for all queries, leading to high costs for infrequent queries. The solution is to route ad hoc queries to the serverless pool.

What Goes Wrong? - BigQuery cost spikes: Users forget to set query cost limits or do not partition tables. A single query scanning 10 TB can cost $50. Solution: use partitioned tables and set maximum bytes billed. - Synapse dedicated pool performance: Choosing too low a DWU level leads to slow queries and concurrency issues. Conversely, over-provisioning wastes money. Solution: monitor using DMVs and scale appropriately. - Serverless SQL pool cold start: First query on a cold serverless pool may take longer as resources are allocated. Solution: use a warm-up query or consider dedicated pool for latency-sensitive workloads.

How DP-900 Actually Tests This

What DP-900 Tests DP-900 objective 3.1 focuses on describing core data workloads and the differences between transactional and analytical workloads. You are expected to understand that Azure Synapse Analytics is an analytical data store (data warehouse) optimized for complex queries and large volumes of data, as opposed to Azure SQL Database which is transactional. The exam may compare cloud data warehouse services generically, but specific questions about BigQuery are unlikely since DP-900 is Azure-focused. However, you may encounter questions that contrast Azure Synapse with other Azure services or with on-premises solutions.

Common Wrong Answers 1. Choosing Azure SQL Database for analytics: Candidates often confuse transactional (OLTP) and analytical (OLAP) workloads. Azure SQL Database is for OLTP, not for large-scale analytics. The correct answer for data warehousing is Azure Synapse. 2. Thinking BigQuery is part of Azure: BigQuery is a Google Cloud service. In an Azure-only exam, it may be a distractor. Always pair Azure Synapse with Azure data services. 3. Assuming Synapse is only serverless: Synapse offers both dedicated and serverless pools. The exam may test that you know dedicated pools provide predictable performance. 4. Confusing DWU with DTU: DWU is for Azure Synapse dedicated pool; DTU is for Azure SQL Database. They are not interchangeable.

Key Numbers and Terms - DWU: Data Warehouse Unit (e.g., DW100c, DW1000c). - PolyBase: technology for querying external data in Synapse. - Massively Parallel Processing (MPP): architecture of Synapse dedicated pool. - Serverless vs. provisioned: two compute models in Synapse. - Data Lake Storage Gen2: recommended storage for Synapse.

Edge Cases - Synapse can query data in Azure Cosmos DB via the Synapse Link feature (not in DP-900 scope but good to know). - BigQuery supports ML models via SQL (BigQuery ML), but Synapse uses Azure Machine Learning integration. - Both support external tables, but Synapse's PolyBase requires a data source and file format definition.

How to Eliminate Wrong Answers - If the scenario mentions 'predictable performance' or 'consistent workload', look for 'dedicated SQL pool' or 'provisioned resources'. - If the scenario says 'ad hoc queries' or 'variable workloads', look for 'serverless' or 'pay-per-query'. - If the question asks about 'data warehousing' in Azure, the answer is almost always Azure Synapse Analytics, not Azure SQL Database or Azure Data Lake. - For questions about 'unified analytics' or 'big data and data warehousing together', Synapse is the correct choice.

Key Takeaways

BigQuery is a serverless data warehouse from Google Cloud; Azure Synapse is a unified analytics platform from Microsoft Azure.

Azure Synapse offers two compute models: dedicated SQL pool (provisioned DWU) and serverless SQL pool (pay-per-query).

Dedicated SQL pools provide predictable performance and are billed per hour, regardless of usage.

Serverless SQL pools are ideal for ad hoc queries and cost only when data is processed.

PolyBase is a technology in Azure Synapse that allows querying external data in Azure Data Lake or Blob Storage.

DWU (Data Warehouse Unit) is the unit of compute for dedicated SQL pools in Synapse.

BigQuery separates storage and compute; storage is in a columnar format (Capacitor) and compute is allocated dynamically.

Both services support standard SQL and can query external data sources (external tables).

For DP-900, remember that Azure Synapse is the primary data warehousing solution in Azure, not Azure SQL Database.

Cost management is critical: BigQuery can have unpredictable costs; Synapse dedicated pools have predictable but continuous costs.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Google BigQuery

Fully serverless; no provisioning required.

Pay per query (data scanned) or flat-rate per slot.

Automatic scaling of compute resources.

Integrated with Google Cloud services (e.g., Cloud Storage, Data Studio).

Supports BigQuery ML for in-database machine learning.

Azure Synapse Analytics

Offers both serverless and dedicated (provisioned) SQL pools.

Dedicated pool: pay per hour for DWUs; serverless: pay per TB processed.

Dedicated pool requires manual scaling; serverless scales automatically.

Integrated with Microsoft ecosystem (e.g., Azure Data Lake, Power BI, Azure Machine Learning).

Supports Apache Spark for big data processing within the same workspace.

Watch Out for These

Mistake

BigQuery and Azure Synapse are essentially the same service from different cloud providers.

Correct

While both are cloud data warehouses, they have fundamental architectural differences. BigQuery is fully serverless and separates compute and storage completely, while Azure Synapse offers both serverless and provisioned (dedicated) options. BigQuery automatically scales compute, whereas Synapse dedicated pools require manual scaling. Pricing models also differ: BigQuery charges per query (data scanned) or per slot, while Synapse dedicated pools charge per hour for provisioned DWUs.

Mistake

Azure Synapse Analytics is only for data warehousing and cannot handle big data workloads.

Correct

Azure Synapse is a unified analytics platform that includes both SQL-based data warehousing and Apache Spark for big data processing. You can run Spark jobs within the same workspace, and integrate with Azure Data Lake, Cosmos DB, and other services. It supports both relational and non-relational data.

Mistake

You can use BigQuery within Azure because it is a multi-cloud service.

Correct

BigQuery is a Google Cloud Platform service and is not available natively in Azure. However, you can access BigQuery from Azure through third-party tools or by using Google Cloud APIs. For Azure-native data warehousing, the correct service is Azure Synapse Analytics.

Mistake

Serverless SQL pool in Azure Synapse is free because you only pay for storage.

Correct

Serverless SQL pool charges per TB of data processed (currently $5 per TB). Storage costs for data in Azure Data Lake are separate. There is no charge for idle time, but queries incur costs based on the amount of data scanned. It is not free.

Mistake

Dedicated SQL pools in Azure Synapse are always better than serverless because they are faster.

Correct

Dedicated pools provide predictable performance and are ideal for production workloads with consistent demand. However, they incur costs even when idle. Serverless pools are cost-effective for ad hoc, intermittent, or exploratory queries. The choice depends on workload patterns, not just speed.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the difference between Azure Synapse Analytics and Azure SQL Database?

Azure SQL Database is a transactional (OLTP) database for online transaction processing, optimized for many small, concurrent writes. Azure Synapse Analytics is an analytical (OLAP) data warehouse for large-scale complex queries and data aggregation. Synapse uses MPP architecture and can handle petabytes of data, while SQL Database uses a single-node architecture. For data warehousing workloads, use Synapse; for OLTP, use SQL Database.

Can Azure Synapse Analytics query data from Azure Data Lake Storage?

Yes, Azure Synapse can query data stored in Azure Data Lake Storage Gen2 using PolyBase or the OPENROWSET function in serverless SQL pool. This allows you to run SQL queries on files (Parquet, CSV, JSON, etc.) without loading them into the data warehouse. For dedicated pools, you can also use PolyBase to load data into tables.

What is a Data Warehouse Unit (DWU) in Azure Synapse?

DWU (Data Warehouse Unit) is a unit of measure that represents the combination of CPU, memory, and I/O resources allocated to a dedicated SQL pool. The higher the DWU level (e.g., DW100c to DW30000c), the more resources and better performance. You can scale up or down by changing the DWU level. The 'c' suffix indicates compute-optimized, which provides faster query performance.

How does pricing differ between BigQuery and Azure Synapse?

BigQuery offers two pricing models: on-demand ($5 per TB of data processed) and flat-rate (reserved slots starting at $2,000/month for 100 slots). You also pay for storage ($0.02/GB/month). Azure Synapse dedicated pools charge per hour for the selected DWU level (e.g., DW100c costs ~$1.00/hour). Serverless SQL pool charges $5 per TB of data processed. Storage costs for Azure Data Lake (~$0.024/GB/month) are separate.

Is BigQuery available in Azure?

No, BigQuery is a Google Cloud Platform service and is not available in Azure. However, you can access BigQuery from Azure applications via REST APIs or third-party connectors. For Azure-native data warehousing, use Azure Synapse Analytics.

What is the difference between serverless and dedicated in Azure Synapse?

Serverless SQL pool is a pay-per-query service that automatically scales compute resources based on the query. It is ideal for ad hoc, exploratory, or intermittent workloads. Dedicated SQL pool is a provisioned set of resources (DWUs) that you manage and pay for per hour, regardless of usage. Dedicated pools provide consistent performance and are suitable for production workloads with predictable demand.

Can I use Apache Spark with Azure Synapse?

Yes, Azure Synapse includes built-in Apache Spark pools for big data processing. You can create Spark notebooks and run PySpark, Scala, or .NET for Spark jobs. This allows you to perform ETL, machine learning, and data transformation within the same analytics platform.

Terms Worth Knowing

Ready to put this to the test?

You've just covered BigQuery vs Azure Synapse Analytics — now see how well it sticks with free DP-900 practice questions. Full explanations included, no account needed.

Done with this chapter?