You are designing a data processing solution in Azure Synapse Analytics. The solution must support incremental loading of data from an Azure SQL Database to a dedicated SQL pool using PolyBase. Which approach should you use to minimize data movement and maximize performance?
PolyBase external tables enable direct query of source data, and CTAS allows efficient incremental loading with minimal data movement.
Why this answer
Option B is correct because using external tables with PolyBase in Azure Synapse Analytics allows you to directly query the source Azure SQL Database without moving the data first. The CREATE TABLE AS SELECT (CTAS) statement then loads only the incremental data into the dedicated SQL pool, minimizing data movement by leveraging PolyBase's parallel streaming capability for maximum performance.
Exam trap
The trap here is that candidates often assume external tables are only for static data or Hadoop, but PolyBase in Synapse supports external tables against Azure SQL Database for efficient incremental loading, making options that introduce extra hops (like Data Factory or bcp) seem more familiar but less optimal.
How to eliminate wrong answers
Option A is wrong because the bcp utility exports data to a text file, which introduces an intermediate storage step and additional I/O overhead, increasing data movement and latency compared to direct PolyBase access. Option C is wrong because Azure Data Factory copy activity moves data through an intermediate staging area (e.g., Azure Blob Storage), which adds extra data transfer and storage costs, whereas PolyBase can read directly from the source without staging. Option D is wrong because Azure Databricks with the Spark connector requires moving data out of Azure SQL Database into a Spark cluster for processing, then writing back to the dedicated SQL pool, which increases data movement and complexity compared to the native PolyBase approach.