A company uses Azure Synapse Analytics to process large datasets. They need to transform JSON data stored in Azure Data Lake Storage Gen2 into a star schema. Which data processing approach minimizes data movement and leverages the compute closest to the data?
Trap 1: Use Azure Data Factory to copy the JSON data into Azure SQL…
Copies data unnecessarily.
Trap 2: Use Azure Data Factory with SSIS to transform and load into…
SSIS is less optimal for this scenario.
Trap 3: Load data into a Spark DataFrame in Synapse notebooks, transform,…
Loads data into memory, causing movement.
- A
Use Azure Data Factory to copy the JSON data into Azure SQL Database, then use T-SQL to transform.
Why wrong: Copies data unnecessarily.
- B
Use Azure Data Factory with SSIS to transform and load into dedicated SQL pool.
Why wrong: SSIS is less optimal for this scenario.
- C
Load data into a Spark DataFrame in Synapse notebooks, transform, and write back.
Why wrong: Loads data into memory, causing movement.
- D
Create external tables on the JSON files using PolyBase, then use CREATE EXTERNAL TABLE AS SELECT (CETAS) to write transformed Parquet files.
Minimizes movement by querying in place.