DP-203 Design and implement data storage — All Questions With Answers

Question 1mediummultiple choice

Read the full Design and implement data storage explanation →

A company is designing a data lake solution on Azure Data Lake Storage Gen2. Data will be ingested from IoT devices at high frequency (every 5 seconds). Each device sends a JSON payload of 2 KB. The data must be stored in a hierarchical namespace and partitioned by date and device ID to optimize query performance. Which partition strategy should be used?

Question 2hardmultiple choice

Read the full Design and implement data storage explanation →

You are designing a near-real-time analytics pipeline for a retail company. Transaction data is generated in Azure SQL Database and must be replicated to Azure Synapse Analytics (dedicated SQL pool) with less than 5 minutes latency. The source table has 50 million rows and 200 columns, but only 30 columns are needed for analytics. Which approach should you recommend?

Question 3easymultiple choice

Read the full Design and implement data storage explanation →

A data engineer needs to store semi-structured JSON log files from a web application. Each log entry is about 1 KB. The logs are rarely queried (once a month) and must be retained for 7 years for compliance. The solution must minimize storage cost. Which storage option should be used?

Question 4mediummultiple choice

Read the full Design and implement data storage explanation →

You are designing a solution to store streaming data from multiple sources into Azure Data Lake Storage Gen2. The data must be organized by ingestion time and source system. Each source system produces data in a different format: CSV, JSON, and Parquet. The solution must allow efficient querying using Azure Synapse Serverless SQL and must support partitioning on ingestion date. What is the recommended folder structure?

Question 5hardmultiple choice

Read the full NAT/PAT explanation →

A healthcare company stores sensitive patient data in Azure Data Lake Storage Gen2. They need to ensure that only authorized users can access data and that all access is audited. They also need to prevent data from being accessed by unauthorized Azure services. Which combination of security features should be used?

Question 6mediummulti select

Read the full Design and implement data storage explanation →

Which TWO of the following are supported storage options for use as a source in Azure Synapse Pipeline Copy Activity?

Question 7hardmulti select

Read the full Design and implement data storage explanation →

Which THREE of the following are required to configure a managed private endpoint for Azure Data Factory when connecting to an Azure SQL Database that has a private endpoint?

Question 8mediummultiple choice

Read the full Design and implement data storage explanation →

You are reviewing a copy job configuration in Azure Data Factory that copies Parquet files from Azure Data Lake Storage Gen2 to Azure Synapse Analytics. The exhibit shows the job settings. If the source folder contains a file that is not in Parquet format (e.g., a CSV file), what will happen?

Exhibit

Refer to the exhibit.

```json
{
  "data": [
    {
      "name": "order_data",
      "path": "orders/*.parquet",
      "partitionBy": ["year", "month", "day"],
      "format": "parquet",
      "options": {
        "compression": "snappy"
      }
    }
  ],
  "source": {
    "provider": "AzureDataLakeStorage",
    "connectionString": "DefaultEndpointsProtocol=https;AccountName=storagedatalake;AccountKey=...;EndpointSuffix=core.windows.net",
    "container": "data"
  },
  "sink": {
    "provider": "AzureSynapseAnalytics",
    "table": "dbo.orders",
    "staging": {
      "linkedServiceName": "AzureDataLakeStorage",
      "folderPath": "staging"
    }
  },
  "copyBehavior": "MergeFiles",
  "faultTolerance": {
    "skipIncompatibleFiles": true,
    "skipIncompatibleRows": true
  }
}
```

Question 9easymultiple choice

Read the full Design and implement data storage explanation →

You are an administrator for an Azure Synapse Analytics dedicated SQL pool. You execute the T-SQL statements shown in the exhibit. The external table 'dbo.Orders' is created. Which statement about querying this external table is true?

Exhibit

Refer to the exhibit.

```sql
CREATE EXTERNAL DATA SOURCE MyDataSource
WITH (
    LOCATION = 'abfss://data@storagedatalake.dfs.core.windows.net',
    TYPE = HADOOP,
    CREDENTIAL = MyCredential
);

CREATE EXTERNAL FILE FORMAT MyFileFormat
WITH (
    FORMAT_TYPE = PARQUET,
    DATA_COMPRESSION = 'org.apache.hadoop.io.compress.SnappyCodec'
);

CREATE EXTERNAL TABLE dbo.Orders (
    OrderID INT,
    CustomerID INT,
    OrderDate DATE,
    TotalAmount DECIMAL(10,2)
)
WITH (
    LOCATION = '/orders/',
    DATA_SOURCE = MyDataSource,
    FILE_FORMAT = MyFileFormat
);
```

Question 10easymultiple choice

Read the full network assurance explanation →

A company is designing a data storage solution for IoT device telemetry. Each device sends a JSON payload every second. The data must be stored in a way that supports real-time dashboards and long-term analytics with low latency. Which Azure data store should be used for the ingestion layer?

Question 11mediummultiple choice

Read the full Design and implement data storage explanation →

A data engineer needs to store JSON documents that are frequently updated by multiple users concurrently. The solution must support optimistic concurrency control and have built-in indexing on all fields. Which Azure data store should be used?

Question 12hardmultiple choice

Read the full Design and implement data storage explanation →

A company stores sensitive customer data in Azure Data Lake Storage Gen2. They need to implement a data retention policy where data older than 90 days is automatically moved to the 'cold' access tier, and data older than 365 days is deleted. Which Azure feature should be used to automate this?

Question 13mediummulti select

Read the full Design and implement data storage explanation →

A company is designing a data storage solution for a global application that requires low-latency reads and writes for user session data. The solution must support automatic failover across multiple Azure regions. Which TWO Azure services meet these requirements?

Question 14hardmulti select

Read the full Design and implement data storage explanation →

A company ingests streaming data from multiple sources into Azure Event Hubs. The data must be stored in Azure Data Lake Storage Gen2 in Parquet format, partitioned by date and hour. The solution must minimize cost and processing latency. Which THREE actions should be taken?

Question 15hardmultiple choice

Read the full NAT/PAT explanation →

A company uses Azure Synapse Analytics dedicated SQL pool to store sales data. The sales table is partitioned by month and has a clustered columnstore index. Over time, the performance of queries filtering on a specific month has degraded. The data engineer suspects high rowgroup elimination. Which action should be taken to improve performance?

Question 16easymultiple choice

Read the full Design and implement data storage explanation →

A company stores IoT sensor data in Azure Blob Storage. The data is appended every minute and must be queried in near real-time using a SQL interface. Which Azure service should be used to enable this?

Question 17mediummultiple choice

Read the full Design and implement data storage explanation →

A company is designing a data lake on Azure Data Lake Storage Gen2. Data comes from multiple sources with varying schemas. The team must minimize storage costs while keeping all data available for future processing. Which storage tier should they use for the raw ingested data?

Question 18hardmultiple choice

Read the full network assurance explanation →

You are designing a solution to store telemetry data from millions of devices. Each device sends a JSON payload every 5 seconds. The data must be partitioned by device ID and time for efficient querying and must support real-time streaming ingestion. Which Azure storage solution should you recommend?

Question 19easymultiple choice

Read the full Design and implement data storage explanation →

A data engineer needs to store CSV files containing customer data in Azure Blob Storage. The files must be encrypted at rest using a customer-managed key stored in Azure Key Vault. What should they configure?

Question 20mediummultiple choice

Read the full Design and implement data storage explanation →

A company uses Azure SQL Database for an OLTP application. They need to run complex analytical queries without impacting OLTP performance. Which solution should they implement?

Question 21hardmulti select

Read the full Design and implement data storage explanation →

Which TWO options are valid ways to load data into Azure Synapse SQL Pool? (Choose two.)

Question 22mediummulti select

Read the full Design and implement data storage explanation →

Which THREE security features are available for Azure SQL Database? (Choose three.)

Question 23hardmultiple choice

Read the full Design and implement data storage explanation →

A data engineer runs the Azure CLI command shown in the exhibit. The blob is stored in Azure Blob Storage. The team previously set a lifecycle management rule to move blobs to the Archive tier after 30 days. The blob was created 45 days ago. What is the most likely reason the blob is still in the Cool tier?

Exhibit

Refer to the exhibit.

az storage blob show \
  --account-name exampledatalake \
  --container-name raw \
  --name sensor/2023/01/01/data.parquet \
  --query "properties.blobTier"

Output: "Cool"

Question 24mediummultiple choice

Read the full network assurance explanation →

A company is designing a data storage solution for streaming IoT telemetry data. The data is JSON-formatted, arrives at up to 10,000 events per second, and must be stored for at least 30 days for real-time dashboards and ad-hoc querying. The solution must minimize operational overhead and query latency. Which Azure service should they use?

Question 25hardmultiple choice

Read the full Design and implement data storage explanation →

You are a data engineer for a financial services company. The company stores sensitive transaction data in Azure Data Lake Storage Gen2. The data is partitioned by date and loaded daily via Azure Data Factory. Recently, an audit found that the storage account allows public network access, and some containers have anonymous read access enabled. You need to secure the storage account according to the principle of least privilege while ensuring that Azure Data Factory can still load data. You must also ensure that data can be accessed by Azure Databricks for analytics. The solution must minimize administrative overhead. Which course of action should you take?

Question 26easymultiple choice

Read the full NAT/PAT explanation →

A retail company uses Azure Synapse Analytics dedicated SQL pool to store sales data. The data is loaded nightly from Azure Data Lake Storage Gen2 using PolyBase. Recently, the load process started failing with the error 'External table 'sales' is not accessible because the location does not exist or is used by another process.' You verify that the storage account, container, and file path are correct. The file is a CSV file named 'sales_20250301.csv' and it exists. Other files in the same container load successfully. What is the most likely cause of the error?

Question 27hardmultiple choice

Read the full NAT/PAT explanation →

A healthcare company stores patient records in Azure Blob Storage. The compliance team requires that all data be encrypted at rest using customer-managed keys (CMK) stored in Azure Key Vault. Additionally, the storage account must be accessible only from a specific virtual network (VNet) and must support versioning to protect against accidental deletion. The storage account is currently using Microsoft-managed keys and has public network access enabled. You need to implement the required changes with minimal downtime. Which course of action should you take?

Question 28easymultiple choice

Read the full network assurance explanation →

A company is designing a data storage solution for IoT device telemetry data. The data is append-only, needs to be stored cost-effectively for long-term analytics, and must support querying by device ID and timestamp. Which Azure storage solution should they use?

Question 29mediummultiple choice

Read the full Design and implement data storage explanation →

A data engineering team is designing a batch processing pipeline that reads from Azure Data Lake Storage Gen2, transforms data using Azure Databricks, and writes to Azure Synapse Analytics. The pipeline must process data incrementally and handle late-arriving data up to 2 hours. Which approach should they use to track processed files?

Question 30hardmultiple choice

Read the full Design and implement data storage explanation →

A company is migrating an on-premises Hadoop cluster to Azure. The cluster uses Hive tables stored as Parquet files on HDFS. They want to minimize changes to existing Hive queries and continue using HiveQL. Which Azure storage solution should they choose?

Question 31mediummulti select

Read the full Design and implement data storage explanation →

Which TWO of the following are recommended practices for designing a data storage solution using Azure Data Lake Storage Gen2?

Question 32hardmulti select

Read the full Design and implement data storage explanation →

Which THREE of the following are valid methods to load data into Azure Synapse Analytics?

Question 33easymultiple choice

Read the full Design and implement data storage explanation →

Which Azure service provides fully managed, serverless relational database capabilities for transactional workloads in a data storage solution?

Question 34mediummultiple choice

Read the full Design and implement data storage explanation →

You need to partition a large Azure SQL Database table by date to improve query performance and manageability. Which partitioning strategy should you use?

Question 35hardmultiple choice

Read the full Design and implement data storage explanation →

You are designing a data storage solution for a global IoT application that ingests millions of events per second. The data is write-heavy with occasional reads for real-time dashboards. Which Azure storage option and configuration would provide the lowest latency writes with high throughput?

Question 36mediummulti select

Read the full Design and implement data storage explanation →

Which of the following are valid methods to secure data at rest in Azure Data Lake Storage Gen2? (Choose two.)

Question 37hardmulti select

Read the full Design and implement data storage explanation →

You are designing a data lake architecture using Azure Data Lake Storage Gen2. You need to optimize query performance for Azure Synapse Analytics serverless SQL. Which three design considerations should you follow? (Choose three.)

Question 38mediummultiple choice

Read the full Design and implement data storage explanation →

Match each Azure data storage service to its primary use case.

Question 39easymultiple choice

Read the full Design and implement data storage explanation →

You are designing a data storage solution for IoT sensor data. The data is written thousands of times per second and requires low-latency reads for real-time dashboards. Which Azure storage solution should you use?

Question 40mediummultiple choice

Read the full Design and implement data storage explanation →

Your company stores sensitive customer data in Azure SQL Database. You need to encrypt the data at rest and ensure that only your application can decrypt it, even from database administrators. What should you implement?

Question 41hardmultiple choice

Read the full Design and implement data storage explanation →

You are designing a data lake on Azure Data Lake Storage Gen2. The data will be used by both batch processing (Spark) and interactive querying (Azure Synapse Serverless SQL). The data is partitioned by date and stored as Parquet. What is the optimal folder structure to minimize cross-partition scans for both workloads?

Question 42mediummulti select

Read the full Design and implement data storage explanation →

You are designing a hybrid data storage architecture for a global e-commerce platform. Which two Azure services should you combine to achieve low-latency read access for users worldwide and durable archival storage for compliance?

Question 43hardmulti select

Read the full Design and implement data storage explanation →

Your company stores JSON documents in Azure Cosmos DB Core (SQL) API. You need to improve query performance for a common filter on the 'status' field and a sort on 'timestamp'. Which three actions should you take?

Question 44hardmultiple choice

Read the full Design and implement data storage explanation →

Match each Azure storage service to its primary use case.

Question 45easymultiple choice

Study the full ACL explanation →

Which Azure storage solution is best suited for storing large volumes of unstructured data, such as log files and media files, and supports both hierarchical namespace and POSIX-like access control lists?

Question 46mediummultiple choice

Read the full Design and implement data storage explanation →

You are designing a data storage solution for a retail company that needs to store transaction data that is frequently updated and requires strong consistency. The solution must support complex queries and joins across multiple tables. Which Azure data service should you recommend?

Question 47hardmulti select

Read the full NAT/PAT explanation →

A multinational corporation is designing a data lake on Azure Data Lake Storage Gen2. The data must be accessible from multiple regions with low latency, but only one region needs writable access. The solution must also comply with data residency requirements. Which two features or configurations should be implemented? (Choose two.)

Question 48mediumdrag order

Read the full Design and implement data storage explanation →

Drag and drop the steps to configure Azure Synapse Analytics serverless SQL pool to query data in Azure Data Lake Storage Gen2 into the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

1Step 1

2Step 2

3Step 3

4Step 4

5Step 5

Question 49mediumdrag order

Read the full Design and implement data storage explanation →

Drag and drop the steps to configure Azure Databricks auto-scaling cluster for ETL workloads into the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

1Step 1

2Step 2

3Step 3

4Step 4

5Step 5

Question 50mediumdrag order

Read the full Design and implement data storage explanation →

Drag and drop the steps to implement Azure Data Lake Storage Gen2 lifecycle management to move data to cool and archive tiers into the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

1Step 1

2Step 2

3Step 3

4Step 4

5Step 5

Question 51mediummatching

Read the full Design and implement data storage explanation →

Match each Azure service to its primary purpose in a data engineering pipeline.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Scalable data lake for analytics workloads

Unified analytics platform with SQL and Spark

Cloud-based ETL and data integration service

Real-time stream processing service

Apache Spark-based analytics platform

Question 52mediummatching

Read the full Design and implement data storage explanation →

Match each Azure security feature to its description.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Role-based access control for Azure resources

Cloud-based identity and access management service

Manage cryptographic keys and secrets

Private connectivity to Azure services over VNet

Question 53mediummatching

Read the full Design and implement data storage explanation →

Match each data storage format to its characteristic.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Columnar storage format optimized for analytics

Row-based format with schema embedded

Columnar format with high compression

ACID transactions on data lakes

Question 54easymultiple choice

Read the full Design and implement data storage explanation →

A company wants to ingest streaming data from IoT devices into Azure for real-time analytics. The data must be available for immediate querying and also stored long-term in a cost-effective format. Which Azure service should be used as the primary ingestion endpoint?

Question 55mediummultiple choice

Read the full Design and implement data storage explanation →

A data engineer is designing a solution to store historical sales data for a retail company. The data is append-only and accessed infrequently for compliance reports. The solution must minimize storage costs while allowing retrieval within 24 hours. Which storage tier should be used for the data?

Question 56hardmultiple choice

Read the full Design and implement data storage explanation →

A company uses Azure Synapse Analytics dedicated SQL pool for data warehousing. They notice that queries against a large fact table are slow. The table is hash-distributed on ProductID, but many queries filter on OrderDate. What should the data engineer do to improve query performance?

Question 57mediummulti select

Read the full Design and implement data storage explanation →

Which TWO Azure services can be used to implement a data lake architecture for storing structured, semi-structured, and unstructured data?

Question 58hardmulti select

Read the full Design and implement data storage explanation →

Which THREE factors should be considered when designing a partitioning strategy for a large fact table in Azure Synapse Analytics dedicated SQL pool?

Question 59mediummultiple choice

Read the full Design and implement data storage explanation →

A data engineer needs to store semi-structured JSON logs for analysis using Azure Synapse Serverless SQL. Which file format should be used for optimal query performance?

Question 60hardmultiple choice

Read the full Design and implement data storage explanation →

A company is migrating its on-premises SQL Server data warehouse to Azure Synapse Analytics. They have a fact table with 2 billion rows and 30 columns. The table is frequently joined on CustomerID and filtered on OrderDate. What is the recommended table design?

Question 61easymultiple choice

Read the full Design and implement data storage explanation →

A data engineer is setting up Azure Data Lake Storage Gen2 for a new project. The security requirement is to prevent direct access to the storage account from the internet while allowing access from a specific virtual network. Which network security feature should be enabled?

Question 62mediummultiple choice

Read the full Design and implement data storage explanation →

A data engineer is designing a solution that uses Azure Data Factory to copy data from an on-premises SQL Server to Azure Synapse Analytics. The data transfer must be encrypted in transit. Which property should be configured in the linked service?

Question 63easymulti select

Read the full Design and implement data storage explanation →

Which TWO are valid methods to load data into Azure Synapse Analytics dedicated SQL pool?

Question 64hardmulti select

Read the full Design and implement data storage explanation →

Which THREE are best practices for optimizing query performance in Azure Synapse Analytics dedicated SQL pool?

Question 65hardmultiple choice

Read the full Design and implement data storage explanation →

A company uses Azure Data Lake Storage Gen2 with a hierarchical namespace. They need to secure access to specific directories using RBAC roles. Which RBAC role should be assigned to a user to grant read and write access to a specific folder without giving access to other folders in the same container?

Question 66mediummultiple choice

Read the full Design and implement data storage explanation →

Your company has an Azure Synapse Analytics dedicated SQL pool. You need to implement a solution that automatically moves data between the 'PRIMARY' filegroup and a secondary filegroup based on data age, without manual intervention. Which feature should you use?

Question 67hardmultiple choice

Read the full NAT/PAT explanation →

You are designing a data storage solution for an Azure Data Lake Storage Gen2 account that will store sensitive customer data. The solution must enforce that all data is encrypted at rest using customer-managed keys (CMK) stored in Azure Key Vault. Additionally, you need to prevent data from being accessed by any Azure service except Azure Synapse Analytics. Which combination of configurations should you implement?

Question 68easymultiple choice

Read the full Design and implement data storage explanation →

You are designing a data storage solution for a retail company. The data includes transactional data that requires low-latency queries (under 10 milliseconds) and large historical data for analytics. The solution must minimize storage costs. Which approach should you recommend?

Question 69mediummultiple choice

Read the full Design and implement data storage explanation →

You are using Azure Synapse Analytics serverless SQL pool to query data in Parquet files stored in Azure Data Lake Storage Gen2. The queries are slow when filtering on a date column. You need to improve query performance without changing the data structure. What should you do?

Question 70hardmultiple choice

Read the full Design and implement data storage explanation →

You have an Azure Data Factory pipeline that loads data from an on-premises SQL Server to an Azure Synapse Analytics dedicated SQL pool. The pipeline uses a staging Azure Blob Storage account. Recently, the pipeline has been failing with timeout errors. You need to ensure the pipeline completes successfully within the scheduled window. What should you do?

Question 71easymultiple choice

Read the full NAT/PAT explanation →

You need to design a storage solution for streaming data from IoT devices. The solution must support real-time analytics and long-term storage for historical analysis. Which combination of Azure services should you use?

Question 72mediummultiple choice

Read the full NAT/PAT explanation →

You are designing a data storage solution for a healthcare organization that stores patient records. The solution must comply with HIPAA and support point-in-time restore (PITR) for the last 35 days. The data is frequently accessed for reporting. Which Azure data service should you use?

Question 73hardmultiple choice

Read the full Design and implement data storage explanation →

Your organization uses Azure Synapse Analytics dedicated SQL pool. You need to implement a solution that reduces storage costs for historical data that is rarely accessed but must be available for querying within minutes. The solution should not require application changes. What should you do?

Question 74easymultiple choice

Read the full Design and implement data storage explanation →

You need to store semi-structured JSON data from a web application. The data schema may change over time. The solution must support low-latency queries and be globally distributed. Which Azure data service should you use?

Question 75mediummulti select

Read the full Design and implement data storage explanation →

You are designing a data storage solution for a manufacturing company that collects sensor data from machines. The data is stored in Azure Data Lake Storage Gen2. You need to ensure that the solution can handle large volumes of streaming data (up to 100 MB/s) and provide real-time dashboards. Which TWO services should you include?

Question 76hardmulti select

Read the full Design and implement data storage explanation →

You are designing a data storage solution for a financial services company. The solution must meet the following requirements: store transaction data for 7 years for regulatory compliance, support point-in-time restore (PITR) for the last 30 days, and minimize storage costs for historical data. Which THREE actions should you take?

Question 77easymulti select

Read the full Design and implement data storage explanation →

You need to design a storage solution for a data lake that will be used by multiple teams for analytics. The solution must support fine-grained access control, versioning of files, and integration with Azure Purview for data cataloging. Which THREE features should you enable in Azure Data Lake Storage Gen2?

Question 78mediummultiple choice

Read the full Design and implement data storage explanation →

You are designing a data storage solution for a global e-commerce company. The company needs to store clickstream data from millions of users with high write throughput and low-latency reads for real-time analytics. The data is semi-structured and includes nested JSON objects. Which Azure data store should you recommend?

Question 79hardmultiple choice

Read the full Design and implement data storage explanation →

You are migrating a large on-premises SQL Server database to Azure Synapse Analytics. The database includes tables with up to 500 million rows and frequent updates. You need to minimize data movement during the migration while ensuring optimal query performance in the dedicated SQL pool. Which table design strategy should you use?

Question 80easymultiple choice

Read the full Design and implement data storage explanation →

Your company stores sensitive customer data in Azure Data Lake Storage Gen2. You need to implement a security solution that prevents unauthorized access from the public internet while allowing access from specific Azure services and on-premises networks. Which feature should you configure?

Question 81mediummultiple choice

Read the full Design and implement data storage explanation →

You are designing a data lake architecture using Azure Data Lake Storage Gen2. The data will be ingested from multiple sources with varying schemas. You need to organize the data in a way that supports both batch and streaming analytics while maintaining data lineage. Which folder structure convention should you use?

Question 82hardmultiple choice

Read the full Design and implement data storage explanation →

You are tuning a dedicated SQL pool in Azure Synapse Analytics. A query that joins two large tables (fact_sales and dim_product) is slow. The fact_sales table is hash-distributed on product_id, and dim_product is replicated. You notice that the query plan shows a shuffle move. What is the most likely cause?

Question 83easymultiple choice

Read the full Design and implement data storage explanation →

You need to store data that is rarely accessed but must be retained for 10 years for compliance. The data will be accessed occasionally for audits. Which Azure storage tier is the most cost-effective?

Question 84mediummultiple choice

Read the full Design and implement data storage explanation →

You are designing a change data capture (CDC) solution to incrementally load data from an on-premises SQL Server database to Azure Synapse Analytics. The source tables have no timestamp columns and you cannot modify the schema. Which Azure service should you use?

Question 85mediummultiple choice

Read the full Design and implement data storage explanation →

Your team needs to provide near-real-time analytics on IoT sensor data streaming into Azure Event Hubs. The data must be stored in Azure Data Lake Storage Gen2 in Parquet format, partitioned by date and device ID. Which architecture should you implement?

Question 86hardmultiple choice

Read the full Design and implement data storage explanation →

You are responsible for securing an Azure Synapse Analytics workspace. The workspace contains dedicated SQL pools and serverless SQL pools. You need to ensure that only users with specific Microsoft Entra ID roles can query serverless SQL pools, while dedicated SQL pools use SQL authentication. What should you do?

Question 87mediummulti select

Read the full Design and implement data storage explanation →

Which TWO of the following are valid methods to load data into a dedicated SQL pool in Azure Synapse Analytics?

Question 88hardmulti select

Read the full Design and implement data storage explanation →

Which THREE of the following are best practices for designing tables in a dedicated SQL pool in Azure Synapse Analytics?

Question 89easymulti select

Read the full Design and implement data storage explanation →

Which TWO of the following Azure services can be used to orchestrate data pipelines that include data transformation?

Question 90mediummultiple choice

Read the full Design and implement data storage explanation →

You execute the above T-SQL in a serverless SQL pool in Azure Synapse Analytics. The external table creation succeeds, but when you query the table, it returns zero rows. The folder 'sales/products/' exists in the container and contains multiple .parquet files. What is the most likely cause?

Exhibit

Refer to the exhibit.

CREATE EXTERNAL DATA SOURCE myDataSource
WITH (
  LOCATION = 'abfss://container@storageaccount.dfs.core.windows.net',
  CREDENTIAL = myCredential
);

CREATE EXTERNAL FILE FORMAT myFileFormat
WITH (
  FORMAT_TYPE = PARQUET,
  DATA_COMPRESSION = 'org.apache.hadoop.io.compress.SnappyCodec'
);

CREATE EXTERNAL TABLE myExternalTable
(
  ProductID int,
  ProductName varchar(100),
  Price decimal(10,2)
)
WITH (
  LOCATION = 'sales/products/',
  DATA_SOURCE = myDataSource,
  FILE_FORMAT = myFileFormat
);

Question 91hardmultiple choice

Read the full Design and implement data storage explanation →

You need to assign permissions to a service principal so that it can write data to a specific container in Azure Data Lake Storage Gen2, but not delete blobs. The above JSON shows the built-in role 'Storage Blob Data Contributor'. The role includes delete permission in DataActions. What should you do?

Exhibit

Refer to the exhibit.

{
  "RoleName": "Storage Blob Data Contributor",
  "Type": "BuiltInRole",
  "Description": "Allows for read, write, and delete access to Azure Storage containers and blobs.",
  "Actions": [
    "Microsoft.Storage/storageAccounts/blobServices/containers/read",
    "Microsoft.Storage/storageAccounts/blobServices/containers/write",
    "Microsoft.Storage/storageAccounts/blobServices/containers/delete"
  ],
  "NotActions": [],
  "DataActions": [
    "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read",
    "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write",
    "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/delete"
  ],
  "NotDataActions": [],
  "AssignableScopes": ["/subscriptions/..."]
}

Question 92easymultiple choice

Read the full Design and implement data storage explanation →

You run the above query on a table named 'visits' in a dedicated SQL pool. The table has 1 billion rows and is hash-distributed on user_id. The query takes a long time. What is the most likely reason?

Exhibit

Refer to the exhibit.

SELECT 
  COUNT(DISTINCT user_id) AS unique_users,
  COUNT(DISTINCT session_id) AS unique_sessions
FROM visits
WHERE date BETWEEN '2024-01-01' AND '2024-01-31';

Question 93mediummultiple choice

Read the full Design and implement data storage explanation →

You are designing a data storage solution for a real-time analytics application that ingests IoT sensor data. The data must be stored in a format that supports both streaming ingestion and batch processing with low latency for queries. Which Azure storage solution should you use?

Question 94hardmulti select

Read the full Design and implement data storage explanation →

You are designing a delta lake architecture in Azure Synapse Analytics. Which TWO practices should you follow to ensure ACID transactions and data consistency?

Question 95hardmultiple choice

Read the full Design and implement data storage explanation →

Your company uses Azure Synapse Analytics dedicated SQL pool for a data warehouse. You notice that queries on a large fact table are slow. The table is hash-distributed on CustomerID and has 60 distributions. After reviewing the query plan, you see that many queries filter on OrderDate. You want to improve query performance without redesigning the table. What should you do?

Question 96easymultiple choice

Read the full Design and implement data storage explanation →

You need to store semi-structured JSON data from a web application that requires low-latency reads and writes at a global scale. The data must be indexed automatically and support SQL-like queries. Which Azure data store should you use?

Question 97mediummulti select

Read the full Design and implement data storage explanation →

You are implementing a data lake using Azure Data Lake Storage Gen2. Which THREE actions should you take to secure the data at rest and in transit?

Question 98hardmultiple choice

Read the full Design and implement data storage explanation →

You are troubleshooting a slow-running query in Azure Synapse Analytics dedicated SQL pool. The query joins a large fact table (hash-distributed on ProductID) with a small dimension table (replicated). Upon reviewing the query plan, you see a 'ShuffleMove' operation. What is the most likely cause of the slow performance?

Question 99easymultiple choice

Read the full Design and implement data storage explanation →

You need to design a data storage solution for a batch processing pipeline that processes petabytes of data daily. The data is stored in Parquet format and must be accessible by both Azure Databricks and Azure Synapse Analytics. Which storage solution should you recommend?

Question 100mediummultiple choice

Read the full Design and implement data storage explanation →

Your team is migrating an on-premises SQL Server data warehouse to Azure Synapse Analytics. The source has a fact table with 500 million rows and several dimension tables. You need to choose the best distribution strategy for the fact table to minimize data movement during joins. Which distribution type should you use?

Question 101hardmulti select

Read the full Design and implement data storage explanation →

You are designing a data storage solution that must support both operational and analytical workloads using a single copy of data. Which THREE technologies should you consider?

Question 102easymultiple choice

Read the full Design and implement data storage explanation →

You need to store log files from multiple applications in a central location for long-term retention and occasional analysis. The data is rarely accessed after 30 days. Which storage solution should you use to minimize cost?

Question 103easymulti select

Read the full Design and implement data storage explanation →

You are designing a data storage solution for a real-time dashboard that displays streaming data from Azure Event Hubs. The data must be stored in a format that supports both real-time and batch analytics with minimal latency. Which TWO technologies should you use?

Question 104hardmultiple choice

Read the full Design and implement data storage explanation →

You are using Azure Synapse Analytics serverless SQL pool to query Parquet files in Azure Data Lake Storage Gen2. The query is slow and you suspect that the file layout is not optimized. You examine the files and find that each file is 50 MB. What should you do to improve query performance?

Question 105easymultiple choice

Read the full Design and implement data storage explanation →

Your company is migrating an on-premises SQL Server database to Azure SQL Database. The database includes a large fact table with hourly updates. You need to minimize downtime during migration. Which Azure service should you use to replicate data continuously?

Question 106mediummultiple choice

Read the full Design and implement data storage explanation →

You are designing a data lake on Azure Data Lake Storage Gen2. The data includes customer PII that must be encrypted at rest using customer-managed keys. Which feature should you enable?

Question 107hardmultiple choice

Read the full Design and implement data storage explanation →

You have an Azure Synapse Analytics dedicated SQL pool with a large fact table partitioned by date. As data grows, query performance on recent data degrades. You need to improve performance for queries filtering on the current month without affecting queries on older data. What should you do?

Question 108easymultiple choice

Read the full Design and implement data storage explanation →

You are designing a data storage solution for real-time streaming data from IoT devices. The data must be stored in its original format for immediate processing and later transformed for analytics. Which Azure service should you use for raw data ingestion?

Question 109mediummultiple choice

Read the full Design and implement data storage explanation →

Your team uses Azure Databricks to process data stored in Azure Data Lake Storage Gen2. You need to ensure that only authorized users can access the data and that access is audited. What should you implement?

Question 110hardmultiple choice

Read the full Design and implement data storage explanation →

You have an Azure Synapse Analytics dedicated SQL pool with a table that uses hash distribution on CustomerID. You notice that queries joining this table with another table on OrderDate are slow. What is the most likely cause?

Question 111easymultiple choice

Read the full Design and implement data storage explanation →

You need to store JSON files from an external partner in Azure Blob Storage. The files contain sensitive financial data. Which access method provides the highest security while allowing the partner to upload files?

Question 112mediummultiple choice

Read the full Design and implement data storage explanation →

Your company uses Azure Data Lake Storage Gen2 for a data lake. You need to implement a folder structure that separates data by sensitivity level. Which access control method should you use?

Question 113hardmultiple choice

Read the full Design and implement data storage explanation →

You are troubleshooting slow COPY INTO performance in Azure Synapse Analytics dedicated SQL pool when loading Parquet files from Azure Data Lake Storage Gen2. The files are 1 GB each. What should you do to improve performance?

Question 114mediummulti select

Read the full Design and implement data storage explanation →

Which TWO factors should you consider when choosing between Azure SQL Database and Azure SQL Managed Instance for migrating a legacy application? (Choose two.)

Question 115hardmulti select

Read the full Design and implement data storage explanation →

Which THREE considerations are important when designing a table distribution strategy for an Azure Synapse Analytics dedicated SQL pool? (Choose three.)

Question 116easymulti select

Read the full Design and implement data storage explanation →

Which TWO features are available in Azure Data Lake Storage Gen2 but not in Azure Blob Storage? (Choose two.)

Question 117mediummultiple choice

Read the full Design and implement data storage explanation →

You are designing a data storage solution for a retail company that expects high volumes of small, time-series sensor data from thousands of IoT devices. The data must be stored cost-effectively and queried by time range with low latency. Which Azure data store should you recommend?

Question 118hardmultiple choice

Read the full NAT/PAT explanation →

A multinational bank needs to store customer transaction records for 10 years to meet regulatory compliance. The data is rarely accessed after the first year. The solution must minimize storage costs while allowing queries on recent data with low latency. Which tiering strategy should you implement?

Question 119easymultiple choice

Read the full Design and implement data storage explanation →

You are designing a data lake for a manufacturing company that will store sensor readings in Parquet format. The data will be used by data scientists for batch training and by analysts for ad-hoc queries. Which Azure service should you use as the primary storage layer?

Question 120mediummultiple choice

Read the full Design and implement data storage explanation →

Your company stores sensitive customer data in Azure Data Lake Storage Gen2. You need to ensure that only authorized users can access the data, and that access is audited. Which approach should you use to control access to the data lake?

Question 121hardmultiple choice

Read the full Design and implement data storage explanation →

You are migrating an on-premises SQL Server database to Azure. The database has a large fact table (500 GB) and several dimension tables (10 GB total). Reporting queries join the fact table with dimension tables and aggregate by date. Which Azure service and table design should you recommend to minimize query latency?

Question 122easymultiple choice

Read the full NAT/PAT explanation →

A healthcare organization needs to store electronic health records (EHR) in a format that supports schema flexibility and complex nested data. The solution must allow fast queries by patient ID and enable analytics with Azure Synapse. Which data store should you choose?

Question 123mediummultiple choice

Read the full Design and implement data storage explanation →

You are designing a solution for a social media company that needs to store user profile data with strong consistency and low latency (under 10 ms) for reads and writes. The data model is simple key-value with occasional queries on secondary attributes. Which Azure data store meets these requirements?

Question 124hardmultiple choice

Read the full Design and implement data storage explanation →

You are using Azure Synapse SQL Pool to store a large fact table partitioned by date. Queries frequently filter on a specific date range and aggregate by a column called 'product_id'. Which table distribution and indexing strategy will minimize query execution time?

Question 125easymultiple choice

Read the full Design and implement data storage explanation →

A logistics company needs to store delivery tracking data that is updated frequently by multiple services. The solution must support transactions across multiple documents and provide real-time analytics. Which Azure service should you recommend?

Question 126mediummulti select

Read the full Design and implement data storage explanation →

Which TWO Azure services can be used to implement a polyglot persistence architecture for an e-commerce application that requires both a relational database for orders and a document database for product catalogs?

Question 127hardmulti select

Read the full Design and implement data storage explanation →

Which THREE factors should you consider when choosing a shard key for Azure Cosmos DB to ensure even distribution and optimal performance?

Question 128mediummulti select

Read the full Design and implement data storage explanation →

Which TWO Azure Blob Storage access tiers are suitable for data that must be available within milliseconds but is accessed infrequently (e.g., once per month)?

Question 129easymultiple choice

Read the full Design and implement data storage explanation →

A data engineer needs to store semi-structured JSON logs from IoT devices. The data will be queried using SQL and must support high-throughput writes. Which Azure data store is most appropriate?

Question 130mediummultiple choice

Read the full Design and implement data storage explanation →

A company is designing a data lake on Azure Data Lake Storage Gen2. They need to enforce row-level security on the data for different departments. Which approach should they use?

Question 131hardmultiple choice

Read the full NAT/PAT explanation →

A healthcare organization stores patient data in Azure SQL Database. They need to encrypt sensitive columns (e.g., SSN) such that only authorized users can decrypt the data at query time. Which feature should they use?

Question 132easymulti select

Read the full Design and implement data storage explanation →

Which TWO options are valid methods to load data from on-premises SQL Server into Azure Synapse Analytics?

Question 133mediummulti select

Read the full Design and implement data storage explanation →

Which THREE considerations should be evaluated when designing a partitioning strategy for a large fact table in Azure Synapse Dedicated SQL Pool?

Question 134hardmulti select

Read the full Design and implement data storage explanation →

Which TWO actions should be taken to secure data at rest in Azure Data Lake Storage Gen2?

Question 135easymultiple choice

Read the full Design and implement data storage explanation →

Refer to the exhibit. An Azure Policy is defined to enforce network security on storage accounts. What does this policy do?

Exhibit

Refer to the exhibit.

{
  "policyRule": {
    "if": {
      "field": "type",
      "equals": "Microsoft.Storage/storageAccounts"
    },
    "then": {
      "effect": "deny",
      "details": {
        "field": "Microsoft.Storage/storageAccounts/networkAcls.defaultAction",
        "equals": "Allow"
      }
    }
  }
}

Question 136mediummultiple choice

Read the full Design and implement data storage explanation →

Refer to the exhibit. A data engineer creates an external table in Azure Synapse Serverless SQL. Which statement about this table is correct?

Exhibit

Refer to the exhibit.

CREATE EXTERNAL DATA SOURCE MyDataSource
WITH (
  LOCATION = 'abfss://container@storageaccount.dfs.core.windows.net',
  TYPE = HADOOP,
  CREDENTIAL = MyCredential
);

CREATE EXTERNAL FILE FORMAT ParquetFormat
WITH (
  FORMAT_TYPE = PARQUET,
  DATA_COMPRESSION = 'org.apache.hadoop.io.compress.SnappyCodec'
);

CREATE EXTERNAL TABLE dbo.Sales
(
  SaleID INT,
  ProductID INT,
  Quantity INT,
  SaleDate DATE
)
WITH (
  LOCATION = '/sales/',
  DATA_SOURCE = MyDataSource,
  FILE_FORMAT = ParquetFormat
);

Question 137hardmultiple choice

Read the full Design and implement data storage explanation →

Refer to the exhibit. A Bicep file is used to deploy an Azure Synapse Analytics workspace. What is the purpose of the 'purviewConfiguration' property?

Exhibit

Refer to the exhibit.

{
  "properties": {
    "dataLakeStorageAccountDetails": [
      {
        "accountUrl": "https://mystorageaccount.dfs.core.windows.net"
      }
    ],
    "defaultDataLakeStorage": {
      "accountUrl": "https://mystorageaccount.dfs.core.windows.net",
      "filesystem": "synapseworkspace"
    },
    "sqlAdministratorLogin": "adminuser",
    "sqlAdministratorLoginPassword": "",
    "managedResourceGroupName": "managedRG",
    "purviewConfiguration": {
      "purviewResourceId": "/subscriptions/sub-id/resourceGroups/rg/providers/Microsoft.Purview/accounts/purview-account"
    },
    "encryption": {
      "cmk": {
        "key": {
          "name": "cmk-key",
          "keyVaultUrl": "https://kv.vault.azure.net/"
        }
      }
    }
  }
}

Question 138easymultiple choice

Read the full Design and implement data storage explanation →

A data engineer needs to store log data from multiple applications in Azure. The data is append-only, heavily compressed, and queried infrequently. Cost minimization is critical. Which storage solution is best?

Question 139mediummultiple choice

Read the full Design and implement data storage explanation →

A financial services company is migrating its data warehouse to Azure Synapse Analytics. They have a star schema with a 10-billion-row fact table and 50 dimension tables. Query performance is critical, and they need to minimize data movement during joins. Which distribution strategy should they use for the fact table?

Question 140hardmultiple choice

Read the full Design and implement data storage explanation →

A data engineering team uses Azure Data Factory to load data from Azure SQL Database to Azure Data Lake Storage Gen2. They notice that the pipeline runs fail intermittently due to transient errors. They need to implement a retry policy with exponential backoff. What is the most efficient way to achieve this?

Question 141mediummulti select

Read the full Design and implement data storage explanation →

Which THREE factors should be considered when choosing between Azure Synapse Dedicated SQL Pool and Azure SQL Database for a data warehouse workload?

Question 142hardmulti select

Read the full Design and implement data storage explanation →

Which TWO strategies can be used to optimize storage costs for historical data in Azure Data Lake Storage Gen2?

Question 143mediummultiple choice

Read the full Design and implement data storage explanation →

Refer to the exhibit. An ARM template deploys an Azure Synapse Analytics workspace. What is the purpose of the 'managedVirtualNetwork' property set to 'default'?

Exhibit

Refer to the exhibit.

{
  "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
  "contentVersion": "1.0.0.0",
  "resources": [
    {
      "type": "Microsoft.Synapse/workspaces",
      "apiVersion": "2021-06-01",
      "name": "myworkspace",
      "location": "eastus",
      "properties": {
        "defaultDataLakeStorage": {
          "accountUrl": "https://mystorageaccount.dfs.core.windows.net",
          "filesystem": "synapse"
        },
        "sqlAdministratorLogin": "admin",
        "sqlAdministratorLoginPassword": "P@ssw0rd123!",
        "managedVirtualNetwork": "default"
      },
      "identity": {
        "type": "SystemAssigned"
      }
    }
  ]
}

Question 144easymultiple choice

Read the full Design and implement data storage explanation →

A company uses Azure Synapse Analytics dedicated SQL pool. They need to load data from Azure Data Lake Storage Gen2 (ADLS Gen2) incrementally. Which PolyBase external table configuration supports incremental loading without reprocessing historical data?

Question 145mediummultiple choice

Read the full network assurance explanation →

You are designing a data storage solution for real-time analytics on IoT telemetry. The system must ingest 10,000 events per second and support sub-second query latency. Which Azure data store should you use?

Question 146hardmultiple choice

Read the full Design and implement data storage explanation →

A company uses Azure Synapse Analytics serverless SQL pool to query data in ADLS Gen2. Users report that queries against Parquet files are slow. What should you recommend to improve query performance?

Question 147mediummultiple choice

Read the full Design and implement data storage explanation →

You are migrating an on-premises SQL Server database to Azure Synapse Analytics dedicated SQL pool. The database includes a table with 500 million rows that is frequently queried by date range. Which distribution strategy should you use for this table?

Question 148easymultiple choice

Read the full Design and implement data storage explanation →

A data engineer needs to store semi-structured JSON logs from multiple sources in Azure. The logs must be queryable using T-SQL and support schema-on-read. Which Azure service should be used?

Question 149hardmultiple choice

Read the full Design and implement data storage explanation →

You are designing a data lake architecture for a healthcare company. The solution must support fine-grained access control at the file level, encryption at rest and in transit, and integration with Microsoft Purview for data lineage. Which storage solution should you recommend?

Question 150mediummultiple choice

Read the full Design and implement data storage explanation →

A company uses Azure Synapse Analytics dedicated SQL pool. They notice that some queries are slow due to high data movement. What should you do to minimize data movement for queries that join large fact tables?

Question 151easymultiple choice

Read the full Design and implement data storage explanation →

You need to store historical sales data for 10 years with infrequent queries. The storage cost must be minimized while retaining the ability to query using Azure Synapse serverless SQL pool. Which storage tier should you use?

Question 152hardmultiple choice

Read the full Design and implement data storage explanation →

A financial services company needs to store transaction data for audit purposes. The data must be immutable and cannot be modified or deleted for 7 years. Which Azure storage feature should be used?

Question 153mediummulti select

Read the full Design and implement data storage explanation →

Which TWO actions should you take to optimize query performance in Azure Synapse Analytics dedicated SQL pool when working with large fact tables?

Question 154hardmulti select

Read the full Design and implement data storage explanation →

Which THREE components are required to implement a modern data warehouse architecture on Microsoft Azure using Azure Synapse Analytics?

Question 155easymulti select

Read the full Design and implement data storage explanation →

Which TWO Azure services can be used to ingest streaming data into Azure Synapse Analytics?

Question 156mediummultiple choice

Read the full Design and implement data storage explanation →

You are reviewing an ARM template snippet for an Azure Blob Storage container. What is the effect of this configuration?

Exhibit

Refer to the exhibit.

```json
{
  "type": "Microsoft.Storage/storageAccounts/blobServices/containers/immutabilityPolicies",
  "apiVersion": "2021-09-01",
  "name": "audit-container/policy",
  "properties": {
    "immutabilityPeriodSinceCreationInDays": 2555,
    "allowProtectedAppendWrites": true
  }
}
```

Question 157hardmultiple choice

Read the full Design and implement data storage explanation →

You are examining a T-SQL script that creates an external table in Azure Synapse serverless SQL pool. The query SELECT * FROM dbo.Sales returns zero rows, but the folder /year=2024/ in ADLS Gen2 contains Parquet files. What is the most likely cause?

Exhibit

Refer to the exhibit.

```
CREATE EXTERNAL DATA SOURCE SalesData
WITH (
    LOCATION = 'https://datalakegen2.dfs.core.windows.net/sales',
    CREDENTIAL = StorageCred
);

CREATE EXTERNAL FILE FORMAT ParquetFormat
WITH (
    FORMAT_TYPE = PARQUET,
    DATA_COMPRESSION = 'org.apache.hadoop.io.compress.SnappyCodec'
);

CREATE EXTERNAL TABLE dbo.Sales (
    SaleID INT,
    ProductID INT,
    SaleDate DATE,
    Amount DECIMAL(10,2)
)
WITH (
    LOCATION = '/year=2024/',
    DATA_SOURCE = SalesData,
    FILE_FORMAT = ParquetFormat
);
```

Question 158hardmultiple choice

Read the full Design and implement data storage explanation →

You are a data engineer for a large e-commerce company. The company uses Azure Synapse Analytics dedicated SQL pool as its enterprise data warehouse. A new business requirement mandates that the Sales fact table, which contains 2 billion rows, must support real-time analytics with a maximum query latency of 1 second for aggregations on the most recent 24 hours of data. The table is currently hash-distributed on CustomerID and partitioned monthly by SaleDate. The current query performance for recent data is slow due to full partition scans. The data is ingested via Azure Event Hubs and processed by Azure Stream Analytics, which writes to staging tables every minute. You need to redesign the storage to meet the latency requirement while minimizing cost and maintaining data integrity. Which approach should you take?

Question 159mediummultiple choice

Read the full Design and implement data storage explanation →

A company is designing a data lake in Azure Data Lake Storage Gen2 (ADLS Gen2) to store IoT sensor data from millions of devices. The data is ingested in Parquet format, partitioned by date and device ID. The analytics team frequently queries the last 30 days of data for specific device types. Which partition strategy minimizes query cost and optimizes performance?

Question 160hardmultiple choice

Read the full Design and implement data storage explanation →

You are designing a solution to store semi-structured JSON logs from a web application in Azure Cosmos DB. The logs are written once and rarely read. The application writes up to 10,000 documents per second, and each document is about 2 KB. You need to minimize RU/s cost. Which API and indexing policy should you choose?

Question 161easymultiple choice

Read the full Design and implement data storage explanation →

Your company uses Azure Synapse Analytics dedicated SQL pool to store a fact table with 2 billion rows. You need to improve query performance for a workload that frequently aggregates sales by date and product category. Which distribution and index type should you use?

Question 162mediummultiple choice

Read the full Design and implement data storage explanation →

You are designing a change data capture (CDC) pipeline to ingest incremental changes from an on-premises SQL Server database into Azure Data Lake Storage Gen2. The pipeline must run every 5 minutes and handle high-volume DML changes. Which Azure service should you use to capture the changes with low latency?

Question 163hardmultiple choice

Read the full Design and implement data storage explanation →

You have an Azure Synapse Analytics dedicated SQL pool with a large fact table partitioned by month. You notice that queries filtering on a specific month still scan all partitions. The table has a clustered columnstore index. What is the most likely cause?

Question 164easymultiple choice

Read the full Design and implement data storage explanation →

You need to store streaming data from Azure Event Hubs into Azure Data Lake Storage Gen2 in near real-time. The data should be stored in Avro format with a folder structure: /raw/{eventhub}/{yyyy}/{MM}/{dd}/{HH}/{mm}. Which Azure service should you use to ingest the data?

Question 165mediummultiple choice

Read the full Design and implement data storage explanation →

You are designing a solution to store large amounts of log data that is written once and accessed rarely. The data must be retained for 7 years for compliance. After 30 days, the data should be moved to a lower-cost storage tier. After 1 year, the data should be archived. Which Azure Storage lifecycle management policy should you implement for an Azure Data Lake Storage Gen2 account?

Question 166hardmultiple choice

Read the full Design and implement data storage explanation →

You are designing a data storage solution for a global e-commerce company. The company's analytics team uses Azure Synapse Serverless SQL to query Parquet files in ADLS Gen2. The data is partitioned by year, month, and day. The team frequently runs queries that aggregate sales by product category across the last 30 days. However, the queries are slow and scanning too much data. What should you do to improve performance?

Question 167easymultiple choice

Read the full Design and implement data storage explanation →

Your company uses Azure Cosmos DB for NoSQL to store user profiles. The application frequently reads profiles by user ID (the partition key). Occasionally, the application needs to query by email address, which is not part of the partition key. What should you do to optimize the occasional queries by email?

Question 168mediummulti select

Read the full Design and implement data storage explanation →

Which TWO factors should you consider when choosing between Azure SQL Database and Azure Cosmos DB for a transactional workload that requires low-latency reads and writes globally?

Question 169hardmulti select

Read the full Design and implement data storage explanation →

Which THREE statements are true about partitioning in Azure Synapse Analytics dedicated SQL pool?

Question 170mediummulti select

Read the full Design and implement data storage explanation →

Which TWO Azure services can be used to implement a data lakehouse architecture with Delta Lake?

Question 171hardmultiple choice

Read the full Design and implement data storage explanation →

Refer to the exhibit. You are reviewing an Azure Cosmos DB for NoSQL container configuration. The container stores customer orders. The application frequently queries orders by orderId. However, these queries are consuming high RUs and are slow. What is the most likely cause?

Exhibit

{
  "name": "CustomerOrders",
  "properties": {
    "ResourceType": "Microsoft.DocumentDB/databaseAccounts/sqlDatabases/containers",
    "Options": {
      "throughput": 400
    },
    "Resource": {
      "id": "CustomerContainer",
      "partitionKey": {
        "paths": ["/customerId"],
        "kind": "Hash"
      },
      "indexingPolicy": {
        "indexingMode": "consistent",
        "automatic": true,
        "includedPaths": [
          {
            "path": "/*"
          }
        ],
        "excludedPaths": [
          {
            "path": "/\"_etag\"/?"
          }
        ]
      },
      "uniqueKeyPolicy": {
        "uniqueKeys": [
          {
            "paths": ["/orderId"]
          }
        ]
      }
    }
  }
}

Question 172hardmultiple choice

Read the full NAT/PAT explanation →

You are a data engineer at a healthcare analytics company. The company stores patient records in an Azure Data Lake Storage Gen2 account organized by /patient/{patientId}/year={yyyy}/month={MM}/day={dd}/*.parquet. There are 10,000 patients, and each patient has about 1 GB of data per year. The data is used by data scientists who run ad-hoc queries using Azure Synapse Serverless SQL. They complain that queries scanning multiple patients over the last year take too long and consume too much data. They often need to filter by patientId and a date range. You need to improve query performance and reduce the amount of data scanned. You cannot change the folder structure because it is used by other processes. What should you do?

Question 173mediummultiple choice

Read the full Design and implement data storage explanation →

You are designing a near real-time analytics solution for a retail company. The company has a transactional database in Azure SQL Database that records sales transactions. The data must be available in Azure Synapse Analytics dedicated SQL pool for reporting with less than 15 minutes of latency. The data volume is about 10 GB per day. You need to design the data ingestion pipeline. You also need to ensure that the pipeline can handle schema changes (e.g., new columns added to the source table) without manual intervention. Which approach should you use?

Question 174easymultiple choice

Read the full Design and implement data storage explanation →

You are a data engineer at a financial services company. The company uses Azure Cosmos DB for NoSQL to store customer transaction data. The data is partitioned by customerId. The application team needs to run analytical queries that aggregate transactions by date across all customers. These queries are currently slow and consume high RUs. You need to enable faster analytical queries without impacting the transactional workload. What should you do?

Question 175mediummultiple choice

Read the full Design and implement data storage explanation →

You are designing a data storage solution for a retail company that needs to store semi-structured JSON data from IoT sensors. The data is ingested continuously and must support both real-time analytics and batch processing. Which Azure data store should you recommend?

Question 176hardmultiple choice

Read the full Design and implement data storage explanation →

Your team is migrating an on-premises SQL Server data warehouse to Azure Synapse Analytics. The source data includes fact tables and dimension tables with complex relationships. You need to design the storage in Azure Synapse to minimize query latency for star schema queries. Which distribution and index strategy should you use for the fact table?

Question 177easymultiple choice

Read the full Design and implement data storage explanation →

You need to store log data from multiple Azure services in a single location for long-term retention and cost-effective querying. The data is append-only and rarely modified. Which storage solution should you use?

Question 178mediummulti select

Read the full NAT/PAT explanation →

You are designing a data storage solution for a healthcare company that must comply with HIPAA. The solution needs to store structured patient records and unstructured medical images. Data must be encrypted at rest and in transit. Which TWO storage solutions meet these requirements?

Question 179hardmulti select

Read the full Design and implement data storage explanation →

Your company uses Azure Synapse Analytics for a data warehouse. The fact table is 500 GB and distributed by hash on CustomerID. You notice that queries joining the fact table with the Customer dimension table are slow due to data movement. The Customer dimension table is 10 GB. Which THREE actions should you take to improve query performance?

Question 180easymulti select

Read the full network assurance explanation →

You need to design a storage solution for IoT device telemetry data that will be queried by time range. The data is append-only and arrives at high velocity. Which TWO features should you use to optimize query performance and reduce costs?

Question 181hardmultiple choice

Read the full NAT/PAT explanation →

You are reviewing an Azure Data Factory dataset JSON definition for a data lake. The dataset is used in a copy activity that loads sales data into Azure Data Lake Storage Gen2. The pipeline runs successfully, but you notice that the output file always overwrites the previous file with the name 'sales.parquet' regardless of the folderPath parameter. What is the most likely cause?

Exhibit

Refer to the exhibit.
{
  "dataLakeStorage": {
    "type": "AzureBlobFS",
    "linkedServiceName": "LS_ADLSGen2",
    "parameters": {
      "folderPath": {
        "value": "@dataset().folderPath",
        "type": "Expression"
      }
    },
    "typeProperties": {
      "folderPath": {
        "value": "@{dataset().folderPath}",
        "type": "Expression"
      },
      "fileName": "sales.parquet",
      "format": {
        "type": "ParquetFormat"
      }
    }
  },
  "parameters": {
    "folderPath": {
      "type": "string"
    }
  }
}

Question 182hardmultiple choice

Read the full NAT/PAT explanation →

You are a data engineer for a multinational e-commerce company. The company uses Azure Synapse Analytics as its data warehouse. The current fact table, SalesFact, is distributed using hash distribution on the CustomerID column. It has 2 billion rows and is 2 TB in size. Recently, the business team has been running many queries that aggregate sales by product category and date, and these queries are experiencing high data movement and long execution times. The product dimension table (ProductDim) has 100,000 rows and is 100 MB. The date dimension table (DateDim) has 5,000 rows and is 5 MB. You need to redesign the storage to minimize data movement for these aggregation queries. You cannot change the fact table distribution key to ProductID because of other critical queries that rely on CustomerID. What should you do?

Question 183mediummultiple choice

Read the full Design and implement data storage explanation →

Your company is building a real-time analytics solution for monitoring manufacturing equipment. Sensors send JSON data every second to an Azure Event Hubs instance. The data must be stored in Azure Data Lake Storage Gen2 in Parquet format, partitioned by date and hour. You use Azure Stream Analytics to read from Event Hubs and write to ADLS Gen2. Currently, the output is writing many small Parquet files (under 1 MB each), which is causing performance issues when reading the data. You need to optimize the output to produce fewer, larger files while maintaining low latency. What should you do?

Question 184easymultiple choice

Read the full Design and implement data storage explanation →

You are designing a data storage solution for a marketing analytics platform. The platform collects clickstream data from websites and needs to store it for both real-time dashboards and historical analysis. The data is semi-structured (JSON) and arrives at a rate of 10,000 events per second. You need to choose an Azure storage solution that can handle the ingestion rate, support schema-on-read, and integrate with Azure Databricks for advanced analytics. The solution must also be cost-effective for long-term storage. What should you use?

Question 185mediummultiple choice

Study the full ACL explanation →

You are a data engineer for a financial services company. The company uses Azure Data Lake Storage Gen2 as its data lake. You have a directory structure where each customer has a folder containing transaction files in CSV format. The security team requires that each customer's data be accessible only to that customer's users. You need to implement fine-grained access control using Azure Data Lake Storage Gen2's POSIX-like ACLs. However, you have thousands of customers, and managing ACLs individually is not feasible. What should you do?

Question 186hardmultiple choice

Read the full Design and implement data storage explanation →

Your company uses Azure Synapse Analytics for its enterprise data warehouse. The main fact table, OrdersFact, is distributed using hash on OrderID. It has 10 billion rows. The table is partitioned by month. Recently, the data engineering team added a new column 'OrderStatus' that is used in many queries with filters like 'WHERE OrderStatus = 'Shipped''. These queries are scanning the entire table because the partition pruning is not effective. You need to improve query performance for these status-based queries without redesigning the entire table. What should you do?

Question 187easymultiple choice

Read the full NAT/PAT explanation →

You are tasked with designing a data storage solution for a social media analytics company. They need to store user profile data (JSON) and social media posts (text and images). The data is used for machine learning models that require fast random access to individual user profiles and the ability to run analytical queries over posts. The solution must provide low-latency reads for user profiles (milliseconds) and support for large-scale analytics on posts. Which combination of Azure data services should you recommend?

Question 188mediummultiple choice

Read the full Design and implement data storage explanation →

Your organization is implementing a data lake using Azure Data Lake Storage Gen2. You have a folder structure like '/data/landing/' for raw data and '/data/curated/' for cleaned data. The data is ingested daily from various sources. You need to ensure that data in the curated zone is immutable and cannot be modified or deleted by anyone, including administrators, for compliance reasons. However, data in the landing zone should be modifiable. What should you do?

Question 189hardmultiple choice

Read the full Design and implement data storage explanation →

You are a data engineer for a gaming company that uses Azure Data Lake Storage Gen2. The data lake stores player event data in JSON format. The data is organized by date and event type. The analytics team frequently runs queries that filter by player ID to analyze individual player behavior. These queries are slow because they scan entire daily partitions. You need to improve the performance of queries that filter by player ID without restructuring the entire data lake. The data is stored as JSON files. What should you do?

Question 190easymulti select

Read the full Design and implement data storage explanation →

You are designing a data storage solution for a retail company that needs to store semi-structured IoT sensor data from thousands of devices. The data is ingested in near real-time, and queries will involve filtering by device ID and timestamp. The solution must minimize storage costs while supporting interactive queries. Which TWO Azure data storage options are most appropriate?

Question 191hardmultiple choice

Read the full Design and implement data storage explanation →

You are a data engineer at a financial services company. The company uses Azure Synapse Analytics with a dedicated SQL pool for its data warehouse. The current table 'FactTransactions' is 2 TB and uses round-robin distribution. Query performance is poor for queries that frequently filter on 'CustomerID' and join with a 'DimCustomer' table (10 GB, replicated). You need to redesign the table to improve query performance while minimizing data movement during queries. The solution must also support incremental data loading with minimal overhead. You cannot change the storage size limit or add more DWU. What should you do?