MLS-C01 Exploratory Data Analysis • Set 27
MLS-C01 Exploratory Data Analysis Practice Test 27 — 15 questions with explanations. Free, no signup.
A data scientist is performing EDA on a dataset containing customer transaction records. The dataset includes columns: 'transaction_id', 'customer_id', 'transaction_amount', 'transaction_date', and 'product_category'. The data scientist wants to check for duplicate transactions and identify any suspicious patterns, such as multiple transactions from the same customer on the same day with the same amount. The dataset has 5 million rows. The data scientist is using a SageMaker Studio notebook with a ml.t3.medium instance. The data is stored in S3. What is the most efficient way to perform this analysis?