Back to CompTIA Data+ DA0-001 questions

Scenario-based practice

Hard Difficulty Questions

Practise CompTIA Data+ DA0-001 practice questions — original exam-style scenarios covering every exam domain, with detailed explanations, wrong-answer analysis, and common exam traps.

20
scenario questions
DA0-001
exam code
CompTIA
vendor

Scenario guide

How to approach hard difficulty questions

These are the questions most candidates get wrong. They require connecting multiple concepts, reading tricky output, or knowing edge-case behaviour that isn't on most study cards. Practising them trains you to operate under uncertainty — a necessary skill on the real exam.

Quick answer

Hard Difficulty Questions questions test whether you can apply the concept in context, not just recognise a definition.

How the topic appears in realistic exam-style scenarios.

Which detail in the question changes the correct answer.

How to eliminate plausible but wrong options.

How to connect the question back to the wider exam objective.

Related practice questions

Related DA0-001 topic practice pages

Scenario questions usually connect to one or more exam topics. Use these links to review the underlying concepts behind the scenario.

Practice set

Practice scenarios

Question 1hardmultiple choice
Full question →

A company is analyzing customer feedback sentiment. The dataset is highly imbalanced with 95% positive and 5% negative comments. Which technique should the analyst use to address class imbalance before modeling?

Question 2hardmultiple choice
Full question →

A data analyst creates a bubble chart showing country GDP (x-axis), life expectancy (y-axis), and population (bubble size). However, large bubbles overlap and obscure many data points. Which corrective action should the analyst take?

Question 3hardmulti select
Full question →

Which THREE factors should be considered when choosing a chart type for a dataset?

Question 4hardmultiple choice
Full question →

A data analyst creates a heatmap to show website click-through rates by hour and day of week. The heatmap uses a green-to-red gradient, but users cannot distinguish between moderate values. What is the best fix?

Question 5hardmultiple choice
Read the full NAT/PAT explanation →

A data engineer is designing a data warehouse for a multinational corporation. The company has sales data from different regions with varying currencies and date formats. To ensure consistency, which data concept should be applied to standardize the data before loading into the warehouse?

Question 6hardmultiple choice
Full question →

A data team is preparing a dashboard for executives. The team wants to highlight key performance indicators (KPIs) that are below target. Which of the following visualization techniques would most effectively draw attention to underperforming metrics without causing confusion?

Question 7hardmultiple choice
Full question →

Based on the exhibit, what is the most likely cause of the import failure?

Exhibit

Refer to the exhibit.

Data Import Log:
[2024-03-15 10:22:34] INFO: Starting import from source 'sales_raw.csv'
[2024-03-15 10:22:35] WARN: Row 1502: 'price' field contains non-numeric value '12.5A'. Skipping row.
[2024-03-15 10:22:36] ERROR: Row 3450: 'date' field value '2024-02-30' is invalid. Import halted.
[2024-03-15 10:22:36] INFO: Import process terminated with errors.
Question 8hardmultiple choice
Full question →

A data analyst is troubleshooting a report that shows unusually high sales for a specific product. Upon investigation, the analyst finds that the product was returned by several customers, but the returns were recorded in a separate system and not reflected in the sales data. Which data integration concept was likely missing?

Question 9hardmultiple choice
Full question →

A data analyst is reviewing the error log from a nightly batch load. What is the most likely cause of the error?

Exhibit

Refer to the exhibit.

Error log from a data pipeline:

[2025-03-15 10:32:14] ERROR: Duplicate key value violates unique constraint 'order_pkey'
[2025-03-15 10:32:14] Detail: Key (order_id)=(12345) already exists.
[2025-03-15 10:32:15] WARNING: Batch load incomplete. 4999 of 5000 rows inserted.
Question 10hardmulti select
Full question →

Which THREE of the following are valid methods for handling missing data?

Question 11hardmultiple choice
Full question →

A company is merging two databases from different departments. In Database A, customer IDs are integers. In Database B, customer IDs are alphanumeric strings. To merge, the data analyst must reconcile these differences. Which step should be taken first?

Question 12hardmultiple choice
Full question →

An organization needs to acquire data from a third-party vendor. The data will be used for regulatory reporting. Which of the following should be the primary consideration before acquiring the data?

Question 13hardmultiple choice
Full question →

A data analyst is building a model to predict customer churn. The dataset has 10,000 records with 500 churned customers. The model predicts churn with 95% accuracy, but only identifies 10% of actual churners. Which metric best highlights this issue?

Question 14hardmultiple choice
Full question →

A data analyst sees this error in the ETL logs. What is the most likely cause?

Exhibit

Refer to the exhibit.

Error log:

```
2025-03-15 10:23:45 ERROR: ORA-12034: materialized view log on "SCHEMA"."SALES" is newer than last refresh
2025-03-15 10:23:45 INFO: Refresh of materialized view "SALES_MV" failed
```
Question 15hardmultiple choice
Full question →

A data scientist trains a regression model and observes high variance with low bias. Which technique is most appropriate to reduce variance?

Question 16hardmulti select
Full question →

Which THREE of the following are appropriate methods to handle outliers in a dataset?

Question 17hardmultiple choice
Full question →

An analyst presents a report to stakeholders who are not data-savvy. The report includes a box plot showing the distribution of customer satisfaction scores. One stakeholder asks, 'What do the whiskers mean?' Which communication strategy should the analyst use?

Question 18hardmultiple choice
Full question →

An analyst creates a stacked bar chart showing quarterly sales by product category. The chart becomes hard to read because some categories have very small contributions. Which redesign is most effective?

Question 19hardmultiple choice
Full question →

During a presentation, a stakeholder questions the validity of a data insight because the sample size appears small. The analyst knows the sample is statistically significant. What is the best way to address this concern?

Question 20hardmultiple choice
Full question →

A data analyst is reviewing an S3 bucket policy that controls access to a data lake. The analyst wants to communicate that the current policy restricts data downloads to a specific IP range. Which of the following best describes the policy's effect?

Exhibit

Refer to the exhibit.

```
{
  "policy": {
    "effect": "Deny",
    "action": ["s3:GetObject"],
    "resource": "arn:aws:s3:::data-bucket/*",
    "condition": {
      "IpAddress": {
        "aws:SourceIp": "192.0.2.0/24"
      }
    }
  }
}
```

These DA0-001 practice questions are part of Courseiva's free CompTIA certification practice question bank. Courseiva provides original exam-style DA0-001 questions with detailed explanations, topic-based practice, mock exams, readiness tracking, and study analytics.