DP-900Chapter 15 of 101Objective 1.1

Data Roles: Engineer, Analyst, Scientist, DBA

This chapter covers the four primary data roles recognized by the DP-900 exam: Data Engineer, Data Analyst, Data Scientist, and Database Administrator (DBA). Understanding these roles is critical because the exam expects you to identify which role performs specific tasks, especially when presented with scenario-based questions. Approximately 10-15% of DP-900 questions touch on data roles and responsibilities, often as part of larger questions about data processing or analytics. By mastering this chapter, you'll be able to quickly eliminate wrong answers and select the correct role for any given task.

25 min read
Intermediate
Updated May 31, 2026

Data Roles as a Construction Crew

Imagine building a skyscraper. The Data Engineer is the general contractor and the concrete/steel workers. They design the foundation (data storage), pour the concrete (ingest data), install the plumbing (data pipelines), and ensure the building can withstand load (scalability). They don't decide where the offices go—they just build the structure so it's solid and efficient. The Data Analyst is the interior designer. They walk through the completed floors (cleaned, structured data), measure rooms (query data), and decide where to put desks and chairs (create reports and dashboards). They make the space functional for people, but they don't change the load-bearing walls. The Data Scientist is the architect who tests wind resistance and energy efficiency. They build scale models (ML models), run simulations (experiment with algorithms), and recommend structural changes (data transformations) to save energy or improve safety. They don't build the building themselves, but their insights change how the building is used. The Database Administrator (DBA) is the building superintendent. They maintain the HVAC (monitor performance), change filters (backup/restore), and ensure the fire suppression system works (security). They keep the building running day-to-day, but they don't redesign the layout. Each role has distinct tools and responsibilities, but they must collaborate—a building fails if the architect ignores the contractor's limitations or the interior designer blocks a fire exit.

How It Actually Works

What Are Data Roles and Why Do They Exist?

Data roles are specialized job functions within an organization that handle different aspects of data management, processing, analysis, and governance. They exist because modern data systems are complex—no single person can efficiently design pipelines, clean data, build models, and maintain databases. The DP-900 exam tests your ability to distinguish these roles by the tools they use and the tasks they perform.

Data Engineer

Responsibilities: A Data Engineer designs, builds, and maintains the infrastructure and pipelines that collect, store, and process data. They ensure data is available, reliable, and efficient for downstream consumers.

Key tasks on the exam: - Designing and implementing data ingestion (e.g., Azure Data Factory, Azure Event Hubs) - Building and maintaining data pipelines (e.g., Azure Databricks, Azure Synapse Pipelines) - Managing data storage solutions (e.g., Azure Blob Storage, Azure Data Lake Storage Gen2) - Performing ETL/ELT operations (Extract, Transform, Load) - Ensuring data quality and schema evolution - Optimizing data processing for performance and cost (e.g., partitioning, indexing)

Tools commonly associated: Azure Data Factory, Azure Databricks, Azure Synapse Analytics, Azure Stream Analytics, Azure Data Lake Storage, Azure Blob Storage, Apache Spark, SQL.

Exam trap: Candidates often confuse Data Engineers with DBAs. The key distinction: Data Engineers build pipelines and storage systems for analytics, while DBAs focus on operational databases and transactional systems.

Data Analyst

Responsibilities: A Data Analyst interprets data to help organizations make informed decisions. They create reports, dashboards, and visualizations that summarize business performance.

Key tasks on the exam: - Querying data using SQL or other tools (e.g., Azure SQL Database, Azure Synapse SQL) - Creating visualizations and dashboards (e.g., Power BI, Microsoft Excel) - Identifying trends and patterns in data - Communicating insights to stakeholders - Cleaning and transforming data for analysis (but not building the pipeline)

Tools commonly associated: Power BI, Microsoft Excel, Azure Synapse SQL, Azure Analysis Services, SQL Server Reporting Services (SSRS).

Exam trap: A common wrong answer is assigning Data Analysts the task of building data pipelines. The exam tests that Data Analysts consume data, not build the infrastructure.

Data Scientist

Responsibilities: A Data Scientist uses advanced analytics, machine learning, and statistical techniques to extract deeper insights and make predictions. They often build and train models that can automate decision-making.

Key tasks on the exam: - Performing exploratory data analysis (EDA) - Feature engineering and selection - Training and evaluating machine learning models (e.g., Azure Machine Learning) - Deploying models to production (e.g., as web services) - Experimenting with algorithms and hyperparameters

Tools commonly associated: Azure Machine Learning, Jupyter Notebooks, Python, R, Azure Databricks (for ML), MLflow.

Exam trap: Candidates may think Data Scientists are responsible for building data pipelines. In reality, they rely on Data Engineers for clean, structured data. The exam emphasizes that Data Scientists work on model development, not data ingestion.

Database Administrator (DBA)

Responsibilities: A DBA manages the operational aspects of databases—ensuring high availability, performance, security, and recoverability. They focus on transactional systems rather than analytical ones.

Key tasks on the exam: - Installing and configuring database software (e.g., SQL Server, Azure SQL Database) - Managing backups and disaster recovery (e.g., Azure SQL Backup, geo-replication) - Monitoring performance and tuning queries - Managing user access and security (e.g., Azure Active Directory integration, firewalls) - Applying patches and updates

Tools commonly associated: SQL Server Management Studio (SSMS), Azure SQL Database, Azure SQL Managed Instance, Azure Database for PostgreSQL/MySQL, Azure Backup.

Exam trap: Many candidates incorrectly assign DBA tasks to Data Engineers. Remember: DBAs handle operational databases (OLTP), while Data Engineers handle analytical data stores (OLAP).

How They Interact

In a typical Azure data solution:

Data Engineers ingest raw data from sources (e.g., IoT devices, on-premises databases) into Azure Data Lake Storage using Azure Data Factory.

They then transform the data using Azure Databricks or Synapse Pipelines into a structured format.

Data Analysts connect Power BI to the curated data in Azure Synapse SQL to build dashboards.

Data Scientists access the same curated data (or a separate feature store) to train ML models in Azure Machine Learning.

DBAs manage the underlying Azure SQL Database that powers the operational application generating the data.

Exam Focus on Role Distinction

The DP-900 exam does not require deep knowledge of any single role. Instead, it tests your ability to match tasks to the correct role. For example: - "Who builds a pipeline to ingest data?" → Data Engineer - "Who creates a Power BI dashboard?" → Data Analyst - "Who trains a machine learning model?" → Data Scientist - "Who configures database backups?" → DBA

Common Scenarios on the Exam

Scenario 1: A company wants to analyze customer purchase history. The data is stored in an on-premises SQL Server. The first step is to copy the data to Azure Blob Storage. Who should do this? → Data Engineer (because it involves building a pipeline).

Scenario 2: After the data is in Azure, the company wants to visualize monthly sales trends. Who creates the report? → Data Analyst.

Scenario 3: The company wants to predict which customers are likely to churn. Who builds the predictive model? → Data Scientist.

Scenario 4: The SQL Server database needs nightly backups and performance tuning. Who handles this? → DBA.

Overlapping Tasks

Some tasks may appear to overlap. For example, both Data Engineers and Data Scientists might clean data. However, the exam distinguishes: Data Engineers perform cleaning as part of ETL pipelines (structural cleaning), while Data Scientists clean data for model training (e.g., handling missing values in a dataset). Similarly, both Data Analysts and Data Scientists query data, but Data Analysts use SQL for reporting, while Data Scientists may use Python for EDA.

Key Terms to Memorize for the Exam

Data Engineer: pipeline, ingestion, ETL, Azure Data Factory, Azure Databricks, Azure Data Lake, orchestration, transformation.

Data Analyst: visualization, dashboard, Power BI, report, SQL query, business intelligence.

Data Scientist: machine learning, model, training, evaluation, Azure Machine Learning, experiment, feature engineering.

DBA: backup, restore, high availability, disaster recovery, performance tuning, security, Azure SQL Database, SSMS.

Walk-Through

1

Identify the Task Scenario

Read the exam question carefully. Look for keywords that indicate the nature of the task: 'ingest', 'transform', 'pipeline' suggest Data Engineer; 'visualize', 'report', 'dashboard' suggest Data Analyst; 'predict', 'model', 'train' suggest Data Scientist; 'backup', 'restore', 'performance' suggest DBA. Also note the tools mentioned: Azure Data Factory (Engineer), Power BI (Analyst), Azure Machine Learning (Scientist), SQL Server Management Studio (DBA).

2

Match the Task to the Role

Once you have identified the task type, match it to the appropriate role. For example, if the task involves 'building a data pipeline to move data from on-premises to Azure', the correct role is Data Engineer. If the task is 'creating a Power BI dashboard', it's Data Analyst. The exam often includes distractor roles that are plausible but incorrect, such as assigning a DBA to build a pipeline.

3

Eliminate Obvious Mismatches

Eliminate any role that clearly does not perform the task. For instance, a Data Scientist does not manage database backups, and a DBA does not train machine learning models. Even if you are unsure between two roles, eliminating the clearly wrong ones increases your odds. The exam typically has one or two obviously wrong answers.

4

Consider the Tool Used

If the question mentions a specific tool, use it to confirm the role. For example, Azure Data Factory is a Data Engineer tool, Power BI is for Analysts, Azure Machine Learning is for Scientists, and SSMS is for DBAs. This is especially helpful when the task description is ambiguous. For instance, 'monitoring database performance' could be DBA, but if the tool is Azure Monitor, it might still be DBA, but if it's Power BI performance metrics, it could be Analyst. Focus on the primary tool.

5

Watch for Hybrid Roles

Be aware that in some organizations, roles may overlap. However, the DP-900 exam tests the traditional, distinct definitions. For example, a Data Engineer might also perform some analysis, but the exam expects you to assign the task to the role most closely associated. If the question says 'designs and implements data storage solutions', the answer is Data Engineer, not Data Analyst, even if the Analyst uses the storage.

What This Looks Like on the Job

Enterprise Scenario 1: E-commerce Company with Real-Time Analytics

A large e-commerce company uses Azure to process clickstream data from millions of users. The Data Engineer sets up Azure Event Hubs to ingest streaming data, then uses Azure Stream Analytics to aggregate events in real-time, and stores results in Azure SQL Database. The Data Analyst connects Power BI to the SQL Database to create a live dashboard showing current sales and user activity. The Data Scientist builds a recommendation model using Azure Machine Learning, training it on historical data from Azure Data Lake Storage. The DBA manages the Azure SQL Database, setting up geo-replication for disaster recovery and monitoring performance with Azure SQL Analytics. Misconfiguration example: If the Data Engineer does not partition the Event Hubs correctly, the pipeline may throttle, causing data loss. The DBA must ensure the SQL Database has enough DTUs/vCores to handle the write load from Stream Analytics; otherwise, the dashboard may show stale data.

Enterprise Scenario 2: Healthcare Provider with Regulatory Compliance

A healthcare provider stores patient records in Azure SQL Database. The DBA ensures the database is HIPAA compliant by enabling Transparent Data Encryption (TDE), auditing access with Azure SQL Auditing, and setting up point-in-time restore. The Data Engineer builds a pipeline using Azure Data Factory to copy de-identified data to Azure Data Lake Storage for analytics. The Data Analyst creates Power BI reports that aggregate patient outcomes by demographics. The Data Scientist uses Azure Machine Learning to predict readmission rates, but must only access de-identified data. Common pitfall: The Data Engineer must ensure the pipeline does not accidentally copy sensitive data to the data lake. The DBA configures firewall rules to block direct access to the production database from non-admin IPs.

Enterprise Scenario 3: Financial Services with Batch Processing

A bank processes daily transaction files from branch offices. The Data Engineer uses Azure Data Factory to trigger a pipeline at midnight, which copies CSV files from on-premises file servers to Azure Blob Storage, then transforms them using Azure Databricks, and loads the clean data into Azure Synapse SQL. The Data Analyst queries Synapse SQL from Power BI to generate daily risk reports. The Data Scientist trains a fraud detection model on historical transaction data using Azure Machine Learning. The DBA manages the on-premises SQL Server that runs the core banking application, ensuring high availability with Always On Availability Groups. Failure scenario: If the Data Engineer does not handle schema changes (e.g., a new column added to the CSV), the pipeline may fail. The DBA must coordinate with the Data Engineer to update the pipeline when the source database schema changes.

How DP-900 Actually Tests This

DP-900 Objective Coverage

This chapter directly maps to DP-900 objective: Describe core data concepts > Identify data roles and responsibilities. Specifically, the exam tests:

Differentiate between Data Engineer, Data Analyst, Data Scientist, and DBA tasks.

Recognize the appropriate tools for each role (Azure Data Factory, Power BI, Azure Machine Learning, etc.).

Understand the data lifecycle and which role is responsible at each stage.

Common Wrong Answers and Why Candidates Choose Them

1.

Assigning Data Engineer tasks to DBA: Many candidates think DBAs build pipelines because they manage databases. However, DBAs focus on operational databases, not analytical pipelines. Wrong answer: "A DBA builds an Azure Data Factory pipeline." Correct: Data Engineer.

2.

Assigning Data Scientist tasks to Data Analyst: Candidates see "analyze data" and think Data Analyst. But if the task involves "predicting" or "training a model," it's Data Scientist. Wrong answer: "A Data Analyst trains a machine learning model." Correct: Data Scientist.

3.

Assigning DBA tasks to Data Engineer: Data Engineers do manage storage, but not transactional database backups. Wrong answer: "A Data Engineer configures point-in-time restore for Azure SQL Database." Correct: DBA.

4.

Confusing Data Analyst and Data Engineer: Both may write SQL, but the Data Engineer writes SQL in pipelines (e.g., Azure Data Factory), while the Data Analyst writes SQL for reporting. Wrong answer: "A Data Engineer creates a Power BI dashboard." Correct: Data Analyst.

Specific Numbers, Values, and Terms on the Exam

Data Engineer: Azure Data Factory, Azure Databricks, Azure Data Lake Storage, ETL, pipeline, orchestration, ingestion, transformation.

Data Analyst: Power BI, dashboard, report, visualization, business intelligence, SQL query.

Data Scientist: Azure Machine Learning, model, training, evaluation, experiment, feature engineering, prediction.

DBA: Backup, restore, high availability, disaster recovery, performance tuning, security, Azure SQL Database, SSMS.

Edge Cases and Exceptions

Role overlap in small teams: The exam assumes traditional role definitions, even though in reality a single person might perform multiple roles. Always choose the most specific role for the task.

Data Science vs. Data Analysis: If a task involves "creating a report," it's Data Analyst even if the report uses advanced statistics. The term "report" is a key indicator.

Data Engineering vs. DBA: If the task mentions "data warehouse" or "analytical store," it's Data Engineer. If it mentions "transactional database" or "OLTP," it's DBA.

How to Eliminate Wrong Answers

1.

Read the action verb: "Ingest," "transform," "orchestrate" → Data Engineer. "Visualize," "report" → Data Analyst. "Train," "predict" → Data Scientist. "Backup," "restore," "tune" → DBA.

2.

Check the tool: If a tool is named, match it to the role. If no tool, use the action verb.

3.

Eliminate roles that don't fit the data type: Transactional data → DBA. Analytical data → Data Engineer/Analyst/Scientist.

4.

Remember the data lifecycle: Ingestion & storage → Data Engineer. Analysis & reporting → Data Analyst. Advanced analytics & ML → Data Scientist. Operations & maintenance → DBA.

Key Takeaways

Data Engineers build and maintain data pipelines and analytical storage (e.g., Azure Data Factory, Azure Data Lake).

Data Analysts create visualizations and reports using tools like Power BI to communicate insights.

Data Scientists develop machine learning models using Azure Machine Learning for predictions.

DBAs manage operational databases (e.g., Azure SQL Database) focusing on backups, security, and performance.

On the DP-900 exam, match tasks to roles using action verbs and tools: 'pipeline' → Data Engineer, 'dashboard' → Data Analyst, 'model' → Data Scientist, 'backup' → DBA.

Data Engineers and DBAs are distinct: Data Engineers handle analytical data, DBAs handle transactional data.

Data Analysts and Data Scientists are distinct: Analysts describe past data, Scientists predict future outcomes.

The exam does not test deep technical skills but expects you to know which role is responsible for common tasks.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Data Engineer

Builds data pipelines for analytics

Works with Azure Data Factory, Databricks

Manages analytical stores (Data Lake, Synapse)

Focuses on data ingestion and transformation

Uses tools like Azure Data Lake Storage

Database Administrator (DBA)

Manages operational databases (OLTP)

Works with SSMS, Azure SQL Database

Manages transactional databases (SQL Server)

Focuses on backups, security, performance

Uses tools like Azure SQL Backup, geo-replication

Data Analyst

Creates reports and dashboards (Power BI)

Uses SQL for querying

Performs descriptive analytics

Communicates insights to stakeholders

Does not build predictive models

Data Scientist

Builds machine learning models

Uses Python/R and Azure Machine Learning

Performs predictive and prescriptive analytics

Experiments with algorithms and features

Deploys models as web services

Watch Out for These

Mistake

Data Engineers and DBAs are the same role.

Correct

Data Engineers focus on building data pipelines and analytical storage (e.g., Azure Data Lake, Synapse), while DBAs manage operational databases (e.g., Azure SQL Database) with tasks like backups, performance tuning, and security. They use different tools and have different objectives.

Mistake

Data Analysts build machine learning models.

Correct

Data Analysts create reports and dashboards (e.g., Power BI) and perform descriptive analytics. Building ML models is the responsibility of Data Scientists, who use tools like Azure Machine Learning.

Mistake

Data Scientists are responsible for data ingestion and pipeline construction.

Correct

Data Scientists consume curated data to train models. Data Engineers are responsible for building and maintaining the pipelines that ingest and transform data.

Mistake

DBAs are not needed in cloud environments like Azure.

Correct

DBAs are still essential for managing Azure SQL Database, Azure SQL Managed Instance, and other PaaS databases. They handle tasks like configuring geo-replication, managing firewalls, and monitoring performance.

Mistake

All data roles work independently and don't need to collaborate.

Correct

In practice, roles must collaborate closely. For example, Data Engineers provide clean data to Data Analysts and Scientists; DBAs ensure operational databases are available for Data Engineers to extract data.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the difference between a Data Engineer and a DBA on the DP-900 exam?

The DP-900 exam distinguishes them by the type of data they manage. Data Engineers work with analytical data stores (e.g., Azure Data Lake Storage, Azure Synapse) and build pipelines using tools like Azure Data Factory. DBAs manage operational databases (e.g., Azure SQL Database) and focus on transactional data, backups, and performance tuning. The exam may ask: 'Who is responsible for configuring point-in-time restore for an Azure SQL Database?' The answer is DBA, not Data Engineer.

Which role creates Power BI dashboards?

Data Analysts are responsible for creating Power BI dashboards and reports. They use Power BI to visualize data and communicate insights. Data Engineers might provide the data, but they do not build the dashboards. Data Scientists might use Power BI for visualization, but their primary role is model building. On the exam, if you see 'dashboard' or 'report,' select Data Analyst.

Do Data Scientists build data pipelines?

No, Data Scientists typically do not build data pipelines. They rely on Data Engineers to provide clean, structured data. Data Scientists focus on exploratory data analysis, feature engineering, and training machine learning models. Building pipelines is a Data Engineer task. The exam may trick you by saying 'a Data Scientist ingests data from Azure Blob Storage' – that is incorrect; it should be a Data Engineer.

What tools are associated with each role on the DP-900 exam?

Data Engineer: Azure Data Factory, Azure Databricks, Azure Data Lake Storage, Azure Stream Analytics. Data Analyst: Power BI, Excel, Azure Synapse SQL. Data Scientist: Azure Machine Learning, Jupyter Notebooks. DBA: SQL Server Management Studio (SSMS), Azure SQL Database, Azure SQL Managed Instance. The exam often names a tool and asks which role uses it.

Can one person perform multiple data roles?

In small organizations, yes, but the DP-900 exam tests the traditional distinct roles. You should answer based on the primary responsibility. For example, if a task involves 'training a machine learning model,' always choose Data Scientist, even if a Data Analyst might do it in practice. The exam expects you to know the standard definitions.

What is the data lifecycle and how do roles fit?

The data lifecycle includes: ingestion, storage, processing, analysis, and archival. Data Engineers handle ingestion, storage, and processing. Data Analysts handle analysis and reporting. Data Scientists handle advanced analytics and modeling. DBAs handle operational storage and maintenance. The exam may ask which role is responsible for a specific stage.

How do I remember the roles for the exam?

Use the acronym 'DEAD' (Data Engineer, Analyst, Scientist, DBA) and associate each with a key word: Engineer = Pipeline, Analyst = Dashboard, Scientist = Model, DBA = Backup. When in doubt, look for these keywords in the question. Also, remember that tools are strong indicators: Azure Data Factory = Engineer, Power BI = Analyst, Azure Machine Learning = Scientist, SSMS = DBA.

Terms Worth Knowing

Ready to put this to the test?

You've just covered Data Roles: Engineer, Analyst, Scientist, DBA — now see how well it sticks with free DP-900 practice questions. Full explanations included, no account needed.

Done with this chapter?