PDE · topic practice

Preparing and Using Data for Analysis practice questions

Practise Google Professional Data Engineer Preparing and Using Data for Analysis practice questions — original exam-style scenarios with answer choices, explanations, and analysis of common mistakes.

Courseiva uses original exam-style practice questions designed for learning and revision. The goal is to understand the concepts, recognise exam patterns, and improve through explanations — not memorise copied exam dumps.

Reviewed byJohnson Ajibi· MSc IT Security
20 questionsDomain: Preparing and Using Data for Analysis

What the exam tests

What to know about Preparing and Using Data for Analysis

Preparing and Using Data for Analysis questions test whether you can apply the concept in context, not just recognise a definition.

How the topic appears in realistic exam-style scenarios.

Which detail in the question changes the correct answer.

How to eliminate plausible but wrong options.

How to connect the question back to the wider exam objective.

Watch out for

Common Preparing and Using Data for Analysis exam traps

  • Answering from memory before reading the full scenario.
  • Missing a constraint such as cost, availability, security, scope or command context.
  • Choosing a broad answer when the question asks for the most specific fix.
  • Ignoring why the wrong options are tempting.

Practice set

Preparing and Using Data for Analysis questions

20 questions · select your answer, then reveal the explanation

A data engineer wants to train a linear regression model in BigQuery ML to predict sales. The training data includes a categorical feature with 1000+ unique values. Which method is most appropriate to handle this feature in the CREATE MODEL statement?

You need to create a Looker model that defines a 'sales' view based on a BigQuery table, with a measure for total revenue. Which LookML object defines the table and dimensions?

A company uses Looker Studio to build dashboards from BigQuery data. They notice that queries take several seconds to return. They want to improve performance without changing the schema or adding materialized views. Which option should they use?

A data scientist is training a binary classification model on an imbalanced dataset (95% negative, 5% positive) using AutoML Tables. Which strategy should they use to handle the class imbalance?

You need to split a time-series dataset into training and evaluation sets for a forecasting model. The data is ordered by timestamp. Which splitting technique should you use?

Which BigQuery SQL function returns the rank of a row within a window, with gaps in the ranking for ties?

A company uses Dataplex to manage data quality across multiple BigQuery datasets. They want to define a data quality rule that checks if a column 'email' contains a valid email format. Which Dataplex feature should they use?

A data engineer needs to query data across BigQuery (in Google Cloud) and Snowflake (in AWS) without moving the data. Which service should they use?

You want to train a custom TensorFlow model on Vertex AI using a managed Jupyter notebook environment. Which service should you use?

A company uses Looker to define business logic in LookML. They need to create a new measure that calculates the average order value, defined as total revenue divided by number of orders. Which LookML syntax should they use?

A data scientist wants to import a pre-trained TensorFlow model into BigQuery ML for batch predictions. The model is stored in a Cloud Storage bucket. Which statement is correct?

You need to track the lineage of data in BigQuery, showing how tables are derived from other tables via queries. Which service provides this capability?

A data engineer needs to build a feature engineering pipeline using Vertex AI Pipelines. The pipeline should preprocess data, train a model, and deploy it. Which two components are required to define the pipeline? (Choose 2)

A company uses AutoML Tables to train a classification model. They want to improve model performance by engineering new features from existing timestamp columns. Which three techniques can they apply within AutoML Tables? (Choose 3)

A data team wants to use Approximate Aggregation Functions in BigQuery to get faster query results. Which two functions can they use? (Choose 2)

A data engineer needs to create a BigQuery ML model for predicting customer churn using a dataset with 10 million rows and 50 features. The dataset is highly imbalanced (5% churn). Which approach should the engineer use to handle class imbalance during model training?

A financial analytics team uses Looker to explore BigQuery data. They need to allow business users to filter by a custom date range that is not tied to an existing dimension. The date range must be user-input at query time. What is the best approach in Looker?

A data scientist wants to train a custom TensorFlow model on Vertex AI using a managed Jupyter notebook. Which Vertex AI service should they use to set up a notebook environment with pre-installed deep learning frameworks?

A retail company uses BigQuery to store sales data and wants to forecast weekly demand for the next 8 weeks using historical data from the past 2 years. They need to account for seasonality and holidays. Which BigQuery ML model type and configuration is most appropriate?

A data engineer needs to query data from BigQuery and another cloud provider's storage (AWS S3) using a single SQL query. The data must not be moved or copied to GCP. Which Google Cloud service should they use?

Free account

Track your progress over time

Create a free account to save your results and see which topics improve across sessions.

Focused Preparing and Using Data for Analysis sessions

Start a Preparing and Using Data for Analysis only practice session

Every question in these sessions is drawn from the Preparing and Using Data for Analysis domain — nothing else.

Related practice questions

Related PDE topic practice pages

Move into related areas when this topic feels solid.

Frequently asked questions

What does the PDE exam test about Preparing and Using Data for Analysis?
Preparing and Using Data for Analysis questions test whether you can apply the concept in context, not just recognise a definition.
How should I use these practice questions?
Select your answer before revealing the explanation. Then read why each option is right or wrong — this active recall approach builds retention far faster than re-reading notes.
Can I practise just Preparing and Using Data for Analysis questions in a focused session?
Yes — the session launcher on this page draws every question from the Preparing and Using Data for Analysis domain. Use a 10-question session first to gauge your baseline, then move to 20 or 30 once the weak spots are clear.
Where can I practise other PDE topics?
Use the topic links above to move to related areas, or go back to the PDE question bank to see all topics.
Are these real exam questions or dumps?
These are original practice questions written to test the same concepts the PDE exam covers. They are not copied from any real exam or dump site.
Google Professional Data Engineer Preparing and Using Data for Analysis Practice Questions with Explanations | Courseiva