Knowledge + Practice

CCNA Pde Analysis Ml Questions

75 of 90 questions · Page 1/2 · Pde Analysis Ml topic · Answers revealed

Practice these questions Exam hub All questions

1

MCQmedium

You are building a binary classification model using AutoML Tables on Vertex AI. The dataset has a severe class imbalance (1% positive class). Which strategy should you use to handle the imbalance?

A.Oversample the minority class using SMOTE before training.

B.Do nothing; AutoML Tables automatically handles class imbalance.

C.Downsample the majority class to match the minority class size.

D.Use the class_weight parameter in AutoML Tables training.

AnswerD

AutoML Tables allows setting class weights to address imbalance; it will adjust the loss function accordingly.

Why this answer

AutoML Tables automatically applies class weighting and sampling to handle imbalance; the best practice is to let AutoML handle it. Manually resampling may not be optimal.

Practice this question →

2

MCQmedium

You are using Vertex AI Feature Store to serve features for online predictions. Your model requires features from multiple sources with low latency (<10ms). Which type of serving should you use?

A.Online serving with Cloud SQL

B.Offline serving with BigQuery

C.Online serving with Bigtable

D.Offline serving with Cloud Storage

AnswerC

Online serving uses Bigtable for low-latency feature retrieval.

Why this answer

Online serving (with Bigtable as backing store) provides low-latency feature retrieval. Offline serving is for batch predictions. Feature Store supports both; online is for real-time.

Practice this question →

3

MCQmedium

A company needs to predict whether a product image contains a specific defect. They have 10,000 labeled images and want to build a model quickly without writing custom code or training from scratch. Which GCP service should they use?

A.AutoML Tables

B.AutoML Vision

C.Vertex AI custom training

D.AutoML Natural Language

AnswerB

AutoML Vision is for image classification with minimal coding.

Why this answer

AutoML Vision is designed for custom image classification tasks with minimal ML expertise. It uses transfer learning and supports up to millions of images. AutoML Tables handles tabular data, not images.

Vertex AI custom training would require more effort. AutoML NLP is for text data.

Practice this question →

4

MCQmedium

A company uses Looker Studio to build dashboards from BigQuery data. They notice that queries take several seconds to return. They want to improve performance without changing the schema or adding materialized views. Which option should they use?

A.Enable BigQuery BI Engine on the relevant project.

B.Move the data to Cloud SQL.

C.Switch to BigQuery Omni for cross-cloud queries.

D.Use APPROX_COUNT_DISTINCT to speed up distinct counts.

AnswerA

BI Engine provides in-memory analysis for Looker Studio, reducing query latency.

Why this answer

BI Engine accelerates sub-second query response times in Looker Studio by caching data in memory within the BigQuery region.

Practice this question →

5

Multi-Selectmedium

An e-commerce company uses BigQuery to analyze customer behavior. They need to compute the number of distinct customers per day, approximate quantiles of purchase amounts, and assign a row number per customer partition by date. Which BigQuery SQL functions should they use? (Choose THREE)

Select 3 answers

A.APPROX_COUNT_DISTINCT(customer_id)

B.NTILE(4) OVER (ORDER BY purchase_amount)

C.APPROX_QUANTILES(purchase_amount, 100)

D.ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY date)

E.COUNT(DISTINCT customer_id)

AnswersA, C, D

Approximate distinct count scales better.

Why this answer

APPROX_COUNT_DISTINCT for approximate distinct counts, APPROX_QUANTILES for approximate quantiles, ROW_NUMBER for row numbering within partitions.

Practice this question →

6

MCQmedium

An organization wants to integrate BigQuery Omni to query data stored in AWS S3. They have set up the necessary connections. What is the primary benefit of using BigQuery Omni over simply copying the data to BigQuery?

A.Ability to use BigQuery ML models on data in S3 without moving data.

B.Automatic encryption of data at rest in S3.

C.Lower latency queries due to in-memory caching.

D.Support for real-time streaming inserts into S3.

AnswerA

BigQuery Omni supports BigQuery ML, allowing you to train and run models on cross-cloud data.

Why this answer

BigQuery Omni allows you to query data across clouds without moving it, providing a unified analytics experience. It reduces data egress costs and avoids duplication.

Practice this question →

7

MCQmedium

A data engineer is building a Looker dashboard that requires a calculated field to compute the running total of sales per day per store. Which Looker Studio function should they use?

A.RANK()

B.TOTAL()

C.RUNNING_SUM()

D.PERCENTILE()

AnswerC

RUNNING_SUM computes cumulative sums.

Why this answer

Looker Studio's RUNNING_SUM function computes a running total within a group, exactly what is needed for a running total per store partitioned by date.

Practice this question →

8

MCQeasy

Which BigQuery SQL function returns the rank of a row within a window, with gaps in the ranking for ties?

A.RANK()

B.NTILE()

C.DENSE_RANK()

D.ROW_NUMBER()

AnswerA

RANK() handles ties with gaps.

Why this answer

RANK() assigns the same rank to ties and leaves gaps (e.g., 1,1,3). ROW_NUMBER() assigns unique consecutive numbers. DENSE_RANK() does not leave gaps.

Practice this question →

9

MCQeasy

What is the primary purpose of Vertex AI Feature Store?

A.To manage and track ML experiments

B.To train machine learning models using AutoML

C.To transform raw data into features using SQL

D.To store and serve features for machine learning models at scale

AnswerD

Feature Store is designed for feature management and serving.

Why this answer

Vertex AI Feature Store is a managed service for storing, serving, and sharing ML features. It supports both online (low-latency serving for prediction) and offline (batch serving for training) serving. It is not for model training, data transformation, or experiment tracking.

Practice this question →

10

MCQhard

A company uses Dataplex to manage data quality across multiple BigQuery datasets. They want to define a data quality rule that checks if a column 'email' contains a valid email format. Which Dataplex feature should they use?

A.Use Cloud DLP to classify and validate emails.

B.Use the built-in 'email' rule type in Dataplex.

C.Create a custom Data Quality rule using the 'regex' type.

D.Create a Dataflow pipeline to validate emails and write results to a separate table.

AnswerC

Dataplex Data Quality supports regex rules to validate formats like email.

Why this answer

Dataplex Data Quality allows predefined rule types including a 'regex' rule for pattern matching. There is no built-in 'email' rule, so a regex check is appropriate.

Practice this question →

11

MCQmedium

You are building a forecasting model to predict daily sales for the next 90 days using historical sales data with clear seasonality and trend. You want to use BigQuery ML with minimal manual tuning. Which model type should you choose?

A.ARIMA

B.Boosted tree (XGBoost)

C.ARIMA_PLUS

D.Linear regression

AnswerC

ARIMA_PLUS automatically detects seasonality, trend, and holiday effects; ideal for time-series forecasting without manual tuning.

Why this answer

ARIMA_PLUS is specifically designed for time-series forecasting and automatically handles seasonality, trend, and holiday effects without manual tuning. ARIMA is less automated; linear regression would require manual feature engineering for time components.

Practice this question →

12

MCQmedium

Your team uses Looker Studio to build dashboards on top of BigQuery. The dashboards are slow when filtering on a high-cardinality dimension (e.g., user ID). You want to improve performance without changing the underlying BigQuery table design. Which action should you take?

A.Create a clustered table on the user ID column.

B.Apply a filter to limit data to the last month only.

C.Enable BigQuery BI Engine on the project.

D.Use Looker Studio's extract data functionality.

AnswerC

BI Engine accelerates queries by caching data in memory, improving Looker Studio dashboard performance.

Why this answer

BI Engine automatically caches the BigQuery tables in memory for Looker Studio, accelerating queries on any dimension. Enabling BI Engine on the project will improve dashboard performance.

Practice this question →

13

Multi-Selectmedium

A company wants to track data lineage for their BigQuery tables to understand how data flows from source to derived tables. Which TWO Google Cloud services can be used to capture and visualize data lineage? (Choose 2.)

Select 2 answers

A.Dataplex Data Lineage

B.Vertex AI Feature Store

C.Cloud Composer

D.Cloud Data Fusion

E.BigQuery Lineage API

AnswersA, E

Dataplex provides automated lineage tracking for BigQuery.

Why this answer

Dataplex provides a comprehensive Data Lineage feature that automatically captures lineage for BigQuery jobs. Additionally, the BigQuery Lineage API (part of Data Catalog) allows you to retrieve lineage programmatically.

Practice this question →

14

MCQeasy

A data engineer needs to design a data pipeline that ingests streaming data from Cloud Pub/Sub, performs real-time aggregations, and loads the results into BigQuery for dashboarding. Which Google Cloud service should they use for the streaming aggregation step?

A.Cloud Functions

B.Dataflow

C.Cloud Dataproc

D.Cloud Composer

AnswerB

Dataflow is designed for streaming and batch pipelines, with native Pub/Sub and BigQuery IOs.

Why this answer

Dataflow is a fully managed service for stream and batch processing that integrates with Pub/Sub and BigQuery. It supports exactly-once processing and low-latency streaming.

Practice this question →

15

MCQhard

A company uses BigQuery and wants to reduce query costs by using BI Engine for Looker Studio dashboards. The data is stored in a BigQuery dataset with 5 TB of frequently accessed tables. The dashboards run dozens of concurrent queries. What is the recommended approach to enable BI Engine acceleration?

A.Enable BI Engine by setting the dataset option 'enable_bi_engine=TRUE' in the dataset metadata.

B.Grant all Looker Studio users the 'biengine.user' IAM role on the project.

C.Create a reservation in the Administration panel and assign it to the project.

D.Create materialized views of the tables and connect Looker Studio to the views.

AnswerC

BI Engine requires creating a capacity reservation (memory) in the region where your data resides. This reservation automatically accelerates queries from Looker Studio.

Why this answer

BI Engine is a reserved capacity service that caches data in memory. You must reserve capacity (amount of memory) for a specific BigQuery region, and then BI Engine automatically accelerates queries from Looker Studio and other BI tools. It does not require you to grant specific IAM roles to users (they just need BigQuery permissions) or to create materialized views.

You do not need to enable it per dataset; it works at the project/region level.

Practice this question →

16

MCQeasy

You want to quickly estimate the number of distinct visitors to your website from a large BigQuery table. Which function provides an approximate count with low latency?

A.APPROX_COUNT_DISTINCT

B.HyperLogLog++

C.COUNT(DISTINCT)

D.APPROX_QUANTILES

AnswerA

Approximate count with low latency.

Why this answer

APPROX_COUNT_DISTINCT provides an approximate distinct count with low latency. COUNT(DISTINCT) is exact but slower for large data. APPROX_QUANTILES estimates quantiles.

HyperLogLog is not a BQ function.

Practice this question →

17

MCQeasy

You need to track data lineage from a BigQuery table through a series of transformations and into a Vertex AI model training pipeline. Which Google Cloud service provides automated data lineage tracking?

A.Dataplex

B.Dataflow

C.Data Catalog

D.Cloud Composer

AnswerA

Dataplex provides automated data lineage tracking for BigQuery and Vertex AI pipelines.

Why this answer

Dataplex includes data lineage tracking that captures metadata about data movement and transformations across BigQuery, Vertex AI, and other services. Data Catalog provides metadata but not automated lineage.

Practice this question →

18

MCQeasy

Which BigQuery function can be used to retrieve the value of a column from the previous row within a partition, ordered by a timestamp?

A.LAG()

B.FIRST_VALUE()

C.ROW_NUMBER()

D.LEAD()

AnswerA

Correct: LAG() accesses data from a previous row.

Why this answer

LAG() is a window function that returns the value of a column from a row that is a specified number of rows before the current row within the partition. LEAD() retrieves from a following row.

Practice this question →

19

MCQeasy

A company is using Looker to explore their BigQuery data. They have defined a LookML model with an 'explore' that joins two views: 'orders' and 'customers'. The join is a left join. They want to ensure that only customers with orders are shown when exploring. Which LookML parameter should they modify?

A.Set the 'required_joins' parameter on the 'orders' view

B.Use a derived table with a WHERE clause

C.Add a filter in the 'customers' view to exclude nulls

D.Change the join type from 'left_outer' to 'inner'

AnswerD

Inner join will only include customers that have at least one order.

Why this answer

The 'join_type' parameter in LookML defines the join type. Changing it to 'inner' will exclude customers without orders.

Practice this question →

20

MCQmedium

You need to build a Looker model that joins multiple tables from BigQuery. Which LookML object defines the relationship between tables?

A.JOIN

B.VIEW

C.MODEL

D.EXPLORE

AnswerD

An explore defines the join relationships between views.

Why this answer

In LookML, an EXPLORE defines the starting point and joins to other views. A VIEW defines a single table or derived table. A MODEL contains explores and views.

A JOIN is not a top-level object; joins are defined inside explores.

Practice this question →

21

MCQmedium

Your Looker dashboard uses a BigQuery connection. You notice that some queries take over a minute. Which service can you enable to cache results in memory for sub-second Looker queries?

A.BigQuery BI Engine

B.Cloud SQL

C.Cloud Bigtable

D.Cloud Memorystore

AnswerA

BI Engine caches BigQuery data in memory for sub-second queries from BI tools like Looker.

Why this answer

BI Engine is an in-memory analysis service that accelerates BigQuery queries by caching data in memory. It integrates with Looker and Looker Studio. Cloud Memorystore is a Redis/Memcached cache, not directly for BigQuery.

BigQuery BI Engine is the correct service.

Practice this question →

22

MCQmedium

A company uses Vertex AI Workbench notebooks for data exploration and model development. They want to ensure that the notebook environment can access BigQuery data using the same permissions as the user's Google Cloud account. What is the recommended setup?

A.Use a Cloud Functions proxy to authenticate to BigQuery from the notebook.

B.Create a service account with BigQuery access and attach it to the notebook instance.

C.Log in to the notebook using the user's Google Cloud credentials via oauth2client.

D.Grant the Compute Engine default service account BigQuery access.

AnswerB

Best practice: use a service account for consistent, granular permissions that are not tied to a specific user.

Why this answer

Vertex AI Workbench notebooks can use user-managed notebooks with user credentials. By setting the 'User-managed notebook' type and using the 'Add service account' option, you can grant the notebook instance a service account with appropriate BigQuery permissions. Alternatively, you can use the built-in 'Use the same identity as the user' option which uses the user's credentials via OAuth.

However, the recommended approach for production is to use a service account for consistent permissions.

Practice this question →

23

MCQmedium

A data engineer needs to query data from BigQuery and another cloud provider's storage (AWS S3) using a single SQL query. The data must not be moved or copied to GCP. Which Google Cloud service should they use?

A.Cloud Storage Transfer Service

B.Dataplex

C.BigQuery Data Transfer Service

D.BigQuery Omni

AnswerD

BigQuery Omni enables cross-cloud queries across AWS and Azure without data movement.

Why this answer

BigQuery Omni allows querying data across multiple clouds (AWS S3, Azure Blob Storage) using BigQuery's interface without moving data. BigQuery Omni runs compute in the other cloud's region. BigQuery Transfer Service moves data into BigQuery.

Dataplex is for data management, not cross-cloud queries. Cloud Storage Transfer Service is for moving data between clouds.

Practice this question →

24

MCQhard

You are building a real-time fraud detection system using BigQuery streaming and a BQML logistic regression model. The model must be retrained every hour with new labeled data. What is the MOST cost-effective approach to serve predictions with low latency?

A.Call ML.PREDICT on a BigQuery table that is updated every hour

B.Use a BigQuery materialized view that refreshes every minute and apply ML.PREDICT

C.Stream data into Pub/Sub and use a Dataflow pipeline with Apache Beam's model inference

D.Export the model to a Cloud Storage bucket and deploy it to AI Platform Prediction

AnswerD

Exporting to AI Platform Prediction provides low-latency serving with autoscaling, cost-effective for hourly retraining.

Why this answer

Exporting to Pub/Sub and serving with custom prediction logic avoids repeated queries to BigQuery. ML.PREDICT in streaming pipeline is expensive. Deploying to AI Platform Prediction incurs VM costs.

Using materialized views with ML.PREDICT is not supported or inefficient.

Practice this question →

25

MCQhard

You are using BigQuery ML to train a matrix factorization model for a recommendation system. The training data consists of user-item interactions. You notice that the model is overfitting. Which of the following hyperparameter changes would most likely reduce overfitting?

A.Increase w_reg (regularization weight) from 0.1 to 0.5

B.Decrease w_reg (regularization weight) from 0.1 to 0.01

C.Increase num_factors from 10 to 20

D.Increase num_training_iterations from 10 to 20

AnswerA

Increasing regularization penalizes large weights and reduces overfitting.

Why this answer

Increasing the L2 regularization weight (w_reg) penalizes large weights and reduces overfitting. Increasing number of factors (num_factors) increases model complexity, worsening overfitting. Decreasing learning rate may help but not as directly as regularization.

Practice this question →

26

Multi-Selectmedium

A data engineer needs to build a real-time dashboard in Looker Studio that displays live sales data from BigQuery. The dashboard must refresh every minute. The underlying BigQuery table is updated continuously via streaming inserts. Which two approaches can reduce query cost and latency? (Choose TWO)

Select 2 answers

A.Use the BigQuery Data Transfer Service to copy data to a separate dataset

B.Create a materialized view that pre-aggregates the data

C.Schedule a script to export the table to Cloud Storage and load into Cloud SQL

D.Use BigQuery BI Engine to accelerate the Looker Studio queries

E.Partition the BigQuery table by the date column

AnswersD, E

BI Engine provides in-memory analysis for sub-second query response.

Why this answer

BI Engine can accelerate queries in Looker Studio by caching data in memory. Partitioning the table on the date column reduces the amount of data scanned.

Practice this question →

27

MCQmedium

A data scientist is using AutoML Tables to build a classification model for predicting customer churn. The dataset is highly imbalanced (only 1% churn). Which strategy should they use to handle the class imbalance within AutoML Tables?

A.Manually apply SMOTE to the training data before uploading to AutoML Tables.

B.No action needed; AutoML Tables automatically handles class imbalance by adjusting class weights.

C.Enable the 'enable_class_imbalance_handling' flag during training.

D.Set the 'class_weight' parameter in the AutoML Tables training configuration to 'balanced'.

AnswerB

Correct.

Why this answer

AutoML Tables automatically computes class weights and applies them during training to handle imbalanced data. You do not need to manually apply SMOTE or change the training budget; AutoML Tables handles it out of the box.

Practice this question →

28

Multi-Selecteasy

A data engineer is using Vertex AI Workbench to develop a custom ML model. They want to store and version datasets, track experiments, and register models. Which three Vertex AI services should they use? (Choose THREE)

Select 3 answers

A.Vertex AI Model Registry

B.Vertex AI Dataset

C.Vertex AI Feature Store

D.Vertex AI Matching Engine

E.Vertex AI Experiments

AnswersA, B, E

For registering and versioning models.

Why this answer

Vertex AI Dataset stores and manages datasets. Vertex AI Experiments tracks ML experiments. Vertex AI Model Registry stores and versions trained models.

Practice this question →

29

MCQmedium

A data engineer needs to train a linear regression model in BigQuery ML using a table with 10 million rows. The model will predict sales based on features like advertising spend, seasonality, and store location. Which SQL statement should they use to create and train the model?

A.CREATE MODEL mymodel AS SELECT * FROM sales_data WITH LINEAR REGRESSION

B.CREATE MODEL mymodel OPTIONS(model_type='LINEAR_REG') AS SELECT * FROM sales_data

C.CREATE OR REPLACE MODEL mymodel OPTIONS(model_type='LINEAR_REGRESSION') AS SELECT * FROM sales_data

D.CREATE MODEL mymodel OPTIONS(model_type='linear_reg') AS SELECT * FROM sales_data

AnswerB

Correct syntax for linear regression in BigQuery ML.

Why this answer

In BigQuery ML, the CREATE MODEL statement with option MODEL_TYPE='LINEAR_REG' creates a linear regression model. The training data is specified in the AS SELECT clause.

Practice this question →

30

MCQeasy

A data analyst wants to rank products by sales within each category. They need to assign a unique rank to each product, with no gaps in the ranking numbers (i.e., ties should have different ranks). Which window function should they use?

A.NTILE()

B.ROW_NUMBER()

C.RANK()

D.DENSE_RANK()

AnswerB

ROW_NUMBER() assigns a unique sequential number to each row, so ties get different ranks without gaps.

Why this answer

ROW_NUMBER() assigns a unique sequential integer to each row within a partition, starting at 1, regardless of ties. RANK() would give the same rank to ties and skip numbers, so it would introduce gaps.

Practice this question →

31

MCQmedium

A machine learning engineer needs to deploy a custom TensorFlow model for online predictions with low latency. The model is already trained and saved in SavedModel format. Which Vertex AI service should they use?

A.Vertex AI Workbench

B.Vertex AI Prediction

C.Vertex AI Feature Store

D.Vertex AI AutoML

AnswerB

Correct: Vertex AI Prediction provides model serving endpoints.

Why this answer

Vertex AI Prediction allows you to deploy custom models (including TensorFlow SavedModel) to an endpoint for online predictions. It supports autoscaling and low-latency serving.

Practice this question →

32

MCQhard

Your team uses Looker to develop a model on top of BigQuery. The data is partitioned by ingestion time, and analysts frequently query the last 7 days. However, Looker queries are scanning the entire table, causing high costs. Which two changes should you implement? (Pick two) Wait, this is multiple_choice. Pick one best approach.

A.Enable BI Engine on the BigQuery table to accelerate queries.

B.Apply a LookML access_filter to dynamically filter on the partition column.

C.Create a materialized view that aggregates data daily and point Looker to that view.

D.Use clustering on the order_date column to improve query performance.

AnswerB

Access filters in LookML can be used to restrict queries to a specific partition range, reducing full table scans.

Why this answer

The single best approach is to apply a partition filter requirement in LookML and enable partition pruning in BigQuery. Other options are not directly about Looker or are suboptimal.

Practice this question →

33

MCQmedium

A data engineer wants to train a linear regression model in BigQuery ML to predict sales. The training data includes a categorical feature with 1000+ unique values. Which method is most appropriate to handle this feature in the CREATE MODEL statement?

A.Set max_categorical_features=100 in the model options.

B.Use TRANSFORM clause with ML.FEATURE_CROSS or manual hashing.

C.Use the OPTIONS(ENCODE='ONE_HOT_ENCODING') parameter in the model options.

D.The model automatically handles high-cardinality features without any additional steps.

AnswerB

TRANSFORM allows custom feature engineering including hashing for high-cardinality features.

Why this answer

BigQuery ML automatically one-hot encodes categorical features with fewer than a threshold of unique values. For high-cardinality features, you can use TRANSFORM to apply feature engineering like hashing or bucketizing.

Practice this question →

34

Multi-Selecthard

A company uses AutoML Tables to train a classification model. They want to improve model performance by engineering new features from existing timestamp columns. Which three techniques can they apply within AutoML Tables? (Choose 3)

Select 3 answers

A.Manually add a column with a boolean indicating if the timestamp falls on a weekend.

B.Create a new column with the difference between two timestamps.

C.Use the 'feature engineering' option to add polynomial features.

D.Apply a SQL UDF in the AutoML Tables training configuration.

E.Extract day of week from timestamp using the AutoML Tables UI.

AnswersB, C, E

You can precompute this in the source data and include it as a feature.

Why this answer

AutoML Tables automatically extracts features from timestamp columns, such as day of week, month, hour, etc. Users can also manually create new columns via the UI or by preprocessing data before import. However, manual SQL functions cannot be used directly within AutoML Tables.

Practice this question →

35

Multi-Selectmedium

A data engineer needs to build a feature engineering pipeline using Vertex AI Pipelines. The pipeline should preprocess data, train a model, and deploy it. Which two components are required to define the pipeline? (Choose 2)

Select 2 answers

A.Kubeflow Pipelines SDK

B.TensorFlow Extended (TFX)

C.Vertex AI Feature Store

D.Dataflow

E.Cloud Composer

AnswersA, B

The SDK is used to define pipeline components and the pipeline graph.

Why this answer

Vertex AI Pipelines uses the Kubeflow Pipelines SDK (or TFX) to define components and compile them into a pipeline. The pipeline is then run on Vertex AI Pipelines.

Practice this question →

36

MCQmedium

You need to create a time-series forecast for inventory demand using BigQuery ML. The data includes daily sales for 5 years. Which model type should you use?

A.K-means

B.Linear regression

C.ARIMA_PLUS

D.Matrix factorization

AnswerC

ARIMA_PLUS is designed for time-series forecasting in BQML.

Why this answer

BigQuery ML supports ARIMA_PLUS for time-series forecasting. Linear regression, k-means, and matrix factorization are not appropriate for time-series forecasting.

Practice this question →

37

MCQmedium

A data engineer needs to build a LookML model in Looker to define business logic and relationships for a new dataset. They want to create an 'explore' that joins an 'orders' view with a 'customers' view. Where should they define this join?

A.In the Looker admin panel, under 'Joins', create a new join.

B.In the model file, within the 'explore' definition, add a 'join' parameter.

C.In the 'orders.view.lkml' file, add a 'join' parameter.

D.In the 'customers.view.lkml' file, add a 'join' parameter.

AnswerB

Correct. The explore definition in the model file (or an explore file) contains the join logic linking views.

Why this answer

In LookML, the explore file (or the explore definition within a model file) is where you define which views to include and how they join together. The view files (*.view.lkml) define the dimension and measure logic for a single table or derived table. The model file (*.model.lkml) ties everything together and defines explores.

Practice this question →

38

MCQmedium

A retailer wants to use machine learning to predict customer churn based on transaction history and demographic data. The dataset has 500 features, many of which are correlated. The data is highly imbalanced: only 2% churn. They need to deploy a model that provides feature importance and is interpretable. Which model type should they use in BigQuery ML?

A.Logistic regression

B.AutoML Tables model

C.Deep Neural Network (DNN) classifier

D.Boosted tree classifier

AnswerD

Boosted trees handle imbalanced data, provide feature importance, and are reasonably interpretable.

Why this answer

Boosted tree models (like XGBoost) handle imbalanced data well, provide feature importance, and offer interpretability. Deep Neural Networks are less interpretable. Logistic regression is interpretable but may not capture complex patterns.

AutoML Tables is powerful but less interpretable and may cost more.

Practice this question →

39

Multi-Selecthard

A company wants to use BigQuery ML to train a time-series forecasting model on historical sales data. The data is recorded daily for 3 years. They need to evaluate model accuracy using time-series aware cross-validation. Which two options should they configure in the CREATE MODEL statement? (Choose TWO)

Select 2 answers

A.Set the data_frequency parameter to 'daily'

B.Use the 'num_trials' parameter for hyperparameter tuning

C.Specify a time_series_timestamp_col and time_series_data_col

D.Set the 'split_method' to 'time_series'

E.Set the model_type to 'ARIMA_PLUS'

AnswersC, E

These are required columns for time-series.

Why this answer

For ARIMA+ models, you can set 'horizon' (forecast length) and 'data_frequency' (auto-detect or set). Cross-validation is not built-in for ARIMA; instead, you evaluate on held-out periods.

Practice this question →

40

MCQeasy

A data engineer needs to create a BigQuery ML model for predicting customer churn using a dataset with 10 million rows and 50 features. The dataset is highly imbalanced (5% churn). Which approach should the engineer use to handle class imbalance during model training?

A.Undersample the majority class before training

B.Use the CREATE MODEL statement with CLASS_WEIGHTS = {'0': 0.2, '1': 0.8}

C.Use SMOTE via TRANSFORM clause in BigQuery ML

D.Oversample the minority class by duplicating rows

AnswerB

BigQuery ML supports class weights to handle imbalance by assigning higher weights to the minority class.

Why this answer

BigQuery ML supports class weights for imbalanced datasets via the CLASS_WEIGHTS option in CREATE MODEL. This assigns higher weight to the minority class without generating synthetic data. SMOTE is not available in BigQuery ML.

Undersampling the majority class would lose data, and oversampling with duplication could introduce bias.

Practice this question →

41

MCQmedium

You need to split a time-series dataset into training and evaluation sets for a forecasting model. The data is ordered by timestamp. Which splitting technique should you use?

A.Sequential split where training data precedes evaluation data in time.

B.Use k-fold cross-validation with random folds.

C.Stratified split based on the target variable.

D.Random split with 80% training, 20% evaluation.

AnswerA

Sequential split respects the temporal order and prevents leakage.

Why this answer

For time-series data, a random split would leak future information into training. A sequential split (earlier data for training, later for evaluation) is required.

Practice this question →

42

Multi-Selectmedium

A data engineer needs to implement data quality rules and governance policies across multiple data lakes in GCP. They want to automatically discover and catalog data assets, and enforce row-level security. Which two services should they use? (Select TWO)

Select 1 answer

A.Security Command Center

B.Dataplex

C.Cloud DLP

D.Dataflow

E.Data Catalog

AnswersB

Dataplex provides data quality, cataloging, governance policies, and row-level security.

Why this answer

Dataplex provides unified data management including data discovery, cataloging, data quality rules, and policy enforcement (row-level security via BigQuery). Data Catalog is for metadata management and discovery, but Dataplex Universal Catalog includes Data Catalog capabilities. Dataflow is for processing, not governance.

Cloud DLP is for data loss prevention, not row-level security. Security Command Center is for cloud security posture.

Practice this question →

43

MCQmedium

A data scientist wants to train a custom TensorFlow model on Vertex AI using a managed Jupyter notebook. Which Vertex AI service should they use to set up a notebook environment with pre-installed deep learning frameworks?

A.Compute Engine with Deep Learning VM

B.Vertex AI Training via custom job

C.Vertex AI Workbench

D.Vertex AI Pipelines

AnswerC

Vertex AI Workbench offers managed, pre-configured Jupyter notebooks with deep learning libraries.

Why this answer

Vertex AI Workbench provides managed Jupyter notebooks with pre-installed deep learning frameworks (TensorFlow, PyTorch, etc.) and easy scaling options. Notebooks on Compute Engine would require manual setup. AI Platform Training is for training jobs, not interactive notebooks.

Vertex AI Pipelines is for orchestrating ML workflows.

Practice this question →

44

MCQeasy

Which BigQuery SQL function can be used to get an approximate count of distinct values in a large column faster than COUNT(DISTINCT) with lower accuracy?

A.APPROX_QUANTILES

B.COUNT(DISTINCT)

C.APPROX_COUNT_DISTINCT

D.DISTINCT_COUNT

AnswerC

Correct: approximate distinct count with improved performance.

Why this answer

APPROX_COUNT_DISTINCT is a HyperLogLog++ based function that provides an approximate distinct count with standard error of ~1.6%, and is much faster on large datasets.

Practice this question →

45

MCQeasy

You want to train a custom TensorFlow model on Vertex AI using a managed Jupyter notebook environment. Which service should you use?

A.Vertex AI Workbench

B.Cloud Datalab

C.Vertex AI Training

D.AI Platform Notebooks

AnswerA

Workbench provides managed notebooks for development and prototyping.

Why this answer

Vertex AI Workbench provides managed Jupyter notebooks with pre-installed frameworks and easy access to Vertex AI services.

Practice this question →

46

MCQmedium

You need to analyze customer churn and want to understand the rank of each customer's churn probability within their subscription plan. Which BigQuery window function computes the relative ranking from 1 (highest probability) to N?

A.DENSE_RANK()

B.RANK()

C.NTILE(100)

D.ROW_NUMBER()

AnswerB

RANK() gives the rank within a partition; ties get same rank, next rank skips.

Why this answer

RANK() assigns a rank with gaps for ties; DENSE_RANK() assigns consecutive ranks; ROW_NUMBER() assigns unique sequential numbers; NTILE() divides into buckets. For relative ranking with ties, RANK() is typically used for 'rank' meaning.

Practice this question →

47

Multi-Selectmedium

A data engineer is building a feature store for ML models using Vertex AI Feature Store. The features are computed daily from BigQuery and need to be available for both online predictions (low latency) and offline training. Which two actions must the engineer take? (Choose TWO)

Select 2 answers

A.Deploy a custom TensorFlow model on Vertex AI

B.Use Cloud SQL to store feature metadata

C.Create a BigQuery table for offline serving

D.Create an entity type in the feature store

E.Enable online serving for the feature store

AnswersD, E

Entity types define the logical grouping of features.

Why this answer

Vertex AI Feature Store requires creating an entity type and a featurestore (which serves online and offline). Features are ingested via Dataflow or batch jobs.

Practice this question →

48

MCQmedium

You are using Looker to model data from BigQuery. You have a dimension that should be filtered by a user attribute (e.g., user's region). Which LookML concept allows you to apply dynamic row-level security based on user attributes?

A.Custom field

B.Derived table

C.Access filter

D.Required access grant

AnswerC

Access filters dynamically restrict data rows based on user attributes, providing row-level security.

Why this answer

Access filters in LookML allow dynamic filtering based on user attributes from the authentication system. Required access grants are for permissions, not dynamic filtering.

Practice this question →

49

MCQeasy

You need to create a Looker model that defines a 'sales' view based on a BigQuery table, with a measure for total revenue. Which LookML object defines the table and dimensions?

A.explore

B.view

C.model

D.dimension

AnswerB

A view in LookML maps to a database table and defines dimensions and measures.

Why this answer

In LookML, a view defines the mapping to a database table (or derived table) and contains dimensions and measures.

Practice this question →

50

MCQhard

A data engineer is building a production ML pipeline on Vertex AI. The pipeline must preprocess features (e.g., scaling, encoding) and then train a model. The preprocessing logic must be reusable for serving predictions. Which Vertex AI component should they use?

A.Vertex AI Feature Transform

B.Dataflow with Apache Beam

C.Vertex AI Feature Store

D.Vertex AI Pipelines

AnswerA

Feature Transform provides managed, reusable transformations that can be applied consistently in training and serving.

Why this answer

Vertex AI Feature Transform is a managed service that allows you to define transformations using TFX Transform or BigQuery SQL, which are then applied consistently during training and serving. Vertex AI Pipelines can orchestrate but does not itself provide reusable transformations. Dataflow would require custom code.

Vertex AI Feature Store serves pre-computed features, not transformations.

Practice this question →

51

Multi-Selecteasy

You need to implement data quality rules on a Dataplex lake to ensure that critical columns are not null and meet certain constraints. Which two Dataplex features can you use? (Choose TWO)

Select 2 answers

A.Tag Templates

B.Auto Data Quality

C.Data Catalog

D.Data Quality Tasks

E.Data Scan

AnswersB, D

Auto Data Quality automatically profiles data and suggests quality rules.

Why this answer

Dataplex Data Quality Tasks allow you to define and run data quality checks. Dataplex Auto Data Quality automates profiling and monitoring. The other options are not specific to quality: Catalog is for discovery, Scans for security, and Tag Templates for metadata.

Practice this question →

52

MCQeasy

You need to track the lineage of data in BigQuery, showing how tables are derived from other tables via queries. Which service provides this capability?

A.BigQuery Lineage API

B.Cloud Composer

C.Cloud Data Catalog

D.Dataflow

AnswerA

BigQuery has a built-in lineage API that tracks table dependencies.

Why this answer

BigQuery lineage API and Dataplex lineage both provide data lineage tracking. BigQuery lineage is built-in, while Dataplex extends it across the data lake.

Practice this question →

53

MCQhard

A data engineer needs to implement data lineage tracking for a BigQuery data warehouse. They want to automatically capture column-level lineage from ETL jobs run by Dataform and from manual SQL queries executed in the BigQuery console. Which approach meets these requirements?

A.Enable Dataplex Universal Catalog and use the Dataplex Lineage API to capture lineage from both sources

B.Use the BigQuery Lineage API provided by Data Catalog to register lineage manually

C.Use BigQuery's built-in column-level lineage, which automatically tracks lineage for all queries, and query it using INFORMATION_SCHEMA.JOBS

D.Export Dataform execution logs to Cloud Logging and use a custom script to extract lineage

AnswerC

BigQuery automatically captures column-level lineage for all SQL jobs, including Dataform and console queries.

Why this answer

BigQuery's column-level lineage is automatically captured for all queries (including Dataform and console) and can be queried via the INFORMATION_SCHEMA.JOBS_BY_USER views. Dataplex lineage requires integration but BigQuery lineage is native.

Practice this question →

54

MCQhard

A company uses BigQuery Omni to query data stored in AWS S3. They need to join this data with data in BigQuery (GCP). The dataset in AWS is large (10 TB) and frequently updated. Which approach minimizes data movement and cost?

A.Use BigQuery Omni cross-cloud join with the query processed in AWS region

B.Create a materialized view in BigQuery that includes the AWS data

C.Copy the AWS data into BigQuery storage using a scheduled transfer, then join

D.Use a federated query from BigQuery to read the AWS data directly and join in BigQuery

AnswerA

BigQuery Omni processes the join on the AWS side, pulling only the required rows from GCP, minimizing data movement.

Why this answer

BigQuery Omni supports cross-cloud joins by running the query on the cloud where the data resides, using the cross-cloud join feature to pull only the necessary rows from the other cloud.

Practice this question →

55

Multi-Selecthard

You are building a time-series forecasting model with BigQuery ML. Which three steps should you perform to properly split the data and evaluate the model? (Choose THREE)

Select 3 answers

A.Evaluate on a holdout set that is later in time than the training set.

B.Use time-series cross-validation with expanding windows.

C.Split the data randomly into training and testing sets.

D.Use a chronological split based on a cutoff date.

E.Use k-fold cross-validation with random folds.

AnswersA, B, D

Testing on future data simulates real-world forecasting.

Why this answer

For time-series, you must maintain temporal order: split chronologically (not randomly), use a cutoff date for training/validation, and evaluate on unseen future data. Cross-validation should be time-series aware (e.g., expanding window). Random split is invalid for time-series.

Using a single train/test split may be insufficient; multiple windows are better.

Practice this question →

56

MCQeasy

You need to preprocess tabular data for training a classification model using Vertex AI. The dataset has missing values in numerical columns and categorical columns with high cardinality. Which Vertex AI service provides automated feature engineering and preprocessing as part of the pipeline?

A.Vertex AI Pipelines

B.AutoML Tables

C.Vertex AI Feature Store

D.Vertex AI Workbench

AnswerA

Vertex AI Pipelines orchestrates preprocessing steps such as imputation and encoding as part of an ML pipeline.

Why this answer

Vertex AI Pipelines allows you to build ML pipelines with components for feature engineering, including handling missing values and encoding. Vertex AI Feature Store is for serving features, not preprocessing.

Practice this question →

57

MCQhard

You are migrating a large on-premises data warehouse to BigQuery. The data includes sensitive PII columns that must be masked for certain users. Which BigQuery feature can automatically redact PII in query results based on user roles?

A.IAM conditions on tables

B.Authorized views

C.Column-level security with data masking

D.Cloud DLP API

AnswerC

Data masking policy tags can automatically redact PII based on user roles.

Why this answer

BigQuery column-level security with policy tags and data masking can automatically mask sensitive data based on IAM roles. Authorized views require manual creation. Dynamic data masking is part of column-level security.

IAM conditions don't mask data.

Practice this question →

58

Multi-Selectmedium

A data scientist wants to use Vertex AI Workbench for exploratory data analysis. Which TWO statements are true about Vertex AI Workbench?

Select 2 answers

A.It is a serverless service that scales to zero when not in use.

B.It supports custom container images for the notebook environment.

C.It can only be used with TensorFlow.

D.It provides a managed JupyterLab environment with pre-installed ML libraries.

E.It includes a built-in SQL query editor for BigQuery.

AnswersB, D

Correct: supports custom containers.

Why this answer

Vertex AI Workbench provides managed Jupyter notebooks with pre-installed ML frameworks. It integrates with BigQuery via the Python client. It does not provide a SQL editor; that's BigQuery.

It supports custom containers. It is not serverless; it runs on Compute Engine VMs.

Practice this question →

59

MCQmedium

A company wants to train a machine learning model to predict customer churn using BigQuery ML. The dataset has a severe class imbalance (only 2% churn). Which approach should the data engineer take to handle this imbalance within BigQuery ML?

A.Use SMOTE directly in BigQuery SQL before training

B.Set the CLASS_WEIGHTS option to 'balanced' in the CREATE MODEL statement

C.Create a custom Vertex AI model using TensorFlow and use the class_weight parameter

D.Oversample the minority class using a SQL query that duplicates rows

AnswerB

BigQuery ML's CLASS_WEIGHTS option can be set to 'balanced' to automatically compute weights inversely proportional to class frequencies.

Why this answer

BigQuery ML supports class weights via the CLASS_WEIGHTS option in CREATE MODEL, which adjusts the loss function to penalize misclassifications of the minority class more heavily.

Practice this question →

60

Multi-Selecthard

A data analyst wants to compute the rank of sales per region and also the difference in sales between consecutive months for each region. Which BigQuery analytic functions should they use? (Select TWO)

Select 2 answers

A.RANK()

B.ROW_NUMBER()

C.LAG()

D.LEAD()

E.NTILE()

AnswersA, C

RANK() computes the rank of sales per region.

Why this answer

RANK() computes the rank of rows within a partition. LAG() accesses data from a previous row in the same result set, which can be used to compute differences. ROW_NUMBER() assigns unique sequential integers, not rank.

NTILE() distributes rows into buckets. LEAD() accesses next row, not previous.

Practice this question →

61

Multi-Selectmedium

A company wants to use BigQuery ML to build a recommendation system for movies. The data includes user IDs, movie IDs, and ratings. Which BigQuery ML model types are suitable for this? (Select TWO)

Select 2 answers

A.ARIMA_PLUS

B.k-means

C.AutoML Tables

D.Boosted tree classifier

E.Matrix factorization

AnswersD, E

Boosted trees can be used to predict ratings as a classification problem.

Why this answer

Matrix factorization (via implicit or explicit feedback) is specifically designed for recommendation systems. Boosted tree classifiers can also be used for predicting ratings as a classification problem. AutoML Tables is not a model type in BigQuery ML.

ARIMA_PLUS is for time-series. k-means is for clustering, not recommendations directly.

Practice this question →

62

MCQeasy

You have a BigQuery table 'orders' with columns order_id, customer_id, order_amount, and order_date. You need to rank customers by total spend per month, assigning the rank 1 to the highest spender. Which SQL function should you use in a window clause?

A.NTILE()

B.DENSE_RANK()

C.ROW_NUMBER()

D.RANK()

AnswerD

RANK() assigns the same rank to ties and leaves gaps; appropriate for ranking top spenders.

Why this answer

RANK() assigns a rank with gaps for ties; for top-spender ranking, that is appropriate. DENSE_RANK() also works but without gaps; the stem does not specify. ROW_NUMBER() gives unique numbers even for ties.

However, typical ranking with ties uses RANK().

Practice this question →

63

MCQmedium

A financial analytics team uses Looker to explore BigQuery data. They need to allow business users to filter by a custom date range that is not tied to an existing dimension. The date range must be user-input at query time. What is the best approach in Looker?

A.Create an explore with a custom filter field in the Looker UI

B.Use a filter parameter directly on the date dimension

C.Add a dimension with a yesno filter that toggles the date range

D.Create a parameter in LookML using Liquid templating

AnswerD

Parameters allow user input at query time, rendered as filter controls, and can be used in conditions.

Why this answer

Looker uses Liquid templating in LookML to create parameters that render as filter controls at runtime. Users can input values that are then injected into the SQL. Creating a dimension with a yesno filter requires predefined values.

The filter parameter on a dimension only allows selecting from existing values, not arbitrary input.

Practice this question →

64

MCQhard

A data scientist is training a binary classification model on an imbalanced dataset (95% negative, 5% positive) using AutoML Tables. Which strategy should they use to handle the class imbalance?

A.Set the budget to a higher value to allow more training on minority class.

B.Use SMOTE in a Dataflow pipeline before importing the data to AutoML Tables.

C.Specify a weight column with higher weights for positive examples in the dataset.

D.Create duplicate copies of the positive class rows to balance the dataset.

AnswerC

AutoML Tables supports a weight column to give more importance to minority class.

Why this answer

AutoML Tables automatically handles class imbalance by applying class weights and downsampling. Users can also specify a weight column explicitly.

Practice this question →

65

MCQmedium

A company uses Looker Studio to create dashboards from BigQuery data. They notice that dashboard queries take several seconds to load. They want to improve performance without changing the underlying data or creating materialized views. Which option should they use?

A.Enable BigQuery BI Engine for the project

B.Increase the number of BigQuery slots

C.Switch to Looker instead of Looker Studio

D.Replicate the data to Cloud SQL for faster queries

AnswerA

BI Engine accelerates BI queries by caching data in memory, reducing latency.

Why this answer

BigQuery BI Engine is an in-memory analysis service that accelerates queries from Looker Studio (and other BI tools) by caching data in memory, significantly reducing latency. Replicating data to Cloud SQL would add complexity and may not handle the volume. Using Looker instead of Looker Studio doesn't inherently speed up queries.

Increasing BigQuery slots would help but is more expensive and not as targeted for BI tools.

Practice this question →

66

Multi-Selecteasy

You want to query data across Google Cloud and AWS using a single SQL interface without moving data. Which TWO services can you use?

Select 2 answers

A.BigQuery Data Transfer Service

B.Cloud Spanner

C.BigQuery Omni

D.Cloud Data Fusion

E.BigQuery cross-cloud query with Omni

AnswersC, E

BigQuery Omni enables multi-cloud queries.

Why this answer

BigQuery Omni allows querying data in AWS (and Azure) using BigQuery SQL. BigQuery Omni runs on multi-cloud. BigQuery itself is GCP-only.

Cloud Spanner and Data Fusion are not for multi-cloud SQL queries across clouds.

Practice this question →

67

MCQmedium

A data engineer needs to query data across BigQuery (in Google Cloud) and Snowflake (in AWS) without moving the data. Which service should they use?

A.Dataflow

B.Cloud SQL

C.Vertex AI Feature Store

D.BigQuery Omni

AnswerD

BigQuery Omni supports multi-cloud analytics without data movement.

Why this answer

BigQuery Omni allows querying data across multiple clouds using BigQuery's interface, with compute running in the respective cloud. Data stays in place.

Practice this question →

68

MCQmedium

A company wants to use AutoML Tables to build a classification model on a dataset with 100 features and 500,000 rows. They need to deploy the model for online predictions with low latency (<100 ms). Which deployment option should they choose?

A.Export the model as a TF SavedModel and deploy on Cloud Run

B.Deploy the model on AI Platform Prediction

C.Deploy the model to an endpoint in Vertex AI using the AutoML endpoint service

D.Use batch prediction in Vertex AI

AnswerC

Vertex AI provides a managed endpoint for AutoML models with low latency prediction.

Why this answer

AutoML Tables supports online prediction endpoints that are deployed on a dedicated cluster, providing low latency for real-time predictions.

Practice this question →

69

MCQmedium

You are building a multi-cloud analytics solution to join data from Google Cloud and AWS S3. You need to query the S3 data using BigQuery without moving it. Which Google Cloud service should you use?

A.Looker

B.Dataproc

C.BigQuery Data Transfer Service

D.BigQuery Omni

AnswerD

BigQuery Omni enables cross-cloud analytics by querying data in AWS S3 and Azure without moving it.

Why this answer

BigQuery Omni allows querying data stored in AWS S3 and Azure Blob Storage using BigQuery SQL, without data movement. BigQuery Omni runs in the cloud provider's region.

Practice this question →

70

MCQeasy

You have a BigQuery table with sales data and want to pivot product categories into columns. Which SQL clause should you use?

A.UNPIVOT

B.PIVOT

C.ARRAY_AGG with CROSS JOIN

D.STRUCT

AnswerB

PIVOT transforms rows into columns.

Why this answer

PIVOT is the standard SQL clause to rotate rows into columns. UNPIVOT rotates columns to rows. ARRAY_AGG and STRUCT are not used for pivoting.

Practice this question →

71

Multi-Selectmedium

You need to select two BigQuery features that improve query performance by reducing the amount of data read. Which two options accomplish this? (Choose TWO)

Select 2 answers

A.Clustering on commonly filtered columns

B.BI Engine reservation

C.Materialized views

D.Partitioning on a DATE column

E.Approximate aggregation functions

AnswersA, D

Clustering enables block-level pruning, reducing data read.

Why this answer

Partitioning and clustering both reduce data scanned by narrowing the data read. BI Engine caches data but does not reduce scan size. Materialized views may reduce scans but the question asks for features, not views.

Approximate aggregation reduces computation but not data read.

Practice this question →

72

MCQhard

A company uses Dataplex to manage data lakes on Google Cloud. They want to enforce data quality rules on a BigQuery table, such as ensuring that a 'email' column is not null and matches a regex pattern. Which Dataplex feature should they use?

A.Dataplex Universal Catalog

B.Dataplex Lake

C.Dataplex Data Quality

D.Dataplex Data Lineage

AnswerC

Correct: Data Quality is the feature for defining and running quality rules.

Why this answer

Dataplex Data Quality is a feature that allows you to define and run data quality checks on BigQuery tables. You can create a Data Quality Task using the Dataplex UI or API to specify rules like NOT_NULL and REGEX.

Practice this question →

73

MCQhard

A data scientist wants to import a pre-trained TensorFlow model into BigQuery ML for batch predictions. The model is stored in a Cloud Storage bucket. Which statement is correct?

A.Use CREATE MODEL with model_type='tensorflow' and model_path='gs://bucket/model'.

B.Use CREATE MODEL with model_type='imported_tensorflow' and model_path='gs://bucket/model'.

C.First upload the model to Vertex AI Model Registry, then reference it in BigQuery ML.

D.Use the ML.IMPORT_MODEL function to load the model into BigQuery.

AnswerA

This is the correct syntax for importing a TensorFlow model.

Why this answer

BigQuery ML supports importing TensorFlow models via CREATE MODEL with model_type='tensorflow' and model_path pointing to the SavedModel directory in Cloud Storage.

Practice this question →

74

Multi-Selectmedium

A company stores sensitive customer data in BigQuery. They need to implement column-level security to restrict access to personally identifiable information (PII) columns. Which two BigQuery features can they use together? (Choose TWO)

Select 2 answers

A.BigQuery row-level security

B.BigQuery foreign key constraints

C.BigQuery data masking

D.Authorized views

E.BigQuery column-level access control using policy tags

AnswersD, E

Authorized views can expose a subset of columns to users based on policy tags.

Why this answer

Column-level security can be achieved via policy tags (with Data Catalog) and then authorized views that filter columns based on user roles. Policy tags enforce access control at the column level.

Practice this question →

75

Multi-Selecthard

A company uses Dataplex to manage data quality across multiple BigQuery datasets. They need to define data quality rules that check for null values in critical columns and enforce uniqueness constraints. Which two Dataplex features should they use? (Choose TWO)

Select 2 answers

A.Dataplex Lake

B.Dataplex Data Lineage

C.Dataplex Data Quality Rules

D.Dataplex Data Quality Tasks

E.Dataplex Universal Catalog

AnswersC, D

Data Quality Rules allow defining custom checks like null and uniqueness.

Why this answer

Dataplex Data Quality can define rules (including null check and uniqueness) and schedule them as Data Quality Tasks. Data Quality Rules are defined in YAML and can be attached to entities.

Practice this question →

Page 1 of 2 · 90 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Pde Analysis Ml questions.

Start 20-question session

CCNA Pde Analysis Ml Questions — Page 1 of 2 | Courseiva